glam/docs/RDF_PARTNERSHIP_EXPORT.md
2025-11-19 23:25:22 +01:00

12 KiB

RDF Partnership Export Implementation

Status: COMPLETE
Date: 2025-11-07
Version: 1.0

Overview

Successfully implemented RDF/JSON-LD serialization of Partnership data using W3C Organization Ontology (ORG) patterns. The implementation integrates multiple heritage ontologies including CIDOC-CRM, RiC-O, Schema.org, PROV-O, and W3C ORG.

Implementation

Files Created/Modified

  1. src/glam_extractor/exporters/rdf_exporter.py (343 lines) - NEW

    • Full RDF exporter with multi-ontology support
    • Partnership serialization using org:Membership pattern
    • Supports Turtle, RDF/XML, JSON-LD, N-Triples formats
  2. src/glam_extractor/exporters/__init__.py - UPDATED

    • Exported RDFExporter class for public API
  3. tests/exporters/test_rdf_exporter.py (292 lines) - NEW

    • 5 comprehensive tests covering:
      • Single partnership export
      • Multiple partnerships export
      • Partnerships with temporal scope (start/end dates)
      • Full Turtle serialization
      • Complete custodian with all fields

Test Results

tests/exporters/test_rdf_exporter.py::TestRDFExporterPartnership::test_single_partnership_export PASSED
tests/exporters/test_rdf_exporter.py::TestRDFExporterPartnership::test_multiple_partnerships_export PASSED
tests/exporters/test_rdf_exporter.py::TestRDFExporterPartnership::test_partnership_with_temporal_scope PASSED
tests/exporters/test_rdf_exporter.py::TestRDFExporterPartnership::test_export_to_turtle PASSED
tests/exporters/test_rdf_exporter.py::TestRDFExporterCompleteness::test_full_custodian_export PASSED

5 passed in 1.00s
Coverage: 89% for rdf_exporter.py

RDF Partnership Pattern

W3C Organization Ontology Pattern

Partnerships are serialized using the org:Membership class with the following structure:

<custodian-uri>
  org:hasMembership [
    a org:Membership, ghcid:Partnership ;
    org:organization <custodian-uri> ;
    org:member [
      a org:Organization ;
      schema:name "Partner Name"
    ] ;
    org:role "partnership_type" ;
    ghcid:partner_name "Partner Name" ;
    ghcid:partnership_type "partnership_type" ;
    schema:startDate "2022-01-01"^^xsd:date ;
    schema:endDate "2025-12-31"^^xsd:date ;
    schema:description "Partnership description" ;
  ] .

Ontology Integration

Primary Classes:

  • org:Membership - W3C Organization Ontology (standardized pattern)
  • ghcid:Partnership - GHCID-specific type for domain queries

Properties:

  • org:organization - Links membership to custodian
  • org:member - Partner organization (blank node or URI)
  • org:role - Partnership type (string literal)
  • schema:startDate / schema:endDate - Temporal scope (XSD dates)
  • schema:description - Partnership description
  • ghcid:partner_name - Partner organization name (string)
  • ghcid:partnership_type - Partnership classification

Partner Organization Representation

Partners are represented as blank nodes with:

  • rdf:type org:Organization
  • schema:name - Organization name

Future Enhancement: When partner organizations have resolvable URIs in the GHCID dataset, replace blank nodes with URI references.

Real-World Example

Input Data

From Dutch Organizations CSV (data/voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.csv):

Regionaal Historisch Centrum (RHC) Drents Archief
- ISIL: NL-AsnDA
- City: Assen
- Partnerships:
  - Archieven.nl (aggregator_participation)
  - Archives Portal Europe (international_aggregator)
  - WO2Net (thematic_network)
  - OODE24 (Mondriaan) (thematic_network)

RDF Output (Turtle)

<NL-AsnDA> a schema:ArchiveOrganization,
        schema:Organization,
        org:Organization,
        prov:Entity,
        ghcid:HeritageCustodian,
        rico:CorporateBody ;
    schema:name "Regionaal Historisch Centrum (RHC) Drents Archief" ;
    
    org:hasMembership [
        a org:Membership, ghcid:Partnership ;
        org:organization <NL-AsnDA> ;
        org:member [ a org:Organization ; schema:name "Archieven.nl" ] ;
        org:role "aggregator_participation" ;
        schema:description "Dutch national archive portal" ;
    ] ,
    [
        a org:Membership, ghcid:Partnership ;
        org:organization <NL-AsnDA> ;
        org:member [ a org:Organization ; schema:name "Archives Portal Europe" ] ;
        org:role "international_aggregator" ;
        schema:description "European archive aggregation network" ;
    ] ,
    [
        a org:Membership, ghcid:Partnership ;
        org:organization <NL-AsnDA> ;
        org:member [ a org:Organization ; schema:name "WO2Net" ] ;
        org:role "thematic_network" ;
        schema:description "WWII heritage network" ;
    ] ,
    [
        a org:Membership, ghcid:Partnership ;
        org:organization <NL-AsnDA> ;
        org:member [ a org:Organization ; schema:name "OODE24 (Mondriaan)" ] ;
        org:role "thematic_network" ;
        schema:description "Mondriaan art project" ;
    ] .

Export Formats Supported

1. Turtle (RDF/Turtle)

exporter = RDFExporter()
turtle = exporter.export([custodian], format="turtle")

Features:

  • Human-readable RDF serialization
  • Prefix declarations for all ontologies
  • Blank node lists for partnerships

2. JSON-LD

jsonld = exporter.export([custodian], format="json-ld")

Features:

  • JSON structure with @context, @type, @id
  • Machine-parseable linked data
  • Interoperable with IIIF, Web Annotations, Activity Streams

3. RDF/XML

rdfxml = exporter.export([custodian], format="xml")

Features:

  • XML serialization for OAI-PMH, SWORD
  • Traditional Semantic Web format

4. N-Triples

ntriples = exporter.export([custodian], format="nt")

Features:

  • Simple triple format (subject, predicate, object per line)
  • Easy to parse with Unix tools

Usage Examples

Export Single Custodian

from glam_extractor.exporters.rdf_exporter import RDFExporter
from glam_extractor.models import HeritageCustodian, Partnership

custodian = HeritageCustodian(
    id="https://w3id.org/heritage/custodian/nl/test",
    name="Test Museum",
    institution_type=InstitutionType.MUSEUM,
    partnerships=[
        Partnership(
            partner_name="Museum Register",
            partnership_type="national_museum_certification"
        )
    ],
    provenance=Provenance(...)
)

exporter = RDFExporter()
turtle = exporter.export([custodian], format="turtle")
print(turtle)

Export Multiple Custodians

exporter = RDFExporter()
for custodian in custodians:
    exporter.add_custodian(custodian)

# Export all at once
turtle = exporter.export(custodians, format="turtle")

Export to File

exporter = RDFExporter()
turtle = exporter.export(custodians, format="turtle")

with open("output.ttl", "w", encoding="utf-8") as f:
    f.write(turtle)

Ontology Namespaces

The RDF exporter integrates the following ontologies:

Prefix Namespace Purpose
ghcid https://w3id.org/heritage/custodian/ GHCID domain classes and properties
cidoc http://www.cidoc-crm.org/cidoc-crm/ CIDOC Conceptual Reference Model (cultural heritage)
rico https://www.ica.org/standards/RiC/ontology# Records in Contexts (archival description)
schema http://schema.org/ Schema.org vocabulary (web search, IIIF)
org http://www.w3.org/ns/org# W3C Organization Ontology (partnerships, hierarchy)
prov http://www.w3.org/ns/prov# W3C PROV Ontology (provenance tracking)
foaf http://xmlns.com/foaf/0.1/ Friend of a Friend (agents, names)
dcterms http://purl.org/dc/terms/ Dublin Core metadata terms

Design Decisions

Why org:Membership?

The W3C Organization Ontology provides org:Membership specifically for representing "membership or affiliation of agents to organizations." This aligns perfectly with heritage institution partnerships:

  • Standardized pattern - Established W3C recommendation
  • Flexible scope - Supports temporal bounds, roles, descriptions
  • Interoperable - Used by government data portals (UK, EU)
  • Extensible - Can add GHCID-specific properties via ghcid:Partnership

Blank Nodes vs. URIs

Current: Partner organizations are blank nodes
Rationale: Most partners don't have GHCIDs (yet)
Future: Replace blank nodes with URIs when partners are in GHCID dataset

Example migration:

# Current (blank node)
org:member [ a org:Organization ; schema:name "Museum Register" ]

# Future (URI reference)
org:member <https://w3id.org/heritage/custodian/nl/museum-register>

Dual Typing (org:Membership + ghcid:Partnership)

Memberships are typed as both org:Membership and ghcid:Partnership:

[ a org:Membership, ghcid:Partnership ; ... ]

Rationale:

  • org:Membership - Standard interoperability with non-GLAM systems
  • ghcid:Partnership - Domain-specific queries (e.g., SPARQL: ?s org:hasMembership ?m . ?m a ghcid:Partnership)

SPARQL Query Examples

Find All Partnerships of an Institution

PREFIX org: <http://www.w3.org/ns/org#>
PREFIX ghcid: <https://w3id.org/heritage/custodian/>

SELECT ?partner ?type WHERE {
  <NL-AsnDA> org:hasMembership ?membership .
  ?membership a ghcid:Partnership ;
    ghcid:partner_name ?partner ;
    ghcid:partnership_type ?type .
}

Find All Institutions in a Network

PREFIX org: <http://www.w3.org/ns/org#>

SELECT ?institution ?name WHERE {
  ?institution org:hasMembership ?membership .
  ?membership org:role "thematic_network" ;
    ghcid:partner_name "WO2Net" .
  ?institution schema:name ?name .
}

Find Partnerships with Temporal Scope

PREFIX schema: <http://schema.org/>
PREFIX org: <http://www.w3.org/ns/org#>

SELECT ?institution ?partner ?start ?end WHERE {
  ?institution org:hasMembership ?membership .
  ?membership ghcid:partner_name ?partner ;
    schema:startDate ?start ;
    schema:endDate ?end .
  FILTER(?end > "2025-01-01"^^xsd:date)
}

Next Steps

Task 3: Conversation JSON Parser Enhancement

Add Partnership extraction to src/glam_extractor/parsers/conversation.py:

  1. Pattern detection for partnership mentions
  2. Classify partnership types from context
  3. Extract temporal scope when mentioned
  4. Link to partner organizations if identifiable

Task 4: Global Partnership Taxonomy Documentation

Document the partnership type taxonomy in docs/PARTNERSHIP_TAXONOMY.md:

  1. Dutch Partnership Types (18 types observed):

    • national_museum_certification - Museum Register
    • aggregator_participation - Collectie Nederland, Archieven.nl
    • digitization_program - Versnellen, DC4EU
    • thematic_network - WO2Net, Mondriaan, Van Gogh Worldwide
    • (and 14 more types)
  2. Global Partnership Categories:

    • National certifications/registers
    • Aggregation platforms
    • Digitization programs
    • Thematic networks
    • International collaborations
    • Funding partnerships
    • Technical infrastructure
  3. Mapping to Controlled Vocabularies:

    • AAT (Art & Architecture Thesaurus)
    • PROV-O activity types
    • EU corporate vocabularies (CPOV)

References


Contributors: OpenCODE AI Agent
License: CC0 1.0 Universal (Public Domain)
Project: GLAM Data Extractor - Global Heritage Custodian Identifier (GHCID) System