glam/docs/RDF_PARTNERSHIP_EXPORT.md
2025-11-19 23:25:22 +01:00

387 lines
12 KiB
Markdown

# RDF Partnership Export Implementation
**Status**: ✅ COMPLETE
**Date**: 2025-11-07
**Version**: 1.0
## Overview
Successfully implemented RDF/JSON-LD serialization of Partnership data using W3C Organization Ontology (ORG) patterns. The implementation integrates multiple heritage ontologies including CIDOC-CRM, RiC-O, Schema.org, PROV-O, and W3C ORG.
## Implementation
### Files Created/Modified
1. **`src/glam_extractor/exporters/rdf_exporter.py`** (343 lines) - NEW
- Full RDF exporter with multi-ontology support
- Partnership serialization using `org:Membership` pattern
- Supports Turtle, RDF/XML, JSON-LD, N-Triples formats
2. **`src/glam_extractor/exporters/__init__.py`** - UPDATED
- Exported `RDFExporter` class for public API
3. **`tests/exporters/test_rdf_exporter.py`** (292 lines) - NEW
- 5 comprehensive tests covering:
- Single partnership export
- Multiple partnerships export
- Partnerships with temporal scope (start/end dates)
- Full Turtle serialization
- Complete custodian with all fields
### Test Results
```
tests/exporters/test_rdf_exporter.py::TestRDFExporterPartnership::test_single_partnership_export PASSED
tests/exporters/test_rdf_exporter.py::TestRDFExporterPartnership::test_multiple_partnerships_export PASSED
tests/exporters/test_rdf_exporter.py::TestRDFExporterPartnership::test_partnership_with_temporal_scope PASSED
tests/exporters/test_rdf_exporter.py::TestRDFExporterPartnership::test_export_to_turtle PASSED
tests/exporters/test_rdf_exporter.py::TestRDFExporterCompleteness::test_full_custodian_export PASSED
5 passed in 1.00s
Coverage: 89% for rdf_exporter.py
```
## RDF Partnership Pattern
### W3C Organization Ontology Pattern
Partnerships are serialized using the `org:Membership` class with the following structure:
```turtle
<custodian-uri>
org:hasMembership [
a org:Membership, ghcid:Partnership ;
org:organization <custodian-uri> ;
org:member [
a org:Organization ;
schema:name "Partner Name"
] ;
org:role "partnership_type" ;
ghcid:partner_name "Partner Name" ;
ghcid:partnership_type "partnership_type" ;
schema:startDate "2022-01-01"^^xsd:date ;
schema:endDate "2025-12-31"^^xsd:date ;
schema:description "Partnership description" ;
] .
```
### Ontology Integration
**Primary Classes**:
- `org:Membership` - W3C Organization Ontology (standardized pattern)
- `ghcid:Partnership` - GHCID-specific type for domain queries
**Properties**:
- `org:organization` - Links membership to custodian
- `org:member` - Partner organization (blank node or URI)
- `org:role` - Partnership type (string literal)
- `schema:startDate` / `schema:endDate` - Temporal scope (XSD dates)
- `schema:description` - Partnership description
- `ghcid:partner_name` - Partner organization name (string)
- `ghcid:partnership_type` - Partnership classification
### Partner Organization Representation
Partners are represented as blank nodes with:
- `rdf:type org:Organization`
- `schema:name` - Organization name
**Future Enhancement**: When partner organizations have resolvable URIs in the GHCID dataset, replace blank nodes with URI references.
## Real-World Example
### Input Data
From Dutch Organizations CSV (`data/voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.csv`):
```
Regionaal Historisch Centrum (RHC) Drents Archief
- ISIL: NL-AsnDA
- City: Assen
- Partnerships:
- Archieven.nl (aggregator_participation)
- Archives Portal Europe (international_aggregator)
- WO2Net (thematic_network)
- OODE24 (Mondriaan) (thematic_network)
```
### RDF Output (Turtle)
```turtle
<NL-AsnDA> a schema:ArchiveOrganization,
schema:Organization,
org:Organization,
prov:Entity,
ghcid:HeritageCustodian,
rico:CorporateBody ;
schema:name "Regionaal Historisch Centrum (RHC) Drents Archief" ;
org:hasMembership [
a org:Membership, ghcid:Partnership ;
org:organization <NL-AsnDA> ;
org:member [ a org:Organization ; schema:name "Archieven.nl" ] ;
org:role "aggregator_participation" ;
schema:description "Dutch national archive portal" ;
] ,
[
a org:Membership, ghcid:Partnership ;
org:organization <NL-AsnDA> ;
org:member [ a org:Organization ; schema:name "Archives Portal Europe" ] ;
org:role "international_aggregator" ;
schema:description "European archive aggregation network" ;
] ,
[
a org:Membership, ghcid:Partnership ;
org:organization <NL-AsnDA> ;
org:member [ a org:Organization ; schema:name "WO2Net" ] ;
org:role "thematic_network" ;
schema:description "WWII heritage network" ;
] ,
[
a org:Membership, ghcid:Partnership ;
org:organization <NL-AsnDA> ;
org:member [ a org:Organization ; schema:name "OODE24 (Mondriaan)" ] ;
org:role "thematic_network" ;
schema:description "Mondriaan art project" ;
] .
```
## Export Formats Supported
### 1. Turtle (RDF/Turtle)
```python
exporter = RDFExporter()
turtle = exporter.export([custodian], format="turtle")
```
**Features**:
- Human-readable RDF serialization
- Prefix declarations for all ontologies
- Blank node lists for partnerships
### 2. JSON-LD
```python
jsonld = exporter.export([custodian], format="json-ld")
```
**Features**:
- JSON structure with `@context`, `@type`, `@id`
- Machine-parseable linked data
- Interoperable with IIIF, Web Annotations, Activity Streams
### 3. RDF/XML
```python
rdfxml = exporter.export([custodian], format="xml")
```
**Features**:
- XML serialization for OAI-PMH, SWORD
- Traditional Semantic Web format
### 4. N-Triples
```python
ntriples = exporter.export([custodian], format="nt")
```
**Features**:
- Simple triple format (subject, predicate, object per line)
- Easy to parse with Unix tools
## Usage Examples
### Export Single Custodian
```python
from glam_extractor.exporters.rdf_exporter import RDFExporter
from glam_extractor.models import HeritageCustodian, Partnership
custodian = HeritageCustodian(
id="https://w3id.org/heritage/custodian/nl/test",
name="Test Museum",
institution_type=InstitutionType.MUSEUM,
partnerships=[
Partnership(
partner_name="Museum Register",
partnership_type="national_museum_certification"
)
],
provenance=Provenance(...)
)
exporter = RDFExporter()
turtle = exporter.export([custodian], format="turtle")
print(turtle)
```
### Export Multiple Custodians
```python
exporter = RDFExporter()
for custodian in custodians:
exporter.add_custodian(custodian)
# Export all at once
turtle = exporter.export(custodians, format="turtle")
```
### Export to File
```python
exporter = RDFExporter()
turtle = exporter.export(custodians, format="turtle")
with open("output.ttl", "w", encoding="utf-8") as f:
f.write(turtle)
```
## Ontology Namespaces
The RDF exporter integrates the following ontologies:
| Prefix | Namespace | Purpose |
|--------|-----------|---------|
| `ghcid` | `https://w3id.org/heritage/custodian/` | GHCID domain classes and properties |
| `cidoc` | `http://www.cidoc-crm.org/cidoc-crm/` | CIDOC Conceptual Reference Model (cultural heritage) |
| `rico` | `https://www.ica.org/standards/RiC/ontology#` | Records in Contexts (archival description) |
| `schema` | `http://schema.org/` | Schema.org vocabulary (web search, IIIF) |
| `org` | `http://www.w3.org/ns/org#` | W3C Organization Ontology (partnerships, hierarchy) |
| `prov` | `http://www.w3.org/ns/prov#` | W3C PROV Ontology (provenance tracking) |
| `foaf` | `http://xmlns.com/foaf/0.1/` | Friend of a Friend (agents, names) |
| `dcterms` | `http://purl.org/dc/terms/` | Dublin Core metadata terms |
## Design Decisions
### Why org:Membership?
The W3C Organization Ontology provides `org:Membership` specifically for representing "membership or affiliation of agents to organizations." This aligns perfectly with heritage institution partnerships:
- **Standardized pattern** - Established W3C recommendation
- **Flexible scope** - Supports temporal bounds, roles, descriptions
- **Interoperable** - Used by government data portals (UK, EU)
- **Extensible** - Can add GHCID-specific properties via `ghcid:Partnership`
### Blank Nodes vs. URIs
**Current**: Partner organizations are blank nodes
**Rationale**: Most partners don't have GHCIDs (yet)
**Future**: Replace blank nodes with URIs when partners are in GHCID dataset
Example migration:
```turtle
# Current (blank node)
org:member [ a org:Organization ; schema:name "Museum Register" ]
# Future (URI reference)
org:member <https://w3id.org/heritage/custodian/nl/museum-register>
```
### Dual Typing (org:Membership + ghcid:Partnership)
Memberships are typed as **both** `org:Membership` and `ghcid:Partnership`:
```turtle
[ a org:Membership, ghcid:Partnership ; ... ]
```
**Rationale**:
- `org:Membership` - Standard interoperability with non-GLAM systems
- `ghcid:Partnership` - Domain-specific queries (e.g., SPARQL: `?s org:hasMembership ?m . ?m a ghcid:Partnership`)
## SPARQL Query Examples
### Find All Partnerships of an Institution
```sparql
PREFIX org: <http://www.w3.org/ns/org#>
PREFIX ghcid: <https://w3id.org/heritage/custodian/>
SELECT ?partner ?type WHERE {
<NL-AsnDA> org:hasMembership ?membership .
?membership a ghcid:Partnership ;
ghcid:partner_name ?partner ;
ghcid:partnership_type ?type .
}
```
### Find All Institutions in a Network
```sparql
PREFIX org: <http://www.w3.org/ns/org#>
SELECT ?institution ?name WHERE {
?institution org:hasMembership ?membership .
?membership org:role "thematic_network" ;
ghcid:partner_name "WO2Net" .
?institution schema:name ?name .
}
```
### Find Partnerships with Temporal Scope
```sparql
PREFIX schema: <http://schema.org/>
PREFIX org: <http://www.w3.org/ns/org#>
SELECT ?institution ?partner ?start ?end WHERE {
?institution org:hasMembership ?membership .
?membership ghcid:partner_name ?partner ;
schema:startDate ?start ;
schema:endDate ?end .
FILTER(?end > "2025-01-01"^^xsd:date)
}
```
## Next Steps
### Task 3: Conversation JSON Parser Enhancement
Add Partnership extraction to `src/glam_extractor/parsers/conversation.py`:
1. Pattern detection for partnership mentions
2. Classify partnership types from context
3. Extract temporal scope when mentioned
4. Link to partner organizations if identifiable
### Task 4: Global Partnership Taxonomy Documentation
Document the partnership type taxonomy in `docs/PARTNERSHIP_TAXONOMY.md`:
1. **Dutch Partnership Types** (18 types observed):
- `national_museum_certification` - Museum Register
- `aggregator_participation` - Collectie Nederland, Archieven.nl
- `digitization_program` - Versnellen, DC4EU
- `thematic_network` - WO2Net, Mondriaan, Van Gogh Worldwide
- (and 14 more types)
2. **Global Partnership Categories**:
- National certifications/registers
- Aggregation platforms
- Digitization programs
- Thematic networks
- International collaborations
- Funding partnerships
- Technical infrastructure
3. **Mapping to Controlled Vocabularies**:
- AAT (Art & Architecture Thesaurus)
- PROV-O activity types
- EU corporate vocabularies (CPOV)
## References
- **W3C Organization Ontology**: https://www.w3.org/TR/vocab-org/
- **CIDOC-CRM**: https://www.cidoc-crm.org/
- **RiC-O**: https://www.ica.org/standards/RiC/ontology
- **PROV-O**: https://www.w3.org/TR/prov-o/
- **Schema.org**: https://schema.org/
- **LinkML Schema**: `schemas/collections.yaml` (Partnership class definition)
---
**Contributors**: OpenCODE AI Agent
**License**: CC0 1.0 Universal (Public Domain)
**Project**: GLAM Data Extractor - Global Heritage Custodian Identifier (GHCID) System