387 lines
12 KiB
Markdown
387 lines
12 KiB
Markdown
# RDF Partnership Export Implementation
|
|
|
|
**Status**: ✅ COMPLETE
|
|
**Date**: 2025-11-07
|
|
**Version**: 1.0
|
|
|
|
## Overview
|
|
|
|
Successfully implemented RDF/JSON-LD serialization of Partnership data using W3C Organization Ontology (ORG) patterns. The implementation integrates multiple heritage ontologies including CIDOC-CRM, RiC-O, Schema.org, PROV-O, and W3C ORG.
|
|
|
|
## Implementation
|
|
|
|
### Files Created/Modified
|
|
|
|
1. **`src/glam_extractor/exporters/rdf_exporter.py`** (343 lines) - NEW
|
|
- Full RDF exporter with multi-ontology support
|
|
- Partnership serialization using `org:Membership` pattern
|
|
- Supports Turtle, RDF/XML, JSON-LD, N-Triples formats
|
|
|
|
2. **`src/glam_extractor/exporters/__init__.py`** - UPDATED
|
|
- Exported `RDFExporter` class for public API
|
|
|
|
3. **`tests/exporters/test_rdf_exporter.py`** (292 lines) - NEW
|
|
- 5 comprehensive tests covering:
|
|
- Single partnership export
|
|
- Multiple partnerships export
|
|
- Partnerships with temporal scope (start/end dates)
|
|
- Full Turtle serialization
|
|
- Complete custodian with all fields
|
|
|
|
### Test Results
|
|
|
|
```
|
|
tests/exporters/test_rdf_exporter.py::TestRDFExporterPartnership::test_single_partnership_export PASSED
|
|
tests/exporters/test_rdf_exporter.py::TestRDFExporterPartnership::test_multiple_partnerships_export PASSED
|
|
tests/exporters/test_rdf_exporter.py::TestRDFExporterPartnership::test_partnership_with_temporal_scope PASSED
|
|
tests/exporters/test_rdf_exporter.py::TestRDFExporterPartnership::test_export_to_turtle PASSED
|
|
tests/exporters/test_rdf_exporter.py::TestRDFExporterCompleteness::test_full_custodian_export PASSED
|
|
|
|
5 passed in 1.00s
|
|
Coverage: 89% for rdf_exporter.py
|
|
```
|
|
|
|
## RDF Partnership Pattern
|
|
|
|
### W3C Organization Ontology Pattern
|
|
|
|
Partnerships are serialized using the `org:Membership` class with the following structure:
|
|
|
|
```turtle
|
|
<custodian-uri>
|
|
org:hasMembership [
|
|
a org:Membership, ghcid:Partnership ;
|
|
org:organization <custodian-uri> ;
|
|
org:member [
|
|
a org:Organization ;
|
|
schema:name "Partner Name"
|
|
] ;
|
|
org:role "partnership_type" ;
|
|
ghcid:partner_name "Partner Name" ;
|
|
ghcid:partnership_type "partnership_type" ;
|
|
schema:startDate "2022-01-01"^^xsd:date ;
|
|
schema:endDate "2025-12-31"^^xsd:date ;
|
|
schema:description "Partnership description" ;
|
|
] .
|
|
```
|
|
|
|
### Ontology Integration
|
|
|
|
**Primary Classes**:
|
|
- `org:Membership` - W3C Organization Ontology (standardized pattern)
|
|
- `ghcid:Partnership` - GHCID-specific type for domain queries
|
|
|
|
**Properties**:
|
|
- `org:organization` - Links membership to custodian
|
|
- `org:member` - Partner organization (blank node or URI)
|
|
- `org:role` - Partnership type (string literal)
|
|
- `schema:startDate` / `schema:endDate` - Temporal scope (XSD dates)
|
|
- `schema:description` - Partnership description
|
|
- `ghcid:partner_name` - Partner organization name (string)
|
|
- `ghcid:partnership_type` - Partnership classification
|
|
|
|
### Partner Organization Representation
|
|
|
|
Partners are represented as blank nodes with:
|
|
- `rdf:type org:Organization`
|
|
- `schema:name` - Organization name
|
|
|
|
**Future Enhancement**: When partner organizations have resolvable URIs in the GHCID dataset, replace blank nodes with URI references.
|
|
|
|
## Real-World Example
|
|
|
|
### Input Data
|
|
|
|
From Dutch Organizations CSV (`data/voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.csv`):
|
|
|
|
```
|
|
Regionaal Historisch Centrum (RHC) Drents Archief
|
|
- ISIL: NL-AsnDA
|
|
- City: Assen
|
|
- Partnerships:
|
|
- Archieven.nl (aggregator_participation)
|
|
- Archives Portal Europe (international_aggregator)
|
|
- WO2Net (thematic_network)
|
|
- OODE24 (Mondriaan) (thematic_network)
|
|
```
|
|
|
|
### RDF Output (Turtle)
|
|
|
|
```turtle
|
|
<NL-AsnDA> a schema:ArchiveOrganization,
|
|
schema:Organization,
|
|
org:Organization,
|
|
prov:Entity,
|
|
ghcid:HeritageCustodian,
|
|
rico:CorporateBody ;
|
|
schema:name "Regionaal Historisch Centrum (RHC) Drents Archief" ;
|
|
|
|
org:hasMembership [
|
|
a org:Membership, ghcid:Partnership ;
|
|
org:organization <NL-AsnDA> ;
|
|
org:member [ a org:Organization ; schema:name "Archieven.nl" ] ;
|
|
org:role "aggregator_participation" ;
|
|
schema:description "Dutch national archive portal" ;
|
|
] ,
|
|
[
|
|
a org:Membership, ghcid:Partnership ;
|
|
org:organization <NL-AsnDA> ;
|
|
org:member [ a org:Organization ; schema:name "Archives Portal Europe" ] ;
|
|
org:role "international_aggregator" ;
|
|
schema:description "European archive aggregation network" ;
|
|
] ,
|
|
[
|
|
a org:Membership, ghcid:Partnership ;
|
|
org:organization <NL-AsnDA> ;
|
|
org:member [ a org:Organization ; schema:name "WO2Net" ] ;
|
|
org:role "thematic_network" ;
|
|
schema:description "WWII heritage network" ;
|
|
] ,
|
|
[
|
|
a org:Membership, ghcid:Partnership ;
|
|
org:organization <NL-AsnDA> ;
|
|
org:member [ a org:Organization ; schema:name "OODE24 (Mondriaan)" ] ;
|
|
org:role "thematic_network" ;
|
|
schema:description "Mondriaan art project" ;
|
|
] .
|
|
```
|
|
|
|
## Export Formats Supported
|
|
|
|
### 1. Turtle (RDF/Turtle)
|
|
|
|
```python
|
|
exporter = RDFExporter()
|
|
turtle = exporter.export([custodian], format="turtle")
|
|
```
|
|
|
|
**Features**:
|
|
- Human-readable RDF serialization
|
|
- Prefix declarations for all ontologies
|
|
- Blank node lists for partnerships
|
|
|
|
### 2. JSON-LD
|
|
|
|
```python
|
|
jsonld = exporter.export([custodian], format="json-ld")
|
|
```
|
|
|
|
**Features**:
|
|
- JSON structure with `@context`, `@type`, `@id`
|
|
- Machine-parseable linked data
|
|
- Interoperable with IIIF, Web Annotations, Activity Streams
|
|
|
|
### 3. RDF/XML
|
|
|
|
```python
|
|
rdfxml = exporter.export([custodian], format="xml")
|
|
```
|
|
|
|
**Features**:
|
|
- XML serialization for OAI-PMH, SWORD
|
|
- Traditional Semantic Web format
|
|
|
|
### 4. N-Triples
|
|
|
|
```python
|
|
ntriples = exporter.export([custodian], format="nt")
|
|
```
|
|
|
|
**Features**:
|
|
- Simple triple format (subject, predicate, object per line)
|
|
- Easy to parse with Unix tools
|
|
|
|
## Usage Examples
|
|
|
|
### Export Single Custodian
|
|
|
|
```python
|
|
from glam_extractor.exporters.rdf_exporter import RDFExporter
|
|
from glam_extractor.models import HeritageCustodian, Partnership
|
|
|
|
custodian = HeritageCustodian(
|
|
id="https://w3id.org/heritage/custodian/nl/test",
|
|
name="Test Museum",
|
|
institution_type=InstitutionType.MUSEUM,
|
|
partnerships=[
|
|
Partnership(
|
|
partner_name="Museum Register",
|
|
partnership_type="national_museum_certification"
|
|
)
|
|
],
|
|
provenance=Provenance(...)
|
|
)
|
|
|
|
exporter = RDFExporter()
|
|
turtle = exporter.export([custodian], format="turtle")
|
|
print(turtle)
|
|
```
|
|
|
|
### Export Multiple Custodians
|
|
|
|
```python
|
|
exporter = RDFExporter()
|
|
for custodian in custodians:
|
|
exporter.add_custodian(custodian)
|
|
|
|
# Export all at once
|
|
turtle = exporter.export(custodians, format="turtle")
|
|
```
|
|
|
|
### Export to File
|
|
|
|
```python
|
|
exporter = RDFExporter()
|
|
turtle = exporter.export(custodians, format="turtle")
|
|
|
|
with open("output.ttl", "w", encoding="utf-8") as f:
|
|
f.write(turtle)
|
|
```
|
|
|
|
## Ontology Namespaces
|
|
|
|
The RDF exporter integrates the following ontologies:
|
|
|
|
| Prefix | Namespace | Purpose |
|
|
|--------|-----------|---------|
|
|
| `ghcid` | `https://w3id.org/heritage/custodian/` | GHCID domain classes and properties |
|
|
| `cidoc` | `http://www.cidoc-crm.org/cidoc-crm/` | CIDOC Conceptual Reference Model (cultural heritage) |
|
|
| `rico` | `https://www.ica.org/standards/RiC/ontology#` | Records in Contexts (archival description) |
|
|
| `schema` | `http://schema.org/` | Schema.org vocabulary (web search, IIIF) |
|
|
| `org` | `http://www.w3.org/ns/org#` | W3C Organization Ontology (partnerships, hierarchy) |
|
|
| `prov` | `http://www.w3.org/ns/prov#` | W3C PROV Ontology (provenance tracking) |
|
|
| `foaf` | `http://xmlns.com/foaf/0.1/` | Friend of a Friend (agents, names) |
|
|
| `dcterms` | `http://purl.org/dc/terms/` | Dublin Core metadata terms |
|
|
|
|
## Design Decisions
|
|
|
|
### Why org:Membership?
|
|
|
|
The W3C Organization Ontology provides `org:Membership` specifically for representing "membership or affiliation of agents to organizations." This aligns perfectly with heritage institution partnerships:
|
|
|
|
- **Standardized pattern** - Established W3C recommendation
|
|
- **Flexible scope** - Supports temporal bounds, roles, descriptions
|
|
- **Interoperable** - Used by government data portals (UK, EU)
|
|
- **Extensible** - Can add GHCID-specific properties via `ghcid:Partnership`
|
|
|
|
### Blank Nodes vs. URIs
|
|
|
|
**Current**: Partner organizations are blank nodes
|
|
**Rationale**: Most partners don't have GHCIDs (yet)
|
|
**Future**: Replace blank nodes with URIs when partners are in GHCID dataset
|
|
|
|
Example migration:
|
|
```turtle
|
|
# Current (blank node)
|
|
org:member [ a org:Organization ; schema:name "Museum Register" ]
|
|
|
|
# Future (URI reference)
|
|
org:member <https://w3id.org/heritage/custodian/nl/museum-register>
|
|
```
|
|
|
|
### Dual Typing (org:Membership + ghcid:Partnership)
|
|
|
|
Memberships are typed as **both** `org:Membership` and `ghcid:Partnership`:
|
|
|
|
```turtle
|
|
[ a org:Membership, ghcid:Partnership ; ... ]
|
|
```
|
|
|
|
**Rationale**:
|
|
- `org:Membership` - Standard interoperability with non-GLAM systems
|
|
- `ghcid:Partnership` - Domain-specific queries (e.g., SPARQL: `?s org:hasMembership ?m . ?m a ghcid:Partnership`)
|
|
|
|
## SPARQL Query Examples
|
|
|
|
### Find All Partnerships of an Institution
|
|
|
|
```sparql
|
|
PREFIX org: <http://www.w3.org/ns/org#>
|
|
PREFIX ghcid: <https://w3id.org/heritage/custodian/>
|
|
|
|
SELECT ?partner ?type WHERE {
|
|
<NL-AsnDA> org:hasMembership ?membership .
|
|
?membership a ghcid:Partnership ;
|
|
ghcid:partner_name ?partner ;
|
|
ghcid:partnership_type ?type .
|
|
}
|
|
```
|
|
|
|
### Find All Institutions in a Network
|
|
|
|
```sparql
|
|
PREFIX org: <http://www.w3.org/ns/org#>
|
|
|
|
SELECT ?institution ?name WHERE {
|
|
?institution org:hasMembership ?membership .
|
|
?membership org:role "thematic_network" ;
|
|
ghcid:partner_name "WO2Net" .
|
|
?institution schema:name ?name .
|
|
}
|
|
```
|
|
|
|
### Find Partnerships with Temporal Scope
|
|
|
|
```sparql
|
|
PREFIX schema: <http://schema.org/>
|
|
PREFIX org: <http://www.w3.org/ns/org#>
|
|
|
|
SELECT ?institution ?partner ?start ?end WHERE {
|
|
?institution org:hasMembership ?membership .
|
|
?membership ghcid:partner_name ?partner ;
|
|
schema:startDate ?start ;
|
|
schema:endDate ?end .
|
|
FILTER(?end > "2025-01-01"^^xsd:date)
|
|
}
|
|
```
|
|
|
|
## Next Steps
|
|
|
|
### Task 3: Conversation JSON Parser Enhancement
|
|
|
|
Add Partnership extraction to `src/glam_extractor/parsers/conversation.py`:
|
|
|
|
1. Pattern detection for partnership mentions
|
|
2. Classify partnership types from context
|
|
3. Extract temporal scope when mentioned
|
|
4. Link to partner organizations if identifiable
|
|
|
|
### Task 4: Global Partnership Taxonomy Documentation
|
|
|
|
Document the partnership type taxonomy in `docs/PARTNERSHIP_TAXONOMY.md`:
|
|
|
|
1. **Dutch Partnership Types** (18 types observed):
|
|
- `national_museum_certification` - Museum Register
|
|
- `aggregator_participation` - Collectie Nederland, Archieven.nl
|
|
- `digitization_program` - Versnellen, DC4EU
|
|
- `thematic_network` - WO2Net, Mondriaan, Van Gogh Worldwide
|
|
- (and 14 more types)
|
|
|
|
2. **Global Partnership Categories**:
|
|
- National certifications/registers
|
|
- Aggregation platforms
|
|
- Digitization programs
|
|
- Thematic networks
|
|
- International collaborations
|
|
- Funding partnerships
|
|
- Technical infrastructure
|
|
|
|
3. **Mapping to Controlled Vocabularies**:
|
|
- AAT (Art & Architecture Thesaurus)
|
|
- PROV-O activity types
|
|
- EU corporate vocabularies (CPOV)
|
|
|
|
## References
|
|
|
|
- **W3C Organization Ontology**: https://www.w3.org/TR/vocab-org/
|
|
- **CIDOC-CRM**: https://www.cidoc-crm.org/
|
|
- **RiC-O**: https://www.ica.org/standards/RiC/ontology
|
|
- **PROV-O**: https://www.w3.org/TR/prov-o/
|
|
- **Schema.org**: https://schema.org/
|
|
- **LinkML Schema**: `schemas/collections.yaml` (Partnership class definition)
|
|
|
|
---
|
|
|
|
**Contributors**: OpenCODE AI Agent
|
|
**License**: CC0 1.0 Universal (Public Domain)
|
|
**Project**: GLAM Data Extractor - Global Heritage Custodian Identifier (GHCID) System
|