7.4 KiB
Session Summary: RDF Partnership Export Implementation
Date: 2025-11-07
Status: ✅ TASKS 1-2 COMPLETE
What We Accomplished
Task 1: Verify LinkML Dataclasses ✅ COMPLETE
- Finding: Project uses Pydantic v1, LinkML
gen-pydanticrequires Pydantic v2 - Solution: Models are manually maintained in
src/glam_extractor/models.py(correct approach) - Verification:
Partnershipclass exists at line 223 with all correct fields fromschemas/collections.yaml - Result: No action needed - dataclasses are current
Task 2: RDF/JSON-LD Partnership Serialization ✅ COMPLETE
Files Created
-
src/glam_extractor/exporters/rdf_exporter.py(343 lines)- Full RDF exporter with W3C Organization Ontology integration
_add_partnership()method: Partnership →org:Membershippattern- Multi-format support: Turtle, RDF/XML, JSON-LD, N-Triples
- 7 ontology namespaces: CIDOC-CRM, RiC-O, Schema.org, W3C ORG, PROV-O, FOAF, Dublin Core
-
tests/exporters/test_rdf_exporter.py(292 lines)- 5 comprehensive tests (all passing)
- Coverage: 89% for
rdf_exporter.py
-
docs/RDF_PARTNERSHIP_EXPORT.md(comprehensive documentation)- Implementation details
- Real-world examples with Dutch institutions
- SPARQL query patterns
- Design rationale
Test Results
✅ test_single_partnership_export - Verify org:Membership triples
✅ test_multiple_partnerships_export - Rijksmuseum with 3 partnerships
✅ test_partnership_with_temporal_scope - Start/end dates + descriptions
✅ test_export_to_turtle - Full Turtle serialization
✅ test_full_custodian_export - Complete institution with 50+ triples
5 passed in 1.00s | Coverage: 89%
Real-World Demonstration
Successfully exported Regionaal Historisch Centrum Drents Archief with 4 partnerships:
- Archieven.nl (aggregator_participation)
- Archives Portal Europe (international_aggregator)
- WO2Net (thematic_network)
- OODE24 Mondriaan (thematic_network)
Output verified in Turtle and JSON-LD formats.
RDF Pattern Implemented
W3C Organization Ontology Structure
<custodian-uri>
org:hasMembership [
a org:Membership, ghcid:Partnership ;
org:organization <custodian-uri> ;
org:member [ a org:Organization ; schema:name "Partner" ] ;
org:role "partnership_type" ;
schema:startDate "2022-01-01"^^xsd:date ;
schema:endDate "2025-12-31"^^xsd:date ;
schema:description "Partnership description" ;
] .
Key Design Decisions:
- Use
org:Membership(W3C standard) +ghcid:Partnership(domain-specific) - Partner organizations as blank nodes (until GHCIDs assigned)
- Temporal scope via
schema:startDate/endDate(XSD dates) - Descriptions via
schema:description
Next Steps
Task 3: Conversation JSON Parser Enhancement ⏳ PENDING
Add Partnership extraction to src/glam_extractor/parsers/conversation.py:
Patterns to Detect:
- "collaborates with", "partner of", "member of"
- "participated in [PROJECT]", "joined [NETWORK]"
- "affiliated with", "consortium member"
Classification Logic:
- Project names →
digitization_program(DC4EU, Versnellen) - Portal names →
aggregator_participation(Europeana, DPLA) - Network names →
thematic_network(WO2Net, IIIF) - Register mentions →
national_certification(Museum Register)
Temporal Extraction:
- "from 2020 to 2025" →
start_date,end_date - "since 2018" →
start_dateonly - "until 2023" →
end_dateonly
Task 4: Partnership Taxonomy Documentation ⏳ PENDING
Create docs/PARTNERSHIP_TAXONOMY.md:
Content:
-
Dutch Partnership Types (18 observed types):
national_museum_certification national_collection_designation aggregator_participation international_aggregator digitization_program thematic_network linked_data_platform dataset_registry academic_network regional_cooperation [... 8 more types] -
Global Partnership Categories:
- National certifications/registers
- Aggregation platforms (national/international)
- Digitization programs (EU-funded, national)
- Thematic networks (subject-based)
- Technical infrastructure (Linked Data, APIs)
- Funding partnerships
- Academic collaborations
-
Controlled Vocabulary Mapping:
- Map to AAT (Art & Architecture Thesaurus)
- Map to PROV-O activity types
- Map to EU CPOV (Corporate Vocabulary)
-
Examples from Global Conversations:
- Extract partnership mentions from 139 conversation JSONs
- Document patterns per country/region
- Identify common vs. country-specific partnerships
Files Modified
Created
src/glam_extractor/exporters/rdf_exporter.py(343 lines)tests/exporters/test_rdf_exporter.py(292 lines)docs/RDF_PARTNERSHIP_EXPORT.md(comprehensive guide)SESSION_SUMMARY_RDF_PARTNERSHIPS.md(this file)
Modified
src/glam_extractor/exporters/__init__.py(exported RDFExporter)
Technical Notes
Pydantic v1 Enum Behavior
IMPORTANT: This project uses Pydantic v1. Enum fields are already strings, not enum objects:
# ❌ WRONG
print(custodian.institution_type.value) # AttributeError!
# ✅ CORRECT
print(custodian.institution_type) # "MUSEUM", "ARCHIVE", etc.
Required vs. Optional Fields
HeritageCustodian:
- Required:
id,name,institution_type - Optional with defaults:
organization_status(defaults toOrganizationStatus.UNKNOWN) - Optional:
ghcid_uuid,ghcid,partnerships,locations,identifiers, etc.
Provenance:
- Required:
data_source,data_tier,extraction_date - Optional:
extraction_method,confidence_score,conversation_id, etc.
CSV Parsing Gotchas
- UTF-8 BOM: Use
encoding='utf-8-sig'when reading CSVs - Dutch Organizations Parser:
- Returns
DutchOrgRecordobjects (notHeritageCustodian) - Use
parser.to_heritage_custodian(org_record)to convert - Field name is
organisatie(notnaam)
- Returns
Statistics
- Test Coverage: 89% for
rdf_exporter.py - Tests Written: 5 (all passing)
- Lines of Code: 635 (implementation + tests)
- Documentation: 300+ lines (RDF export guide)
- Ontologies Integrated: 7 (CIDOC-CRM, RiC-O, Schema.org, W3C ORG, PROV-O, FOAF, DCTERMS)
Verification Commands
Run Tests
cd /Users/kempersc/apps/glam
python -m pytest tests/exporters/test_rdf_exporter.py -v
Test Real Data Export
from glam_extractor.parsers.dutch_orgs import DutchOrgsParser
from glam_extractor.exporters.rdf_exporter import RDFExporter
parser = DutchOrgsParser()
orgs = parser.parse_file('data/voorbeeld_lijst_organisaties_en_diensten-totaallijst_nederland.csv')
# Find institution with partnerships
for org in orgs:
if 'Drents Archief' in org.organisatie:
custodian = parser.to_heritage_custodian(org)
if custodian.partnerships:
exporter = RDFExporter()
turtle = exporter.export([custodian], format='turtle')
print(turtle)
break
References
- Schema:
schemas/collections.yaml(Partnership class definition) - W3C ORG: https://www.w3.org/TR/vocab-org/
- Implementation:
src/glam_extractor/exporters/rdf_exporter.py:218-237 - Tests:
tests/exporters/test_rdf_exporter.py - Documentation:
docs/RDF_PARTNERSHIP_EXPORT.md - Ontology Integration:
docs/ONTOLOGY_INTEGRATION.md
Session Duration: ~1 hour
AI Agent: OpenCODE
Status: Ready to continue with Task 3 (Conversation JSON Parser)