- Introduced custodian_hub_v3.mmd, custodian_hub_v4_final.mmd, and custodian_hub_v5_FINAL.mmd for Mermaid representation. - Created custodian_hub_FINAL.puml and custodian_hub_v3.puml for PlantUML representation. - Defined entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument. - Established relationships and associations between entities, including temporal extents, observations, and reconstruction activities. - Incorporated enumerations for various types, statuses, and classifications relevant to custodians and their activities.
15 KiB
Session Summary: Schema v0.2.1 and Ontology Integration Documentation
Date: 2025-11-09
Session Type: Schema Version Update + Documentation Enhancement
Status: ✅ Complete
Objectives
- ✅ Update schema version from 0.2.0 to 0.2.1 (reflecting LEARNING_MANAGEMENT addition)
- ✅ Document TOOI and CPOV ontology integration in ONTOLOGY_EXTENSIONS.md
- ✅ Add ontology reference instructions to AGENTS.md
- ✅ Update validation script with new platform type enum
Changes Made
1. Schema Version Update (0.2.0 → 0.2.1)
Updated version numbers in all 6 modular schema files:
Files Modified:
schemas/heritage_custodian.yaml- Line 12:version: 0.2.1schemas/core.yaml- Line 11:version: 0.2.1schemas/enums.yaml- Line 10:version: 0.2.1schemas/provenance.yaml- Line 10:version: 0.2.1schemas/collections.yaml- Line 10:version: 0.2.1schemas/dutch.yaml- Line 11:version: 0.2.1
Reason: Version 0.2.1 reflects the addition of LEARNING_MANAGEMENT to DigitalPlatformTypeEnum (added during Libyan extraction work in previous session, 2025-11-09).
Evidence:
$ rg "^version:" schemas/*.yaml
schemas/collections.yaml:version: 0.2.1
schemas/core.yaml:version: 0.2.1
schemas/heritage_custodian.yaml:version: 0.2.1
schemas/provenance.yaml:version: 0.2.1
schemas/dutch.yaml:version: 0.2.1
schemas/enums.yaml:version: 0.2.1
2. ONTOLOGY_EXTENSIONS.md Enhancement
File: docs/ONTOLOGY_EXTENSIONS.md
Added Section: "## Integrating TOOI and CPOV Ontologies" (inserted after line 101, before "Extension Guidelines")
Content Added (~350 lines):
2.1 TOOI Integration Documentation
TOOI - Dutch Government Organizational Ontology:
- File location:
/data/ontology/tooiont.ttl - Namespace:
https://identifier.overheid.nl/tooi/def/ont/ - Key classes:
tooi:Overheidsorganisatie,tooi:Wijzigingsgebeurtenis - Key properties:
tooi:officieleNaamInclSoort,tooi:begindatum,tooi:einddatum - PROV-O integration for temporal tracking
- Heritage custodian mapping to
DutchHeritageCustodianclass - RDF serialization examples (Noord-Hollands Archief merger case)
When to Use TOOI:
- ✅ Dutch heritage institutions (government archives, state museums)
- ✅ Dutch organizational change events (mergers, splits, reorganizations)
- ✅ Dutch ISIL registry or KvK data integration
- ❌ Non-Dutch institutions (use CPOV instead)
2.2 CPOV Integration Documentation
CPOV - EU Core Public Organisation Vocabulary:
- Files:
/data/ontology/core-public-organisation-ap.ttl,/data/ontology/core-public-organisation-ap.jsonld - Namespace:
http://data.europa.eu/m8g/ - Key classes:
cpov:PublicOrganisation,cv:ChangeEvent,locn:Address - Key properties:
skos:prefLabel,skos:altLabel,dct:identifier,locn:address - W3C Organization Ontology integration
- Heritage custodian mapping to base
HeritageCustodianclass - RDF serialization examples (Biblioteca Nacional do Brasil founding case)
When to Use CPOV:
- ✅ European heritage institutions (non-Dutch)
- ✅ Global public-sector cultural organizations
- ✅ EU Linked Open Data alignment (Europeana, DPLA)
- ⚠️ Global institutions (use CPOV patterns + regional ontologies)
2.3 Ontology Decision Tree
Added flowchart-style decision tree for agents:
Is the institution Dutch?
├─ YES → Use TOOI
└─ NO → Is it EU public organization?
├─ YES → Use CPOV
└─ NO → Use CPOV patterns + regional ontologies
2.4 Practical Extraction Workflow
Step 1: Read ontology files before extraction
Step 2: Map conversation data to ontology properties (comparison table)
Step 3: Generate RDF-compatible LinkML with class_uri and slot_uri
Step 4: Export to RDF/Turtle using linkml-convert
Property Mapping Table:
| Extracted Data | TOOI Property | CPOV Property | Schema.org |
|---|---|---|---|
| Institution name | tooi:officieleNaamInclSoort |
skos:prefLabel |
schema:name |
| Founding date | tooi:begindatum |
schema:startDate |
schema:foundingDate |
| ISIL code | tooi:organisatieIdentificatie |
dct:identifier |
schema:identifier |
| Merger event | tooi:Wijzigingsgebeurtenis |
cv:ChangeEvent |
schema:Event |
2.5 Updated Extension Guidelines
Reordered "Research Base Ontologies" checklist to prioritize TOOI and CPOV:
- TOOI (if Dutch institutions)
- CPOV (if EU/global public organizations)
- Schema.org (web semantics)
- CIDOC-CRM (cultural heritage)
- RiC-O (archival)
- BIBFRAME (bibliographic)
- Dublin Core (metadata elements)
Added search strategy using rg (ripgrep) to find relevant ontology classes.
3. AGENTS.md Enhancement
File: AGENTS.md
Changes Made:
3.1 Schema Version Update (Line 11, 13)
Changed references from "v0.2.0" to "v0.2.1":
- Line 11:
**Schema**: See the modular LinkML schema v0.2.1 described below. - Line 13:
## Schema Reference (v0.2.1)
3.2 New Section: "Base Ontologies for Global GLAM Data"
Location: Inserted after "Schema Reference (v0.2.1)" section (line 49), before "Institution Type Taxonomy" (line 51)
Content Added (~300 lines):
Foundation Ontologies (3 primary ontologies)
1. TOOI - Dutch Government Organizational Ontology
- File:
/data/ontology/tooiont.ttl - Scope: Dutch heritage institutions
- When to use: Dutch extraction, Dutch ISIL/KvK data, mergers/splits
- Key classes:
tooi:Overheidsorganisatie,tooi:Wijzigingsgebeurtenis - LinkML mapping:
DutchHeritageCustodian.class_uri: tooi:Overheidsorganisatie
2. CPOV - EU Core Public Organisation Vocabulary
- Files:
/data/ontology/core-public-organisation-ap.ttl,core-public-organisation-ap.jsonld - Scope: EU-wide and global public heritage organizations
- When to use: European/global extraction, Europeana/DPLA alignment
- Key classes:
cpov:PublicOrganisation,cv:ChangeEvent,locn:Address - LinkML mapping:
HeritageCustodian.class_uri: cpov:PublicOrganisation
3. Schema.org - Web Vocabulary
- File:
/data/ontology/schemaorg.owl - Scope: Universal web semantics
- When to use: Private collections, digital platforms, SEO optimization
- Key classes:
schema:Museum,schema:Library,schema:LearningResource
Ontology Decision Tree
ASCII flowchart guiding agents on which ontology to use based on institution type and location.
Required Ontology Consultation Workflow
4-step process for agents before extraction:
Step 1: Identify institution geographic scope (Dutch → TOOI, EU → CPOV, Other → Schema.org)
Step 2: Review ontology classes and properties using rg search commands:
rg "tooi:Overheidsorganisatie" /data/ontology/tooiont.ttl
rg "cpov:PublicOrganisation" /data/ontology/core-public-organisation-ap.ttl
Step 3: Map conversation data to ontology properties (table with 7 common fields)
Step 4: Document ontology alignment in provenance metadata:
provenance:
base_ontology: "http://data.europa.eu/m8g/"
ontology_alignment:
- "cpov:PublicOrganisation"
- "cv:ChangeEvent"
Common Ontology Patterns
Pattern 1: Organizational change events (TOOI vs. CPOV vs. Schema.org)
Pattern 2: Multilingual names (language-tagged literals in CPOV)
Pattern 3: Hierarchical relationships (W3C Org Ontology patterns)
Anti-Patterns to Avoid
❌ Inventing custom properties when ontology equivalents exist
❌ Ignoring ontology namespace conventions
❌ Extracting without reviewing ontology files
✅ Always map to base ontologies and document alignment
Additional Ontology Resources
Brief mentions of:
- CIDOC-CRM (museum objects, provenance)
- RiC-O (archival description)
- BIBFRAME (bibliographic metadata)
3.3 Updated "When Asked to Design NLP Components" Section (Line 1072-1083)
Changed:
- Line 1074: Schema version 0.2.0 → 0.2.1
- Line 1078: Schema reference section 0.2.0 → 0.2.1
Added (new item #2):
2. **Consult Base Ontologies**: BEFORE designing extraction logic, review relevant ontologies:
- **Dutch institutions**: Study TOOI ontology (`/data/ontology/tooiont.ttl`)
- **EU/global institutions**: Study CPOV ontology (`/data/ontology/core-public-organisation-ap.ttl`)
- **All institutions**: Reference Schema.org patterns (`/data/ontology/schemaorg.owl`)
- See "Base Ontologies for Global GLAM Data" section above for decision tree
Renumbered: Original items 2-5 became 3-6.
4. Validation Script Update
File: validate_instances.py
Change: Added LEARNING_MANAGEMENT to platform_type enum (line 54-57)
Before:
'platform_type': [
'COLLECTION_MANAGEMENT', 'DISCOVERY_PORTAL', 'DIGITAL_REPOSITORY',
'SPARQL_ENDPOINT', 'API', 'OTHER'
],
After:
'platform_type': [
'COLLECTION_MANAGEMENT', 'DISCOVERY_PORTAL', 'DIGITAL_REPOSITORY',
'SPARQL_ENDPOINT', 'API', 'LEARNING_MANAGEMENT', 'OTHER'
],
Impact: Validator now accepts Libyan university extraction files with LMS platforms (Google Classroom, Moodle).
Verification
Schema Version Consistency
All 6 schema files updated to v0.2.1:
$ rg "^version:" schemas/*.yaml
schemas/collections.yaml:version: 0.2.1
schemas/core.yaml:version: 0.2.1
schemas/heritage_custodian.yaml:version: 0.2.1
schemas/provenance.yaml:version: 0.2.1
schemas/dutch.yaml:version: 0.2.1
schemas/enums.yaml:version: 0.2.1
Documentation Updates Confirmed
ONTOLOGY_EXTENSIONS.md:
$ rg -A 3 "## Integrating TOOI and CPOV Ontologies" docs/ONTOLOGY_EXTENSIONS.md | head -10
## Integrating TOOI and CPOV Ontologies
The GLAM project builds on two foundational ontologies for organizational data modeling.
**AI agents should always consult these ontologies** when designing extraction pipelines...
AGENTS.md:
$ rg -A 3 "## Base Ontologies for Global GLAM" AGENTS.md | head -10
## Base Ontologies for Global GLAM Data
**CRITICAL**: Before designing extraction pipelines or extending the schema, AI agents
MUST consult the base ontologies that the LinkML schema builds upon...
Context from Previous Session
Previous Session (2025-11-09, earlier):
- ✅ Extracted 37 Libyan heritage institutions from conversation files
- ✅ Discovered gap: Learning management systems (LMS) not covered by existing platform types
- ✅ Added
LEARNING_MANAGEMENTtoDigitalPlatformTypeEnuminschemas/enums.yaml - ✅ Created 5 Libyan extraction JSON files (universities, museums, sites, buildings)
- ✅ Validated data with 3 universities using Google Classroom and Moodle
- ⏳ Deferred: LinkML validation (linkml-validate command crashed)
This Session (2025-11-09, resumed):
- ✅ Updated schema version to 0.2.1 to reflect LEARNING_MANAGEMENT addition
- ✅ Documented TOOI/CPOV ontology integration for future extraction work
- ✅ Enhanced agent instructions to require ontology consultation before extraction
- ✅ Updated validation script to support new platform type
Files Modified
| File | Lines Changed | Type | Status |
|---|---|---|---|
schemas/heritage_custodian.yaml |
1 line (version) | Schema | ✅ |
schemas/core.yaml |
1 line (version) | Schema | ✅ |
schemas/enums.yaml |
1 line (version) | Schema | ✅ |
schemas/provenance.yaml |
1 line (version) | Schema | ✅ |
schemas/collections.yaml |
1 line (version) | Schema | ✅ |
schemas/dutch.yaml |
1 line (version) | Schema | ✅ |
docs/ONTOLOGY_EXTENSIONS.md |
+350 lines | Documentation | ✅ |
AGENTS.md |
+300 lines, version updates | Documentation | ✅ |
validate_instances.py |
1 line (enum) | Validation | ✅ |
Total Changes: 9 files modified, ~650 lines added, 8 lines updated
Impact and Benefits
1. Schema Versioning
- Clear evolution tracking: v0.2.1 documents the LEARNING_MANAGEMENT extension
- Version consistency: All 6 schema modules updated in sync
- Historical record: ONTOLOGY_EXTENSIONS.md logs the rationale and evidence
2. Ontology Integration
- Standards compliance: Agents now know to align with TOOI, CPOV, Schema.org
- Semantic interoperability: Extraction data maps to established ontologies
- RDF-ready: LinkML
class_uriandslot_urimappings documented - Multi-ontology support: Institutions can implement multiple ontology classes
3. Agent Guidance
- Decision tree: Clear workflow for choosing appropriate ontology
- Practical examples: RDF serialization patterns for TOOI and CPOV
- Anti-patterns: Documented common mistakes to avoid
- Required workflow: 4-step process before extraction begins
4. Validation Support
- Enum coverage: Validator now supports all 7 platform types (including LEARNING_MANAGEMENT)
- Libyan data: 3 university LMS platforms can now be validated
- Quality assurance: Schema validation aligns with schema version 0.2.1
Next Steps
Immediate Priorities
-
Continue Extraction Work (Middle East/North Africa cluster):
- Algeria (conversation available)
- Morocco (conversation available)
- Egypt (conversation available)
- Tunisia (conversation available)
- Jordan (conversation available)
-
Alternative Validation Approach:
- Investigate why
linkml-validateCLI crashed on Libyan JSON files - Try Python API approach:
from linkml.validator import Validator - Or convert JSON to YAML and use existing validation script
- Investigate why
-
Test Ontology Alignment:
- Extract one Dutch institution following TOOI patterns
- Extract one Brazilian institution following CPOV patterns
- Generate RDF serialization with
linkml-convert - Verify ontology class mappings in output
Secondary Priorities
-
Update PROGRESS.md:
- Add entry for schema v0.2.1 release
- Document ontology integration milestone
- Update schema evolution timeline
-
Create Migration Guide (if needed):
- Document changes from v0.2.0 to v0.2.1
- Explain LEARNING_MANAGEMENT use cases
- Provide conversion examples for older datasets
-
Expand Ontology Coverage:
- Document CIDOC-CRM integration patterns (museum objects)
- Add RiC-O patterns for archival description
- Create BIBFRAME examples for library catalogs
References
- Schema v0.2.1: All schema files in
schemas/directory - Ontology Files:
- TOOI:
/data/ontology/tooiont.ttl - CPOV:
/data/ontology/core-public-organisation-ap.ttl,core-public-organisation-ap.jsonld - Schema.org:
/data/ontology/schemaorg.owl
- TOOI:
- Documentation:
/docs/ONTOLOGY_EXTENSIONS.md- Extension guidelines and ontology integration/AGENTS.md- AI agent instructions with ontology decision tree/docs/SCHEMA_MODULES.md- Modular schema architecture
- Validation:
validate_instances.py- Custom LinkML validator
Session Metadata
Start Time: 2025-11-09 (resumed from previous session summary)
End Time: 2025-11-09
Duration: ~30 minutes (documentation-focused)
Agent: OpenCODE (Claude 3.7 Sonnet)
Session Type: Schema maintenance + documentation enhancement
Complexity: Medium (version updates + comprehensive documentation)
Status: ✅ Session complete. Schema versioned to 0.2.1, ontology integration documented, agent instructions enhanced.