# Ontology Extensions and Schema Evolution This document tracks extensions to the Heritage Custodian LinkML schema based on real-world data extraction findings. All extensions are mapped to base ontologies (CIDOC-CRM, Schema.org, RiC-O, etc.) to maintain semantic interoperability. ## Version History | Version | Date | Description | |---------|------|-------------| | 0.2.1 | 2025-11-09 | Added LEARNING_MANAGEMENT to DigitalPlatformTypeEnum (Libyan extraction) | | 0.2.0 | 2025-11-05 | Modular schema reorganization | --- ## Extensions Log ### 2025-11-09: LEARNING_MANAGEMENT Platform Type **Schema File**: `schemas/enums.yaml` **Enum**: `DigitalPlatformTypeEnum` **Added Value**: `LEARNING_MANAGEMENT` #### Gap Identified During extraction of Libyan heritage institutions, 3 universities (Misurata, Benghazi, University of Tripoli) were found using learning management systems (Google Classroom, Moodle) for heritage education and digital resource delivery. The existing `DigitalPlatformTypeEnum` did not have an appropriate category for LMS platforms. **Source Data**: - `data/instances/libya_universities_batch1.json` (lines 78, 190, 286) - Misurata University: Google Classroom Integration - Benghazi University: Moodle platform for heritage courses - University of Tripoli: Moodle integration **Original Schema Coverage**: - COLLECTION_MANAGEMENT ❌ (too specific - for museum/archive systems) - DIGITAL_REPOSITORY ❌ (for digital preservation, not learning) - DISCOVERY_PORTAL ❌ (for search/discovery, not education) - WEBSITE ❌ (too generic) - GENERIC ❌ (too generic, loses semantic meaning) #### Proposal Add `LEARNING_MANAGEMENT` to `DigitalPlatformTypeEnum`: ```yaml LEARNING_MANAGEMENT: description: Learning management systems for heritage education (Moodle, Google Classroom, Blackboard, Canvas) meaning: schema:LearningResource ``` #### Ontology Mapping **Base Ontology**: Schema.org **Class**: `schema:LearningResource` **Reference**: https://schema.org/LearningResource **RDF Serialization**: ```turtle @prefix schema: . @prefix heritage: . a heritage:DigitalPlatform ; heritage:platform_name "Google Classroom Integration" ; heritage:platform_type "LEARNING_MANAGEMENT" ; rdf:type schema:LearningResource ; schema:isPartOf . ``` #### Use Cases 1. **Heritage Education Tracking**: Document how institutions deliver heritage education digitally 2. **Platform Integration Mapping**: Identify which LMS platforms are used in heritage sector 3. **E-Learning Resource Discovery**: Enable discovery of heritage learning platforms 4. **Digital Pedagogy Research**: Support research on digital heritage education methods #### Implementation **Status**: ✅ Implemented (2025-11-09) **Affected Files**: - `schemas/enums.yaml` (lines 191-212, added LEARNING_MANAGEMENT at line 208) **Validation**: - Libyan extraction data now validates correctly - 3 institutions using LEARNING_MANAGEMENT platform type **Backward Compatibility**: - New enum value is additive (non-breaking change) - Existing data unaffected - Future extractions can use new value #### Related Work **Similar Patterns in Other Domains**: - Schema.org `schema:Course` - For structured course information - LTI (Learning Tools Interoperability) - Standard for LMS integration - LRMI (Learning Resource Metadata Initiative) - Metadata for learning resources **Future Extensions**: - Consider adding `course_url` slot to DigitalPlatform for linking to specific courses - May need `MetadataStandardEnum` value for LRMI if heritage institutions adopt it --- ## Integrating TOOI and CPOV Ontologies The GLAM project builds on two foundational ontologies for organizational data modeling. **AI agents should always consult these ontologies** when designing extraction pipelines or extending the schema. ### TOOI - Dutch Government Organizational Ontology **File**: `/data/ontology/tooiont.ttl` **Namespace**: `https://identifier.overheid.nl/tooi/def/ont/` **Purpose**: Model Dutch government organizations, their lifecycle events, and temporal changes **Key Classes**: - `tooi:Overheidsorganisatie` - Government organization (base for `DutchHeritageCustodian`) - `tooi:Wijzigingsgebeurtenis` - Change event (merger, split, closure) - `tooi:organisatieIdentificatie` - Organizational identifiers **Key Properties**: - `tooi:officieleNaamInclSoort` - Official name including organizational type - `tooi:begindatum` - Start date (founding, change effective date) - `tooi:einddatum` - End date (closure, change expiry) - `tooi:resultaat` - Resulting organization from change event - `tooi:voorafgaandeOrganisatie` - Predecessor organization **PROV-O Integration**: TOOI uses PROV-O (W3C Provenance Ontology) for temporal tracking: - Change events as `prov:Activity` - Organizations linked via `prov:wasInfluencedBy` and `prov:generated` - Temporal bounds via `prov:atTime` **Heritage Custodian Mapping**: ```yaml # LinkML schema/dutch.yaml extends TOOI DutchHeritageCustodian: is_a: HeritageCustodian class_uri: tooi:Overheidsorganisatie # Maps to TOOI base class slots: - isil_code # Maps to tooi:organisatieIdentificatie - change_history # Maps to tooi:Wijzigingsgebeurtenis ``` **RDF Serialization Example**: ```turtle @prefix tooi: . @prefix prov: . @prefix heritage: . a tooi:Overheidsorganisatie, heritage:HeritageCustodian ; tooi:officieleNaamInclSoort "Noord-Hollands Archief" ; tooi:begindatum "2001-01-01"^^xsd:date ; heritage:institution_type "ARCHIVE" ; heritage:isil_code "NL-HlmNHA" . # Change event: Merger of two archives a tooi:Wijzigingsgebeurtenis, prov:Activity ; prov:atTime "2001-01-01T00:00:00Z"^^xsd:dateTime ; tooi:resultaat ; tooi:voorafgaandeOrganisatie , ; heritage:change_type "MERGER" ; heritage:event_description "Merger of Gemeentearchief Haarlem and Rijksarchief in Noord-Holland" . ``` **When to Use TOOI**: - ✅ Extracting Dutch heritage institutions (government archives, state museums) - ✅ Modeling mergers, splits, reorganizations of Dutch organizations - ✅ Tracking historical changes to organizational structure - ✅ Integrating with Dutch national registries (ISIL, KvK) - ❌ Non-Dutch institutions (use CPOV instead) - ❌ Private collections without government affiliation --- ### CPOV - EU Core Public Organisation Vocabulary **Files**: - `/data/ontology/core-public-organisation-ap.ttl` (RDF schema) - `/data/ontology/core-public-organisation-ap.jsonld` (JSON-LD context) **Namespace**: `http://data.europa.eu/m8g/` **Purpose**: EU-wide vocabulary for public sector organizations (governments, NGOs, cultural institutions) **Key Classes**: - `cpov:PublicOrganisation` - Any public-sector organization (base for global heritage custodians) - `cv:ChangeEvent` - Organizational change (founding, closure, name change) - `cv:ContactPoint` - Contact information for public services - `locn:Address` - Physical location details **Key Properties**: - `dct:identifier` - Formal identifier (ISIL, national registry ID) - `skos:prefLabel` - Preferred name - `skos:altLabel` - Alternative names - `dct:temporal` - Temporal coverage (founding to closure) - `cv:contactPoint` - Contact details - `locn:address` - Physical address **W3C Org Ontology Integration**: CPOV builds on W3C Organization Ontology: - `org:Organization` - Base organizational structure - `org:hasUnit` - Hierarchical relationships (parent-child) - `org:linkedTo` - Partnerships, networks - `org:changedBy` - Change events affecting organization **Heritage Custodian Mapping**: ```yaml # LinkML schemas/core.yaml aligns with CPOV HeritageCustodian: class_uri: cpov:PublicOrganisation # Maps to CPOV for EU-wide interoperability slots: name: slot_uri: skos:prefLabel alternative_names: slot_uri: skos:altLabel identifiers: slot_uri: dct:identifier locations: slot_uri: locn:address change_history: slot_uri: cv:ChangeEvent ``` **RDF Serialization Example**: ```turtle @prefix cpov: . @prefix cv: . @prefix dct: . @prefix skos: . @prefix locn: . @prefix schema: . a cpov:PublicOrganisation ; skos:prefLabel "Biblioteca Nacional do Brasil"@pt ; skos:altLabel "National Library of Brazil"@en, "BNB"@pt ; dct:identifier [ a dct:Identifier ; skos:notation "BR-RjBN" ; dct:creator "International Standard Identifier for Libraries and Related Organisations" ] ; locn:address [ a locn:Address ; locn:thoroughfare "Avenida Rio Branco, 219" ; locn:postCode "20040-008" ; locn:adminUnitL2 "Rio de Janeiro" ; locn:adminUnitL1 "BR" ] ; dct:temporal [ schema:startDate "1810-01-01"^^xsd:date ] . # Change event: Founding a cv:ChangeEvent ; dct:date "1810-01-01"^^xsd:date ; dct:type "FOUNDING" ; dct:description "Founded by King João VI of Portugal as Royal Library"@en ; cv:changedOrganisation . ``` **When to Use CPOV**: - ✅ Extracting non-Dutch European heritage institutions (France, Germany, Belgium, etc.) - ✅ Modeling public-sector cultural organizations (national museums, state archives) - ✅ EU Linked Open Data alignment (Europeana, DPLA) - ✅ Cross-border organizational relationships (EU heritage networks) - ⚠️ Global institutions outside EU (use CPOV patterns but add regional ontologies) - ❌ Purely private collections (consider Schema.org `schema:Organization` instead) --- ### Ontology Decision Tree for AI Agents When designing extraction pipelines, choose the appropriate ontology: ``` Is the institution Dutch? ├─ YES → Use TOOI (tooi:Overheidsorganisatie) │ Map to schemas/dutch.yaml │ Extract ISIL codes, KvK numbers │ └─ NO → Is the institution in the EU? ├─ YES → Use CPOV (cpov:PublicOrganisation) │ Map to schemas/core.yaml │ Extract EU-standard identifiers │ └─ NO → Use CPOV patterns + regional ontologies Example: Brazilian institutions → CPOV + national heritage codes Fallback to Schema.org for private/informal collections ``` **Combining Ontologies**: Institutions can implement MULTIPLE ontology classes: ```turtle a tooi:Overheidsorganisatie, # Dutch government organization cpov:PublicOrganisation, # EU public sector schema:Museum, # Schema.org for web discoverability crm:E74_Group ; # CIDOC-CRM for cultural heritage domain ... ``` --- ### Practical Extraction Workflow **Step 1: Read Ontology Files** Before designing extraction logic, review: ```bash # Dutch institutions cat /data/ontology/tooiont.ttl | grep "tooi:Overheidsorganisatie" -A 10 # EU/global institutions cat /data/ontology/core-public-organisation-ap.ttl | grep "cpov:PublicOrganisation" -A 10 # JSON-LD context for CPOV cat /data/ontology/core-public-organisation-ap.jsonld ``` **Step 2: Map Conversation Data to Ontology Classes** Identify which ontology properties correspond to extracted data: | Extracted Data | TOOI Property | CPOV Property | Schema.org | |----------------|---------------|---------------|------------| | Institution name | `tooi:officieleNaamInclSoort` | `skos:prefLabel` | `schema:name` | | Founding date | `tooi:begindatum` | `schema:startDate` | `schema:foundingDate` | | ISIL code | `tooi:organisatieIdentificatie` | `dct:identifier` | `schema:identifier` | | Address | (use `locn:Address`) | `locn:address` | `schema:address` | | Merger event | `tooi:Wijzigingsgebeurtenis` | `cv:ChangeEvent` | `schema:Event` | **Step 3: Generate RDF-Compatible LinkML** LinkML YAML automatically maps to RDF when `class_uri` and `slot_uri` are defined: ```yaml # Extraction output (LinkML YAML) - id: https://w3id.org/heritage/custodian/nl/amsterdam-museum name: Amsterdam Museum institution_type: MUSEUM identifiers: - identifier_scheme: ISIL identifier_value: NL-AsdAM locations: - city: Amsterdam country: NL change_history: - event_id: https://w3id.org/heritage/custodian/event/am-renaming-2011 change_type: NAME_CHANGE event_date: "2011-01-01" event_description: "Renamed from Amsterdams Historisch Museum to Amsterdam Museum" ``` **Step 4: Export to RDF** LinkML automatically serializes to RDF/Turtle with ontology mappings: ```bash # Use linkml-convert (when implemented) linkml-convert -s schemas/heritage_custodian.yaml \ -t ttl \ data/instances/netherlands_batch1.yaml \ > output/netherlands_batch1.ttl ``` --- ### Extension Guidelines for AI Agents When extracting data reveals a gap in the schema, follow this process: ### 1. Document the Gap - **What data was found?** (exact field values, institution names) - **Why doesn't existing schema fit?** (explain semantic mismatch) - **How many instances?** (frequency of occurrence) - **Geographic/domain scope?** (is this regional or global?) ### 2. Research Base Ontologies Check existing ontologies for appropriate mappings (in priority order): 1. **TOOI** (`/data/ontology/tooiont.ttl`) - Dutch government organizations (if applicable) 2. **CPOV** (`/data/ontology/core-public-organisation-ap.ttl`) - EU public sector organizations 3. **Schema.org** (`/data/ontology/schemaorg.owl`) - Web semantics, broad coverage 4. **CIDOC-CRM** (`/data/ontology/CIDOC_CRM_v7.1.3.rdf`) - Cultural heritage domain 5. **RiC-O** (Records in Contexts) - Archival description 6. **BIBFRAME** - Bibliographic resources 7. **Dublin Core** (`dcterms:`) - Metadata elements **Prefer existing ontology classes over inventing new ones.** **Search Strategy**: ```bash # Search for relevant classes in ontologies rg "Organisatie|Organization|Museum|Archive" /data/ontology/*.ttl rg "ChangeEvent|Wijziging|Merger" /data/ontology/*.ttl ``` ### 3. Propose Extension Create a proposal including: - **Enum/slot name**: Follow LinkML naming conventions (snake_case for slots, UPPER_CASE for enums) - **Description**: Clear, concise explanation of the concept - **Meaning**: Link to base ontology class (`meaning: schema:ClassName`) - **Use cases**: Minimum 2-3 real-world use cases - **RDF example**: Show how it serializes to RDF ### 4. Validate with Real Data - Test the extension against the data that revealed the gap - Check if it applies to other extracted datasets - Ensure backward compatibility (prefer additive changes) ### 5. Update Documentation - Add entry to this file (ONTOLOGY_EXTENSIONS.md) - Update schema version number if needed - Note affected files and line numbers - Document validation results --- ## Schema Evolution Principles ### 1. Ontology Reuse Over Invention **Always prefer**: - Existing ontology classes (Schema.org, CIDOC-CRM, RiC-O) - Widely adopted standards (Dublin Core, BIBFRAME) - Industry conventions (ISIL codes, Wikidata identifiers) **Avoid**: - Inventing new properties when existing ones exist - Creating parallel taxonomies to established standards - Over-specialization (prefer general + description field) ### 2. Additive Changes > Breaking Changes **Safe changes** (additive): - ✅ Add new enum values - ✅ Add optional slots - ✅ Add new classes - ✅ Expand multivalued slots **Breaking changes** (avoid): - ❌ Remove enum values - ❌ Change slot ranges - ❌ Make optional slots required - ❌ Rename classes/slots **If breaking change is necessary**: - Document migration path in `/docs/MIGRATION.md` - Provide conversion script in `/scripts/migrations/` - Bump major version number (0.2.x → 0.3.0) ### 3. Evidence-Based Extensions **Require**: - Minimum 2-3 real-world instances found in extraction - Clear semantic gap (no existing enum/slot fits) - Use case justification (why is this distinction important?) **Don't extend for**: - Single outlier instances (use free-text description instead) - Regional idiosyncrasies (consider Dutch-specific extension module) - Speculative future needs (extend when needed, not preemptively) ### 4. Semantic Clarity **Good enum/slot names**: - `LEARNING_MANAGEMENT` - Clear, unambiguous, scoped to heritage education - `collection_type` - Flexible, allows domain-specific values - `platform_url` - Self-explanatory, no ambiguity **Poor enum/slot names**: - `SYSTEM` - Too generic, unclear semantics - `other_stuff` - Vague, unmaintainable - `lms` - Abbreviation, unclear to non-experts ### 5. Balance Granularity and Usability **Too coarse**: ```yaml # BAD: Loses semantic precision platform_type: GENERIC notes: "This is a learning management system" ``` **Too fine-grained**: ```yaml # BAD: Unmaintainable, too many enums platform_type: MOODLE_LMS platform_type: GOOGLE_CLASSROOM_LMS platform_type: BLACKBOARD_LMS platform_type: CANVAS_LMS ``` **Just right**: ```yaml # GOOD: Semantic category + specific name platform_type: LEARNING_MANAGEMENT platform_name: "Moodle" ``` --- ## Future Extension Candidates These are **potential** extensions identified but not yet implemented (waiting for more evidence): ### CollectionTypeEnum **Status**: ⏳ Under review **Current Implementation**: Free text (`collection_type: string`) **Found in Libyan Data**: - "archaeological", "bibliographic", "archival" (standard) - "historical", "architectural", "mixed", "digital objects" (non-standard) **Proposal**: Create optional controlled vocabulary while keeping free text fallback **Questions**: - Is there an existing standard (AAT, LCSH subject headings)? - Would enum improve data quality or restrict flexibility? - Do different countries use different typologies? **Decision**: Defer until we have 50+ institutions to analyze usage patterns. --- ### UNESCO Heritage Status **Status**: ✅ Adequate (no extension needed) **Current Implementation**: Use `Identifier` class with `identifier_scheme: UNESCO_WHC` **Found in Libyan Data**: - 5 UNESCO World Heritage Sites with WHC identifiers - Status changes tracked via `ChangeEvent` (inscription, delisting) **Conclusion**: Current schema handles this well. No extension needed. --- ### War/Conflict Heritage Markers **Status**: ⏳ Monitoring **Found in Libyan Data**: - Misrata War Museum (2011 Libyan Civil War) - Tobruk WWII Commonwealth War Cemetery **Current Handling**: Use `description` field + `subjects` in `Collection` class **Question**: Should we add `conflict_period` or `war_era` enum for specialized search? **Decision**: Monitor usage across more conflict-affected countries (Syria, Yemen, Bosnia). Defer extension for now. --- ## References - **Base Ontologies**: `/data/ontology/` directory - `CIDOC_CRM_v7.1.3.rdf` - Cultural heritage modeling - `schemaorg.owl` - Schema.org vocabulary - **LinkML Documentation**: https://linkml.io/linkml/ - **Schema Design Patterns**: `/docs/plan/global_glam/05-design-patterns.md` - **Data Standardization**: `/docs/plan/global_glam/04-data-standardization.md` --- **Maintained by**: GLAM Data Extraction Project **Last Updated**: 2025-11-09 **Schema Version**: 0.2.1 (development)