# Heritage Custodian Ontology Integration Design **Version**: 0.2.0 **Date**: 2025-11-05 **Status**: DRAFT - Awaiting subagent analysis ## Executive Summary This document outlines the design for integrating multiple established ontologies into the Heritage Custodian LinkML schema: - **TOOI** (Dutch government organizational ontology) - **CPOV** (Core Public Organization Vocabulary - EU) - **Schema.org** (web semantics) - **EDM** (Europeana Data Model) - **PROV-O** (W3C Provenance Ontology) ## Key Ontology Patterns Identified ### 1. TOOI Temporal Model **Source**: `data/ontology/tooiont.ttl` TOOI provides a sophisticated temporal tracking system for organizational changes: #### Core Classes ```turtle tooi:Overheidsorganisatie - rdfs:subClassOf org:FormalOrganization - rdfs:subClassOf prov:Agent - rdfs:subClassOf prov:Entity - rdfs:subClassOf prov:Organization ``` **Key Properties**: - `prov:generatedAtTime` - when organization was founded - `prov:invalidatedAtTime` - when organization ceased to exist - `tooi:begindatum` - first calendar day organization existed (date component of generatedAtTime) - `tooi:einddatum` - last calendar day organization existed (date component of invalidatedAtTime) - `tooi:afkorting` - abbreviation - `tooi:alternatieveNaam` - alternative names (multivalued) - `dcterms:isPartOf` - parent organization (recursive) - `tooi:organisatiecode` - unique organization code #### Change Event Model ```turtle tooi:Wijzigingsgebeurtenis (ChangeEvent) - rdfs:subClassOf prov:Activity - Properties: - tooi:tijdstipWijziging (dateTime when change formally occurred) - tooi:tijdstipRegistratie (dateTime when change was registered) - tooi:redenWijziging (reason for change) - tooi:heeftJuridischeGrondslag (legal basis for change) tooi:ExistentieleWijziging (ExistentialChange) - rdfs:subClassOf tooi:Wijzigingsgebeurtenis - Subtypes: - tooi:Afsplitsing (Split/spinoff) - tooi:Fusie (Merger) - tooi:Opheffing (Dissolution) ``` **Design Implication**: Our current `ghcid_history` uses a simple list of history entries. We should consider adding a `ChangeEvent` class that follows TOOI's pattern of linking changes to PROV-O activities. ### 2. CPOV Public Organization Model **Source**: `data/ontology/core-public-organisation-ap.ttl` CPOV focuses on public sector organizations with EU interoperability: #### Core Classes ```turtle cpov:PublicOrganisation - rdfs:subClassOf org:Organization - Represents government/public heritage organizations cpov:ContactPoint - Properties: - cpov:email - cpov:telephone - cpov:contactPage (foaf:Document) ``` **Design Implication**: Our `ContactInfo` class aligns well with `cpov:ContactPoint`. We should add `class_uri: cpov:ContactPoint` mapping. ### 3. PROV-O Provenance Model Both TOOI and CPOV heavily use PROV-O for temporal and provenance tracking: - `prov:Entity` - things with provenance - `prov:Activity` - activities that affect entities - `prov:Agent` - agents responsible for activities - `prov:generatedAtTime` - when entity was created - `prov:invalidatedAtTime` - when entity ceased to be valid - `prov:wasGeneratedBy` - links entity to creating activity - `prov:wasInvalidatedBy` - links entity to ending activity **Design Implication**: We should make our `HeritageCustodian` a subclass of `prov:Entity` and use PROV-O properties for temporal tracking. ## Proposed Schema Extensions ### New Classes to Add #### 1. ChangeEvent Models organizational changes over time (inspired by TOOI): ```yaml ChangeEvent: description: >- An event that changed the state of a heritage custodian organization (e.g., founding, closure, relocation, name change, merger, split). Based on tooi:Wijzigingsgebeurtenis pattern. class_uri: prov:Activity mixins: - tooi:Wijzigingsgebeurtenis slots: - change_type # founding, closure, relocation, rename, merger, split - effective_date # when change formally occurred (tooi:tijdstipWijziging) - registration_date # when change was recorded (tooi:tijdstipRegistratie) - reason # reason for change (tooi:redenWijziging) - legal_basis # legal document/regulation (tooi:heeftJuridischeGrondslag) - affected_organization # link to HeritageCustodian - resulting_ghcid # new GHCID after this change - previous_ghcid # GHCID before this change ``` #### 2. OrganizationalUnit For departments/branches of larger institutions: ```yaml OrganizationalUnit: description: >- A unit, department, or branch within a larger heritage custodian organization. class_uri: org:OrganizationalUnit is_a: HeritageCustodian slots: - unit_type # department, branch, division, section - parent_unit # recursive ``` ### Properties to Enhance #### Temporal Properties Add PROV-O temporal tracking to `HeritageCustodian`: ```yaml # In HeritageCustodian class slots: - prov_generated_at # maps to prov:generatedAtTime - prov_invalidated_at # maps to prov:invalidatedAtTime - change_history # list of ChangeEvent instances # Slot definitions prov_generated_at: description: Timestamp when organization was formally founded range: datetime slot_uri: prov:generatedAtTime prov_invalidated_at: description: Timestamp when organization ceased to exist range: datetime slot_uri: prov:invalidatedAtTime change_history: description: Historical record of changes to this organization range: ChangeEvent multivalued: true inlined: true inlined_as_list: true ``` #### Name Properties (TOOI-inspired) ```yaml official_name: description: Official legal name of the organization range: string slot_uri: tooi:officieleNaamInclSoort sorting_name: description: Name formatted for alphabetical sorting range: string slot_uri: tooi:officieleNaamSorteer abbreviation: description: Official abbreviation or acronym range: string slot_uri: tooi:afkorting ``` ### Ontology Mappings to Update #### HeritageCustodian ```yaml HeritageCustodian: class_uri: org:Organization mixins: - prov:Entity # Add PROV-O provenance tracking - tooi:Overheidsorganisatie # For Dutch institutions - cpov:PublicOrganisation # For government institutions - schema:Organization # For Schema.org compatibility ``` #### ContactInfo ```yaml ContactInfo: class_uri: cpov:ContactPoint exact_mappings: - schema:ContactPoint ``` ### Enumerations to Add #### ChangeTypeEnum ```yaml ChangeTypeEnum: description: Types of organizational changes permissible_values: FOUNDING: description: Organization was founded meaning: tooi:Oprichting CLOSURE: description: Organization ceased operations meaning: tooi:Opheffing MERGER: description: Organization merged with another meaning: tooi:Fusie SPLIT: description: Organization split into multiple entities meaning: tooi:Afsplitsing RELOCATION: description: Organization moved to new location NAME_CHANGE: description: Organization changed its name TYPE_CHANGE: description: Institution type changed STATUS_CHANGE: description: Operational status changed ``` ## Integration with GHCID System The GHCID system already tracks identifier changes via `ghcid_history`. We should: 1. **Keep `ghcid_history`** as-is (simple, functional) 2. **Add `change_history`** for richer semantic change tracking 3. **Link the two**: Each `GHCIDHistoryEntry` should reference a `ChangeEvent` if applicable ### Example Mapping ```yaml # Simple GHCID history (current system) ghcid_history: - ghcid: "NL-NH-AMS-M-RM" ghcid_numeric: 12345678901234567890 valid_from: "2020-01-01T00:00:00Z" valid_to: null reason: "Initial identifier" institution_name: "Rijksmuseum" location_city: "Amsterdam" location_country: "NL" # Rich semantic change history (new system) change_history: - change_type: FOUNDING effective_date: "1800-11-19T00:00:00Z" registration_date: "2020-01-01T00:00:00Z" reason: "Founded as national art museum" resulting_ghcid: "NL-NH-AMS-M-RM" affected_organization: "https://example.org/custodian/12345" ``` ## EDM Aggregator/Provider Pattern *(Awaiting subagent analysis of EDM conversations)* Expected patterns: - `edm:ProvidedCHO` (Cultural Heritage Object) - `edm:WebResource` (digital representation) - `edm:Agent` (provider organization) - `ore:Aggregation` (metadata aggregation) ## Namespace Strategy ### Recommended Approach 1. **Create our own namespace**: `https://w3id.org/heritage/custodian/` 2. **Reuse existing properties** via `slot_uri` mappings 3. **Define custom properties** only when no suitable property exists ### Prefix Registry ```yaml prefixes: heritage: https://w3id.org/heritage/custodian/ tooi: https://identifier.overheid.nl/tooi/def/ont/ cpov: http://data.europa.eu/m8g/ org: http://www.w3.org/ns/org# prov: http://www.w3.org/ns/prov# schema: http://schema.org/ rico: https://www.ica.org/standards/RiC/ontology# edm: http://www.europeana.eu/schemas/edm/ ore: http://www.openarchives.org/ore/terms/ ``` ## Validation Strategy ### SHACL Constraints TOOI uses SHACL extensively for validation. We should: 1. **Generate SHACL from LinkML** using `gen-shacl` 2. **Define custom constraints** for: - GHCID format validation - Date consistency (founded_date < closed_date) - Identifier uniqueness - Geographic coordinate validation ### Example SHACL (conceptual) ```turtle heritage:HeritageCustodianShape a sh:NodeShape ; sh:targetClass heritage:HeritageCustodian ; sh:property [ sh:path heritage:ghcid_current ; sh:pattern "^[A-Z]{2}-[A-Z0-9]{1,3}-[A-Z]{3}-[A-Z]-[A-Z0-9]{1,10}(-Q[0-9]+)?$" ; ] ; sh:property [ sh:path prov:generatedAtTime ; sh:maxCount 1 ; sh:datatype xsd:dateTime ; ] . ``` ## Implementation Roadmap ### Phase 1: Core Extensions (Current Priority) - [ ] Add `ChangeEvent` class - [ ] Add PROV-O temporal properties to `HeritageCustodian` - [ ] Add `ChangeTypeEnum` - [ ] Update `class_uri` and `slot_uri` mappings - [ ] Create example instances ### Phase 2: TOOI Integration - [ ] Add `OrganizationalUnit` class - [ ] Add Dutch-specific TOOI properties - [ ] Implement TOOI name variants (official, preferred, sorting) - [ ] Add legal basis tracking ### Phase 3: EDM Integration - [ ] Add aggregator/provider relationship model - [ ] Add collection digitization tracking - [ ] Add EDM-specific metadata ### Phase 4: Validation - [ ] Generate SHACL constraints from LinkML - [ ] Implement custom validators - [ ] Create validation test suite ## Open Questions 1. **Class hierarchy**: Should `HeritageCustodian` use `is_a` or `mixins` for multiple ontology mappings? - **Recommendation**: Use `mixins` to avoid diamond inheritance issues 2. **Temporal model**: Dual tracking (`founded_date` vs `prov:generatedAtTime`)? - **Recommendation**: Keep both - `founded_date` for simple queries, PROV-O for semantic interoperability 3. **Change events**: Link to `ghcid_history` or keep separate? - **Recommendation**: Keep separate but allow optional cross-references 4. **Dutch-specific fields**: In base class or subclass? - **Current approach**: `DutchHeritageCustodian` subclass ✅ ## References - TOOI Ontology: `data/ontology/tooiont.ttl` - CPOV Ontology: `data/ontology/core-public-organisation-ap.ttl` - W3C PROV-O: https://www.w3.org/TR/prov-o/ - W3C Org Ontology: https://www.w3.org/TR/vocab-org/ - LinkML Docs: https://linkml.io/linkml/ --- **Next Steps**: 1. Wait for subagent analysis of ontology conversations 2. Refine design based on subagent findings 3. Implement `heritage_custodian_extended.yaml` 4. Create example instances 5. Validate with LinkML tools