glam/docs/ontology_integration_design.md
kempersc fa5680f0dd Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats
- Introduced custodian_hub_v3.mmd, custodian_hub_v4_final.mmd, and custodian_hub_v5_FINAL.mmd for Mermaid representation.
- Created custodian_hub_FINAL.puml and custodian_hub_v3.puml for PlantUML representation.
- Defined entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument.
- Established relationships and associations between entities, including temporal extents, observations, and reconstruction activities.
- Incorporated enumerations for various types, statuses, and classifications relevant to custodians and their activities.
2025-11-22 14:33:51 +01:00

12 KiB

Heritage Custodian Ontology Integration Design

Version: 0.2.0
Date: 2025-11-05
Status: DRAFT - Awaiting subagent analysis

Executive Summary

This document outlines the design for integrating multiple established ontologies into the Heritage Custodian LinkML schema:

  • TOOI (Dutch government organizational ontology)
  • CPOV (Core Public Organization Vocabulary - EU)
  • Schema.org (web semantics)
  • EDM (Europeana Data Model)
  • PROV-O (W3C Provenance Ontology)

Key Ontology Patterns Identified

1. TOOI Temporal Model

Source: data/ontology/tooiont.ttl

TOOI provides a sophisticated temporal tracking system for organizational changes:

Core Classes

tooi:Overheidsorganisatie
  - rdfs:subClassOf org:FormalOrganization
  - rdfs:subClassOf prov:Agent
  - rdfs:subClassOf prov:Entity
  - rdfs:subClassOf prov:Organization

Key Properties:

  • prov:generatedAtTime - when organization was founded
  • prov:invalidatedAtTime - when organization ceased to exist
  • tooi:begindatum - first calendar day organization existed (date component of generatedAtTime)
  • tooi:einddatum - last calendar day organization existed (date component of invalidatedAtTime)
  • tooi:afkorting - abbreviation
  • tooi:alternatieveNaam - alternative names (multivalued)
  • dcterms:isPartOf - parent organization (recursive)
  • tooi:organisatiecode - unique organization code

Change Event Model

tooi:Wijzigingsgebeurtenis (ChangeEvent)
  - rdfs:subClassOf prov:Activity
  - Properties:
    - tooi:tijdstipWijziging (dateTime when change formally occurred)
    - tooi:tijdstipRegistratie (dateTime when change was registered)
    - tooi:redenWijziging (reason for change)
    - tooi:heeftJuridischeGrondslag (legal basis for change)

tooi:ExistentieleWijziging (ExistentialChange)
  - rdfs:subClassOf tooi:Wijzigingsgebeurtenis
  - Subtypes:
    - tooi:Afsplitsing (Split/spinoff)
    - tooi:Fusie (Merger)
    - tooi:Opheffing (Dissolution)

Design Implication: Our current ghcid_history uses a simple list of history entries. We should consider adding a ChangeEvent class that follows TOOI's pattern of linking changes to PROV-O activities.

2. CPOV Public Organization Model

Source: data/ontology/core-public-organisation-ap.ttl

CPOV focuses on public sector organizations with EU interoperability:

Core Classes

cpov:PublicOrganisation
  - rdfs:subClassOf org:Organization
  - Represents government/public heritage organizations

cpov:ContactPoint
  - Properties:
    - cpov:email
    - cpov:telephone
    - cpov:contactPage (foaf:Document)

Design Implication: Our ContactInfo class aligns well with cpov:ContactPoint. We should add class_uri: cpov:ContactPoint mapping.

3. PROV-O Provenance Model

Both TOOI and CPOV heavily use PROV-O for temporal and provenance tracking:

  • prov:Entity - things with provenance
  • prov:Activity - activities that affect entities
  • prov:Agent - agents responsible for activities
  • prov:generatedAtTime - when entity was created
  • prov:invalidatedAtTime - when entity ceased to be valid
  • prov:wasGeneratedBy - links entity to creating activity
  • prov:wasInvalidatedBy - links entity to ending activity

Design Implication: We should make our HeritageCustodian a subclass of prov:Entity and use PROV-O properties for temporal tracking.

Proposed Schema Extensions

New Classes to Add

1. ChangeEvent

Models organizational changes over time (inspired by TOOI):

ChangeEvent:
  description: >-
    An event that changed the state of a heritage custodian organization
    (e.g., founding, closure, relocation, name change, merger, split).
    Based on tooi:Wijzigingsgebeurtenis pattern.    
  class_uri: prov:Activity
  mixins:
    - tooi:Wijzigingsgebeurtenis
  slots:
    - change_type  # founding, closure, relocation, rename, merger, split
    - effective_date  # when change formally occurred (tooi:tijdstipWijziging)
    - registration_date  # when change was recorded (tooi:tijdstipRegistratie)
    - reason  # reason for change (tooi:redenWijziging)
    - legal_basis  # legal document/regulation (tooi:heeftJuridischeGrondslag)
    - affected_organization  # link to HeritageCustodian
    - resulting_ghcid  # new GHCID after this change
    - previous_ghcid  # GHCID before this change

2. OrganizationalUnit

For departments/branches of larger institutions:

OrganizationalUnit:
  description: >-
    A unit, department, or branch within a larger heritage custodian organization.    
  class_uri: org:OrganizationalUnit
  is_a: HeritageCustodian
  slots:
    - unit_type  # department, branch, division, section
    - parent_unit  # recursive

Properties to Enhance

Temporal Properties

Add PROV-O temporal tracking to HeritageCustodian:

# In HeritageCustodian class
slots:
  - prov_generated_at  # maps to prov:generatedAtTime
  - prov_invalidated_at  # maps to prov:invalidatedAtTime
  - change_history  # list of ChangeEvent instances

# Slot definitions
prov_generated_at:
  description: Timestamp when organization was formally founded
  range: datetime
  slot_uri: prov:generatedAtTime

prov_invalidated_at:
  description: Timestamp when organization ceased to exist
  range: datetime
  slot_uri: prov:invalidatedAtTime

change_history:
  description: Historical record of changes to this organization
  range: ChangeEvent
  multivalued: true
  inlined: true
  inlined_as_list: true

Name Properties (TOOI-inspired)

official_name:
  description: Official legal name of the organization
  range: string
  slot_uri: tooi:officieleNaamInclSoort

sorting_name:
  description: Name formatted for alphabetical sorting
  range: string
  slot_uri: tooi:officieleNaamSorteer

abbreviation:
  description: Official abbreviation or acronym
  range: string
  slot_uri: tooi:afkorting

Ontology Mappings to Update

HeritageCustodian

HeritageCustodian:
  class_uri: org:Organization
  mixins:
    - prov:Entity  # Add PROV-O provenance tracking
    - tooi:Overheidsorganisatie  # For Dutch institutions
    - cpov:PublicOrganisation  # For government institutions
    - schema:Organization  # For Schema.org compatibility

ContactInfo

ContactInfo:
  class_uri: cpov:ContactPoint
  exact_mappings:
    - schema:ContactPoint

Enumerations to Add

ChangeTypeEnum

ChangeTypeEnum:
  description: Types of organizational changes
  permissible_values:
    FOUNDING:
      description: Organization was founded
      meaning: tooi:Oprichting
    CLOSURE:
      description: Organization ceased operations
      meaning: tooi:Opheffing
    MERGER:
      description: Organization merged with another
      meaning: tooi:Fusie
    SPLIT:
      description: Organization split into multiple entities
      meaning: tooi:Afsplitsing
    RELOCATION:
      description: Organization moved to new location
    NAME_CHANGE:
      description: Organization changed its name
    TYPE_CHANGE:
      description: Institution type changed
    STATUS_CHANGE:
      description: Operational status changed

Integration with GHCID System

The GHCID system already tracks identifier changes via ghcid_history. We should:

  1. Keep ghcid_history as-is (simple, functional)
  2. Add change_history for richer semantic change tracking
  3. Link the two: Each GHCIDHistoryEntry should reference a ChangeEvent if applicable

Example Mapping

# Simple GHCID history (current system)
ghcid_history:
  - ghcid: "NL-NH-AMS-M-RM"
    ghcid_numeric: 12345678901234567890
    valid_from: "2020-01-01T00:00:00Z"
    valid_to: null
    reason: "Initial identifier"
    institution_name: "Rijksmuseum"
    location_city: "Amsterdam"
    location_country: "NL"

# Rich semantic change history (new system)
change_history:
  - change_type: FOUNDING
    effective_date: "1800-11-19T00:00:00Z"
    registration_date: "2020-01-01T00:00:00Z"
    reason: "Founded as national art museum"
    resulting_ghcid: "NL-NH-AMS-M-RM"
    affected_organization: "https://example.org/custodian/12345"

EDM Aggregator/Provider Pattern

(Awaiting subagent analysis of EDM conversations)

Expected patterns:

  • edm:ProvidedCHO (Cultural Heritage Object)
  • edm:WebResource (digital representation)
  • edm:Agent (provider organization)
  • ore:Aggregation (metadata aggregation)

Namespace Strategy

  1. Create our own namespace: https://w3id.org/heritage/custodian/
  2. Reuse existing properties via slot_uri mappings
  3. Define custom properties only when no suitable property exists

Prefix Registry

prefixes:
  heritage: https://w3id.org/heritage/custodian/
  tooi: https://identifier.overheid.nl/tooi/def/ont/
  cpov: http://data.europa.eu/m8g/
  org: http://www.w3.org/ns/org#
  prov: http://www.w3.org/ns/prov#
  schema: http://schema.org/
  rico: https://www.ica.org/standards/RiC/ontology#
  edm: http://www.europeana.eu/schemas/edm/
  ore: http://www.openarchives.org/ore/terms/

Validation Strategy

SHACL Constraints

TOOI uses SHACL extensively for validation. We should:

  1. Generate SHACL from LinkML using gen-shacl
  2. Define custom constraints for:
    • GHCID format validation
    • Date consistency (founded_date < closed_date)
    • Identifier uniqueness
    • Geographic coordinate validation

Example SHACL (conceptual)

heritage:HeritageCustodianShape
  a sh:NodeShape ;
  sh:targetClass heritage:HeritageCustodian ;
  sh:property [
    sh:path heritage:ghcid_current ;
    sh:pattern "^[A-Z]{2}-[A-Z0-9]{1,3}-[A-Z]{3}-[A-Z]-[A-Z0-9]{1,10}(-Q[0-9]+)?$" ;
  ] ;
  sh:property [
    sh:path prov:generatedAtTime ;
    sh:maxCount 1 ;
    sh:datatype xsd:dateTime ;
  ] .

Implementation Roadmap

Phase 1: Core Extensions (Current Priority)

  • Add ChangeEvent class
  • Add PROV-O temporal properties to HeritageCustodian
  • Add ChangeTypeEnum
  • Update class_uri and slot_uri mappings
  • Create example instances

Phase 2: TOOI Integration

  • Add OrganizationalUnit class
  • Add Dutch-specific TOOI properties
  • Implement TOOI name variants (official, preferred, sorting)
  • Add legal basis tracking

Phase 3: EDM Integration

  • Add aggregator/provider relationship model
  • Add collection digitization tracking
  • Add EDM-specific metadata

Phase 4: Validation

  • Generate SHACL constraints from LinkML
  • Implement custom validators
  • Create validation test suite

Open Questions

  1. Class hierarchy: Should HeritageCustodian use is_a or mixins for multiple ontology mappings?

    • Recommendation: Use mixins to avoid diamond inheritance issues
  2. Temporal model: Dual tracking (founded_date vs prov:generatedAtTime)?

    • Recommendation: Keep both - founded_date for simple queries, PROV-O for semantic interoperability
  3. Change events: Link to ghcid_history or keep separate?

    • Recommendation: Keep separate but allow optional cross-references
  4. Dutch-specific fields: In base class or subclass?

    • Current approach: DutchHeritageCustodian subclass

References


Next Steps:

  1. Wait for subagent analysis of ontology conversations
  2. Refine design based on subagent findings
  3. Implement heritage_custodian_extended.yaml
  4. Create example instances
  5. Validate with LinkML tools