# Ontology Mapping Rules for Heritage Custodian Project **Version**: 1.0 **Last Updated**: 2025-11-20 **Purpose**: Define rigorous ontological mapping procedures for AI agents working on the GLAM heritage custodian data project --- ## Core Principle: Ontology-First Design **CRITICAL**: The primary objective of this project is to create a **comprehensive, nuanced ontology** that can accurately represent the complex, temporal, multi-faceted nature of heritage custodian institutions worldwide. ### What This Means - ✅ **DO**: Study ontology files deeply before creating classes or properties - ✅ **DO**: Map Wikidata entities to formal ontology classes with explicit rationale - ✅ **DO**: Model temporal independence of different aspects (place, custodian, legal form, collections, people) - ✅ **DO**: Support multiple ontology classes for the same entity (CPOV + TOOI + Schema.org + CIDOC-CRM) - ❌ **DON'T**: Use Wikidata Q-numbers directly as ontology classes - ❌ **DON'T**: Create generic "HeritageCustodian" mappings without considering semantic aspects - ❌ **DON'T**: Ignore temporal dimensions (everything changes over time!) --- ## Heritage-First Framing Principle **CRITICAL FRAMING**: This project exclusively focuses on entities with **heritage significance**. All Wikidata entities in our taxonomy are evaluated through a heritage lens. ### Heritage Significance Default When mapping Wikidata entities to ontology classes: - ✅ **ALWAYS assume heritage significance** - We only include entities that are or could become heritage custodians - ✅ **ALWAYS use heritage-focused ontology classes** - Prefer crm:E27_Site over generic schema:Place, prefer schema:LandmarksOrHistoricalBuildings over schema:Building - ✅ **ALWAYS model place aspect for physical sites** - Buildings, monuments, landscapes in our taxonomy have heritage value - ❌ **DON'T use generic real estate classes** - schema:Accommodation, schema:Residence are TOO GENERIC for our heritage focus - ❌ **DON'T require "proof of heritage status"** - If an entity type is in our Wikidata extraction, it has heritage potential ### Examples **Vacation Properties (Q3694)** - ❌ WRONG: "Use schema:Accommodation as primary class because most vacation properties are commercial rentals" - ✅ CORRECT: "Use crm:E27_Site as primary class because vacation properties in our taxonomy are HISTORIC VACATION PROPERTIES (royal summer palaces, historic villas) with documented heritage significance" **Mansions (Q1802963)** - ❌ WRONG: "Use schema:Residence because mansions are residential buildings" - ✅ CORRECT: "Use crm:E27_Site + schema:LandmarksOrHistoricalBuildings because mansions in our taxonomy are HERITAGE BUILDINGS with architectural significance" **Buitenplaatsen (Q2927789)** - ❌ WRONG: "Use schema:House because buitenplaatsen are country houses" - ✅ CORRECT: "Use crm:E27_Site because buitenplaatsen are HISTORIC ESTATES, many with Rijksmonument status and heritage protection" ### Ontology Selection Decision Tree for Physical Sites ``` Is the entity a physical place/building/site? ↓ YES Is it in our GLAMORCUBESFIXPHDNT taxonomy? ↓ YES THEN it has heritage significance ↓ PRIMARY CLASS: crm:E27_Site (CIDOC-CRM heritage site) SECONDARY CLASS: schema:LandmarksOrHistoricalBuildings (Schema.org) TERTIARY CLASS: dbo:HistoricPlace OR dbo:HistoricBuilding (DBpedia) ↓ NEVER USE: schema:Accommodation, schema:Residence, schema:Building (too generic) ``` ### Rationale 1. **Taxonomy Scope**: Our Wikidata extraction targets GLAM entities - by definition, these have heritage significance 2. **Project Mission**: We model heritage custodians, not generic real estate 3. **Ontology Precision**: Heritage-specific classes (crm:E27_Site) provide richer semantics than generic classes 4. **Data Quality**: Using heritage classes signals to consumers that these are culturally significant entities 5. **Interoperability**: CIDOC-CRM is the STANDARD for cultural heritage - we must use it for heritage sites --- ## Rule 1: Ontology Files Are Source of Truth **All ontology design MUST reference base ontologies in `/data/ontology/`.** ### Available Ontologies | Ontology | File | Scope | When to Use | |----------|------|-------|-------------| | **CPOV** | `core-public-organisation-ap.ttl` | EU public sector | Government archives, state museums, public cultural institutions | | **TOOI** | `tooiont.ttl` | Dutch government | Netherlands government heritage organizations | | **Schema.org** | `schemaorg.owl` | Web semantics | Private collections, web discoverability, general fallback | | **CIDOC-CRM** | `CIDOC_CRM_v7.1.3.rdf` | Cultural heritage domain | Museums, sites, curated holdings, provenance | | **RiC-O** | `RiC-O_1-1.rdf` | Archival description | Archives, record sets, corporate bodies | | **BIBFRAME** | `bibframe_vocabulary.rdf` | Bibliographic resources | Libraries, bibliographic collections | | **PiCo** | `pico.ttl` | Person observations | Staff, curators, archivists, directors | | **W3C Org** | (embedded in CPOV) | Organizational structure | Legal forms, organizational units | ### Mandatory Ontology Consultation Workflow **Before designing any LinkML class, agents MUST:** 1. **Identify the semantic domain** (cultural, archival, educational, legal, etc.) 2. **Read relevant ontology files** using `read` or `grep` tools 3. **Extract applicable classes and properties** 4. **Document ontology alignment** in design notes 5. **Map Wikidata hypernyms to ontology classes** (not vice versa!) **Example Workflow**: ```bash # Step 1: Identify domain # Entity: "mansion" (building + potential heritage custodian) # Step 2: Search CIDOC-CRM for site/building classes rg "E27_Site|E53_Place" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf # Step 3: Search Schema.org for building types rg "LandmarksOrHistoricalBuildings|TouristAttraction" /Users/kempersc/apps/glam/data/ontology/schemaorg.owl # Step 4: Search CPOV for organization classes (if mansion operates as museum) rg "PublicOrganisation|classification" /Users/kempersc/apps/glam/data/ontology/core-public-organisation-ap.ttl # Step 5: Document findings in design notes # "Mansion should map to crm:E27_Site (place aspect) AND # cpov:PublicOrganisation (custodian aspect if operates as museum)" ``` --- ## Rule 2: Never Use Wikidata Entities Directly **Wikidata Q-numbers are NOT ontology classes. They are ENTITY IDENTIFIERS.** ### Incorrect Approach ❌ ```yaml # BAD - Wikidata Q-number used as class HeritageCustodian: class_uri: wd:Q1802963 # ← This is an INSTANCE (mansion), not a CLASS! ``` ### Correct Approach ✅ ```yaml # GOOD - Wikidata entity mapped to formal ontology classes Mansion: description: >- Large residential building, often with heritage significance. Wikidata reference: Q1802963 # Place aspect place_class_uri: crm:E27_Site place_secondary_uri: schema:LandmarksOrHistoricalBuildings # Custodian aspect (if operates as heritage institution) custodian_class_uri: cpov:PublicOrganisation # If public custodian_alt_uri: schema:Museum # If private # Collections aspect collections_class_uri: crm:E78_Curated_Holding ``` ### Wikidata Hypernym Files Purpose The files `/schemas/hyponyms_curated.yaml` and `/schemas/hyponyms_curated_full.yaml` are: - ✅ **Source data** for identifying heritage entity TYPES - ✅ **Analysis input** for understanding domain taxonomy - ✅ **Reference** for multilingual labels and descriptions - ❌ **NOT** direct ontology class definitions **Required Mapping Workflow**: ``` hyponyms_curated.yaml (Wikidata entities) ↓ ANALYZE semantic properties ↓ SEARCH base ontologies for appropriate classes ↓ MAP Wikidata entity to ontology class(es) ↓ DOCUMENT rationale and properties ↓ CREATE LinkML schema with ontology class_uri ``` --- ## Rule 3: Multi-Aspect Modeling is Mandatory **Every heritage entity has MULTIPLE ontological aspects with INDEPENDENT temporal lifecycles.** ### Required Aspects All heritage custodian entities MUST model these aspects: 1. **Place Aspect** (physical location/site) - Ontology: CIDOC-CRM (E27_Site, E53_Place) + Schema.org (Place) - Temporal: Construction → Demolition/Present - Properties: Address, coordinates, building type, heritage designation 2. **Custodian Aspect** (organization managing heritage) - Ontology: CPOV (public) OR Schema.org (private) + CIDOC-CRM (E39_Actor) - Temporal: Founding → Dissolution/Present - Properties: Legal identifiers, organizational structure, mission 3. **Legal Form Aspect** (legal entity registration) - Ontology: W3C Org (FormalOrganization) + TOOI (Dutch) - Temporal: Registration → Deregistration/Present - Properties: KvK number, legal classification, registered address 4. **Collections Aspect** (heritage materials preserved) - Ontology: RiC-O (archival) OR CIDOC-CRM (museum) OR BIBFRAME (library) - Temporal: Accession → Deaccession (per item/collection) - Properties: Provenance, extent, access restrictions 5. **People Aspect** (staff/curators) - Ontology: PiCo (PersonObservation) + CIDOC-CRM (E21_Person) - Temporal: Employment start → Employment end (per person) - Properties: Roles, activities, employment records 6. **Temporal Events** (organizational changes) - Ontology: CIDOC-CRM (E10_Transfer_of_Custody, E8_Acquisition) + RiC-O (Event) - Properties: Custody transfers, mergers, relocations, transformations ### Example: Modeling a Historic Mansion Operating as Museum ```yaml # Entity: Villa Mondriaan (Winterswijk, Netherlands) # PLACE ASPECT villa_mondriaan_place: aspect_type: place class_uri: crm:E27_Site secondary_class_uri: schema:LandmarksOrHistoricalBuildings temporal_extent: construction_date: "1880-01-01" current_status: standing properties: address: "Zonnebrink 4, 7101 NP Winterswijk" coordinates: [51.9711, 6.7197] heritage_designation: "Rijksmonument" # CUSTODIAN ASPECT stichting_villa_mondriaan: aspect_type: custodian class_uri: cpov:PublicOrganisation # Dutch foundation with public benefit secondary_class_uri: schema:Museum temporal_extent: founding_date: "1994-05-12" current_status: active properties: legal_name: "Stichting Villa Mondriaan" isil_code: "NL-WtVM" manages: [villa_mondriaan_collections] # LEGAL FORM ASPECT stichting_legal_entity: aspect_type: legal_form class_uri: org:FormalOrganization mixin_class_uri: tooi:Overheidsorganisatie # Dutch government org temporal_extent: registration_date: "1994-05-12" current_status: registered properties: kvk_number: "12345678" legal_form: "stichting" # Dutch foundation # COLLECTIONS ASPECT villa_mondriaan_collections: aspect_type: collections class_uri: crm:E78_Curated_Holding archival_class_uri: rico:RecordSet temporal_extent: accession_start: "1994-01-01" current_status: growing properties: provenance: "Mondriaan family" extent: "500 objects, 200 archival documents" # PEOPLE ASPECT curator_maria_van_der_berg: aspect_type: person class_uri: pico:PersonObservation secondary_class_uri: crm:E21_Person temporal_extent: employment_start: "2020-01-01" current_status: employed properties: role: picot_roles:curator works_for: stichting_villa_mondriaan ``` --- ## Rule 4: Temporal Independence Documentation **All aspects have SEPARATE temporal lifecycles. Document this explicitly.** ### Required Temporal Properties Every aspect MUST include: ```yaml temporal_extent: start_date: "YYYY-MM-DD" # When this aspect began end_date: "YYYY-MM-DD" or null # When aspect ended (null = ongoing) certainty: "certain" | "approximate" | "inferred" source: "archival_record" | "legal_registration" | "oral_history" | etc. ``` ### Example: Temporal Independence in Custody Transfer ```yaml # Heineken corporate archive custody transfer (2005) # BEFORE TRANSFER (1864-2005) heineken_corporate_archive: custodian_aspect: custodian_id: heineken_nv class_uri: schema:Corporation temporal_extent: start_date: "1864-01-01" # Heineken founded end_date: "2005-06-15" # Custody transferred collections_aspect: class_uri: rico:RecordSet provenance: "Heineken N.V." temporal_extent: start_date: "1864-01-01" end_date: null # Collection still exists (just moved) # AFTER TRANSFER (2005-present) heineken_archive_at_stadsarchief: custodian_aspect: custodian_id: stadsarchief_amsterdam class_uri: cpov:PublicOrganisation temporal_extent: start_date: "2005-06-15" # Custody received end_date: null # Ongoing collections_aspect: class_uri: rico:RecordSet provenance: "Heineken N.V." # ← Provenance unchanged! temporal_extent: start_date: "1864-01-01" # ← Collection dates unchanged! end_date: null # CUSTODY TRANSFER EVENT custody_transfer_event: event_type: crm:E10_Transfer_of_Custody class_uri: rico:Event temporal_extent: event_date: "2005-06-15" properties: surrendered_by: heineken_nv received_by: stadsarchief_amsterdam transferred_object: heineken_corporate_archive ``` --- ## Rule 5: Ontology Properties Must Be Researched **Never invent custom properties when ontology equivalents exist.** ### Property Research Workflow 1. **Identify the relationship** you need to express 2. **Search base ontologies** for existing properties 3. **Use ontology property** with proper namespace 4. **Document property source** in comments **Example**: ```yaml # ❌ WRONG - Custom property invented institution: official_name: "Rijksarchief in Noord-Holland" # ✅ CORRECT - CPOV ontology property used institution: skos:prefLabel: "Rijksarchief in Noord-Holland"@nl # Source: CPOV uses SKOS for preferred labels ``` ### Common Property Mappings | Need | Ontology Property | Namespace | |------|-------------------|-----------| | Preferred name | `skos:prefLabel` | SKOS (used by CPOV) | | Alternative names | `skos:altLabel` | SKOS | | Identifiers | `dct:identifier` | Dublin Core Terms | | Address | `locn:address` | W3C Location Core | | Coordinates | `schema:geo` | Schema.org | | Founding date | `schema:foundingDate` OR `tooi:begindatum` | Schema.org / TOOI | | Organizational unit | `cpov:hasUnit` OR `org:hasUnit` | CPOV / W3C Org | | Curated collection | `crm:P147_curated` | CIDOC-CRM | | Archival holdings | `rico:isOrWasHolderOf` | RiC-O | | Person role | `pico:hasRole` | PiCo | | Provenance | `rico:hasProvenance` OR `prov:hadPrimarySource` | RiC-O / PROV-O | --- ## Rule 6: Decision Trees for Ontology Selection **Use structured decision trees to select appropriate ontologies.** ### Decision Tree: Primary Ontology Class ``` START: Heritage entity identified ↓ Is it a physical place/site? ├─ YES → PRIMARY: crm:E27_Site + schema:Place │ Continue to check if also a custodian organization ↓ │ └─ NO → Is it an organization? ├─ YES → Is it public sector? │ ├─ YES → cpov:PublicOrganisation │ │ Is it Dutch government? │ │ ├─ YES → ADD MIXIN: tooi:Overheidsorganisatie │ │ └─ NO → CPOV only │ │ │ └─ NO → schema:Organization │ What type? │ ├─ Museum → schema:Museum │ ├─ Library → schema:Library │ ├─ Archive → schema:ArchiveOrganization │ ├─ Education → schema:EducationalOrganization │ └─ NGO → schema:NGO │ └─ NO → Is it a collection? ├─ Archival → rico:RecordSet ├─ Museum → crm:E78_Curated_Holding ├─ Library → bf:Collection └─ Mixed → Use multiple classes ``` ### Decision Tree: Dutch vs. EU vs. Global ``` START: Determine geographic/legal scope ↓ Country == "Netherlands"? ├─ YES → Legal status == "public"? │ ├─ YES → USE: tooi:Overheidsorganisatie (Dutch government) │ │ ALSO ADD: cpov:PublicOrganisation (EU compliance) │ │ │ └─ NO → USE: schema:Organization (private) │ ADD: DutchLegalEntityMixin (KvK numbers) │ └─ NO → In Europe? ├─ YES → Legal status == "public"? │ ├─ YES → USE: cpov:PublicOrganisation │ └─ NO → USE: schema:Organization │ └─ NO → USE: schema:Organization (global) ADD domain-specific class: - schema:Museum - schema:ArchiveOrganization - schema:Library ``` --- ## Rule 7: Documentation Requirements **All ontology mappings MUST be documented with rationale.** ### Required Documentation Fields ```yaml ontology_mapping: wikidata_source: Q1802963 # Wikidata entity being mapped wikidata_label: mansion primary_class: uri: crm:E27_Site namespace: http://www.cidoc-crm.org/cidoc-crm/ rationale: >- CIDOC-CRM E27_Site for physical heritage buildings with archaeological/architectural significance. ontology_file: data/ontology/CIDOC_CRM_v7.1.3.rdf ontology_section: "Lines 1234-1267" # Optional secondary_class: uri: schema:LandmarksOrHistoricalBuildings namespace: http://schema.org/ rationale: Web discoverability for historic landmarks ontology_file: data/ontology/schemaorg.owl properties: - uri: crm:P1_is_identified_by range: crm:E41_Appellation usage: Building name identification example: "Buitenplaats Beeckestijn" - uri: schema:geo range: schema:GeoCoordinates usage: Geographic coordinates example: "{latitude: 51.9711, longitude: 6.7197}" temporal_model: aspects: - place # Physical site - custodian # If operates as heritage institution - collections # If holds curated materials temporal_independence_note: >- Place existence (construction → present) is independent from custodian organization lifecycle (founding → present). complexity_score: 9 # 1-10 scale reviewed_by: human_expert review_date: "2025-11-20" ``` --- ## Rule 8: Prohibited Practices **The following practices are STRICTLY FORBIDDEN:** ### ❌ Prohibited 1. **Using Wikidata Q-numbers as class URIs** ```yaml # FORBIDDEN class_uri: wd:Q33506 # This is an entity, not a class! ``` 2. **Creating custom properties without ontology research** ```yaml # FORBIDDEN slots: institution_official_name: # Use skos:prefLabel instead! ``` 3. **Single-ontology mappings for complex entities** ```yaml # FORBIDDEN - Mansion is BOTH place AND potential custodian Mansion: class_uri: schema:Place # ← Missing custodian aspect! ``` 4. **Ignoring temporal dimensions** ```yaml # FORBIDDEN - No temporal tracking custodian: name: "Heineken Archive" location: "Amsterdam" # ← Where are the dates? Which period does this describe? ``` 5. **Binary public/private classifications** ```yaml # FORBIDDEN - Too simplistic PublicHeritageCustodian: # What about NGOs? Foundations? Mixed? PrivateHeritageCustodian: # What about government corporations? ``` --- ## Rule 9: Quality Assurance Checklist **Before submitting any ontology design, verify:** - [ ] All base ontologies consulted (`/data/ontology/` files read) - [ ] Wikidata entities mapped to formal ontology classes (not used directly) - [ ] Multi-aspect modeling applied (place, custodian, legal, collections, people) - [ ] Temporal independence documented for each aspect - [ ] Properties sourced from ontologies (not custom inventions) - [ ] Decision trees applied for ontology selection - [ ] Rationale documented for all class/property choices - [ ] Examples provided with real-world entities - [ ] Complexity score assigned (1-10 scale) - [ ] Human review requested for complexity ≥ 7 --- ## Rule 10: Agent Collaboration Protocol **When working with other agents or humans:** 1. **Always cite ontology files** in design discussions - "According to CIDOC-CRM (lines 1234-1267 in CIDOC_CRM_v7.1.3.rdf)..." 2. **Share ontology search commands** for reproducibility ```bash rg "E27_Site" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf ``` 3. **Document disagreements** with explicit rationale - "Agent A suggests schema:Museum, but I recommend cpov:PublicOrganisation because institution is government-operated (see TOOI classification rules)." 4. **Request human review** for: - Complexity score ≥ 7 - Conflicting ontology recommendations - Temporal modeling ambiguities - Novel aspect combinations --- ## Example: Complete Ontology Mapping Workflow **Scenario**: Map Wikidata Q3437789 (heemkamer - local history room) ### Step 1: Research Entity ```bash # Read Wikidata metadata from hyponyms_curated_full.yaml grep -A 100 "Q3437789" /Users/kempersc/apps/glam/data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated_full.yaml ``` **Findings**: - Dutch concept: "Local history room/museum" - Usually operated by volunteers/heritage societies - Mix of museum, archive, library functions - Often in small municipalities ### Step 2: Search Base Ontologies ```bash # Search CPOV for organizational types rg "classification|OrganisationType" /Users/kempersc/apps/glam/data/ontology/core-public-organisation-ap.ttl # Search Schema.org for community organizations rg "NGO|CivicStructure|LocalBusiness" /Users/kempersc/apps/glam/data/ontology/schemaorg.owl # Search CIDOC-CRM for community groups rg "E74_Group|E40_Legal_Body" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf ``` ### Step 3: Apply Decision Trees **Geographic scope**: Netherlands → Check TOOI **Legal status**: Usually private foundation (stichting) or association (vereniging) **Function**: Collects + Preserves + Exhibits local heritage → Multi-functional **Decision**: - PRIMARY: `schema:NGO` (non-governmental heritage organization) - SECONDARY: `crm:E74_Group` (community heritage group) - DUTCH MIXIN: `DutchLegalEntityMixin` (KvK registration) ### Step 4: Model Aspects ```yaml heemkamer: wikidata_id: Q3437789 ontology_mapping: # CUSTODIAN ASPECT custodian_class: schema:NGO custodian_secondary: crm:E74_Group rationale: >- Non-governmental community heritage organization. Not public sector (excludes CPOV). Uses Schema.org NGO. # PLACE ASPECT (often operates in specific building) place_class: schema:CivicStructure place_secondary: crm:E27_Site # LEGAL FORM ASPECT (Dutch foundation/association) legal_class: org:FormalOrganization legal_dutch_mixin: DutchLegalEntityMixin properties: kvk_number: required legal_form: "stichting OR vereniging" # COLLECTIONS ASPECT (multi-functional) collections_classes: - rico:RecordSet # Local archival materials - crm:E78_Curated_Holding # Museum objects - bf:Collection # Local history books # PEOPLE ASPECT (volunteers) people_class: pico:PersonObservation people_roles: - picot_roles:curator - picot_roles:volunteer_archivist - picot_roles:educator temporal_model: aspects: - custodian # Founding → present/closure - place # Building occupancy (may change) - collections # Accessions over time - people # Volunteer participation periods ``` ### Step 5: Document and Review ```yaml ontology_enrichment: complexity_score: 8 # Multi-functional, temporal complexity requires_human_review: true review_notes: >- Heemkamer concept is Dutch-specific with no direct international equivalent. Multi-functional nature (museum + archive + library) requires careful aspect modeling. ``` --- ## Summary: Key Takeaways for Agents 1. **Ontology files are your bible** - Read them first, always 2. **Wikidata is data, not ontology** - Map Q-numbers to formal classes 3. **Everything has multiple aspects** - Place, custodian, legal, collections, people 4. **Time is always a factor** - Model temporal independence 5. **Properties must be justified** - Use ontology properties, document rationale 6. **Complexity is reality** - Don't oversimplify, embrace nuance 7. **Document everything** - Future agents/humans need your reasoning 8. **Ask for help** - Complex cases require human review **When in doubt**: Read the ontology files, consult AGENTS.md, request human guidance. --- **End of Ontology Mapping Rules v1.0**