- Introduced custodian_hub_v3.mmd, custodian_hub_v4_final.mmd, and custodian_hub_v5_FINAL.mmd for Mermaid representation. - Created custodian_hub_FINAL.puml and custodian_hub_v3.puml for PlantUML representation. - Defined entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument. - Established relationships and associations between entities, including temporal extents, observations, and reconstruction activities. - Incorporated enumerations for various types, statuses, and classifications relevant to custodians and their activities.
24 KiB
Ontology Mapping Rules for Heritage Custodian Project
Version: 1.0
Last Updated: 2025-11-20
Purpose: Define rigorous ontological mapping procedures for AI agents working on the GLAM heritage custodian data project
Core Principle: Ontology-First Design
CRITICAL: The primary objective of this project is to create a comprehensive, nuanced ontology that can accurately represent the complex, temporal, multi-faceted nature of heritage custodian institutions worldwide.
What This Means
- ✅ DO: Study ontology files deeply before creating classes or properties
- ✅ DO: Map Wikidata entities to formal ontology classes with explicit rationale
- ✅ DO: Model temporal independence of different aspects (place, custodian, legal form, collections, people)
- ✅ DO: Support multiple ontology classes for the same entity (CPOV + TOOI + Schema.org + CIDOC-CRM)
- ❌ DON'T: Use Wikidata Q-numbers directly as ontology classes
- ❌ DON'T: Create generic "HeritageCustodian" mappings without considering semantic aspects
- ❌ DON'T: Ignore temporal dimensions (everything changes over time!)
Heritage-First Framing Principle
CRITICAL FRAMING: This project exclusively focuses on entities with heritage significance. All Wikidata entities in our taxonomy are evaluated through a heritage lens.
Heritage Significance Default
When mapping Wikidata entities to ontology classes:
- ✅ ALWAYS assume heritage significance - We only include entities that are or could become heritage custodians
- ✅ ALWAYS use heritage-focused ontology classes - Prefer crm:E27_Site over generic schema:Place, prefer schema:LandmarksOrHistoricalBuildings over schema:Building
- ✅ ALWAYS model place aspect for physical sites - Buildings, monuments, landscapes in our taxonomy have heritage value
- ❌ DON'T use generic real estate classes - schema:Accommodation, schema:Residence are TOO GENERIC for our heritage focus
- ❌ DON'T require "proof of heritage status" - If an entity type is in our Wikidata extraction, it has heritage potential
Examples
Vacation Properties (Q3694)
- ❌ WRONG: "Use schema:Accommodation as primary class because most vacation properties are commercial rentals"
- ✅ CORRECT: "Use crm:E27_Site as primary class because vacation properties in our taxonomy are HISTORIC VACATION PROPERTIES (royal summer palaces, historic villas) with documented heritage significance"
Mansions (Q1802963)
- ❌ WRONG: "Use schema:Residence because mansions are residential buildings"
- ✅ CORRECT: "Use crm:E27_Site + schema:LandmarksOrHistoricalBuildings because mansions in our taxonomy are HERITAGE BUILDINGS with architectural significance"
Buitenplaatsen (Q2927789)
- ❌ WRONG: "Use schema:House because buitenplaatsen are country houses"
- ✅ CORRECT: "Use crm:E27_Site because buitenplaatsen are HISTORIC ESTATES, many with Rijksmonument status and heritage protection"
Ontology Selection Decision Tree for Physical Sites
Is the entity a physical place/building/site?
↓ YES
Is it in our GLAMORCUBESFIXPHDNT taxonomy?
↓ YES
THEN it has heritage significance
↓
PRIMARY CLASS: crm:E27_Site (CIDOC-CRM heritage site)
SECONDARY CLASS: schema:LandmarksOrHistoricalBuildings (Schema.org)
TERTIARY CLASS: dbo:HistoricPlace OR dbo:HistoricBuilding (DBpedia)
↓
NEVER USE: schema:Accommodation, schema:Residence, schema:Building (too generic)
Rationale
- Taxonomy Scope: Our Wikidata extraction targets GLAM entities - by definition, these have heritage significance
- Project Mission: We model heritage custodians, not generic real estate
- Ontology Precision: Heritage-specific classes (crm:E27_Site) provide richer semantics than generic classes
- Data Quality: Using heritage classes signals to consumers that these are culturally significant entities
- Interoperability: CIDOC-CRM is the STANDARD for cultural heritage - we must use it for heritage sites
Rule 1: Ontology Files Are Source of Truth
All ontology design MUST reference base ontologies in /data/ontology/.
Available Ontologies
| Ontology | File | Scope | When to Use |
|---|---|---|---|
| CPOV | core-public-organisation-ap.ttl |
EU public sector | Government archives, state museums, public cultural institutions |
| TOOI | tooiont.ttl |
Dutch government | Netherlands government heritage organizations |
| Schema.org | schemaorg.owl |
Web semantics | Private collections, web discoverability, general fallback |
| CIDOC-CRM | CIDOC_CRM_v7.1.3.rdf |
Cultural heritage domain | Museums, sites, curated holdings, provenance |
| RiC-O | RiC-O_1-1.rdf |
Archival description | Archives, record sets, corporate bodies |
| BIBFRAME | bibframe_vocabulary.rdf |
Bibliographic resources | Libraries, bibliographic collections |
| PiCo | pico.ttl |
Person observations | Staff, curators, archivists, directors |
| W3C Org | (embedded in CPOV) | Organizational structure | Legal forms, organizational units |
Mandatory Ontology Consultation Workflow
Before designing any LinkML class, agents MUST:
- Identify the semantic domain (cultural, archival, educational, legal, etc.)
- Read relevant ontology files using
readorgreptools - Extract applicable classes and properties
- Document ontology alignment in design notes
- Map Wikidata hypernyms to ontology classes (not vice versa!)
Example Workflow:
# Step 1: Identify domain
# Entity: "mansion" (building + potential heritage custodian)
# Step 2: Search CIDOC-CRM for site/building classes
rg "E27_Site|E53_Place" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf
# Step 3: Search Schema.org for building types
rg "LandmarksOrHistoricalBuildings|TouristAttraction" /Users/kempersc/apps/glam/data/ontology/schemaorg.owl
# Step 4: Search CPOV for organization classes (if mansion operates as museum)
rg "PublicOrganisation|classification" /Users/kempersc/apps/glam/data/ontology/core-public-organisation-ap.ttl
# Step 5: Document findings in design notes
# "Mansion should map to crm:E27_Site (place aspect) AND
# cpov:PublicOrganisation (custodian aspect if operates as museum)"
Rule 2: Never Use Wikidata Entities Directly
Wikidata Q-numbers are NOT ontology classes. They are ENTITY IDENTIFIERS.
Incorrect Approach ❌
# BAD - Wikidata Q-number used as class
HeritageCustodian:
class_uri: wd:Q1802963 # ← This is an INSTANCE (mansion), not a CLASS!
Correct Approach ✅
# GOOD - Wikidata entity mapped to formal ontology classes
Mansion:
description: >-
Large residential building, often with heritage significance.
Wikidata reference: Q1802963
# Place aspect
place_class_uri: crm:E27_Site
place_secondary_uri: schema:LandmarksOrHistoricalBuildings
# Custodian aspect (if operates as heritage institution)
custodian_class_uri: cpov:PublicOrganisation # If public
custodian_alt_uri: schema:Museum # If private
# Collections aspect
collections_class_uri: crm:E78_Curated_Holding
Wikidata Hypernym Files Purpose
The files /schemas/hyponyms_curated.yaml and /schemas/hyponyms_curated_full.yaml are:
- ✅ Source data for identifying heritage entity TYPES
- ✅ Analysis input for understanding domain taxonomy
- ✅ Reference for multilingual labels and descriptions
- ❌ NOT direct ontology class definitions
Required Mapping Workflow:
hyponyms_curated.yaml (Wikidata entities)
↓
ANALYZE semantic properties
↓
SEARCH base ontologies for appropriate classes
↓
MAP Wikidata entity to ontology class(es)
↓
DOCUMENT rationale and properties
↓
CREATE LinkML schema with ontology class_uri
Rule 3: Multi-Aspect Modeling is Mandatory
Every heritage entity has MULTIPLE ontological aspects with INDEPENDENT temporal lifecycles.
Required Aspects
All heritage custodian entities MUST model these aspects:
-
Place Aspect (physical location/site)
- Ontology: CIDOC-CRM (E27_Site, E53_Place) + Schema.org (Place)
- Temporal: Construction → Demolition/Present
- Properties: Address, coordinates, building type, heritage designation
-
Custodian Aspect (organization managing heritage)
- Ontology: CPOV (public) OR Schema.org (private) + CIDOC-CRM (E39_Actor)
- Temporal: Founding → Dissolution/Present
- Properties: Legal identifiers, organizational structure, mission
-
Legal Form Aspect (legal entity registration)
- Ontology: W3C Org (FormalOrganization) + TOOI (Dutch)
- Temporal: Registration → Deregistration/Present
- Properties: KvK number, legal classification, registered address
-
Collections Aspect (heritage materials preserved)
- Ontology: RiC-O (archival) OR CIDOC-CRM (museum) OR BIBFRAME (library)
- Temporal: Accession → Deaccession (per item/collection)
- Properties: Provenance, extent, access restrictions
-
People Aspect (staff/curators)
- Ontology: PiCo (PersonObservation) + CIDOC-CRM (E21_Person)
- Temporal: Employment start → Employment end (per person)
- Properties: Roles, activities, employment records
-
Temporal Events (organizational changes)
- Ontology: CIDOC-CRM (E10_Transfer_of_Custody, E8_Acquisition) + RiC-O (Event)
- Properties: Custody transfers, mergers, relocations, transformations
Example: Modeling a Historic Mansion Operating as Museum
# Entity: Villa Mondriaan (Winterswijk, Netherlands)
# PLACE ASPECT
villa_mondriaan_place:
aspect_type: place
class_uri: crm:E27_Site
secondary_class_uri: schema:LandmarksOrHistoricalBuildings
temporal_extent:
construction_date: "1880-01-01"
current_status: standing
properties:
address: "Zonnebrink 4, 7101 NP Winterswijk"
coordinates: [51.9711, 6.7197]
heritage_designation: "Rijksmonument"
# CUSTODIAN ASPECT
stichting_villa_mondriaan:
aspect_type: custodian
class_uri: cpov:PublicOrganisation # Dutch foundation with public benefit
secondary_class_uri: schema:Museum
temporal_extent:
founding_date: "1994-05-12"
current_status: active
properties:
legal_name: "Stichting Villa Mondriaan"
isil_code: "NL-WtVM"
manages: [villa_mondriaan_collections]
# LEGAL FORM ASPECT
stichting_legal_entity:
aspect_type: legal_form
class_uri: org:FormalOrganization
mixin_class_uri: tooi:Overheidsorganisatie # Dutch government org
temporal_extent:
registration_date: "1994-05-12"
current_status: registered
properties:
kvk_number: "12345678"
legal_form: "stichting" # Dutch foundation
# COLLECTIONS ASPECT
villa_mondriaan_collections:
aspect_type: collections
class_uri: crm:E78_Curated_Holding
archival_class_uri: rico:RecordSet
temporal_extent:
accession_start: "1994-01-01"
current_status: growing
properties:
provenance: "Mondriaan family"
extent: "500 objects, 200 archival documents"
# PEOPLE ASPECT
curator_maria_van_der_berg:
aspect_type: person
class_uri: pico:PersonObservation
secondary_class_uri: crm:E21_Person
temporal_extent:
employment_start: "2020-01-01"
current_status: employed
properties:
role: picot_roles:curator
works_for: stichting_villa_mondriaan
Rule 4: Temporal Independence Documentation
All aspects have SEPARATE temporal lifecycles. Document this explicitly.
Required Temporal Properties
Every aspect MUST include:
temporal_extent:
start_date: "YYYY-MM-DD" # When this aspect began
end_date: "YYYY-MM-DD" or null # When aspect ended (null = ongoing)
certainty: "certain" | "approximate" | "inferred"
source: "archival_record" | "legal_registration" | "oral_history" | etc.
Example: Temporal Independence in Custody Transfer
# Heineken corporate archive custody transfer (2005)
# BEFORE TRANSFER (1864-2005)
heineken_corporate_archive:
custodian_aspect:
custodian_id: heineken_nv
class_uri: schema:Corporation
temporal_extent:
start_date: "1864-01-01" # Heineken founded
end_date: "2005-06-15" # Custody transferred
collections_aspect:
class_uri: rico:RecordSet
provenance: "Heineken N.V."
temporal_extent:
start_date: "1864-01-01"
end_date: null # Collection still exists (just moved)
# AFTER TRANSFER (2005-present)
heineken_archive_at_stadsarchief:
custodian_aspect:
custodian_id: stadsarchief_amsterdam
class_uri: cpov:PublicOrganisation
temporal_extent:
start_date: "2005-06-15" # Custody received
end_date: null # Ongoing
collections_aspect:
class_uri: rico:RecordSet
provenance: "Heineken N.V." # ← Provenance unchanged!
temporal_extent:
start_date: "1864-01-01" # ← Collection dates unchanged!
end_date: null
# CUSTODY TRANSFER EVENT
custody_transfer_event:
event_type: crm:E10_Transfer_of_Custody
class_uri: rico:Event
temporal_extent:
event_date: "2005-06-15"
properties:
surrendered_by: heineken_nv
received_by: stadsarchief_amsterdam
transferred_object: heineken_corporate_archive
Rule 5: Ontology Properties Must Be Researched
Never invent custom properties when ontology equivalents exist.
Property Research Workflow
- Identify the relationship you need to express
- Search base ontologies for existing properties
- Use ontology property with proper namespace
- Document property source in comments
Example:
# ❌ WRONG - Custom property invented
institution:
official_name: "Rijksarchief in Noord-Holland"
# ✅ CORRECT - CPOV ontology property used
institution:
skos:prefLabel: "Rijksarchief in Noord-Holland"@nl
# Source: CPOV uses SKOS for preferred labels
Common Property Mappings
| Need | Ontology Property | Namespace |
|---|---|---|
| Preferred name | skos:prefLabel |
SKOS (used by CPOV) |
| Alternative names | skos:altLabel |
SKOS |
| Identifiers | dct:identifier |
Dublin Core Terms |
| Address | locn:address |
W3C Location Core |
| Coordinates | schema:geo |
Schema.org |
| Founding date | schema:foundingDate OR tooi:begindatum |
Schema.org / TOOI |
| Organizational unit | cpov:hasUnit OR org:hasUnit |
CPOV / W3C Org |
| Curated collection | crm:P147_curated |
CIDOC-CRM |
| Archival holdings | rico:isOrWasHolderOf |
RiC-O |
| Person role | pico:hasRole |
PiCo |
| Provenance | rico:hasProvenance OR prov:hadPrimarySource |
RiC-O / PROV-O |
Rule 6: Decision Trees for Ontology Selection
Use structured decision trees to select appropriate ontologies.
Decision Tree: Primary Ontology Class
START: Heritage entity identified
↓
Is it a physical place/site?
├─ YES → PRIMARY: crm:E27_Site + schema:Place
│ Continue to check if also a custodian organization ↓
│
└─ NO → Is it an organization?
├─ YES → Is it public sector?
│ ├─ YES → cpov:PublicOrganisation
│ │ Is it Dutch government?
│ │ ├─ YES → ADD MIXIN: tooi:Overheidsorganisatie
│ │ └─ NO → CPOV only
│ │
│ └─ NO → schema:Organization
│ What type?
│ ├─ Museum → schema:Museum
│ ├─ Library → schema:Library
│ ├─ Archive → schema:ArchiveOrganization
│ ├─ Education → schema:EducationalOrganization
│ └─ NGO → schema:NGO
│
└─ NO → Is it a collection?
├─ Archival → rico:RecordSet
├─ Museum → crm:E78_Curated_Holding
├─ Library → bf:Collection
└─ Mixed → Use multiple classes
Decision Tree: Dutch vs. EU vs. Global
START: Determine geographic/legal scope
↓
Country == "Netherlands"?
├─ YES → Legal status == "public"?
│ ├─ YES → USE: tooi:Overheidsorganisatie (Dutch government)
│ │ ALSO ADD: cpov:PublicOrganisation (EU compliance)
│ │
│ └─ NO → USE: schema:Organization (private)
│ ADD: DutchLegalEntityMixin (KvK numbers)
│
└─ NO → In Europe?
├─ YES → Legal status == "public"?
│ ├─ YES → USE: cpov:PublicOrganisation
│ └─ NO → USE: schema:Organization
│
└─ NO → USE: schema:Organization (global)
ADD domain-specific class:
- schema:Museum
- schema:ArchiveOrganization
- schema:Library
Rule 7: Documentation Requirements
All ontology mappings MUST be documented with rationale.
Required Documentation Fields
ontology_mapping:
wikidata_source: Q1802963 # Wikidata entity being mapped
wikidata_label: mansion
primary_class:
uri: crm:E27_Site
namespace: http://www.cidoc-crm.org/cidoc-crm/
rationale: >-
CIDOC-CRM E27_Site for physical heritage buildings with
archaeological/architectural significance.
ontology_file: data/ontology/CIDOC_CRM_v7.1.3.rdf
ontology_section: "Lines 1234-1267" # Optional
secondary_class:
uri: schema:LandmarksOrHistoricalBuildings
namespace: http://schema.org/
rationale: Web discoverability for historic landmarks
ontology_file: data/ontology/schemaorg.owl
properties:
- uri: crm:P1_is_identified_by
range: crm:E41_Appellation
usage: Building name identification
example: "Buitenplaats Beeckestijn"
- uri: schema:geo
range: schema:GeoCoordinates
usage: Geographic coordinates
example: "{latitude: 51.9711, longitude: 6.7197}"
temporal_model:
aspects:
- place # Physical site
- custodian # If operates as heritage institution
- collections # If holds curated materials
temporal_independence_note: >-
Place existence (construction → present) is independent from
custodian organization lifecycle (founding → present).
complexity_score: 9 # 1-10 scale
reviewed_by: human_expert
review_date: "2025-11-20"
Rule 8: Prohibited Practices
The following practices are STRICTLY FORBIDDEN:
❌ Prohibited
-
Using Wikidata Q-numbers as class URIs
# FORBIDDEN class_uri: wd:Q33506 # This is an entity, not a class! -
Creating custom properties without ontology research
# FORBIDDEN slots: institution_official_name: # Use skos:prefLabel instead! -
Single-ontology mappings for complex entities
# FORBIDDEN - Mansion is BOTH place AND potential custodian Mansion: class_uri: schema:Place # ← Missing custodian aspect! -
Ignoring temporal dimensions
# FORBIDDEN - No temporal tracking custodian: name: "Heineken Archive" location: "Amsterdam" # ← Where are the dates? Which period does this describe? -
Binary public/private classifications
# FORBIDDEN - Too simplistic PublicHeritageCustodian: # What about NGOs? Foundations? Mixed? PrivateHeritageCustodian: # What about government corporations?
Rule 9: Quality Assurance Checklist
Before submitting any ontology design, verify:
- All base ontologies consulted (
/data/ontology/files read) - Wikidata entities mapped to formal ontology classes (not used directly)
- Multi-aspect modeling applied (place, custodian, legal, collections, people)
- Temporal independence documented for each aspect
- Properties sourced from ontologies (not custom inventions)
- Decision trees applied for ontology selection
- Rationale documented for all class/property choices
- Examples provided with real-world entities
- Complexity score assigned (1-10 scale)
- Human review requested for complexity ≥ 7
Rule 10: Agent Collaboration Protocol
When working with other agents or humans:
-
Always cite ontology files in design discussions
- "According to CIDOC-CRM (lines 1234-1267 in CIDOC_CRM_v7.1.3.rdf)..."
-
Share ontology search commands for reproducibility
rg "E27_Site" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf -
Document disagreements with explicit rationale
- "Agent A suggests schema:Museum, but I recommend cpov:PublicOrganisation because institution is government-operated (see TOOI classification rules)."
-
Request human review for:
- Complexity score ≥ 7
- Conflicting ontology recommendations
- Temporal modeling ambiguities
- Novel aspect combinations
Example: Complete Ontology Mapping Workflow
Scenario: Map Wikidata Q3437789 (heemkamer - local history room)
Step 1: Research Entity
# Read Wikidata metadata from hyponyms_curated_full.yaml
grep -A 100 "Q3437789" /Users/kempersc/apps/glam/data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated_full.yaml
Findings:
- Dutch concept: "Local history room/museum"
- Usually operated by volunteers/heritage societies
- Mix of museum, archive, library functions
- Often in small municipalities
Step 2: Search Base Ontologies
# Search CPOV for organizational types
rg "classification|OrganisationType" /Users/kempersc/apps/glam/data/ontology/core-public-organisation-ap.ttl
# Search Schema.org for community organizations
rg "NGO|CivicStructure|LocalBusiness" /Users/kempersc/apps/glam/data/ontology/schemaorg.owl
# Search CIDOC-CRM for community groups
rg "E74_Group|E40_Legal_Body" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf
Step 3: Apply Decision Trees
Geographic scope: Netherlands → Check TOOI
Legal status: Usually private foundation (stichting) or association (vereniging)
Function: Collects + Preserves + Exhibits local heritage → Multi-functional
Decision:
- PRIMARY:
schema:NGO(non-governmental heritage organization) - SECONDARY:
crm:E74_Group(community heritage group) - DUTCH MIXIN:
DutchLegalEntityMixin(KvK registration)
Step 4: Model Aspects
heemkamer:
wikidata_id: Q3437789
ontology_mapping:
# CUSTODIAN ASPECT
custodian_class: schema:NGO
custodian_secondary: crm:E74_Group
rationale: >-
Non-governmental community heritage organization.
Not public sector (excludes CPOV). Uses Schema.org NGO.
# PLACE ASPECT (often operates in specific building)
place_class: schema:CivicStructure
place_secondary: crm:E27_Site
# LEGAL FORM ASPECT (Dutch foundation/association)
legal_class: org:FormalOrganization
legal_dutch_mixin: DutchLegalEntityMixin
properties:
kvk_number: required
legal_form: "stichting OR vereniging"
# COLLECTIONS ASPECT (multi-functional)
collections_classes:
- rico:RecordSet # Local archival materials
- crm:E78_Curated_Holding # Museum objects
- bf:Collection # Local history books
# PEOPLE ASPECT (volunteers)
people_class: pico:PersonObservation
people_roles:
- picot_roles:curator
- picot_roles:volunteer_archivist
- picot_roles:educator
temporal_model:
aspects:
- custodian # Founding → present/closure
- place # Building occupancy (may change)
- collections # Accessions over time
- people # Volunteer participation periods
Step 5: Document and Review
ontology_enrichment:
complexity_score: 8 # Multi-functional, temporal complexity
requires_human_review: true
review_notes: >-
Heemkamer concept is Dutch-specific with no direct
international equivalent. Multi-functional nature
(museum + archive + library) requires careful aspect modeling.
Summary: Key Takeaways for Agents
- Ontology files are your bible - Read them first, always
- Wikidata is data, not ontology - Map Q-numbers to formal classes
- Everything has multiple aspects - Place, custodian, legal, collections, people
- Time is always a factor - Model temporal independence
- Properties must be justified - Use ontology properties, document rationale
- Complexity is reality - Don't oversimplify, embrace nuance
- Document everything - Future agents/humans need your reasoning
- Ask for help - Complex cases require human review
When in doubt: Read the ontology files, consult AGENTS.md, request human guidance.
End of Ontology Mapping Rules v1.0