Add comprehensive ontology mapping rules and update project mission

- Update AGENTS.md with PROJECT CORE MISSION section emphasizing ontology engineering focus
- Create .opencode/agent/ontology-mapping-rules.md (665 lines) with detailed guidelines:
  * Ontology consultation workflows (Rule 1)
  * Wikidata entity mapping procedures (Rule 2)
  * Multi-aspect modeling requirements (Rule 3)
  * Temporal independence documentation (Rule 4)
  * Property research workflows (Rule 5)
  * Decision trees for ontology selection (Rule 6-7)
  * Quality assurance checklists (Rule 8-9)
  * Agent collaboration protocols (Rule 10)
- Create ONTOLOGY_RULES_SUMMARY.md as quick reference guide

Key principles established:
1. Wikidata Q-numbers are NOT ontology classes (must be mapped)
2. Every heritage entity has multiple aspects with independent temporal lifecycles
3. Base ontologies (CPOV, TOOI, CIDOC-CRM, RiC-O, Schema.org, PiCo) are source of truth
4. Custom properties forbidden when ontology equivalents exist

Example: 'Mansion' (Q1802963) requires modeling as:
- Place aspect (crm:E27_Site, construction→present)
- Custodian aspect (cpov:PublicOrganisation OR schema:Museum, founding→present)
- Legal form aspect (org:FormalOrganization, registration→present)
- Collections aspect (crm:E78_Curated_Holding, accession→present)
- People aspect (picom:PersonObservation, employment periods)
- Temporal events (crm:E10_Transfer_of_Custody for custody changes)

All agents MUST read ontology files before schema design.
This commit is contained in:
kempersc 2025-11-20 23:09:02 +01:00
parent e6684e815b
commit 176a7479f9
3 changed files with 1006 additions and 0 deletions

View file

@ -0,0 +1,665 @@
# Ontology Mapping Rules for Heritage Custodian Project
**Version**: 1.0
**Last Updated**: 2025-11-20
**Purpose**: Define rigorous ontological mapping procedures for AI agents working on the GLAM heritage custodian data project
---
## Core Principle: Ontology-First Design
**CRITICAL**: The primary objective of this project is to create a **comprehensive, nuanced ontology** that can accurately represent the complex, temporal, multi-faceted nature of heritage custodian institutions worldwide.
### What This Means
- ✅ **DO**: Study ontology files deeply before creating classes or properties
- ✅ **DO**: Map Wikidata entities to formal ontology classes with explicit rationale
- ✅ **DO**: Model temporal independence of different aspects (place, custodian, legal form, collections, people)
- ✅ **DO**: Support multiple ontology classes for the same entity (CPOV + TOOI + Schema.org + CIDOC-CRM)
- ❌ **DON'T**: Use Wikidata Q-numbers directly as ontology classes
- ❌ **DON'T**: Create generic "HeritageCustodian" mappings without considering semantic aspects
- ❌ **DON'T**: Ignore temporal dimensions (everything changes over time!)
---
## Rule 1: Ontology Files Are Source of Truth
**All ontology design MUST reference base ontologies in `/data/ontology/`.**
### Available Ontologies
| Ontology | File | Scope | When to Use |
|----------|------|-------|-------------|
| **CPOV** | `core-public-organisation-ap.ttl` | EU public sector | Government archives, state museums, public cultural institutions |
| **TOOI** | `tooiont.ttl` | Dutch government | Netherlands government heritage organizations |
| **Schema.org** | `schemaorg.owl` | Web semantics | Private collections, web discoverability, general fallback |
| **CIDOC-CRM** | `CIDOC_CRM_v7.1.3.rdf` | Cultural heritage domain | Museums, sites, curated holdings, provenance |
| **RiC-O** | `RiC-O_1-1.rdf` | Archival description | Archives, record sets, corporate bodies |
| **BIBFRAME** | `bibframe_vocabulary.rdf` | Bibliographic resources | Libraries, bibliographic collections |
| **PiCo** | `pico.ttl` | Person observations | Staff, curators, archivists, directors |
| **W3C Org** | (embedded in CPOV) | Organizational structure | Legal forms, organizational units |
### Mandatory Ontology Consultation Workflow
**Before designing any LinkML class, agents MUST:**
1. **Identify the semantic domain** (cultural, archival, educational, legal, etc.)
2. **Read relevant ontology files** using `read` or `grep` tools
3. **Extract applicable classes and properties**
4. **Document ontology alignment** in design notes
5. **Map Wikidata hypernyms to ontology classes** (not vice versa!)
**Example Workflow**:
```bash
# Step 1: Identify domain
# Entity: "mansion" (building + potential heritage custodian)
# Step 2: Search CIDOC-CRM for site/building classes
rg "E27_Site|E53_Place" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf
# Step 3: Search Schema.org for building types
rg "LandmarksOrHistoricalBuildings|TouristAttraction" /Users/kempersc/apps/glam/data/ontology/schemaorg.owl
# Step 4: Search CPOV for organization classes (if mansion operates as museum)
rg "PublicOrganisation|classification" /Users/kempersc/apps/glam/data/ontology/core-public-organisation-ap.ttl
# Step 5: Document findings in design notes
# "Mansion should map to crm:E27_Site (place aspect) AND
# cpov:PublicOrganisation (custodian aspect if operates as museum)"
```
---
## Rule 2: Never Use Wikidata Entities Directly
**Wikidata Q-numbers are NOT ontology classes. They are ENTITY IDENTIFIERS.**
### Incorrect Approach ❌
```yaml
# BAD - Wikidata Q-number used as class
HeritageCustodian:
class_uri: wd:Q1802963 # ← This is an INSTANCE (mansion), not a CLASS!
```
### Correct Approach ✅
```yaml
# GOOD - Wikidata entity mapped to formal ontology classes
Mansion:
description: >-
Large residential building, often with heritage significance.
Wikidata reference: Q1802963
# Place aspect
place_class_uri: crm:E27_Site
place_secondary_uri: schema:LandmarksOrHistoricalBuildings
# Custodian aspect (if operates as heritage institution)
custodian_class_uri: cpov:PublicOrganisation # If public
custodian_alt_uri: schema:Museum # If private
# Collections aspect
collections_class_uri: crm:E78_Curated_Holding
```
### Wikidata Hypernym Files Purpose
The files `/schemas/hyponyms_curated.yaml` and `/schemas/hyponyms_curated_full.yaml` are:
- ✅ **Source data** for identifying heritage entity TYPES
- ✅ **Analysis input** for understanding domain taxonomy
- ✅ **Reference** for multilingual labels and descriptions
- ❌ **NOT** direct ontology class definitions
**Required Mapping Workflow**:
```
hyponyms_curated.yaml (Wikidata entities)
ANALYZE semantic properties
SEARCH base ontologies for appropriate classes
MAP Wikidata entity to ontology class(es)
DOCUMENT rationale and properties
CREATE LinkML schema with ontology class_uri
```
---
## Rule 3: Multi-Aspect Modeling is Mandatory
**Every heritage entity has MULTIPLE ontological aspects with INDEPENDENT temporal lifecycles.**
### Required Aspects
All heritage custodian entities MUST model these aspects:
1. **Place Aspect** (physical location/site)
- Ontology: CIDOC-CRM (E27_Site, E53_Place) + Schema.org (Place)
- Temporal: Construction → Demolition/Present
- Properties: Address, coordinates, building type, heritage designation
2. **Custodian Aspect** (organization managing heritage)
- Ontology: CPOV (public) OR Schema.org (private) + CIDOC-CRM (E39_Actor)
- Temporal: Founding → Dissolution/Present
- Properties: Legal identifiers, organizational structure, mission
3. **Legal Form Aspect** (legal entity registration)
- Ontology: W3C Org (FormalOrganization) + TOOI (Dutch)
- Temporal: Registration → Deregistration/Present
- Properties: KvK number, legal classification, registered address
4. **Collections Aspect** (heritage materials preserved)
- Ontology: RiC-O (archival) OR CIDOC-CRM (museum) OR BIBFRAME (library)
- Temporal: Accession → Deaccession (per item/collection)
- Properties: Provenance, extent, access restrictions
5. **People Aspect** (staff/curators)
- Ontology: PiCo (PersonObservation) + CIDOC-CRM (E21_Person)
- Temporal: Employment start → Employment end (per person)
- Properties: Roles, activities, employment records
6. **Temporal Events** (organizational changes)
- Ontology: CIDOC-CRM (E10_Transfer_of_Custody, E8_Acquisition) + RiC-O (Event)
- Properties: Custody transfers, mergers, relocations, transformations
### Example: Modeling a Historic Mansion Operating as Museum
```yaml
# Entity: Villa Mondriaan (Winterswijk, Netherlands)
# PLACE ASPECT
villa_mondriaan_place:
aspect_type: place
class_uri: crm:E27_Site
secondary_class_uri: schema:LandmarksOrHistoricalBuildings
temporal_extent:
construction_date: "1880-01-01"
current_status: standing
properties:
address: "Zonnebrink 4, 7101 NP Winterswijk"
coordinates: [51.9711, 6.7197]
heritage_designation: "Rijksmonument"
# CUSTODIAN ASPECT
stichting_villa_mondriaan:
aspect_type: custodian
class_uri: cpov:PublicOrganisation # Dutch foundation with public benefit
secondary_class_uri: schema:Museum
temporal_extent:
founding_date: "1994-05-12"
current_status: active
properties:
legal_name: "Stichting Villa Mondriaan"
isil_code: "NL-WtVM"
manages: [villa_mondriaan_collections]
# LEGAL FORM ASPECT
stichting_legal_entity:
aspect_type: legal_form
class_uri: org:FormalOrganization
mixin_class_uri: tooiont:Overheidsorganisatie # Dutch government org
temporal_extent:
registration_date: "1994-05-12"
current_status: registered
properties:
kvk_number: "12345678"
legal_form: "stichting" # Dutch foundation
# COLLECTIONS ASPECT
villa_mondriaan_collections:
aspect_type: collections
class_uri: crm:E78_Curated_Holding
archival_class_uri: rico:RecordSet
temporal_extent:
accession_start: "1994-01-01"
current_status: growing
properties:
provenance: "Mondriaan family"
extent: "500 objects, 200 archival documents"
# PEOPLE ASPECT
curator_maria_van_der_berg:
aspect_type: person
class_uri: picom:PersonObservation
secondary_class_uri: crm:E21_Person
temporal_extent:
employment_start: "2020-01-01"
current_status: employed
properties:
role: picot_roles:curator
works_for: stichting_villa_mondriaan
```
---
## Rule 4: Temporal Independence Documentation
**All aspects have SEPARATE temporal lifecycles. Document this explicitly.**
### Required Temporal Properties
Every aspect MUST include:
```yaml
temporal_extent:
start_date: "YYYY-MM-DD" # When this aspect began
end_date: "YYYY-MM-DD" or null # When aspect ended (null = ongoing)
certainty: "certain" | "approximate" | "inferred"
source: "archival_record" | "legal_registration" | "oral_history" | etc.
```
### Example: Temporal Independence in Custody Transfer
```yaml
# Heineken corporate archive custody transfer (2005)
# BEFORE TRANSFER (1864-2005)
heineken_corporate_archive:
custodian_aspect:
custodian_id: heineken_nv
class_uri: schema:Corporation
temporal_extent:
start_date: "1864-01-01" # Heineken founded
end_date: "2005-06-15" # Custody transferred
collections_aspect:
class_uri: rico:RecordSet
provenance: "Heineken N.V."
temporal_extent:
start_date: "1864-01-01"
end_date: null # Collection still exists (just moved)
# AFTER TRANSFER (2005-present)
heineken_archive_at_stadsarchief:
custodian_aspect:
custodian_id: stadsarchief_amsterdam
class_uri: cpov:PublicOrganisation
temporal_extent:
start_date: "2005-06-15" # Custody received
end_date: null # Ongoing
collections_aspect:
class_uri: rico:RecordSet
provenance: "Heineken N.V." # ← Provenance unchanged!
temporal_extent:
start_date: "1864-01-01" # ← Collection dates unchanged!
end_date: null
# CUSTODY TRANSFER EVENT
custody_transfer_event:
event_type: crm:E10_Transfer_of_Custody
class_uri: rico:Event
temporal_extent:
event_date: "2005-06-15"
properties:
surrendered_by: heineken_nv
received_by: stadsarchief_amsterdam
transferred_object: heineken_corporate_archive
```
---
## Rule 5: Ontology Properties Must Be Researched
**Never invent custom properties when ontology equivalents exist.**
### Property Research Workflow
1. **Identify the relationship** you need to express
2. **Search base ontologies** for existing properties
3. **Use ontology property** with proper namespace
4. **Document property source** in comments
**Example**:
```yaml
# ❌ WRONG - Custom property invented
institution:
official_name: "Rijksarchief in Noord-Holland"
# ✅ CORRECT - CPOV ontology property used
institution:
skos:prefLabel: "Rijksarchief in Noord-Holland"@nl
# Source: CPOV uses SKOS for preferred labels
```
### Common Property Mappings
| Need | Ontology Property | Namespace |
|------|-------------------|-----------|
| Preferred name | `skos:prefLabel` | SKOS (used by CPOV) |
| Alternative names | `skos:altLabel` | SKOS |
| Identifiers | `dct:identifier` | Dublin Core Terms |
| Address | `locn:address` | W3C Location Core |
| Coordinates | `schema:geo` | Schema.org |
| Founding date | `schema:foundingDate` OR `tooiont:begindatum` | Schema.org / TOOI |
| Organizational unit | `cpov:hasUnit` OR `org:hasUnit` | CPOV / W3C Org |
| Curated collection | `crm:P147_curated` | CIDOC-CRM |
| Archival holdings | `rico:isOrWasHolderOf` | RiC-O |
| Person role | `picom:hasRole` | PiCo |
| Provenance | `rico:hasProvenance` OR `prov:hadPrimarySource` | RiC-O / PROV-O |
---
## Rule 6: Decision Trees for Ontology Selection
**Use structured decision trees to select appropriate ontologies.**
### Decision Tree: Primary Ontology Class
```
START: Heritage entity identified
Is it a physical place/site?
├─ YES → PRIMARY: crm:E27_Site + schema:Place
│ Continue to check if also a custodian organization ↓
└─ NO → Is it an organization?
├─ YES → Is it public sector?
│ ├─ YES → cpov:PublicOrganisation
│ │ Is it Dutch government?
│ │ ├─ YES → ADD MIXIN: tooiont:Overheidsorganisatie
│ │ └─ NO → CPOV only
│ │
│ └─ NO → schema:Organization
│ What type?
│ ├─ Museum → schema:Museum
│ ├─ Library → schema:Library
│ ├─ Archive → schema:ArchiveOrganization
│ ├─ Education → schema:EducationalOrganization
│ └─ NGO → schema:NGO
└─ NO → Is it a collection?
├─ Archival → rico:RecordSet
├─ Museum → crm:E78_Curated_Holding
├─ Library → bf:Collection
└─ Mixed → Use multiple classes
```
### Decision Tree: Dutch vs. EU vs. Global
```
START: Determine geographic/legal scope
Country == "Netherlands"?
├─ YES → Legal status == "public"?
│ ├─ YES → USE: tooiont:Overheidsorganisatie (Dutch government)
│ │ ALSO ADD: cpov:PublicOrganisation (EU compliance)
│ │
│ └─ NO → USE: schema:Organization (private)
│ ADD: DutchLegalEntityMixin (KvK numbers)
└─ NO → In Europe?
├─ YES → Legal status == "public"?
│ ├─ YES → USE: cpov:PublicOrganisation
│ └─ NO → USE: schema:Organization
└─ NO → USE: schema:Organization (global)
ADD domain-specific class:
- schema:Museum
- schema:ArchiveOrganization
- schema:Library
```
---
## Rule 7: Documentation Requirements
**All ontology mappings MUST be documented with rationale.**
### Required Documentation Fields
```yaml
ontology_mapping:
wikidata_source: Q1802963 # Wikidata entity being mapped
wikidata_label: mansion
primary_class:
uri: crm:E27_Site
namespace: http://www.cidoc-crm.org/cidoc-crm/
rationale: >-
CIDOC-CRM E27_Site for physical heritage buildings with
archaeological/architectural significance.
ontology_file: data/ontology/CIDOC_CRM_v7.1.3.rdf
ontology_section: "Lines 1234-1267" # Optional
secondary_class:
uri: schema:LandmarksOrHistoricalBuildings
namespace: http://schema.org/
rationale: Web discoverability for historic landmarks
ontology_file: data/ontology/schemaorg.owl
properties:
- uri: crm:P1_is_identified_by
range: crm:E41_Appellation
usage: Building name identification
example: "Buitenplaats Beeckestijn"
- uri: schema:geo
range: schema:GeoCoordinates
usage: Geographic coordinates
example: "{latitude: 51.9711, longitude: 6.7197}"
temporal_model:
aspects:
- place # Physical site
- custodian # If operates as heritage institution
- collections # If holds curated materials
temporal_independence_note: >-
Place existence (construction → present) is independent from
custodian organization lifecycle (founding → present).
complexity_score: 9 # 1-10 scale
reviewed_by: human_expert
review_date: "2025-11-20"
```
---
## Rule 8: Prohibited Practices
**The following practices are STRICTLY FORBIDDEN:**
### ❌ Prohibited
1. **Using Wikidata Q-numbers as class URIs**
```yaml
# FORBIDDEN
class_uri: wd:Q33506 # This is an entity, not a class!
```
2. **Creating custom properties without ontology research**
```yaml
# FORBIDDEN
slots:
institution_official_name: # Use skos:prefLabel instead!
```
3. **Single-ontology mappings for complex entities**
```yaml
# FORBIDDEN - Mansion is BOTH place AND potential custodian
Mansion:
class_uri: schema:Place # ← Missing custodian aspect!
```
4. **Ignoring temporal dimensions**
```yaml
# FORBIDDEN - No temporal tracking
custodian:
name: "Heineken Archive"
location: "Amsterdam"
# ← Where are the dates? Which period does this describe?
```
5. **Binary public/private classifications**
```yaml
# FORBIDDEN - Too simplistic
PublicHeritageCustodian: # What about NGOs? Foundations? Mixed?
PrivateHeritageCustodian: # What about government corporations?
```
---
## Rule 9: Quality Assurance Checklist
**Before submitting any ontology design, verify:**
- [ ] All base ontologies consulted (`/data/ontology/` files read)
- [ ] Wikidata entities mapped to formal ontology classes (not used directly)
- [ ] Multi-aspect modeling applied (place, custodian, legal, collections, people)
- [ ] Temporal independence documented for each aspect
- [ ] Properties sourced from ontologies (not custom inventions)
- [ ] Decision trees applied for ontology selection
- [ ] Rationale documented for all class/property choices
- [ ] Examples provided with real-world entities
- [ ] Complexity score assigned (1-10 scale)
- [ ] Human review requested for complexity ≥ 7
---
## Rule 10: Agent Collaboration Protocol
**When working with other agents or humans:**
1. **Always cite ontology files** in design discussions
- "According to CIDOC-CRM (lines 1234-1267 in CIDOC_CRM_v7.1.3.rdf)..."
2. **Share ontology search commands** for reproducibility
```bash
rg "E27_Site" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf
```
3. **Document disagreements** with explicit rationale
- "Agent A suggests schema:Museum, but I recommend cpov:PublicOrganisation
because institution is government-operated (see TOOI classification rules)."
4. **Request human review** for:
- Complexity score ≥ 7
- Conflicting ontology recommendations
- Temporal modeling ambiguities
- Novel aspect combinations
---
## Example: Complete Ontology Mapping Workflow
**Scenario**: Map Wikidata Q3437789 (heemkamer - local history room)
### Step 1: Research Entity
```bash
# Read Wikidata metadata from hyponyms_curated_full.yaml
grep -A 100 "Q3437789" /Users/kempersc/apps/glam/data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated_full.yaml
```
**Findings**:
- Dutch concept: "Local history room/museum"
- Usually operated by volunteers/heritage societies
- Mix of museum, archive, library functions
- Often in small municipalities
### Step 2: Search Base Ontologies
```bash
# Search CPOV for organizational types
rg "classification|OrganisationType" /Users/kempersc/apps/glam/data/ontology/core-public-organisation-ap.ttl
# Search Schema.org for community organizations
rg "NGO|CivicStructure|LocalBusiness" /Users/kempersc/apps/glam/data/ontology/schemaorg.owl
# Search CIDOC-CRM for community groups
rg "E74_Group|E40_Legal_Body" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf
```
### Step 3: Apply Decision Trees
**Geographic scope**: Netherlands → Check TOOI
**Legal status**: Usually private foundation (stichting) or association (vereniging)
**Function**: Collects + Preserves + Exhibits local heritage → Multi-functional
**Decision**:
- PRIMARY: `schema:NGO` (non-governmental heritage organization)
- SECONDARY: `crm:E74_Group` (community heritage group)
- DUTCH MIXIN: `DutchLegalEntityMixin` (KvK registration)
### Step 4: Model Aspects
```yaml
heemkamer:
wikidata_id: Q3437789
ontology_mapping:
# CUSTODIAN ASPECT
custodian_class: schema:NGO
custodian_secondary: crm:E74_Group
rationale: >-
Non-governmental community heritage organization.
Not public sector (excludes CPOV). Uses Schema.org NGO.
# PLACE ASPECT (often operates in specific building)
place_class: schema:CivicStructure
place_secondary: crm:E27_Site
# LEGAL FORM ASPECT (Dutch foundation/association)
legal_class: org:FormalOrganization
legal_dutch_mixin: DutchLegalEntityMixin
properties:
kvk_number: required
legal_form: "stichting OR vereniging"
# COLLECTIONS ASPECT (multi-functional)
collections_classes:
- rico:RecordSet # Local archival materials
- crm:E78_Curated_Holding # Museum objects
- bf:Collection # Local history books
# PEOPLE ASPECT (volunteers)
people_class: picom:PersonObservation
people_roles:
- picot_roles:curator
- picot_roles:volunteer_archivist
- picot_roles:educator
temporal_model:
aspects:
- custodian # Founding → present/closure
- place # Building occupancy (may change)
- collections # Accessions over time
- people # Volunteer participation periods
```
### Step 5: Document and Review
```yaml
ontology_enrichment:
complexity_score: 8 # Multi-functional, temporal complexity
requires_human_review: true
review_notes: >-
Heemkamer concept is Dutch-specific with no direct
international equivalent. Multi-functional nature
(museum + archive + library) requires careful aspect modeling.
```
---
## Summary: Key Takeaways for Agents
1. **Ontology files are your bible** - Read them first, always
2. **Wikidata is data, not ontology** - Map Q-numbers to formal classes
3. **Everything has multiple aspects** - Place, custodian, legal, collections, people
4. **Time is always a factor** - Model temporal independence
5. **Properties must be justified** - Use ontology properties, document rationale
6. **Complexity is reality** - Don't oversimplify, embrace nuance
7. **Document everything** - Future agents/humans need your reasoning
8. **Ask for help** - Complex cases require human review
**When in doubt**: Read the ontology files, consult AGENTS.md, request human guidance.
---
**End of Ontology Mapping Rules v1.0**

128
AGENTS.md
View file

@ -2,6 +2,134 @@
This document provides instructions for AI agents (particularly OpenCODE and Claude) to assist with extracting heritage institution data from conversation JSON files and other sources.
---
## 🎯 PROJECT CORE MISSION
**PRIMARY OBJECTIVE**: Create a comprehensive, nuanced ontology that accurately represents the complex, temporal, multi-faceted nature of heritage custodian institutions worldwide.
This is NOT a simple data extraction project. This is an **ontology engineering project** that:
- Models heritage entities as multi-aspect temporal entities (place, custodian, legal form, collections, people)
- Integrates multiple base ontologies (CPOV, TOOI, CIDOC-CRM, RiC-O, Schema.org, PiCo)
- Captures organizational change events over time (custody transfers, mergers, transformations)
- Distinguishes between nominal references and formal organizational structures
- Links heritage custodians to people, collections, and locations with independent temporal lifecycles
**If you're looking for simple NER extraction, this is not the right project.**
---
## 🚨 CRITICAL RULES FOR ALL AGENTS
### Rule 1: Ontology Files Are Your Primary Reference
**BEFORE** designing any schema, class, or property:
1. **READ the base ontology files** in `/data/ontology/`
2. **SEARCH for existing classes and properties** that match your needs
3. **DOCUMENT your ontology alignment** with explicit rationale
4. **NEVER invent custom properties** when ontology equivalents exist
**Available Ontologies**:
- `data/ontology/core-public-organisation-ap.ttl` - CPOV (EU public sector)
- `data/ontology/tooiont.ttl` - TOOI (Dutch government)
- `data/ontology/schemaorg.owl` - Schema.org (web semantics, private sector)
- `data/ontology/CIDOC_CRM_v7.1.3.rdf` - CIDOC-CRM (cultural heritage domain)
- `data/ontology/RiC-O_1-1.rdf` - Records in Contexts (archival description)
- `data/ontology/bibframe_vocabulary.rdf` - BIBFRAME (libraries)
- `data/ontology/pico.ttl` - PiCo (person observations, staff roles)
**See** `.opencode/agent/ontology-mapping-rules.md` for complete ontology consultation workflow.
### Rule 2: Wikidata Entities Are NOT Ontology Classes
**Files**:
- `data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated.yaml`
- `data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated_full.yaml`
**These files contain**:
- ✅ Wikidata entity identifiers (Q-numbers) for heritage institution TYPES
- ✅ Multilingual labels and descriptions
- ✅ Hypernym classifications (upper-level categories)
- ✅ Source data for ontology mapping analysis
**These files DO NOT contain**:
- ❌ Formal ontology class definitions
- ❌ Direct `class_uri` mappings for LinkML
- ❌ Ontology properties or relationships
**REQUIRED WORKFLOW**:
```
hyponyms_curated.yaml (Wikidata Q-numbers)
ANALYZE semantic meaning + hypernyms
SEARCH base ontologies for matching classes
MAP Wikidata entity → Ontology class(es)
DOCUMENT rationale + properties
CREATE LinkML schema with ontology class_uri
```
**Example - WRONG** ❌:
```yaml
Mansion:
class_uri: wd:Q1802963 # ← This is an ENTITY, not a CLASS!
```
**Example - CORRECT** ✅:
```yaml
Mansion:
# Wikidata source: Q1802963
place_aspect:
class_uri: crm:E27_Site # CIDOC-CRM ontology class
custodian_aspect:
class_uri: cpov:PublicOrganisation # If operates as museum
```
### Rule 3: Multi-Aspect Modeling is Mandatory
**Every heritage entity has MULTIPLE ontological aspects with INDEPENDENT temporal lifecycles.**
**Required Aspects**:
1. **Place Aspect** (physical location/site)
- Ontology: `crm:E27_Site` + `schema:Place`
- Temporal: Construction → Demolition/Present
2. **Custodian Aspect** (organization managing heritage)
- Ontology: `cpov:PublicOrganisation` OR `schema:Organization`
- Temporal: Founding → Dissolution/Present
3. **Legal Form Aspect** (legal entity registration)
- Ontology: `org:FormalOrganization` + `tooiont:Overheidsorganisatie` (Dutch)
- Temporal: Registration → Deregistration/Present
4. **Collections Aspect** (heritage materials)
- Ontology: `rico:RecordSet` OR `crm:E78_Curated_Holding` OR `bf:Collection`
- Temporal: Accession → Deaccession (per item)
5. **People Aspect** (staff, curators)
- Ontology: `picom:PersonObservation` + `crm:E21_Person`
- Temporal: Employment start → Employment end (per person)
6. **Temporal Events** (organizational changes)
- Ontology: `crm:E10_Transfer_of_Custody`, `rico:Event`
- Tracks custody transfers, mergers, relocations, transformations
**Example**: A historic mansion operating as a museum has:
- **Place aspect**: Building constructed 1880, still standing (143 years)
- **Custodian aspect**: Foundation established 1994 to operate museum (30 years)
- **Legal form**: Dutch stichting registered 1994, KvK #12345678
- **Collections**: Mondrian artworks acquired 1994-2024
- **People**: Current curator employed 2020-present
**Each aspect changes independently over time!**
---
## Project Overview
**Goal**: Extract structured data about worldwide GLAMORCUBESFIXPHDNT (Galleries, Libraries, Archives, Museums, Official institutions, Research centers, Corporations, Unknown, Botanical gardens/zoos, Educational providers, Societies, Features, Intangible heritage groups, miXed, Personal collections, Holy sites, Digital platforms, NGOs, Taste/smell heritage) institutions from 139+ Claude conversation JSON files and integrate with authoritative CSV datasets.

213
ONTOLOGY_RULES_SUMMARY.md Normal file
View file

@ -0,0 +1,213 @@
# Ontology Mapping Rules - Quick Reference
**Created**: 2025-11-20
**Purpose**: Summary of critical ontology engineering rules for heritage custodian project
---
## Key Changes Made
### 1. Updated AGENTS.md
Added **PROJECT CORE MISSION** section at top emphasizing:
- This is an **ontology engineering project**, not simple data extraction
- Multi-aspect temporal modeling is required
- Multiple base ontologies must be integrated
- Wikidata entities are NOT ontology classes
### 2. Created .opencode/agent/ontology-mapping-rules.md
Comprehensive 30-page guide covering:
- Ontology consultation workflows
- Wikidata entity mapping procedures
- Multi-aspect modeling requirements
- Temporal independence documentation
- Property research workflows
- Decision trees for ontology selection
- Quality assurance checklists
---
## Core Principles
### Principle 1: Ontology Files Are Source of Truth
**ALWAYS** read base ontologies before designing:
```bash
# Example: Research CIDOC-CRM for heritage sites
rg "E27_Site|E53_Place" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf
```
### Principle 2: Wikidata ≠ Ontology
**NEVER** use Wikidata Q-numbers as `class_uri`:
```yaml
❌ WRONG: class_uri: wd:Q1802963
✅ RIGHT: class_uri: crm:E27_Site # After mapping Q1802963 to ontology
```
### Principle 3: Multi-Aspect Modeling
**EVERY** heritage entity has multiple aspects:
- **Place** (construction → present)
- **Custodian** (founding → present)
- **Legal form** (registration → present)
- **Collections** (accession → present)
- **People** (employment periods)
- **Events** (custody transfers, mergers)
### Principle 4: Temporal Independence
**Each aspect has its OWN timeline:**
```yaml
# Building exists 1880-present (144 years)
place_aspect:
temporal_extent:
start_date: "1880-01-01"
end_date: null
# Museum organization founded 1994-present (30 years)
custodian_aspect:
temporal_extent:
start_date: "1994-05-12"
end_date: null
```
---
## Available Ontologies
| Ontology | File | Use For |
|----------|------|---------|
| **CPOV** | `core-public-organisation-ap.ttl` | EU public sector heritage |
| **TOOI** | `tooiont.ttl` | Dutch government organizations |
| **Schema.org** | `schemaorg.owl` | Web semantics, private sector |
| **CIDOC-CRM** | `CIDOC_CRM_v7.1.3.rdf` | Cultural heritage domain |
| **RiC-O** | `RiC-O_1-1.rdf` | Archival description |
| **BIBFRAME** | `bibframe_vocabulary.rdf` | Library collections |
| **PiCo** | `pico.ttl` | Person observations, staff roles |
---
## Required Workflow
```
1. Read hyponyms_curated.yaml (Wikidata entities)
2. Analyze hypernym + semantic properties
3. Search base ontologies for matching classes
4. Map Wikidata entity → Ontology class(es)
5. Extract relevant properties from ontologies
6. Document rationale and temporal model
7. Create LinkML schema with class_uri
8. Human review if complexity ≥ 7/10
```
---
## Example: Mansion (Q1802963)
### ❌ Wrong Approach
```yaml
Mansion:
class_uri: wd:Q1802963 # Wikidata entity used directly
```
### ✅ Correct Approach
```yaml
Mansion:
wikidata_source: Q1802963
# PLACE ASPECT
place_aspect:
class_uri: crm:E27_Site # CIDOC-CRM
secondary_class_uri: schema:LandmarksOrHistoricalBuildings
temporal_extent:
start_date: "1880-01-01" # Construction
# CUSTODIAN ASPECT (if operates as museum)
custodian_aspect:
class_uri: cpov:PublicOrganisation # If public
alt_class_uri: schema:Museum # If private
temporal_extent:
start_date: "1994-05-12" # Foundation established
# COLLECTIONS ASPECT
collections_aspect:
class_uri: crm:E78_Curated_Holding
temporal_extent:
start_date: "1994-01-01" # Accessions begin
```
---
## Decision Tree: Ontology Selection
```
Is it Dutch government?
├─ YES → tooiont:Overheidsorganisatie + cpov:PublicOrganisation
└─ NO → Is it public sector?
├─ YES → cpov:PublicOrganisation
└─ NO → schema:Organization
├─ Museum → schema:Museum
├─ Archive → schema:ArchiveOrganization
├─ Library → schema:Library
└─ NGO → schema:NGO
Is it a physical site?
├─ YES → crm:E27_Site + schema:Place
└─ NO → Continue with organizational classes
Does it hold collections?
├─ Archival → rico:RecordSet
├─ Museum → crm:E78_Curated_Holding
└─ Library → bf:Collection
Does it have staff?
└─ YES → picom:PersonObservation + crm:E21_Person
```
---
## Quality Checklist
Before submitting ontology design:
- [ ] Base ontologies consulted (`/data/ontology/` files read)
- [ ] Wikidata entities mapped (not used directly as classes)
- [ ] Multi-aspect modeling applied
- [ ] Temporal independence documented
- [ ] Properties sourced from ontologies
- [ ] Rationale documented
- [ ] Examples provided
- [ ] Complexity score assigned (1-10)
- [ ] Human review requested if complexity ≥ 7
---
## Files Updated
1. **AGENTS.md** - Added PROJECT CORE MISSION section (lines 1-100)
2. **.opencode/agent/ontology-mapping-rules.md** - NEW comprehensive guide
3. **This file** (ONTOLOGY_RULES_SUMMARY.md) - Quick reference
---
## Next Steps
1. Continue manual ontology mapping for hyponyms_curated.yaml entries
2. Document each mapping with full rationale
3. Build aspect-based LinkML schema modules
4. Create temporal modeling examples for common patterns
---
## Key Resources
- **Full Rules**: `.opencode/agent/ontology-mapping-rules.md`
- **Agent Instructions**: `AGENTS.md`
- **Ontology Files**: `data/ontology/`
- **Wikidata Sources**: `data/wikidata/GLAMORCUBEPSXHFN/`
**Remember**: This is ontology engineering, not data extraction. Precision matters more than speed.