From 176a7479f9c34503a5b39ac33df55c5cf7f4fde4 Mon Sep 17 00:00:00 2001 From: kempersc Date: Thu, 20 Nov 2025 23:09:02 +0100 Subject: [PATCH] Add comprehensive ontology mapping rules and update project mission MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Update AGENTS.md with PROJECT CORE MISSION section emphasizing ontology engineering focus - Create .opencode/agent/ontology-mapping-rules.md (665 lines) with detailed guidelines: * Ontology consultation workflows (Rule 1) * Wikidata entity mapping procedures (Rule 2) * Multi-aspect modeling requirements (Rule 3) * Temporal independence documentation (Rule 4) * Property research workflows (Rule 5) * Decision trees for ontology selection (Rule 6-7) * Quality assurance checklists (Rule 8-9) * Agent collaboration protocols (Rule 10) - Create ONTOLOGY_RULES_SUMMARY.md as quick reference guide Key principles established: 1. Wikidata Q-numbers are NOT ontology classes (must be mapped) 2. Every heritage entity has multiple aspects with independent temporal lifecycles 3. Base ontologies (CPOV, TOOI, CIDOC-CRM, RiC-O, Schema.org, PiCo) are source of truth 4. Custom properties forbidden when ontology equivalents exist Example: 'Mansion' (Q1802963) requires modeling as: - Place aspect (crm:E27_Site, construction→present) - Custodian aspect (cpov:PublicOrganisation OR schema:Museum, founding→present) - Legal form aspect (org:FormalOrganization, registration→present) - Collections aspect (crm:E78_Curated_Holding, accession→present) - People aspect (picom:PersonObservation, employment periods) - Temporal events (crm:E10_Transfer_of_Custody for custody changes) All agents MUST read ontology files before schema design. --- .opencode/agent/ontology-mapping-rules.md | 665 ++++++++++++++++++++++ AGENTS.md | 128 +++++ ONTOLOGY_RULES_SUMMARY.md | 213 +++++++ 3 files changed, 1006 insertions(+) create mode 100644 .opencode/agent/ontology-mapping-rules.md create mode 100644 ONTOLOGY_RULES_SUMMARY.md diff --git a/.opencode/agent/ontology-mapping-rules.md b/.opencode/agent/ontology-mapping-rules.md new file mode 100644 index 0000000000..10e39eb903 --- /dev/null +++ b/.opencode/agent/ontology-mapping-rules.md @@ -0,0 +1,665 @@ +# Ontology Mapping Rules for Heritage Custodian Project + +**Version**: 1.0 +**Last Updated**: 2025-11-20 +**Purpose**: Define rigorous ontological mapping procedures for AI agents working on the GLAM heritage custodian data project + +--- + +## Core Principle: Ontology-First Design + +**CRITICAL**: The primary objective of this project is to create a **comprehensive, nuanced ontology** that can accurately represent the complex, temporal, multi-faceted nature of heritage custodian institutions worldwide. + +### What This Means + +- ✅ **DO**: Study ontology files deeply before creating classes or properties +- ✅ **DO**: Map Wikidata entities to formal ontology classes with explicit rationale +- ✅ **DO**: Model temporal independence of different aspects (place, custodian, legal form, collections, people) +- ✅ **DO**: Support multiple ontology classes for the same entity (CPOV + TOOI + Schema.org + CIDOC-CRM) +- ❌ **DON'T**: Use Wikidata Q-numbers directly as ontology classes +- ❌ **DON'T**: Create generic "HeritageCustodian" mappings without considering semantic aspects +- ❌ **DON'T**: Ignore temporal dimensions (everything changes over time!) + +--- + +## Rule 1: Ontology Files Are Source of Truth + +**All ontology design MUST reference base ontologies in `/data/ontology/`.** + +### Available Ontologies + +| Ontology | File | Scope | When to Use | +|----------|------|-------|-------------| +| **CPOV** | `core-public-organisation-ap.ttl` | EU public sector | Government archives, state museums, public cultural institutions | +| **TOOI** | `tooiont.ttl` | Dutch government | Netherlands government heritage organizations | +| **Schema.org** | `schemaorg.owl` | Web semantics | Private collections, web discoverability, general fallback | +| **CIDOC-CRM** | `CIDOC_CRM_v7.1.3.rdf` | Cultural heritage domain | Museums, sites, curated holdings, provenance | +| **RiC-O** | `RiC-O_1-1.rdf` | Archival description | Archives, record sets, corporate bodies | +| **BIBFRAME** | `bibframe_vocabulary.rdf` | Bibliographic resources | Libraries, bibliographic collections | +| **PiCo** | `pico.ttl` | Person observations | Staff, curators, archivists, directors | +| **W3C Org** | (embedded in CPOV) | Organizational structure | Legal forms, organizational units | + +### Mandatory Ontology Consultation Workflow + +**Before designing any LinkML class, agents MUST:** + +1. **Identify the semantic domain** (cultural, archival, educational, legal, etc.) +2. **Read relevant ontology files** using `read` or `grep` tools +3. **Extract applicable classes and properties** +4. **Document ontology alignment** in design notes +5. **Map Wikidata hypernyms to ontology classes** (not vice versa!) + +**Example Workflow**: + +```bash +# Step 1: Identify domain +# Entity: "mansion" (building + potential heritage custodian) + +# Step 2: Search CIDOC-CRM for site/building classes +rg "E27_Site|E53_Place" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf + +# Step 3: Search Schema.org for building types +rg "LandmarksOrHistoricalBuildings|TouristAttraction" /Users/kempersc/apps/glam/data/ontology/schemaorg.owl + +# Step 4: Search CPOV for organization classes (if mansion operates as museum) +rg "PublicOrganisation|classification" /Users/kempersc/apps/glam/data/ontology/core-public-organisation-ap.ttl + +# Step 5: Document findings in design notes +# "Mansion should map to crm:E27_Site (place aspect) AND +# cpov:PublicOrganisation (custodian aspect if operates as museum)" +``` + +--- + +## Rule 2: Never Use Wikidata Entities Directly + +**Wikidata Q-numbers are NOT ontology classes. They are ENTITY IDENTIFIERS.** + +### Incorrect Approach ❌ + +```yaml +# BAD - Wikidata Q-number used as class +HeritageCustodian: + class_uri: wd:Q1802963 # ← This is an INSTANCE (mansion), not a CLASS! +``` + +### Correct Approach ✅ + +```yaml +# GOOD - Wikidata entity mapped to formal ontology classes +Mansion: + description: >- + Large residential building, often with heritage significance. + Wikidata reference: Q1802963 + + # Place aspect + place_class_uri: crm:E27_Site + place_secondary_uri: schema:LandmarksOrHistoricalBuildings + + # Custodian aspect (if operates as heritage institution) + custodian_class_uri: cpov:PublicOrganisation # If public + custodian_alt_uri: schema:Museum # If private + + # Collections aspect + collections_class_uri: crm:E78_Curated_Holding +``` + +### Wikidata Hypernym Files Purpose + +The files `/schemas/hyponyms_curated.yaml` and `/schemas/hyponyms_curated_full.yaml` are: + +- ✅ **Source data** for identifying heritage entity TYPES +- ✅ **Analysis input** for understanding domain taxonomy +- ✅ **Reference** for multilingual labels and descriptions +- ❌ **NOT** direct ontology class definitions + +**Required Mapping Workflow**: + +``` +hyponyms_curated.yaml (Wikidata entities) + ↓ +ANALYZE semantic properties + ↓ +SEARCH base ontologies for appropriate classes + ↓ +MAP Wikidata entity to ontology class(es) + ↓ +DOCUMENT rationale and properties + ↓ +CREATE LinkML schema with ontology class_uri +``` + +--- + +## Rule 3: Multi-Aspect Modeling is Mandatory + +**Every heritage entity has MULTIPLE ontological aspects with INDEPENDENT temporal lifecycles.** + +### Required Aspects + +All heritage custodian entities MUST model these aspects: + +1. **Place Aspect** (physical location/site) + - Ontology: CIDOC-CRM (E27_Site, E53_Place) + Schema.org (Place) + - Temporal: Construction → Demolition/Present + - Properties: Address, coordinates, building type, heritage designation + +2. **Custodian Aspect** (organization managing heritage) + - Ontology: CPOV (public) OR Schema.org (private) + CIDOC-CRM (E39_Actor) + - Temporal: Founding → Dissolution/Present + - Properties: Legal identifiers, organizational structure, mission + +3. **Legal Form Aspect** (legal entity registration) + - Ontology: W3C Org (FormalOrganization) + TOOI (Dutch) + - Temporal: Registration → Deregistration/Present + - Properties: KvK number, legal classification, registered address + +4. **Collections Aspect** (heritage materials preserved) + - Ontology: RiC-O (archival) OR CIDOC-CRM (museum) OR BIBFRAME (library) + - Temporal: Accession → Deaccession (per item/collection) + - Properties: Provenance, extent, access restrictions + +5. **People Aspect** (staff/curators) + - Ontology: PiCo (PersonObservation) + CIDOC-CRM (E21_Person) + - Temporal: Employment start → Employment end (per person) + - Properties: Roles, activities, employment records + +6. **Temporal Events** (organizational changes) + - Ontology: CIDOC-CRM (E10_Transfer_of_Custody, E8_Acquisition) + RiC-O (Event) + - Properties: Custody transfers, mergers, relocations, transformations + +### Example: Modeling a Historic Mansion Operating as Museum + +```yaml +# Entity: Villa Mondriaan (Winterswijk, Netherlands) + +# PLACE ASPECT +villa_mondriaan_place: + aspect_type: place + class_uri: crm:E27_Site + secondary_class_uri: schema:LandmarksOrHistoricalBuildings + temporal_extent: + construction_date: "1880-01-01" + current_status: standing + properties: + address: "Zonnebrink 4, 7101 NP Winterswijk" + coordinates: [51.9711, 6.7197] + heritage_designation: "Rijksmonument" + +# CUSTODIAN ASPECT +stichting_villa_mondriaan: + aspect_type: custodian + class_uri: cpov:PublicOrganisation # Dutch foundation with public benefit + secondary_class_uri: schema:Museum + temporal_extent: + founding_date: "1994-05-12" + current_status: active + properties: + legal_name: "Stichting Villa Mondriaan" + isil_code: "NL-WtVM" + manages: [villa_mondriaan_collections] + +# LEGAL FORM ASPECT +stichting_legal_entity: + aspect_type: legal_form + class_uri: org:FormalOrganization + mixin_class_uri: tooiont:Overheidsorganisatie # Dutch government org + temporal_extent: + registration_date: "1994-05-12" + current_status: registered + properties: + kvk_number: "12345678" + legal_form: "stichting" # Dutch foundation + +# COLLECTIONS ASPECT +villa_mondriaan_collections: + aspect_type: collections + class_uri: crm:E78_Curated_Holding + archival_class_uri: rico:RecordSet + temporal_extent: + accession_start: "1994-01-01" + current_status: growing + properties: + provenance: "Mondriaan family" + extent: "500 objects, 200 archival documents" + +# PEOPLE ASPECT +curator_maria_van_der_berg: + aspect_type: person + class_uri: picom:PersonObservation + secondary_class_uri: crm:E21_Person + temporal_extent: + employment_start: "2020-01-01" + current_status: employed + properties: + role: picot_roles:curator + works_for: stichting_villa_mondriaan +``` + +--- + +## Rule 4: Temporal Independence Documentation + +**All aspects have SEPARATE temporal lifecycles. Document this explicitly.** + +### Required Temporal Properties + +Every aspect MUST include: + +```yaml +temporal_extent: + start_date: "YYYY-MM-DD" # When this aspect began + end_date: "YYYY-MM-DD" or null # When aspect ended (null = ongoing) + certainty: "certain" | "approximate" | "inferred" + source: "archival_record" | "legal_registration" | "oral_history" | etc. +``` + +### Example: Temporal Independence in Custody Transfer + +```yaml +# Heineken corporate archive custody transfer (2005) + +# BEFORE TRANSFER (1864-2005) +heineken_corporate_archive: + custodian_aspect: + custodian_id: heineken_nv + class_uri: schema:Corporation + temporal_extent: + start_date: "1864-01-01" # Heineken founded + end_date: "2005-06-15" # Custody transferred + + collections_aspect: + class_uri: rico:RecordSet + provenance: "Heineken N.V." + temporal_extent: + start_date: "1864-01-01" + end_date: null # Collection still exists (just moved) + +# AFTER TRANSFER (2005-present) +heineken_archive_at_stadsarchief: + custodian_aspect: + custodian_id: stadsarchief_amsterdam + class_uri: cpov:PublicOrganisation + temporal_extent: + start_date: "2005-06-15" # Custody received + end_date: null # Ongoing + + collections_aspect: + class_uri: rico:RecordSet + provenance: "Heineken N.V." # ← Provenance unchanged! + temporal_extent: + start_date: "1864-01-01" # ← Collection dates unchanged! + end_date: null + +# CUSTODY TRANSFER EVENT +custody_transfer_event: + event_type: crm:E10_Transfer_of_Custody + class_uri: rico:Event + temporal_extent: + event_date: "2005-06-15" + properties: + surrendered_by: heineken_nv + received_by: stadsarchief_amsterdam + transferred_object: heineken_corporate_archive +``` + +--- + +## Rule 5: Ontology Properties Must Be Researched + +**Never invent custom properties when ontology equivalents exist.** + +### Property Research Workflow + +1. **Identify the relationship** you need to express +2. **Search base ontologies** for existing properties +3. **Use ontology property** with proper namespace +4. **Document property source** in comments + +**Example**: + +```yaml +# ❌ WRONG - Custom property invented +institution: + official_name: "Rijksarchief in Noord-Holland" + +# ✅ CORRECT - CPOV ontology property used +institution: + skos:prefLabel: "Rijksarchief in Noord-Holland"@nl + # Source: CPOV uses SKOS for preferred labels +``` + +### Common Property Mappings + +| Need | Ontology Property | Namespace | +|------|-------------------|-----------| +| Preferred name | `skos:prefLabel` | SKOS (used by CPOV) | +| Alternative names | `skos:altLabel` | SKOS | +| Identifiers | `dct:identifier` | Dublin Core Terms | +| Address | `locn:address` | W3C Location Core | +| Coordinates | `schema:geo` | Schema.org | +| Founding date | `schema:foundingDate` OR `tooiont:begindatum` | Schema.org / TOOI | +| Organizational unit | `cpov:hasUnit` OR `org:hasUnit` | CPOV / W3C Org | +| Curated collection | `crm:P147_curated` | CIDOC-CRM | +| Archival holdings | `rico:isOrWasHolderOf` | RiC-O | +| Person role | `picom:hasRole` | PiCo | +| Provenance | `rico:hasProvenance` OR `prov:hadPrimarySource` | RiC-O / PROV-O | + +--- + +## Rule 6: Decision Trees for Ontology Selection + +**Use structured decision trees to select appropriate ontologies.** + +### Decision Tree: Primary Ontology Class + +``` +START: Heritage entity identified + ↓ +Is it a physical place/site? + ├─ YES → PRIMARY: crm:E27_Site + schema:Place + │ Continue to check if also a custodian organization ↓ + │ + └─ NO → Is it an organization? + ├─ YES → Is it public sector? + │ ├─ YES → cpov:PublicOrganisation + │ │ Is it Dutch government? + │ │ ├─ YES → ADD MIXIN: tooiont:Overheidsorganisatie + │ │ └─ NO → CPOV only + │ │ + │ └─ NO → schema:Organization + │ What type? + │ ├─ Museum → schema:Museum + │ ├─ Library → schema:Library + │ ├─ Archive → schema:ArchiveOrganization + │ ├─ Education → schema:EducationalOrganization + │ └─ NGO → schema:NGO + │ + └─ NO → Is it a collection? + ├─ Archival → rico:RecordSet + ├─ Museum → crm:E78_Curated_Holding + ├─ Library → bf:Collection + └─ Mixed → Use multiple classes +``` + +### Decision Tree: Dutch vs. EU vs. Global + +``` +START: Determine geographic/legal scope + ↓ +Country == "Netherlands"? + ├─ YES → Legal status == "public"? + │ ├─ YES → USE: tooiont:Overheidsorganisatie (Dutch government) + │ │ ALSO ADD: cpov:PublicOrganisation (EU compliance) + │ │ + │ └─ NO → USE: schema:Organization (private) + │ ADD: DutchLegalEntityMixin (KvK numbers) + │ + └─ NO → In Europe? + ├─ YES → Legal status == "public"? + │ ├─ YES → USE: cpov:PublicOrganisation + │ └─ NO → USE: schema:Organization + │ + └─ NO → USE: schema:Organization (global) + ADD domain-specific class: + - schema:Museum + - schema:ArchiveOrganization + - schema:Library +``` + +--- + +## Rule 7: Documentation Requirements + +**All ontology mappings MUST be documented with rationale.** + +### Required Documentation Fields + +```yaml +ontology_mapping: + wikidata_source: Q1802963 # Wikidata entity being mapped + wikidata_label: mansion + + primary_class: + uri: crm:E27_Site + namespace: http://www.cidoc-crm.org/cidoc-crm/ + rationale: >- + CIDOC-CRM E27_Site for physical heritage buildings with + archaeological/architectural significance. + ontology_file: data/ontology/CIDOC_CRM_v7.1.3.rdf + ontology_section: "Lines 1234-1267" # Optional + + secondary_class: + uri: schema:LandmarksOrHistoricalBuildings + namespace: http://schema.org/ + rationale: Web discoverability for historic landmarks + ontology_file: data/ontology/schemaorg.owl + + properties: + - uri: crm:P1_is_identified_by + range: crm:E41_Appellation + usage: Building name identification + example: "Buitenplaats Beeckestijn" + + - uri: schema:geo + range: schema:GeoCoordinates + usage: Geographic coordinates + example: "{latitude: 51.9711, longitude: 6.7197}" + + temporal_model: + aspects: + - place # Physical site + - custodian # If operates as heritage institution + - collections # If holds curated materials + + temporal_independence_note: >- + Place existence (construction → present) is independent from + custodian organization lifecycle (founding → present). + + complexity_score: 9 # 1-10 scale + reviewed_by: human_expert + review_date: "2025-11-20" +``` + +--- + +## Rule 8: Prohibited Practices + +**The following practices are STRICTLY FORBIDDEN:** + +### ❌ Prohibited + +1. **Using Wikidata Q-numbers as class URIs** + ```yaml + # FORBIDDEN + class_uri: wd:Q33506 # This is an entity, not a class! + ``` + +2. **Creating custom properties without ontology research** + ```yaml + # FORBIDDEN + slots: + institution_official_name: # Use skos:prefLabel instead! + ``` + +3. **Single-ontology mappings for complex entities** + ```yaml + # FORBIDDEN - Mansion is BOTH place AND potential custodian + Mansion: + class_uri: schema:Place # ← Missing custodian aspect! + ``` + +4. **Ignoring temporal dimensions** + ```yaml + # FORBIDDEN - No temporal tracking + custodian: + name: "Heineken Archive" + location: "Amsterdam" + # ← Where are the dates? Which period does this describe? + ``` + +5. **Binary public/private classifications** + ```yaml + # FORBIDDEN - Too simplistic + PublicHeritageCustodian: # What about NGOs? Foundations? Mixed? + PrivateHeritageCustodian: # What about government corporations? + ``` + +--- + +## Rule 9: Quality Assurance Checklist + +**Before submitting any ontology design, verify:** + +- [ ] All base ontologies consulted (`/data/ontology/` files read) +- [ ] Wikidata entities mapped to formal ontology classes (not used directly) +- [ ] Multi-aspect modeling applied (place, custodian, legal, collections, people) +- [ ] Temporal independence documented for each aspect +- [ ] Properties sourced from ontologies (not custom inventions) +- [ ] Decision trees applied for ontology selection +- [ ] Rationale documented for all class/property choices +- [ ] Examples provided with real-world entities +- [ ] Complexity score assigned (1-10 scale) +- [ ] Human review requested for complexity ≥ 7 + +--- + +## Rule 10: Agent Collaboration Protocol + +**When working with other agents or humans:** + +1. **Always cite ontology files** in design discussions + - "According to CIDOC-CRM (lines 1234-1267 in CIDOC_CRM_v7.1.3.rdf)..." + +2. **Share ontology search commands** for reproducibility + ```bash + rg "E27_Site" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf + ``` + +3. **Document disagreements** with explicit rationale + - "Agent A suggests schema:Museum, but I recommend cpov:PublicOrganisation + because institution is government-operated (see TOOI classification rules)." + +4. **Request human review** for: + - Complexity score ≥ 7 + - Conflicting ontology recommendations + - Temporal modeling ambiguities + - Novel aspect combinations + +--- + +## Example: Complete Ontology Mapping Workflow + +**Scenario**: Map Wikidata Q3437789 (heemkamer - local history room) + +### Step 1: Research Entity +```bash +# Read Wikidata metadata from hyponyms_curated_full.yaml +grep -A 100 "Q3437789" /Users/kempersc/apps/glam/data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated_full.yaml +``` + +**Findings**: +- Dutch concept: "Local history room/museum" +- Usually operated by volunteers/heritage societies +- Mix of museum, archive, library functions +- Often in small municipalities + +### Step 2: Search Base Ontologies +```bash +# Search CPOV for organizational types +rg "classification|OrganisationType" /Users/kempersc/apps/glam/data/ontology/core-public-organisation-ap.ttl + +# Search Schema.org for community organizations +rg "NGO|CivicStructure|LocalBusiness" /Users/kempersc/apps/glam/data/ontology/schemaorg.owl + +# Search CIDOC-CRM for community groups +rg "E74_Group|E40_Legal_Body" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf +``` + +### Step 3: Apply Decision Trees + +**Geographic scope**: Netherlands → Check TOOI +**Legal status**: Usually private foundation (stichting) or association (vereniging) +**Function**: Collects + Preserves + Exhibits local heritage → Multi-functional + +**Decision**: +- PRIMARY: `schema:NGO` (non-governmental heritage organization) +- SECONDARY: `crm:E74_Group` (community heritage group) +- DUTCH MIXIN: `DutchLegalEntityMixin` (KvK registration) + +### Step 4: Model Aspects + +```yaml +heemkamer: + wikidata_id: Q3437789 + ontology_mapping: + + # CUSTODIAN ASPECT + custodian_class: schema:NGO + custodian_secondary: crm:E74_Group + rationale: >- + Non-governmental community heritage organization. + Not public sector (excludes CPOV). Uses Schema.org NGO. + + # PLACE ASPECT (often operates in specific building) + place_class: schema:CivicStructure + place_secondary: crm:E27_Site + + # LEGAL FORM ASPECT (Dutch foundation/association) + legal_class: org:FormalOrganization + legal_dutch_mixin: DutchLegalEntityMixin + properties: + kvk_number: required + legal_form: "stichting OR vereniging" + + # COLLECTIONS ASPECT (multi-functional) + collections_classes: + - rico:RecordSet # Local archival materials + - crm:E78_Curated_Holding # Museum objects + - bf:Collection # Local history books + + # PEOPLE ASPECT (volunteers) + people_class: picom:PersonObservation + people_roles: + - picot_roles:curator + - picot_roles:volunteer_archivist + - picot_roles:educator + + temporal_model: + aspects: + - custodian # Founding → present/closure + - place # Building occupancy (may change) + - collections # Accessions over time + - people # Volunteer participation periods +``` + +### Step 5: Document and Review + +```yaml +ontology_enrichment: + complexity_score: 8 # Multi-functional, temporal complexity + requires_human_review: true + review_notes: >- + Heemkamer concept is Dutch-specific with no direct + international equivalent. Multi-functional nature + (museum + archive + library) requires careful aspect modeling. +``` + +--- + +## Summary: Key Takeaways for Agents + +1. **Ontology files are your bible** - Read them first, always +2. **Wikidata is data, not ontology** - Map Q-numbers to formal classes +3. **Everything has multiple aspects** - Place, custodian, legal, collections, people +4. **Time is always a factor** - Model temporal independence +5. **Properties must be justified** - Use ontology properties, document rationale +6. **Complexity is reality** - Don't oversimplify, embrace nuance +7. **Document everything** - Future agents/humans need your reasoning +8. **Ask for help** - Complex cases require human review + +**When in doubt**: Read the ontology files, consult AGENTS.md, request human guidance. + +--- + +**End of Ontology Mapping Rules v1.0** diff --git a/AGENTS.md b/AGENTS.md index 42af156ff4..8c5b861066 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -2,6 +2,134 @@ This document provides instructions for AI agents (particularly OpenCODE and Claude) to assist with extracting heritage institution data from conversation JSON files and other sources. +--- + +## 🎯 PROJECT CORE MISSION + +**PRIMARY OBJECTIVE**: Create a comprehensive, nuanced ontology that accurately represents the complex, temporal, multi-faceted nature of heritage custodian institutions worldwide. + +This is NOT a simple data extraction project. This is an **ontology engineering project** that: +- Models heritage entities as multi-aspect temporal entities (place, custodian, legal form, collections, people) +- Integrates multiple base ontologies (CPOV, TOOI, CIDOC-CRM, RiC-O, Schema.org, PiCo) +- Captures organizational change events over time (custody transfers, mergers, transformations) +- Distinguishes between nominal references and formal organizational structures +- Links heritage custodians to people, collections, and locations with independent temporal lifecycles + +**If you're looking for simple NER extraction, this is not the right project.** + +--- + +## 🚨 CRITICAL RULES FOR ALL AGENTS + +### Rule 1: Ontology Files Are Your Primary Reference + +**BEFORE** designing any schema, class, or property: + +1. **READ the base ontology files** in `/data/ontology/` +2. **SEARCH for existing classes and properties** that match your needs +3. **DOCUMENT your ontology alignment** with explicit rationale +4. **NEVER invent custom properties** when ontology equivalents exist + +**Available Ontologies**: +- `data/ontology/core-public-organisation-ap.ttl` - CPOV (EU public sector) +- `data/ontology/tooiont.ttl` - TOOI (Dutch government) +- `data/ontology/schemaorg.owl` - Schema.org (web semantics, private sector) +- `data/ontology/CIDOC_CRM_v7.1.3.rdf` - CIDOC-CRM (cultural heritage domain) +- `data/ontology/RiC-O_1-1.rdf` - Records in Contexts (archival description) +- `data/ontology/bibframe_vocabulary.rdf` - BIBFRAME (libraries) +- `data/ontology/pico.ttl` - PiCo (person observations, staff roles) + +**See** `.opencode/agent/ontology-mapping-rules.md` for complete ontology consultation workflow. + +### Rule 2: Wikidata Entities Are NOT Ontology Classes + +**Files**: +- `data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated.yaml` +- `data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated_full.yaml` + +**These files contain**: +- ✅ Wikidata entity identifiers (Q-numbers) for heritage institution TYPES +- ✅ Multilingual labels and descriptions +- ✅ Hypernym classifications (upper-level categories) +- ✅ Source data for ontology mapping analysis + +**These files DO NOT contain**: +- ❌ Formal ontology class definitions +- ❌ Direct `class_uri` mappings for LinkML +- ❌ Ontology properties or relationships + +**REQUIRED WORKFLOW**: +``` +hyponyms_curated.yaml (Wikidata Q-numbers) + ↓ +ANALYZE semantic meaning + hypernyms + ↓ +SEARCH base ontologies for matching classes + ↓ +MAP Wikidata entity → Ontology class(es) + ↓ +DOCUMENT rationale + properties + ↓ +CREATE LinkML schema with ontology class_uri +``` + +**Example - WRONG** ❌: +```yaml +Mansion: + class_uri: wd:Q1802963 # ← This is an ENTITY, not a CLASS! +``` + +**Example - CORRECT** ✅: +```yaml +Mansion: + # Wikidata source: Q1802963 + place_aspect: + class_uri: crm:E27_Site # CIDOC-CRM ontology class + custodian_aspect: + class_uri: cpov:PublicOrganisation # If operates as museum +``` + +### Rule 3: Multi-Aspect Modeling is Mandatory + +**Every heritage entity has MULTIPLE ontological aspects with INDEPENDENT temporal lifecycles.** + +**Required Aspects**: + +1. **Place Aspect** (physical location/site) + - Ontology: `crm:E27_Site` + `schema:Place` + - Temporal: Construction → Demolition/Present + +2. **Custodian Aspect** (organization managing heritage) + - Ontology: `cpov:PublicOrganisation` OR `schema:Organization` + - Temporal: Founding → Dissolution/Present + +3. **Legal Form Aspect** (legal entity registration) + - Ontology: `org:FormalOrganization` + `tooiont:Overheidsorganisatie` (Dutch) + - Temporal: Registration → Deregistration/Present + +4. **Collections Aspect** (heritage materials) + - Ontology: `rico:RecordSet` OR `crm:E78_Curated_Holding` OR `bf:Collection` + - Temporal: Accession → Deaccession (per item) + +5. **People Aspect** (staff, curators) + - Ontology: `picom:PersonObservation` + `crm:E21_Person` + - Temporal: Employment start → Employment end (per person) + +6. **Temporal Events** (organizational changes) + - Ontology: `crm:E10_Transfer_of_Custody`, `rico:Event` + - Tracks custody transfers, mergers, relocations, transformations + +**Example**: A historic mansion operating as a museum has: +- **Place aspect**: Building constructed 1880, still standing (143 years) +- **Custodian aspect**: Foundation established 1994 to operate museum (30 years) +- **Legal form**: Dutch stichting registered 1994, KvK #12345678 +- **Collections**: Mondrian artworks acquired 1994-2024 +- **People**: Current curator employed 2020-present + +**Each aspect changes independently over time!** + +--- + ## Project Overview **Goal**: Extract structured data about worldwide GLAMORCUBESFIXPHDNT (Galleries, Libraries, Archives, Museums, Official institutions, Research centers, Corporations, Unknown, Botanical gardens/zoos, Educational providers, Societies, Features, Intangible heritage groups, miXed, Personal collections, Holy sites, Digital platforms, NGOs, Taste/smell heritage) institutions from 139+ Claude conversation JSON files and integrate with authoritative CSV datasets. diff --git a/ONTOLOGY_RULES_SUMMARY.md b/ONTOLOGY_RULES_SUMMARY.md new file mode 100644 index 0000000000..11d80cb4c8 --- /dev/null +++ b/ONTOLOGY_RULES_SUMMARY.md @@ -0,0 +1,213 @@ +# Ontology Mapping Rules - Quick Reference + +**Created**: 2025-11-20 +**Purpose**: Summary of critical ontology engineering rules for heritage custodian project + +--- + +## Key Changes Made + +### 1. Updated AGENTS.md +Added **PROJECT CORE MISSION** section at top emphasizing: +- This is an **ontology engineering project**, not simple data extraction +- Multi-aspect temporal modeling is required +- Multiple base ontologies must be integrated +- Wikidata entities are NOT ontology classes + +### 2. Created .opencode/agent/ontology-mapping-rules.md +Comprehensive 30-page guide covering: +- Ontology consultation workflows +- Wikidata entity mapping procedures +- Multi-aspect modeling requirements +- Temporal independence documentation +- Property research workflows +- Decision trees for ontology selection +- Quality assurance checklists + +--- + +## Core Principles + +### Principle 1: Ontology Files Are Source of Truth +**ALWAYS** read base ontologies before designing: +```bash +# Example: Research CIDOC-CRM for heritage sites +rg "E27_Site|E53_Place" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf +``` + +### Principle 2: Wikidata ≠ Ontology +**NEVER** use Wikidata Q-numbers as `class_uri`: +```yaml +❌ WRONG: class_uri: wd:Q1802963 +✅ RIGHT: class_uri: crm:E27_Site # After mapping Q1802963 to ontology +``` + +### Principle 3: Multi-Aspect Modeling +**EVERY** heritage entity has multiple aspects: +- **Place** (construction → present) +- **Custodian** (founding → present) +- **Legal form** (registration → present) +- **Collections** (accession → present) +- **People** (employment periods) +- **Events** (custody transfers, mergers) + +### Principle 4: Temporal Independence +**Each aspect has its OWN timeline:** +```yaml +# Building exists 1880-present (144 years) +place_aspect: + temporal_extent: + start_date: "1880-01-01" + end_date: null + +# Museum organization founded 1994-present (30 years) +custodian_aspect: + temporal_extent: + start_date: "1994-05-12" + end_date: null +``` + +--- + +## Available Ontologies + +| Ontology | File | Use For | +|----------|------|---------| +| **CPOV** | `core-public-organisation-ap.ttl` | EU public sector heritage | +| **TOOI** | `tooiont.ttl` | Dutch government organizations | +| **Schema.org** | `schemaorg.owl` | Web semantics, private sector | +| **CIDOC-CRM** | `CIDOC_CRM_v7.1.3.rdf` | Cultural heritage domain | +| **RiC-O** | `RiC-O_1-1.rdf` | Archival description | +| **BIBFRAME** | `bibframe_vocabulary.rdf` | Library collections | +| **PiCo** | `pico.ttl` | Person observations, staff roles | + +--- + +## Required Workflow + +``` +1. Read hyponyms_curated.yaml (Wikidata entities) + ↓ +2. Analyze hypernym + semantic properties + ↓ +3. Search base ontologies for matching classes + ↓ +4. Map Wikidata entity → Ontology class(es) + ↓ +5. Extract relevant properties from ontologies + ↓ +6. Document rationale and temporal model + ↓ +7. Create LinkML schema with class_uri + ↓ +8. Human review if complexity ≥ 7/10 +``` + +--- + +## Example: Mansion (Q1802963) + +### ❌ Wrong Approach +```yaml +Mansion: + class_uri: wd:Q1802963 # Wikidata entity used directly +``` + +### ✅ Correct Approach +```yaml +Mansion: + wikidata_source: Q1802963 + + # PLACE ASPECT + place_aspect: + class_uri: crm:E27_Site # CIDOC-CRM + secondary_class_uri: schema:LandmarksOrHistoricalBuildings + temporal_extent: + start_date: "1880-01-01" # Construction + + # CUSTODIAN ASPECT (if operates as museum) + custodian_aspect: + class_uri: cpov:PublicOrganisation # If public + alt_class_uri: schema:Museum # If private + temporal_extent: + start_date: "1994-05-12" # Foundation established + + # COLLECTIONS ASPECT + collections_aspect: + class_uri: crm:E78_Curated_Holding + temporal_extent: + start_date: "1994-01-01" # Accessions begin +``` + +--- + +## Decision Tree: Ontology Selection + +``` +Is it Dutch government? + ├─ YES → tooiont:Overheidsorganisatie + cpov:PublicOrganisation + └─ NO → Is it public sector? + ├─ YES → cpov:PublicOrganisation + └─ NO → schema:Organization + ├─ Museum → schema:Museum + ├─ Archive → schema:ArchiveOrganization + ├─ Library → schema:Library + └─ NGO → schema:NGO + +Is it a physical site? + ├─ YES → crm:E27_Site + schema:Place + └─ NO → Continue with organizational classes + +Does it hold collections? + ├─ Archival → rico:RecordSet + ├─ Museum → crm:E78_Curated_Holding + └─ Library → bf:Collection + +Does it have staff? + └─ YES → picom:PersonObservation + crm:E21_Person +``` + +--- + +## Quality Checklist + +Before submitting ontology design: + +- [ ] Base ontologies consulted (`/data/ontology/` files read) +- [ ] Wikidata entities mapped (not used directly as classes) +- [ ] Multi-aspect modeling applied +- [ ] Temporal independence documented +- [ ] Properties sourced from ontologies +- [ ] Rationale documented +- [ ] Examples provided +- [ ] Complexity score assigned (1-10) +- [ ] Human review requested if complexity ≥ 7 + +--- + +## Files Updated + +1. **AGENTS.md** - Added PROJECT CORE MISSION section (lines 1-100) +2. **.opencode/agent/ontology-mapping-rules.md** - NEW comprehensive guide +3. **This file** (ONTOLOGY_RULES_SUMMARY.md) - Quick reference + +--- + +## Next Steps + +1. Continue manual ontology mapping for hyponyms_curated.yaml entries +2. Document each mapping with full rationale +3. Build aspect-based LinkML schema modules +4. Create temporal modeling examples for common patterns + +--- + +## Key Resources + +- **Full Rules**: `.opencode/agent/ontology-mapping-rules.md` +- **Agent Instructions**: `AGENTS.md` +- **Ontology Files**: `data/ontology/` +- **Wikidata Sources**: `data/wikidata/GLAMORCUBEPSXHFN/` + +**Remember**: This is ontology engineering, not data extraction. Precision matters more than speed. +