Add comprehensive ontology mapping rules and update project mission
- Update AGENTS.md with PROJECT CORE MISSION section emphasizing ontology engineering focus - Create .opencode/agent/ontology-mapping-rules.md (665 lines) with detailed guidelines: * Ontology consultation workflows (Rule 1) * Wikidata entity mapping procedures (Rule 2) * Multi-aspect modeling requirements (Rule 3) * Temporal independence documentation (Rule 4) * Property research workflows (Rule 5) * Decision trees for ontology selection (Rule 6-7) * Quality assurance checklists (Rule 8-9) * Agent collaboration protocols (Rule 10) - Create ONTOLOGY_RULES_SUMMARY.md as quick reference guide Key principles established: 1. Wikidata Q-numbers are NOT ontology classes (must be mapped) 2. Every heritage entity has multiple aspects with independent temporal lifecycles 3. Base ontologies (CPOV, TOOI, CIDOC-CRM, RiC-O, Schema.org, PiCo) are source of truth 4. Custom properties forbidden when ontology equivalents exist Example: 'Mansion' (Q1802963) requires modeling as: - Place aspect (crm:E27_Site, construction→present) - Custodian aspect (cpov:PublicOrganisation OR schema:Museum, founding→present) - Legal form aspect (org:FormalOrganization, registration→present) - Collections aspect (crm:E78_Curated_Holding, accession→present) - People aspect (picom:PersonObservation, employment periods) - Temporal events (crm:E10_Transfer_of_Custody for custody changes) All agents MUST read ontology files before schema design.
This commit is contained in:
parent
e6684e815b
commit
176a7479f9
3 changed files with 1006 additions and 0 deletions
665
.opencode/agent/ontology-mapping-rules.md
Normal file
665
.opencode/agent/ontology-mapping-rules.md
Normal file
|
|
@ -0,0 +1,665 @@
|
|||
# Ontology Mapping Rules for Heritage Custodian Project
|
||||
|
||||
**Version**: 1.0
|
||||
**Last Updated**: 2025-11-20
|
||||
**Purpose**: Define rigorous ontological mapping procedures for AI agents working on the GLAM heritage custodian data project
|
||||
|
||||
---
|
||||
|
||||
## Core Principle: Ontology-First Design
|
||||
|
||||
**CRITICAL**: The primary objective of this project is to create a **comprehensive, nuanced ontology** that can accurately represent the complex, temporal, multi-faceted nature of heritage custodian institutions worldwide.
|
||||
|
||||
### What This Means
|
||||
|
||||
- ✅ **DO**: Study ontology files deeply before creating classes or properties
|
||||
- ✅ **DO**: Map Wikidata entities to formal ontology classes with explicit rationale
|
||||
- ✅ **DO**: Model temporal independence of different aspects (place, custodian, legal form, collections, people)
|
||||
- ✅ **DO**: Support multiple ontology classes for the same entity (CPOV + TOOI + Schema.org + CIDOC-CRM)
|
||||
- ❌ **DON'T**: Use Wikidata Q-numbers directly as ontology classes
|
||||
- ❌ **DON'T**: Create generic "HeritageCustodian" mappings without considering semantic aspects
|
||||
- ❌ **DON'T**: Ignore temporal dimensions (everything changes over time!)
|
||||
|
||||
---
|
||||
|
||||
## Rule 1: Ontology Files Are Source of Truth
|
||||
|
||||
**All ontology design MUST reference base ontologies in `/data/ontology/`.**
|
||||
|
||||
### Available Ontologies
|
||||
|
||||
| Ontology | File | Scope | When to Use |
|
||||
|----------|------|-------|-------------|
|
||||
| **CPOV** | `core-public-organisation-ap.ttl` | EU public sector | Government archives, state museums, public cultural institutions |
|
||||
| **TOOI** | `tooiont.ttl` | Dutch government | Netherlands government heritage organizations |
|
||||
| **Schema.org** | `schemaorg.owl` | Web semantics | Private collections, web discoverability, general fallback |
|
||||
| **CIDOC-CRM** | `CIDOC_CRM_v7.1.3.rdf` | Cultural heritage domain | Museums, sites, curated holdings, provenance |
|
||||
| **RiC-O** | `RiC-O_1-1.rdf` | Archival description | Archives, record sets, corporate bodies |
|
||||
| **BIBFRAME** | `bibframe_vocabulary.rdf` | Bibliographic resources | Libraries, bibliographic collections |
|
||||
| **PiCo** | `pico.ttl` | Person observations | Staff, curators, archivists, directors |
|
||||
| **W3C Org** | (embedded in CPOV) | Organizational structure | Legal forms, organizational units |
|
||||
|
||||
### Mandatory Ontology Consultation Workflow
|
||||
|
||||
**Before designing any LinkML class, agents MUST:**
|
||||
|
||||
1. **Identify the semantic domain** (cultural, archival, educational, legal, etc.)
|
||||
2. **Read relevant ontology files** using `read` or `grep` tools
|
||||
3. **Extract applicable classes and properties**
|
||||
4. **Document ontology alignment** in design notes
|
||||
5. **Map Wikidata hypernyms to ontology classes** (not vice versa!)
|
||||
|
||||
**Example Workflow**:
|
||||
|
||||
```bash
|
||||
# Step 1: Identify domain
|
||||
# Entity: "mansion" (building + potential heritage custodian)
|
||||
|
||||
# Step 2: Search CIDOC-CRM for site/building classes
|
||||
rg "E27_Site|E53_Place" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf
|
||||
|
||||
# Step 3: Search Schema.org for building types
|
||||
rg "LandmarksOrHistoricalBuildings|TouristAttraction" /Users/kempersc/apps/glam/data/ontology/schemaorg.owl
|
||||
|
||||
# Step 4: Search CPOV for organization classes (if mansion operates as museum)
|
||||
rg "PublicOrganisation|classification" /Users/kempersc/apps/glam/data/ontology/core-public-organisation-ap.ttl
|
||||
|
||||
# Step 5: Document findings in design notes
|
||||
# "Mansion should map to crm:E27_Site (place aspect) AND
|
||||
# cpov:PublicOrganisation (custodian aspect if operates as museum)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rule 2: Never Use Wikidata Entities Directly
|
||||
|
||||
**Wikidata Q-numbers are NOT ontology classes. They are ENTITY IDENTIFIERS.**
|
||||
|
||||
### Incorrect Approach ❌
|
||||
|
||||
```yaml
|
||||
# BAD - Wikidata Q-number used as class
|
||||
HeritageCustodian:
|
||||
class_uri: wd:Q1802963 # ← This is an INSTANCE (mansion), not a CLASS!
|
||||
```
|
||||
|
||||
### Correct Approach ✅
|
||||
|
||||
```yaml
|
||||
# GOOD - Wikidata entity mapped to formal ontology classes
|
||||
Mansion:
|
||||
description: >-
|
||||
Large residential building, often with heritage significance.
|
||||
Wikidata reference: Q1802963
|
||||
|
||||
# Place aspect
|
||||
place_class_uri: crm:E27_Site
|
||||
place_secondary_uri: schema:LandmarksOrHistoricalBuildings
|
||||
|
||||
# Custodian aspect (if operates as heritage institution)
|
||||
custodian_class_uri: cpov:PublicOrganisation # If public
|
||||
custodian_alt_uri: schema:Museum # If private
|
||||
|
||||
# Collections aspect
|
||||
collections_class_uri: crm:E78_Curated_Holding
|
||||
```
|
||||
|
||||
### Wikidata Hypernym Files Purpose
|
||||
|
||||
The files `/schemas/hyponyms_curated.yaml` and `/schemas/hyponyms_curated_full.yaml` are:
|
||||
|
||||
- ✅ **Source data** for identifying heritage entity TYPES
|
||||
- ✅ **Analysis input** for understanding domain taxonomy
|
||||
- ✅ **Reference** for multilingual labels and descriptions
|
||||
- ❌ **NOT** direct ontology class definitions
|
||||
|
||||
**Required Mapping Workflow**:
|
||||
|
||||
```
|
||||
hyponyms_curated.yaml (Wikidata entities)
|
||||
↓
|
||||
ANALYZE semantic properties
|
||||
↓
|
||||
SEARCH base ontologies for appropriate classes
|
||||
↓
|
||||
MAP Wikidata entity to ontology class(es)
|
||||
↓
|
||||
DOCUMENT rationale and properties
|
||||
↓
|
||||
CREATE LinkML schema with ontology class_uri
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rule 3: Multi-Aspect Modeling is Mandatory
|
||||
|
||||
**Every heritage entity has MULTIPLE ontological aspects with INDEPENDENT temporal lifecycles.**
|
||||
|
||||
### Required Aspects
|
||||
|
||||
All heritage custodian entities MUST model these aspects:
|
||||
|
||||
1. **Place Aspect** (physical location/site)
|
||||
- Ontology: CIDOC-CRM (E27_Site, E53_Place) + Schema.org (Place)
|
||||
- Temporal: Construction → Demolition/Present
|
||||
- Properties: Address, coordinates, building type, heritage designation
|
||||
|
||||
2. **Custodian Aspect** (organization managing heritage)
|
||||
- Ontology: CPOV (public) OR Schema.org (private) + CIDOC-CRM (E39_Actor)
|
||||
- Temporal: Founding → Dissolution/Present
|
||||
- Properties: Legal identifiers, organizational structure, mission
|
||||
|
||||
3. **Legal Form Aspect** (legal entity registration)
|
||||
- Ontology: W3C Org (FormalOrganization) + TOOI (Dutch)
|
||||
- Temporal: Registration → Deregistration/Present
|
||||
- Properties: KvK number, legal classification, registered address
|
||||
|
||||
4. **Collections Aspect** (heritage materials preserved)
|
||||
- Ontology: RiC-O (archival) OR CIDOC-CRM (museum) OR BIBFRAME (library)
|
||||
- Temporal: Accession → Deaccession (per item/collection)
|
||||
- Properties: Provenance, extent, access restrictions
|
||||
|
||||
5. **People Aspect** (staff/curators)
|
||||
- Ontology: PiCo (PersonObservation) + CIDOC-CRM (E21_Person)
|
||||
- Temporal: Employment start → Employment end (per person)
|
||||
- Properties: Roles, activities, employment records
|
||||
|
||||
6. **Temporal Events** (organizational changes)
|
||||
- Ontology: CIDOC-CRM (E10_Transfer_of_Custody, E8_Acquisition) + RiC-O (Event)
|
||||
- Properties: Custody transfers, mergers, relocations, transformations
|
||||
|
||||
### Example: Modeling a Historic Mansion Operating as Museum
|
||||
|
||||
```yaml
|
||||
# Entity: Villa Mondriaan (Winterswijk, Netherlands)
|
||||
|
||||
# PLACE ASPECT
|
||||
villa_mondriaan_place:
|
||||
aspect_type: place
|
||||
class_uri: crm:E27_Site
|
||||
secondary_class_uri: schema:LandmarksOrHistoricalBuildings
|
||||
temporal_extent:
|
||||
construction_date: "1880-01-01"
|
||||
current_status: standing
|
||||
properties:
|
||||
address: "Zonnebrink 4, 7101 NP Winterswijk"
|
||||
coordinates: [51.9711, 6.7197]
|
||||
heritage_designation: "Rijksmonument"
|
||||
|
||||
# CUSTODIAN ASPECT
|
||||
stichting_villa_mondriaan:
|
||||
aspect_type: custodian
|
||||
class_uri: cpov:PublicOrganisation # Dutch foundation with public benefit
|
||||
secondary_class_uri: schema:Museum
|
||||
temporal_extent:
|
||||
founding_date: "1994-05-12"
|
||||
current_status: active
|
||||
properties:
|
||||
legal_name: "Stichting Villa Mondriaan"
|
||||
isil_code: "NL-WtVM"
|
||||
manages: [villa_mondriaan_collections]
|
||||
|
||||
# LEGAL FORM ASPECT
|
||||
stichting_legal_entity:
|
||||
aspect_type: legal_form
|
||||
class_uri: org:FormalOrganization
|
||||
mixin_class_uri: tooiont:Overheidsorganisatie # Dutch government org
|
||||
temporal_extent:
|
||||
registration_date: "1994-05-12"
|
||||
current_status: registered
|
||||
properties:
|
||||
kvk_number: "12345678"
|
||||
legal_form: "stichting" # Dutch foundation
|
||||
|
||||
# COLLECTIONS ASPECT
|
||||
villa_mondriaan_collections:
|
||||
aspect_type: collections
|
||||
class_uri: crm:E78_Curated_Holding
|
||||
archival_class_uri: rico:RecordSet
|
||||
temporal_extent:
|
||||
accession_start: "1994-01-01"
|
||||
current_status: growing
|
||||
properties:
|
||||
provenance: "Mondriaan family"
|
||||
extent: "500 objects, 200 archival documents"
|
||||
|
||||
# PEOPLE ASPECT
|
||||
curator_maria_van_der_berg:
|
||||
aspect_type: person
|
||||
class_uri: picom:PersonObservation
|
||||
secondary_class_uri: crm:E21_Person
|
||||
temporal_extent:
|
||||
employment_start: "2020-01-01"
|
||||
current_status: employed
|
||||
properties:
|
||||
role: picot_roles:curator
|
||||
works_for: stichting_villa_mondriaan
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rule 4: Temporal Independence Documentation
|
||||
|
||||
**All aspects have SEPARATE temporal lifecycles. Document this explicitly.**
|
||||
|
||||
### Required Temporal Properties
|
||||
|
||||
Every aspect MUST include:
|
||||
|
||||
```yaml
|
||||
temporal_extent:
|
||||
start_date: "YYYY-MM-DD" # When this aspect began
|
||||
end_date: "YYYY-MM-DD" or null # When aspect ended (null = ongoing)
|
||||
certainty: "certain" | "approximate" | "inferred"
|
||||
source: "archival_record" | "legal_registration" | "oral_history" | etc.
|
||||
```
|
||||
|
||||
### Example: Temporal Independence in Custody Transfer
|
||||
|
||||
```yaml
|
||||
# Heineken corporate archive custody transfer (2005)
|
||||
|
||||
# BEFORE TRANSFER (1864-2005)
|
||||
heineken_corporate_archive:
|
||||
custodian_aspect:
|
||||
custodian_id: heineken_nv
|
||||
class_uri: schema:Corporation
|
||||
temporal_extent:
|
||||
start_date: "1864-01-01" # Heineken founded
|
||||
end_date: "2005-06-15" # Custody transferred
|
||||
|
||||
collections_aspect:
|
||||
class_uri: rico:RecordSet
|
||||
provenance: "Heineken N.V."
|
||||
temporal_extent:
|
||||
start_date: "1864-01-01"
|
||||
end_date: null # Collection still exists (just moved)
|
||||
|
||||
# AFTER TRANSFER (2005-present)
|
||||
heineken_archive_at_stadsarchief:
|
||||
custodian_aspect:
|
||||
custodian_id: stadsarchief_amsterdam
|
||||
class_uri: cpov:PublicOrganisation
|
||||
temporal_extent:
|
||||
start_date: "2005-06-15" # Custody received
|
||||
end_date: null # Ongoing
|
||||
|
||||
collections_aspect:
|
||||
class_uri: rico:RecordSet
|
||||
provenance: "Heineken N.V." # ← Provenance unchanged!
|
||||
temporal_extent:
|
||||
start_date: "1864-01-01" # ← Collection dates unchanged!
|
||||
end_date: null
|
||||
|
||||
# CUSTODY TRANSFER EVENT
|
||||
custody_transfer_event:
|
||||
event_type: crm:E10_Transfer_of_Custody
|
||||
class_uri: rico:Event
|
||||
temporal_extent:
|
||||
event_date: "2005-06-15"
|
||||
properties:
|
||||
surrendered_by: heineken_nv
|
||||
received_by: stadsarchief_amsterdam
|
||||
transferred_object: heineken_corporate_archive
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rule 5: Ontology Properties Must Be Researched
|
||||
|
||||
**Never invent custom properties when ontology equivalents exist.**
|
||||
|
||||
### Property Research Workflow
|
||||
|
||||
1. **Identify the relationship** you need to express
|
||||
2. **Search base ontologies** for existing properties
|
||||
3. **Use ontology property** with proper namespace
|
||||
4. **Document property source** in comments
|
||||
|
||||
**Example**:
|
||||
|
||||
```yaml
|
||||
# ❌ WRONG - Custom property invented
|
||||
institution:
|
||||
official_name: "Rijksarchief in Noord-Holland"
|
||||
|
||||
# ✅ CORRECT - CPOV ontology property used
|
||||
institution:
|
||||
skos:prefLabel: "Rijksarchief in Noord-Holland"@nl
|
||||
# Source: CPOV uses SKOS for preferred labels
|
||||
```
|
||||
|
||||
### Common Property Mappings
|
||||
|
||||
| Need | Ontology Property | Namespace |
|
||||
|------|-------------------|-----------|
|
||||
| Preferred name | `skos:prefLabel` | SKOS (used by CPOV) |
|
||||
| Alternative names | `skos:altLabel` | SKOS |
|
||||
| Identifiers | `dct:identifier` | Dublin Core Terms |
|
||||
| Address | `locn:address` | W3C Location Core |
|
||||
| Coordinates | `schema:geo` | Schema.org |
|
||||
| Founding date | `schema:foundingDate` OR `tooiont:begindatum` | Schema.org / TOOI |
|
||||
| Organizational unit | `cpov:hasUnit` OR `org:hasUnit` | CPOV / W3C Org |
|
||||
| Curated collection | `crm:P147_curated` | CIDOC-CRM |
|
||||
| Archival holdings | `rico:isOrWasHolderOf` | RiC-O |
|
||||
| Person role | `picom:hasRole` | PiCo |
|
||||
| Provenance | `rico:hasProvenance` OR `prov:hadPrimarySource` | RiC-O / PROV-O |
|
||||
|
||||
---
|
||||
|
||||
## Rule 6: Decision Trees for Ontology Selection
|
||||
|
||||
**Use structured decision trees to select appropriate ontologies.**
|
||||
|
||||
### Decision Tree: Primary Ontology Class
|
||||
|
||||
```
|
||||
START: Heritage entity identified
|
||||
↓
|
||||
Is it a physical place/site?
|
||||
├─ YES → PRIMARY: crm:E27_Site + schema:Place
|
||||
│ Continue to check if also a custodian organization ↓
|
||||
│
|
||||
└─ NO → Is it an organization?
|
||||
├─ YES → Is it public sector?
|
||||
│ ├─ YES → cpov:PublicOrganisation
|
||||
│ │ Is it Dutch government?
|
||||
│ │ ├─ YES → ADD MIXIN: tooiont:Overheidsorganisatie
|
||||
│ │ └─ NO → CPOV only
|
||||
│ │
|
||||
│ └─ NO → schema:Organization
|
||||
│ What type?
|
||||
│ ├─ Museum → schema:Museum
|
||||
│ ├─ Library → schema:Library
|
||||
│ ├─ Archive → schema:ArchiveOrganization
|
||||
│ ├─ Education → schema:EducationalOrganization
|
||||
│ └─ NGO → schema:NGO
|
||||
│
|
||||
└─ NO → Is it a collection?
|
||||
├─ Archival → rico:RecordSet
|
||||
├─ Museum → crm:E78_Curated_Holding
|
||||
├─ Library → bf:Collection
|
||||
└─ Mixed → Use multiple classes
|
||||
```
|
||||
|
||||
### Decision Tree: Dutch vs. EU vs. Global
|
||||
|
||||
```
|
||||
START: Determine geographic/legal scope
|
||||
↓
|
||||
Country == "Netherlands"?
|
||||
├─ YES → Legal status == "public"?
|
||||
│ ├─ YES → USE: tooiont:Overheidsorganisatie (Dutch government)
|
||||
│ │ ALSO ADD: cpov:PublicOrganisation (EU compliance)
|
||||
│ │
|
||||
│ └─ NO → USE: schema:Organization (private)
|
||||
│ ADD: DutchLegalEntityMixin (KvK numbers)
|
||||
│
|
||||
└─ NO → In Europe?
|
||||
├─ YES → Legal status == "public"?
|
||||
│ ├─ YES → USE: cpov:PublicOrganisation
|
||||
│ └─ NO → USE: schema:Organization
|
||||
│
|
||||
└─ NO → USE: schema:Organization (global)
|
||||
ADD domain-specific class:
|
||||
- schema:Museum
|
||||
- schema:ArchiveOrganization
|
||||
- schema:Library
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rule 7: Documentation Requirements
|
||||
|
||||
**All ontology mappings MUST be documented with rationale.**
|
||||
|
||||
### Required Documentation Fields
|
||||
|
||||
```yaml
|
||||
ontology_mapping:
|
||||
wikidata_source: Q1802963 # Wikidata entity being mapped
|
||||
wikidata_label: mansion
|
||||
|
||||
primary_class:
|
||||
uri: crm:E27_Site
|
||||
namespace: http://www.cidoc-crm.org/cidoc-crm/
|
||||
rationale: >-
|
||||
CIDOC-CRM E27_Site for physical heritage buildings with
|
||||
archaeological/architectural significance.
|
||||
ontology_file: data/ontology/CIDOC_CRM_v7.1.3.rdf
|
||||
ontology_section: "Lines 1234-1267" # Optional
|
||||
|
||||
secondary_class:
|
||||
uri: schema:LandmarksOrHistoricalBuildings
|
||||
namespace: http://schema.org/
|
||||
rationale: Web discoverability for historic landmarks
|
||||
ontology_file: data/ontology/schemaorg.owl
|
||||
|
||||
properties:
|
||||
- uri: crm:P1_is_identified_by
|
||||
range: crm:E41_Appellation
|
||||
usage: Building name identification
|
||||
example: "Buitenplaats Beeckestijn"
|
||||
|
||||
- uri: schema:geo
|
||||
range: schema:GeoCoordinates
|
||||
usage: Geographic coordinates
|
||||
example: "{latitude: 51.9711, longitude: 6.7197}"
|
||||
|
||||
temporal_model:
|
||||
aspects:
|
||||
- place # Physical site
|
||||
- custodian # If operates as heritage institution
|
||||
- collections # If holds curated materials
|
||||
|
||||
temporal_independence_note: >-
|
||||
Place existence (construction → present) is independent from
|
||||
custodian organization lifecycle (founding → present).
|
||||
|
||||
complexity_score: 9 # 1-10 scale
|
||||
reviewed_by: human_expert
|
||||
review_date: "2025-11-20"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rule 8: Prohibited Practices
|
||||
|
||||
**The following practices are STRICTLY FORBIDDEN:**
|
||||
|
||||
### ❌ Prohibited
|
||||
|
||||
1. **Using Wikidata Q-numbers as class URIs**
|
||||
```yaml
|
||||
# FORBIDDEN
|
||||
class_uri: wd:Q33506 # This is an entity, not a class!
|
||||
```
|
||||
|
||||
2. **Creating custom properties without ontology research**
|
||||
```yaml
|
||||
# FORBIDDEN
|
||||
slots:
|
||||
institution_official_name: # Use skos:prefLabel instead!
|
||||
```
|
||||
|
||||
3. **Single-ontology mappings for complex entities**
|
||||
```yaml
|
||||
# FORBIDDEN - Mansion is BOTH place AND potential custodian
|
||||
Mansion:
|
||||
class_uri: schema:Place # ← Missing custodian aspect!
|
||||
```
|
||||
|
||||
4. **Ignoring temporal dimensions**
|
||||
```yaml
|
||||
# FORBIDDEN - No temporal tracking
|
||||
custodian:
|
||||
name: "Heineken Archive"
|
||||
location: "Amsterdam"
|
||||
# ← Where are the dates? Which period does this describe?
|
||||
```
|
||||
|
||||
5. **Binary public/private classifications**
|
||||
```yaml
|
||||
# FORBIDDEN - Too simplistic
|
||||
PublicHeritageCustodian: # What about NGOs? Foundations? Mixed?
|
||||
PrivateHeritageCustodian: # What about government corporations?
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rule 9: Quality Assurance Checklist
|
||||
|
||||
**Before submitting any ontology design, verify:**
|
||||
|
||||
- [ ] All base ontologies consulted (`/data/ontology/` files read)
|
||||
- [ ] Wikidata entities mapped to formal ontology classes (not used directly)
|
||||
- [ ] Multi-aspect modeling applied (place, custodian, legal, collections, people)
|
||||
- [ ] Temporal independence documented for each aspect
|
||||
- [ ] Properties sourced from ontologies (not custom inventions)
|
||||
- [ ] Decision trees applied for ontology selection
|
||||
- [ ] Rationale documented for all class/property choices
|
||||
- [ ] Examples provided with real-world entities
|
||||
- [ ] Complexity score assigned (1-10 scale)
|
||||
- [ ] Human review requested for complexity ≥ 7
|
||||
|
||||
---
|
||||
|
||||
## Rule 10: Agent Collaboration Protocol
|
||||
|
||||
**When working with other agents or humans:**
|
||||
|
||||
1. **Always cite ontology files** in design discussions
|
||||
- "According to CIDOC-CRM (lines 1234-1267 in CIDOC_CRM_v7.1.3.rdf)..."
|
||||
|
||||
2. **Share ontology search commands** for reproducibility
|
||||
```bash
|
||||
rg "E27_Site" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf
|
||||
```
|
||||
|
||||
3. **Document disagreements** with explicit rationale
|
||||
- "Agent A suggests schema:Museum, but I recommend cpov:PublicOrganisation
|
||||
because institution is government-operated (see TOOI classification rules)."
|
||||
|
||||
4. **Request human review** for:
|
||||
- Complexity score ≥ 7
|
||||
- Conflicting ontology recommendations
|
||||
- Temporal modeling ambiguities
|
||||
- Novel aspect combinations
|
||||
|
||||
---
|
||||
|
||||
## Example: Complete Ontology Mapping Workflow
|
||||
|
||||
**Scenario**: Map Wikidata Q3437789 (heemkamer - local history room)
|
||||
|
||||
### Step 1: Research Entity
|
||||
```bash
|
||||
# Read Wikidata metadata from hyponyms_curated_full.yaml
|
||||
grep -A 100 "Q3437789" /Users/kempersc/apps/glam/data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated_full.yaml
|
||||
```
|
||||
|
||||
**Findings**:
|
||||
- Dutch concept: "Local history room/museum"
|
||||
- Usually operated by volunteers/heritage societies
|
||||
- Mix of museum, archive, library functions
|
||||
- Often in small municipalities
|
||||
|
||||
### Step 2: Search Base Ontologies
|
||||
```bash
|
||||
# Search CPOV for organizational types
|
||||
rg "classification|OrganisationType" /Users/kempersc/apps/glam/data/ontology/core-public-organisation-ap.ttl
|
||||
|
||||
# Search Schema.org for community organizations
|
||||
rg "NGO|CivicStructure|LocalBusiness" /Users/kempersc/apps/glam/data/ontology/schemaorg.owl
|
||||
|
||||
# Search CIDOC-CRM for community groups
|
||||
rg "E74_Group|E40_Legal_Body" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf
|
||||
```
|
||||
|
||||
### Step 3: Apply Decision Trees
|
||||
|
||||
**Geographic scope**: Netherlands → Check TOOI
|
||||
**Legal status**: Usually private foundation (stichting) or association (vereniging)
|
||||
**Function**: Collects + Preserves + Exhibits local heritage → Multi-functional
|
||||
|
||||
**Decision**:
|
||||
- PRIMARY: `schema:NGO` (non-governmental heritage organization)
|
||||
- SECONDARY: `crm:E74_Group` (community heritage group)
|
||||
- DUTCH MIXIN: `DutchLegalEntityMixin` (KvK registration)
|
||||
|
||||
### Step 4: Model Aspects
|
||||
|
||||
```yaml
|
||||
heemkamer:
|
||||
wikidata_id: Q3437789
|
||||
ontology_mapping:
|
||||
|
||||
# CUSTODIAN ASPECT
|
||||
custodian_class: schema:NGO
|
||||
custodian_secondary: crm:E74_Group
|
||||
rationale: >-
|
||||
Non-governmental community heritage organization.
|
||||
Not public sector (excludes CPOV). Uses Schema.org NGO.
|
||||
|
||||
# PLACE ASPECT (often operates in specific building)
|
||||
place_class: schema:CivicStructure
|
||||
place_secondary: crm:E27_Site
|
||||
|
||||
# LEGAL FORM ASPECT (Dutch foundation/association)
|
||||
legal_class: org:FormalOrganization
|
||||
legal_dutch_mixin: DutchLegalEntityMixin
|
||||
properties:
|
||||
kvk_number: required
|
||||
legal_form: "stichting OR vereniging"
|
||||
|
||||
# COLLECTIONS ASPECT (multi-functional)
|
||||
collections_classes:
|
||||
- rico:RecordSet # Local archival materials
|
||||
- crm:E78_Curated_Holding # Museum objects
|
||||
- bf:Collection # Local history books
|
||||
|
||||
# PEOPLE ASPECT (volunteers)
|
||||
people_class: picom:PersonObservation
|
||||
people_roles:
|
||||
- picot_roles:curator
|
||||
- picot_roles:volunteer_archivist
|
||||
- picot_roles:educator
|
||||
|
||||
temporal_model:
|
||||
aspects:
|
||||
- custodian # Founding → present/closure
|
||||
- place # Building occupancy (may change)
|
||||
- collections # Accessions over time
|
||||
- people # Volunteer participation periods
|
||||
```
|
||||
|
||||
### Step 5: Document and Review
|
||||
|
||||
```yaml
|
||||
ontology_enrichment:
|
||||
complexity_score: 8 # Multi-functional, temporal complexity
|
||||
requires_human_review: true
|
||||
review_notes: >-
|
||||
Heemkamer concept is Dutch-specific with no direct
|
||||
international equivalent. Multi-functional nature
|
||||
(museum + archive + library) requires careful aspect modeling.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary: Key Takeaways for Agents
|
||||
|
||||
1. **Ontology files are your bible** - Read them first, always
|
||||
2. **Wikidata is data, not ontology** - Map Q-numbers to formal classes
|
||||
3. **Everything has multiple aspects** - Place, custodian, legal, collections, people
|
||||
4. **Time is always a factor** - Model temporal independence
|
||||
5. **Properties must be justified** - Use ontology properties, document rationale
|
||||
6. **Complexity is reality** - Don't oversimplify, embrace nuance
|
||||
7. **Document everything** - Future agents/humans need your reasoning
|
||||
8. **Ask for help** - Complex cases require human review
|
||||
|
||||
**When in doubt**: Read the ontology files, consult AGENTS.md, request human guidance.
|
||||
|
||||
---
|
||||
|
||||
**End of Ontology Mapping Rules v1.0**
|
||||
128
AGENTS.md
128
AGENTS.md
|
|
@ -2,6 +2,134 @@
|
|||
|
||||
This document provides instructions for AI agents (particularly OpenCODE and Claude) to assist with extracting heritage institution data from conversation JSON files and other sources.
|
||||
|
||||
---
|
||||
|
||||
## 🎯 PROJECT CORE MISSION
|
||||
|
||||
**PRIMARY OBJECTIVE**: Create a comprehensive, nuanced ontology that accurately represents the complex, temporal, multi-faceted nature of heritage custodian institutions worldwide.
|
||||
|
||||
This is NOT a simple data extraction project. This is an **ontology engineering project** that:
|
||||
- Models heritage entities as multi-aspect temporal entities (place, custodian, legal form, collections, people)
|
||||
- Integrates multiple base ontologies (CPOV, TOOI, CIDOC-CRM, RiC-O, Schema.org, PiCo)
|
||||
- Captures organizational change events over time (custody transfers, mergers, transformations)
|
||||
- Distinguishes between nominal references and formal organizational structures
|
||||
- Links heritage custodians to people, collections, and locations with independent temporal lifecycles
|
||||
|
||||
**If you're looking for simple NER extraction, this is not the right project.**
|
||||
|
||||
---
|
||||
|
||||
## 🚨 CRITICAL RULES FOR ALL AGENTS
|
||||
|
||||
### Rule 1: Ontology Files Are Your Primary Reference
|
||||
|
||||
**BEFORE** designing any schema, class, or property:
|
||||
|
||||
1. **READ the base ontology files** in `/data/ontology/`
|
||||
2. **SEARCH for existing classes and properties** that match your needs
|
||||
3. **DOCUMENT your ontology alignment** with explicit rationale
|
||||
4. **NEVER invent custom properties** when ontology equivalents exist
|
||||
|
||||
**Available Ontologies**:
|
||||
- `data/ontology/core-public-organisation-ap.ttl` - CPOV (EU public sector)
|
||||
- `data/ontology/tooiont.ttl` - TOOI (Dutch government)
|
||||
- `data/ontology/schemaorg.owl` - Schema.org (web semantics, private sector)
|
||||
- `data/ontology/CIDOC_CRM_v7.1.3.rdf` - CIDOC-CRM (cultural heritage domain)
|
||||
- `data/ontology/RiC-O_1-1.rdf` - Records in Contexts (archival description)
|
||||
- `data/ontology/bibframe_vocabulary.rdf` - BIBFRAME (libraries)
|
||||
- `data/ontology/pico.ttl` - PiCo (person observations, staff roles)
|
||||
|
||||
**See** `.opencode/agent/ontology-mapping-rules.md` for complete ontology consultation workflow.
|
||||
|
||||
### Rule 2: Wikidata Entities Are NOT Ontology Classes
|
||||
|
||||
**Files**:
|
||||
- `data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated.yaml`
|
||||
- `data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated_full.yaml`
|
||||
|
||||
**These files contain**:
|
||||
- ✅ Wikidata entity identifiers (Q-numbers) for heritage institution TYPES
|
||||
- ✅ Multilingual labels and descriptions
|
||||
- ✅ Hypernym classifications (upper-level categories)
|
||||
- ✅ Source data for ontology mapping analysis
|
||||
|
||||
**These files DO NOT contain**:
|
||||
- ❌ Formal ontology class definitions
|
||||
- ❌ Direct `class_uri` mappings for LinkML
|
||||
- ❌ Ontology properties or relationships
|
||||
|
||||
**REQUIRED WORKFLOW**:
|
||||
```
|
||||
hyponyms_curated.yaml (Wikidata Q-numbers)
|
||||
↓
|
||||
ANALYZE semantic meaning + hypernyms
|
||||
↓
|
||||
SEARCH base ontologies for matching classes
|
||||
↓
|
||||
MAP Wikidata entity → Ontology class(es)
|
||||
↓
|
||||
DOCUMENT rationale + properties
|
||||
↓
|
||||
CREATE LinkML schema with ontology class_uri
|
||||
```
|
||||
|
||||
**Example - WRONG** ❌:
|
||||
```yaml
|
||||
Mansion:
|
||||
class_uri: wd:Q1802963 # ← This is an ENTITY, not a CLASS!
|
||||
```
|
||||
|
||||
**Example - CORRECT** ✅:
|
||||
```yaml
|
||||
Mansion:
|
||||
# Wikidata source: Q1802963
|
||||
place_aspect:
|
||||
class_uri: crm:E27_Site # CIDOC-CRM ontology class
|
||||
custodian_aspect:
|
||||
class_uri: cpov:PublicOrganisation # If operates as museum
|
||||
```
|
||||
|
||||
### Rule 3: Multi-Aspect Modeling is Mandatory
|
||||
|
||||
**Every heritage entity has MULTIPLE ontological aspects with INDEPENDENT temporal lifecycles.**
|
||||
|
||||
**Required Aspects**:
|
||||
|
||||
1. **Place Aspect** (physical location/site)
|
||||
- Ontology: `crm:E27_Site` + `schema:Place`
|
||||
- Temporal: Construction → Demolition/Present
|
||||
|
||||
2. **Custodian Aspect** (organization managing heritage)
|
||||
- Ontology: `cpov:PublicOrganisation` OR `schema:Organization`
|
||||
- Temporal: Founding → Dissolution/Present
|
||||
|
||||
3. **Legal Form Aspect** (legal entity registration)
|
||||
- Ontology: `org:FormalOrganization` + `tooiont:Overheidsorganisatie` (Dutch)
|
||||
- Temporal: Registration → Deregistration/Present
|
||||
|
||||
4. **Collections Aspect** (heritage materials)
|
||||
- Ontology: `rico:RecordSet` OR `crm:E78_Curated_Holding` OR `bf:Collection`
|
||||
- Temporal: Accession → Deaccession (per item)
|
||||
|
||||
5. **People Aspect** (staff, curators)
|
||||
- Ontology: `picom:PersonObservation` + `crm:E21_Person`
|
||||
- Temporal: Employment start → Employment end (per person)
|
||||
|
||||
6. **Temporal Events** (organizational changes)
|
||||
- Ontology: `crm:E10_Transfer_of_Custody`, `rico:Event`
|
||||
- Tracks custody transfers, mergers, relocations, transformations
|
||||
|
||||
**Example**: A historic mansion operating as a museum has:
|
||||
- **Place aspect**: Building constructed 1880, still standing (143 years)
|
||||
- **Custodian aspect**: Foundation established 1994 to operate museum (30 years)
|
||||
- **Legal form**: Dutch stichting registered 1994, KvK #12345678
|
||||
- **Collections**: Mondrian artworks acquired 1994-2024
|
||||
- **People**: Current curator employed 2020-present
|
||||
|
||||
**Each aspect changes independently over time!**
|
||||
|
||||
---
|
||||
|
||||
## Project Overview
|
||||
|
||||
**Goal**: Extract structured data about worldwide GLAMORCUBESFIXPHDNT (Galleries, Libraries, Archives, Museums, Official institutions, Research centers, Corporations, Unknown, Botanical gardens/zoos, Educational providers, Societies, Features, Intangible heritage groups, miXed, Personal collections, Holy sites, Digital platforms, NGOs, Taste/smell heritage) institutions from 139+ Claude conversation JSON files and integrate with authoritative CSV datasets.
|
||||
|
|
|
|||
213
ONTOLOGY_RULES_SUMMARY.md
Normal file
213
ONTOLOGY_RULES_SUMMARY.md
Normal file
|
|
@ -0,0 +1,213 @@
|
|||
# Ontology Mapping Rules - Quick Reference
|
||||
|
||||
**Created**: 2025-11-20
|
||||
**Purpose**: Summary of critical ontology engineering rules for heritage custodian project
|
||||
|
||||
---
|
||||
|
||||
## Key Changes Made
|
||||
|
||||
### 1. Updated AGENTS.md
|
||||
Added **PROJECT CORE MISSION** section at top emphasizing:
|
||||
- This is an **ontology engineering project**, not simple data extraction
|
||||
- Multi-aspect temporal modeling is required
|
||||
- Multiple base ontologies must be integrated
|
||||
- Wikidata entities are NOT ontology classes
|
||||
|
||||
### 2. Created .opencode/agent/ontology-mapping-rules.md
|
||||
Comprehensive 30-page guide covering:
|
||||
- Ontology consultation workflows
|
||||
- Wikidata entity mapping procedures
|
||||
- Multi-aspect modeling requirements
|
||||
- Temporal independence documentation
|
||||
- Property research workflows
|
||||
- Decision trees for ontology selection
|
||||
- Quality assurance checklists
|
||||
|
||||
---
|
||||
|
||||
## Core Principles
|
||||
|
||||
### Principle 1: Ontology Files Are Source of Truth
|
||||
**ALWAYS** read base ontologies before designing:
|
||||
```bash
|
||||
# Example: Research CIDOC-CRM for heritage sites
|
||||
rg "E27_Site|E53_Place" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf
|
||||
```
|
||||
|
||||
### Principle 2: Wikidata ≠ Ontology
|
||||
**NEVER** use Wikidata Q-numbers as `class_uri`:
|
||||
```yaml
|
||||
❌ WRONG: class_uri: wd:Q1802963
|
||||
✅ RIGHT: class_uri: crm:E27_Site # After mapping Q1802963 to ontology
|
||||
```
|
||||
|
||||
### Principle 3: Multi-Aspect Modeling
|
||||
**EVERY** heritage entity has multiple aspects:
|
||||
- **Place** (construction → present)
|
||||
- **Custodian** (founding → present)
|
||||
- **Legal form** (registration → present)
|
||||
- **Collections** (accession → present)
|
||||
- **People** (employment periods)
|
||||
- **Events** (custody transfers, mergers)
|
||||
|
||||
### Principle 4: Temporal Independence
|
||||
**Each aspect has its OWN timeline:**
|
||||
```yaml
|
||||
# Building exists 1880-present (144 years)
|
||||
place_aspect:
|
||||
temporal_extent:
|
||||
start_date: "1880-01-01"
|
||||
end_date: null
|
||||
|
||||
# Museum organization founded 1994-present (30 years)
|
||||
custodian_aspect:
|
||||
temporal_extent:
|
||||
start_date: "1994-05-12"
|
||||
end_date: null
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Available Ontologies
|
||||
|
||||
| Ontology | File | Use For |
|
||||
|----------|------|---------|
|
||||
| **CPOV** | `core-public-organisation-ap.ttl` | EU public sector heritage |
|
||||
| **TOOI** | `tooiont.ttl` | Dutch government organizations |
|
||||
| **Schema.org** | `schemaorg.owl` | Web semantics, private sector |
|
||||
| **CIDOC-CRM** | `CIDOC_CRM_v7.1.3.rdf` | Cultural heritage domain |
|
||||
| **RiC-O** | `RiC-O_1-1.rdf` | Archival description |
|
||||
| **BIBFRAME** | `bibframe_vocabulary.rdf` | Library collections |
|
||||
| **PiCo** | `pico.ttl` | Person observations, staff roles |
|
||||
|
||||
---
|
||||
|
||||
## Required Workflow
|
||||
|
||||
```
|
||||
1. Read hyponyms_curated.yaml (Wikidata entities)
|
||||
↓
|
||||
2. Analyze hypernym + semantic properties
|
||||
↓
|
||||
3. Search base ontologies for matching classes
|
||||
↓
|
||||
4. Map Wikidata entity → Ontology class(es)
|
||||
↓
|
||||
5. Extract relevant properties from ontologies
|
||||
↓
|
||||
6. Document rationale and temporal model
|
||||
↓
|
||||
7. Create LinkML schema with class_uri
|
||||
↓
|
||||
8. Human review if complexity ≥ 7/10
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example: Mansion (Q1802963)
|
||||
|
||||
### ❌ Wrong Approach
|
||||
```yaml
|
||||
Mansion:
|
||||
class_uri: wd:Q1802963 # Wikidata entity used directly
|
||||
```
|
||||
|
||||
### ✅ Correct Approach
|
||||
```yaml
|
||||
Mansion:
|
||||
wikidata_source: Q1802963
|
||||
|
||||
# PLACE ASPECT
|
||||
place_aspect:
|
||||
class_uri: crm:E27_Site # CIDOC-CRM
|
||||
secondary_class_uri: schema:LandmarksOrHistoricalBuildings
|
||||
temporal_extent:
|
||||
start_date: "1880-01-01" # Construction
|
||||
|
||||
# CUSTODIAN ASPECT (if operates as museum)
|
||||
custodian_aspect:
|
||||
class_uri: cpov:PublicOrganisation # If public
|
||||
alt_class_uri: schema:Museum # If private
|
||||
temporal_extent:
|
||||
start_date: "1994-05-12" # Foundation established
|
||||
|
||||
# COLLECTIONS ASPECT
|
||||
collections_aspect:
|
||||
class_uri: crm:E78_Curated_Holding
|
||||
temporal_extent:
|
||||
start_date: "1994-01-01" # Accessions begin
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Decision Tree: Ontology Selection
|
||||
|
||||
```
|
||||
Is it Dutch government?
|
||||
├─ YES → tooiont:Overheidsorganisatie + cpov:PublicOrganisation
|
||||
└─ NO → Is it public sector?
|
||||
├─ YES → cpov:PublicOrganisation
|
||||
└─ NO → schema:Organization
|
||||
├─ Museum → schema:Museum
|
||||
├─ Archive → schema:ArchiveOrganization
|
||||
├─ Library → schema:Library
|
||||
└─ NGO → schema:NGO
|
||||
|
||||
Is it a physical site?
|
||||
├─ YES → crm:E27_Site + schema:Place
|
||||
└─ NO → Continue with organizational classes
|
||||
|
||||
Does it hold collections?
|
||||
├─ Archival → rico:RecordSet
|
||||
├─ Museum → crm:E78_Curated_Holding
|
||||
└─ Library → bf:Collection
|
||||
|
||||
Does it have staff?
|
||||
└─ YES → picom:PersonObservation + crm:E21_Person
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quality Checklist
|
||||
|
||||
Before submitting ontology design:
|
||||
|
||||
- [ ] Base ontologies consulted (`/data/ontology/` files read)
|
||||
- [ ] Wikidata entities mapped (not used directly as classes)
|
||||
- [ ] Multi-aspect modeling applied
|
||||
- [ ] Temporal independence documented
|
||||
- [ ] Properties sourced from ontologies
|
||||
- [ ] Rationale documented
|
||||
- [ ] Examples provided
|
||||
- [ ] Complexity score assigned (1-10)
|
||||
- [ ] Human review requested if complexity ≥ 7
|
||||
|
||||
---
|
||||
|
||||
## Files Updated
|
||||
|
||||
1. **AGENTS.md** - Added PROJECT CORE MISSION section (lines 1-100)
|
||||
2. **.opencode/agent/ontology-mapping-rules.md** - NEW comprehensive guide
|
||||
3. **This file** (ONTOLOGY_RULES_SUMMARY.md) - Quick reference
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Continue manual ontology mapping for hyponyms_curated.yaml entries
|
||||
2. Document each mapping with full rationale
|
||||
3. Build aspect-based LinkML schema modules
|
||||
4. Create temporal modeling examples for common patterns
|
||||
|
||||
---
|
||||
|
||||
## Key Resources
|
||||
|
||||
- **Full Rules**: `.opencode/agent/ontology-mapping-rules.md`
|
||||
- **Agent Instructions**: `AGENTS.md`
|
||||
- **Ontology Files**: `data/ontology/`
|
||||
- **Wikidata Sources**: `data/wikidata/GLAMORCUBEPSXHFN/`
|
||||
|
||||
**Remember**: This is ontology engineering, not data extraction. Precision matters more than speed.
|
||||
|
||||
Loading…
Reference in a new issue