glam/ONTOLOGY_RULES_SUMMARY.md
kempersc fa5680f0dd Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats
- Introduced custodian_hub_v3.mmd, custodian_hub_v4_final.mmd, and custodian_hub_v5_FINAL.mmd for Mermaid representation.
- Created custodian_hub_FINAL.puml and custodian_hub_v3.puml for PlantUML representation.
- Defined entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument.
- Established relationships and associations between entities, including temporal extents, observations, and reconstruction activities.
- Incorporated enumerations for various types, statuses, and classifications relevant to custodians and their activities.
2025-11-22 14:33:51 +01:00

5.8 KiB

Ontology Mapping Rules - Quick Reference

Created: 2025-11-20
Purpose: Summary of critical ontology engineering rules for heritage custodian project


Key Changes Made

1. Updated AGENTS.md

Added PROJECT CORE MISSION section at top emphasizing:

  • This is an ontology engineering project, not simple data extraction
  • Multi-aspect temporal modeling is required
  • Multiple base ontologies must be integrated
  • Wikidata entities are NOT ontology classes

2. Created .opencode/agent/ontology-mapping-rules.md

Comprehensive 30-page guide covering:

  • Ontology consultation workflows
  • Wikidata entity mapping procedures
  • Multi-aspect modeling requirements
  • Temporal independence documentation
  • Property research workflows
  • Decision trees for ontology selection
  • Quality assurance checklists

Core Principles

Principle 1: Ontology Files Are Source of Truth

ALWAYS read base ontologies before designing:

# Example: Research CIDOC-CRM for heritage sites
rg "E27_Site|E53_Place" /Users/kempersc/apps/glam/data/ontology/CIDOC_CRM_v7.1.3.rdf

Principle 2: Wikidata ≠ Ontology

NEVER use Wikidata Q-numbers as class_uri:

❌ WRONG: class_uri: wd:Q1802963
✅ RIGHT: class_uri: crm:E27_Site  # After mapping Q1802963 to ontology

Principle 3: Multi-Aspect Modeling

EVERY heritage entity has multiple aspects:

  • Place (construction → present)
  • Custodian (founding → present)
  • Legal form (registration → present)
  • Collections (accession → present)
  • People (employment periods)
  • Events (custody transfers, mergers)

Principle 4: Temporal Independence

Each aspect has its OWN timeline:

# Building exists 1880-present (144 years)
place_aspect:
  temporal_extent:
    start_date: "1880-01-01"
    end_date: null

# Museum organization founded 1994-present (30 years)
custodian_aspect:
  temporal_extent:
    start_date: "1994-05-12"
    end_date: null

Available Ontologies

Ontology File Use For
CPOV core-public-organisation-ap.ttl EU public sector heritage
TOOI tooiont.ttl Dutch government organizations
Schema.org schemaorg.owl Web semantics, private sector
CIDOC-CRM CIDOC_CRM_v7.1.3.rdf Cultural heritage domain
RiC-O RiC-O_1-1.rdf Archival description
BIBFRAME bibframe_vocabulary.rdf Library collections
PiCo pico.ttl Person observations, staff roles

Required Workflow

1. Read hyponyms_curated.yaml (Wikidata entities)
       ↓
2. Analyze hypernym + semantic properties
       ↓
3. Search base ontologies for matching classes
       ↓
4. Map Wikidata entity → Ontology class(es)
       ↓
5. Extract relevant properties from ontologies
       ↓
6. Document rationale and temporal model
       ↓
7. Create LinkML schema with class_uri
       ↓
8. Human review if complexity ≥ 7/10

Example: Mansion (Q1802963)

Wrong Approach

Mansion:
  class_uri: wd:Q1802963  # Wikidata entity used directly

Correct Approach

Mansion:
  wikidata_source: Q1802963
  
  # PLACE ASPECT
  place_aspect:
    class_uri: crm:E27_Site  # CIDOC-CRM
    secondary_class_uri: schema:LandmarksOrHistoricalBuildings
    temporal_extent:
      start_date: "1880-01-01"  # Construction
  
  # CUSTODIAN ASPECT (if operates as museum)
  custodian_aspect:
    class_uri: cpov:PublicOrganisation  # If public
    alt_class_uri: schema:Museum  # If private
    temporal_extent:
      start_date: "1994-05-12"  # Foundation established
  
  # COLLECTIONS ASPECT
  collections_aspect:
    class_uri: crm:E78_Curated_Holding
    temporal_extent:
      start_date: "1994-01-01"  # Accessions begin

Decision Tree: Ontology Selection

Is it Dutch government?
  ├─ YES → tooi:Overheidsorganisatie + cpov:PublicOrganisation
  └─ NO → Is it public sector?
           ├─ YES → cpov:PublicOrganisation
           └─ NO → schema:Organization
                    ├─ Museum → schema:Museum
                    ├─ Archive → schema:ArchiveOrganization
                    ├─ Library → schema:Library
                    └─ NGO → schema:NGO

Is it a physical site?
  ├─ YES → crm:E27_Site + schema:Place
  └─ NO → Continue with organizational classes

Does it hold collections?
  ├─ Archival → rico:RecordSet
  ├─ Museum → crm:E78_Curated_Holding
  └─ Library → bf:Collection

Does it have staff?
  └─ YES → pico:PersonObservation + crm:E21_Person

Quality Checklist

Before submitting ontology design:

  • Base ontologies consulted (/data/ontology/ files read)
  • Wikidata entities mapped (not used directly as classes)
  • Multi-aspect modeling applied
  • Temporal independence documented
  • Properties sourced from ontologies
  • Rationale documented
  • Examples provided
  • Complexity score assigned (1-10)
  • Human review requested if complexity ≥ 7

Files Updated

  1. AGENTS.md - Added PROJECT CORE MISSION section (lines 1-100)
  2. .opencode/agent/ontology-mapping-rules.md - NEW comprehensive guide
  3. This file (ONTOLOGY_RULES_SUMMARY.md) - Quick reference

Next Steps

  1. Continue manual ontology mapping for hyponyms_curated.yaml entries
  2. Document each mapping with full rationale
  3. Build aspect-based LinkML schema modules
  4. Create temporal modeling examples for common patterns

Key Resources

  • Full Rules: .opencode/agent/ontology-mapping-rules.md
  • Agent Instructions: AGENTS.md
  • Ontology Files: data/ontology/
  • Wikidata Sources: data/wikidata/GLAMORCUBEPSXHFN/

Remember: This is ontology engineering, not data extraction. Precision matters more than speed.