# Ontology Enrichment Plan for hyponyms_curated.yaml **Date**: 2025-11-21 (Updated) **Total Entries**: 2,453 Wikidata entities **Status**: In Progress (5/2,453 complete = 0.20%) --- ## 🎯 Latest Session: DBpedia Integration Complete **Session Date**: 2025-11-21 **Focus**: DBpedia ontology caching + Q119459808 enrichment **Status**: ✅ COMPLETE ### Major Achievements 1. **DBpedia Ontology Files Cached** (276 KB total) - `data/ontology/dbpedia_wikidata_mappings.ttl` (804 lines) - `data/ontology/dbpedia_classes_sample.ttl` (2,514 lines) - `data/ontology/dbpedia_heritage_classes.ttl` (219 lines) - `data/ontology/dbpedia_glam_mappings_index.md` (usage guide) 2. **Q119459808 (scientific facility) Enriched** - Heritage-first framing note added - DBpedia mapping: `dbo:ResearchProject` (medium confidence) - Related classes documented - Coverage gap identified: No direct DBpedia class for research infrastructure 3. **4-Step DBpedia Workflow Established** - Step 1: Check direct Wikidata mappings (high confidence) - Step 2: Semantic keyword search (medium confidence) - Step 3: Review heritage classes (validation) - Step 4: Document confidence + gaps **See**: `SESSION_SUMMARY_20251121_DBPEDIA_INTEGRATION_COMPLETE.md` for full details. --- ## Completed Entries ### 1. Q1802963 - mansion (RETROFITTED with DBpedia) - **Hypernym**: building - **Type**: F (Features - physical landmarks) - **Ontology Mapping**: ✅ Complete + DBpedia - Place aspect: `crm:E27_Site`, `schema:LandmarksOrHistoricalBuildings` - Custodian aspect: `cpov:PublicOrganisation` (public) OR `schema:Museum` (private) - DBpedia: `dbo:Building`, `dbo:HistoricBuilding`, `dbo:HistoricPlace` - Complexity: 9/10 - Properties: 8 properties mapped ### 2. Q3694 - vacation property (FIXED heritage-first framing + DBpedia) - **Hypernym**: accommodation - **Type**: F (Features) - **Ontology Mapping**: ✅ Complete + DBpedia (heritage-first fix) - Place aspect: `crm:E27_Site` (heritage site focus) - ~~`schema:Accommodation`~~ → Changed to heritage-focused classes - DBpedia: `dbo:HistoricPlace` - Heritage framing note added - Complexity: 8/10 ### 3. Q2927789 - buitenplaats (Dutch country estate) (RETROFITTED with DBpedia) - **Hypernym**: building - **Type**: F (Features) - **Country**: Netherlands - **Ontology Mapping**: ✅ Complete + DBpedia - Place aspect: `crm:E27_Site`, `schema:LandmarksOrHistoricalBuildings` - DBpedia: `dbo:HistoricBuilding` - Dutch heritage context: Rijksmonument status, 17th-19th century estates - Complexity: 7/10 ### 4. Q2772772 - military museum - **Hypernym**: museum - **Type**: M (Museum) - **Ontology Mapping**: ✅ Complete + DBpedia - Custodian aspect: `cpov:PublicOrganisation`, `schema:Museum` - Collections: `crm:E78_Curated_Holding` (military artifacts), `rico:RecordSet` (archival records) - DBpedia: `dbo:Museum` (high confidence, direct Wikidata equivalent) - Complexity: 4/10 (straightforward museum pattern) ### 5. Q119459808 - scientific facility ✨ NEW - **Hypernym**: organisation - **Type**: R (Research) + E (Education) - **Ontology Mapping**: ✅ Complete + DBpedia + Heritage-First - Custodian aspect: `schema:ResearchOrganization`, `cpov:PublicOrganisation` (if public) - Place aspect: `crm:E27_Site` (conditional on permanent facilities) - Collections: `schema:Dataset` (research data), `crm:E78_Curated_Holding` (specimens) - DBpedia: `dbo:ResearchProject` (medium confidence, semantic approximation) - Heritage framing note: Emphasizes scientific facilities as **heritage custodians** (specimen archives, research data), not generic R&D - Coverage gap documented: DBpedia lacks "scientific facility" class - Complexity: 7/10 (multi-functional research infrastructure) --- ## Batch Processing Strategy Given 2,452 entries, we'll process them in batches by hypernym category: ### Priority 1: Core Heritage Custodian Types (1,465 entries) These are the most critical for the heritage custodian ontology: | Hypernym | Count | Ontology Pattern | Status | |----------|-------|------------------|--------| | museum | 133 | `cpov:PublicOrganisation` + `schema:Museum` + `crm:E39_Actor` | TODO | | archive | 117 | `cpov:PublicOrganisation` + `rico:CorporateBody` + `rico:RecordSet` | TODO | | library | 29 | `cpov:PublicOrganisation` + `schema:Library` + `bf:Collection` | TODO | | art institution | 77 | `cpov:PublicOrganisation` + `schema:ArtGallery` + `crm:E78_Curated_Holding` | TODO | | cultural institution | 22 | `cpov:PublicOrganisation` + `schema:Organization` | TODO | | heritage site | 151 | `crm:E27_Site` + `schema:LandmarksOrHistoricalBuildings` | TODO | | organisation | 193 | `cpov:PublicOrganisation` OR `schema:Organization` (requires classification) | TODO | | company | 189 | `schema:Corporation` + `crm:E40_Legal_Body` | TODO | | university | 66 | `schema:EducationalOrganization` + `schema:CollegeOrUniversity` | TODO | | higher education institution | 42 | `schema:EducationalOrganization` | TODO | | school | 39 | `schema:EducationalOrganization` | TODO | | research center | (in organisation) | `schema:ResearchOrganization` + `cpov:PublicOrganisation` | TODO | **Subtotal**: ~1,058 entries (43% of total) ### Priority 2: Physical Sites and Places (1,183 entries) Environmental and landscape heritage: | Hypernym | Count | Ontology Pattern | Status | |----------|-------|------------------|--------| | protected area | 875 | `schema:Place` + `crm:E27_Site` | TODO | | national park | 74 | `schema:Park` + environmental heritage mixins | TODO | | natural monument | 70 | `schema:LandmarksOrHistoricalBuildings` | TODO | | building | 35 | `crm:E27_Site` + `schema:Place` | ✅ 1/35 | | park | 21 | `schema:Park` | TODO | | zoo | 17 | `schema:Zoo` + `crm:E39_Actor` | TODO | **Subtotal**: ~1,092 entries (45% of total) ### Priority 3: Specialized Categories (302 entries) Collections, groups, and specialized types: | Hypernym | Count | Ontology Pattern | Status | |----------|-------|------------------|--------| | group | 28 | `crm:E74_Group` + `schema:Organization` | TODO | | collection | 16 | `rico:RecordSet` OR `crm:E78_Curated_Holding` OR `bf:Collection` | TODO | | data repository | 19 | `schema:DataCatalog` + digital platform mixins | TODO | | historical society | (in organisation) | `schema:NGO` + `crm:E74_Group` | TODO | **Subtotal**: ~63 entries (3% of total) ### Priority 4: Settlement and Administrative Units (139 entries) Geographic and political entities (low priority for heritage custodian ontology): | Hypernym | Count | Ontology Pattern | Status | |----------|-------|------------------|--------| | settlement | varies | `schema:Place` | TODO | | province | varies | `schema:AdministrativeArea` | TODO | | polity | varies | `schema:GovernmentOrganization` | TODO | **Subtotal**: ~139 entries (6% of total) --- ## Enrichment Workflow For each entry, add the following YAML structure: ```yaml - label: Q1234567 hypernym: - museum type: - M ontology_mapping: wikidata_source: Q1234567 enrichment_date: '2025-11-20T...' enriched_by: manual_ontology_mapper complexity_score: 7 # 1-10 scale complexity_note: "Explanation of why this entity is complex to model" semantic_aspects: - custodian_reference - place_reference - collections_reference custodian_ontology: primary_class: cpov:PublicOrganisation namespace: http://data.europa.eu/m8g/ secondary_class: schema:Museum rdfs_comment: "Description of when to use this class" properties: - dct:identifier (ISIL code, Wikidata) - cpov:hasUnit (organizational structure) place_ontology: # If applicable primary_class: crm:E27_Site properties: - schema:geo (coordinates) collections_ontology: # If applicable primary_class: crm:E78_Curated_Holding properties: - crm:P147i_was_curated_by (custodian) temporal_model: custodian_aspect: "Founding → Present/Closure" collections_aspect: "Accession dates (per object)" ``` --- ## Next Steps ### Automated Batch Processing Create script to process entries in batches: 1. **Batch 1: Museums (133 entries)** - Pattern: `cpov:PublicOrganisation` + `schema:Museum` + `crm:E39_Actor` - Collections: `crm:E78_Curated_Holding` - People: `pico:PersonObservation` 2. **Batch 2: Archives (117 entries)** - Pattern: `cpov:PublicOrganisation` + `rico:CorporateBody` - Collections: `rico:RecordSet` 3. **Batch 3: Libraries (29 entries)** - Pattern: `cpov:PublicOrganisation` + `schema:Library` - Collections: `bf:Collection` 4. **Batch 4: Buildings (35 entries)** - Pattern: `crm:E27_Site` + `schema:Place` - Dual aspect: place + potential custodian ### Manual Review Required - Entries with hypernym "organisation" (193 entries) - need public/private classification - Entries with multiple hypernyms - need multi-aspect modeling - Entries with complexity score ≥ 7 - require human review --- ## Progress Tracking - [x] Entry 1/2,452: Q1802963 (mansion) ✅ - [ ] Batch 1: Museums (0/133) - [ ] Batch 2: Archives (0/117) - [ ] Batch 3: Libraries (0/29) - [ ] Batch 4: Buildings (1/35) - [ ] Remaining: (1/2,138) **Total Progress**: 0.04% (1/2,452 entries) --- ## Automation vs. Manual Work ### Can Be Automated (70% of entries) - Single hypernym with clear ontology mapping - Standard patterns (museum, archive, library) - Protected areas and natural monuments ### Requires Manual Review (30% of entries) - Multiple hypernyms (multi-aspect entities) - Generic "organisation" classification - Complex historical societies (heemkamer, etc.) - Ambiguous building types --- ## Estimated Effort - **Automated enrichment**: 2-3 hours processing time - **Manual review**: 20-30 hours for complex entries - **Quality assurance**: 5-10 hours spot-checking **Total**: 27-43 hours of work --- ## Resources - **Ontology files**: `/data/ontology/` - **Full Wikidata metadata**: `hyponyms_curated_full.yaml` - **Enrichment target**: `hyponyms_curated.yaml` - **Rules reference**: `.opencode/agent/ontology-mapping-rules.md` ## DBpedia Ontology Integration Discovered - 2025-11-20 23:56:32 **Major Discovery**: DBpedia Ontology provides pre-existing Wikidata → formal ontology mappings for heritage institutions. ### Key Findings: 1. **DBpedia has GLAM classes**: - dbo:Museum ←→ wd:Q33506 ←→ schema:Museum - dbo:Library ←→ wd:Q7075 ←→ schema:Library - dbo:Archive ←→ wd:Q166118 2. **DBpedia provides heritage-specific properties**: - dbo:collection (museum collections) - dbo:curator (curator name) - dbo:museumType (specialization) - dbo:isil (ISIL codes for libraries) - dbo:numberOfCollectionItems 3. **Integration benefits**: - Pre-mapped Wikidata entities save manual mapping work - Standardized properties avoid custom property invention - OWL reasoning support for ontology inference - Validates existing Schema.org mappings ### Documentation Created: - `docs/DBPEDIA_ONTOLOGY_INTEGRATION.md` (12,500+ words) - DBpedia ontology overview - Heritage class mappings (Museum, Library, Archive) - Integration workflow (4 steps) - SPARQL queries for discovery - Implementation recommendations - Example enriched YAML with DBpedia references ### Next Actions: 1. Update `.opencode/agent/ontology-mapping-rules.md` with DBpedia step 2. Create DBpedia → Wikidata mapping cache script 3. Retrofit existing mappings (Q1802963, Q3694, Q2927789) with DBpedia 4. Continue Q119459808 enrichment with DBpedia integration --- ## Heritage-First Framing Principle Added - 2025-11-20 23:55 **Critical Policy Update**: Added Heritage-First Framing Principle to ontology mapping rules. ### Problem Identified Initial Q3694 (vacation property) mapping used generic real estate classes: - ❌ PRIMARY: `schema:Accommodation` (too generic) - ❌ RATIONALE: "Most vacation properties are commercial rentals" This violated project mission: we model **heritage custodians**, not generic real estate. ### Solution: Heritage-First Framing Principle **New Rule**: All entities in GLAMORCUBESFIXPHDNT taxonomy are evaluated through **heritage significance lens**. **Key Points**: 1. ✅ **ALWAYS assume heritage significance** - entities in our taxonomy have heritage value 2. ✅ **ALWAYS use heritage-focused classes** - `crm:E27_Site`, not `schema:Accommodation` 3. ✅ **ALWAYS model place aspect for sites** - physical entities are heritage sites 4. ❌ **NEVER use generic classes** - `schema:Residence`, `schema:Accommodation` too generic 5. ❌ **NEVER require "proof"** - if in Wikidata extraction, has heritage potential ### Documentation Updated **File**: `.opencode/agent/ontology-mapping-rules.md` Added section: "Heritage-First Framing Principle" (60 lines) - Heritage Significance Default - Examples (vacation properties, mansions, buitenplaatsen) - Ontology Selection Decision Tree for Physical Sites - Rationale (5 key points) ### Entries Retrofitted **Q3694 (vacation property)** - Fixed heritage framing: - ✅ BEFORE: `schema:Accommodation` (generic) - ✅ AFTER: `crm:E27_Site` (heritage site) - ✅ Added: `heritage_framing_note` explaining Heritage-First Principle - ✅ Updated: `ontology_rationale` with heritage-focused reasoning - ✅ Added: DBpedia mapping (`dbo:HistoricPlace`) **Q1802963 (mansion)** - Added DBpedia: - ✅ Added: `dbpedia_mapping` section - ✅ Classes: `dbo:Building`, `dbo:HistoricBuilding`, `dbo:HistoricPlace` **Q2927789 (buitenplaats)** - Added DBpedia: - ✅ Added: `dbpedia_mapping` section - ✅ Classes: `dbo:HistoricBuilding` (Dutch heritage estates) ### Impact **All future ontology mappings** will: 1. Default to heritage-focused classes (`crm:E27_Site`, not `schema:Place`) 2. Use CIDOC-CRM as PRIMARY for cultural heritage sites 3. Reject generic real estate classes 4. Reference Heritage-First Framing Principle in rationale ---