14 KiB
Ontology Enrichment Plan for hyponyms_curated.yaml
Date: 2025-11-21 (Updated)
Total Entries: 2,453 Wikidata entities
Status: In Progress (5/2,453 complete = 0.20%)
🎯 Latest Session: DBpedia Integration Complete
Session Date: 2025-11-21
Focus: DBpedia ontology caching + Q119459808 enrichment
Status: ✅ COMPLETE
Major Achievements
-
DBpedia Ontology Files Cached (276 KB total)
data/ontology/dbpedia_wikidata_mappings.ttl(804 lines)data/ontology/dbpedia_classes_sample.ttl(2,514 lines)data/ontology/dbpedia_heritage_classes.ttl(219 lines)data/ontology/dbpedia_glam_mappings_index.md(usage guide)
-
Q119459808 (scientific facility) Enriched
- Heritage-first framing note added
- DBpedia mapping:
dbo:ResearchProject(medium confidence) - Related classes documented
- Coverage gap identified: No direct DBpedia class for research infrastructure
-
4-Step DBpedia Workflow Established
- Step 1: Check direct Wikidata mappings (high confidence)
- Step 2: Semantic keyword search (medium confidence)
- Step 3: Review heritage classes (validation)
- Step 4: Document confidence + gaps
See: SESSION_SUMMARY_20251121_DBPEDIA_INTEGRATION_COMPLETE.md for full details.
Completed Entries
1. Q1802963 - mansion (RETROFITTED with DBpedia)
- Hypernym: building
- Type: F (Features - physical landmarks)
- Ontology Mapping: ✅ Complete + DBpedia
- Place aspect:
crm:E27_Site,schema:LandmarksOrHistoricalBuildings - Custodian aspect:
cpov:PublicOrganisation(public) ORschema:Museum(private) - DBpedia:
dbo:Building,dbo:HistoricBuilding,dbo:HistoricPlace - Complexity: 9/10
- Properties: 8 properties mapped
- Place aspect:
2. Q3694 - vacation property (FIXED heritage-first framing + DBpedia)
- Hypernym: accommodation
- Type: F (Features)
- Ontology Mapping: ✅ Complete + DBpedia (heritage-first fix)
- Place aspect:
crm:E27_Site(heritage site focus) → Changed to heritage-focused classesschema:Accommodation- DBpedia:
dbo:HistoricPlace - Heritage framing note added
- Complexity: 8/10
- Place aspect:
3. Q2927789 - buitenplaats (Dutch country estate) (RETROFITTED with DBpedia)
- Hypernym: building
- Type: F (Features)
- Country: Netherlands
- Ontology Mapping: ✅ Complete + DBpedia
- Place aspect:
crm:E27_Site,schema:LandmarksOrHistoricalBuildings - DBpedia:
dbo:HistoricBuilding - Dutch heritage context: Rijksmonument status, 17th-19th century estates
- Complexity: 7/10
- Place aspect:
4. Q2772772 - military museum
- Hypernym: museum
- Type: M (Museum)
- Ontology Mapping: ✅ Complete + DBpedia
- Custodian aspect:
cpov:PublicOrganisation,schema:Museum - Collections:
crm:E78_Curated_Holding(military artifacts),rico:RecordSet(archival records) - DBpedia:
dbo:Museum(high confidence, direct Wikidata equivalent) - Complexity: 4/10 (straightforward museum pattern)
- Custodian aspect:
5. Q119459808 - scientific facility ✨ NEW
- Hypernym: organisation
- Type: R (Research) + E (Education)
- Ontology Mapping: ✅ Complete + DBpedia + Heritage-First
- Custodian aspect:
schema:ResearchOrganization,cpov:PublicOrganisation(if public) - Place aspect:
crm:E27_Site(conditional on permanent facilities) - Collections:
schema:Dataset(research data),crm:E78_Curated_Holding(specimens) - DBpedia:
dbo:ResearchProject(medium confidence, semantic approximation) - Heritage framing note: Emphasizes scientific facilities as heritage custodians (specimen archives, research data), not generic R&D
- Coverage gap documented: DBpedia lacks "scientific facility" class
- Complexity: 7/10 (multi-functional research infrastructure)
- Custodian aspect:
Batch Processing Strategy
Given 2,452 entries, we'll process them in batches by hypernym category:
Priority 1: Core Heritage Custodian Types (1,465 entries)
These are the most critical for the heritage custodian ontology:
| Hypernym | Count | Ontology Pattern | Status |
|---|---|---|---|
| museum | 133 | cpov:PublicOrganisation + schema:Museum + crm:E39_Actor |
TODO |
| archive | 117 | cpov:PublicOrganisation + rico:CorporateBody + rico:RecordSet |
TODO |
| library | 29 | cpov:PublicOrganisation + schema:Library + bf:Collection |
TODO |
| art institution | 77 | cpov:PublicOrganisation + schema:ArtGallery + crm:E78_Curated_Holding |
TODO |
| cultural institution | 22 | cpov:PublicOrganisation + schema:Organization |
TODO |
| heritage site | 151 | crm:E27_Site + schema:LandmarksOrHistoricalBuildings |
TODO |
| organisation | 193 | cpov:PublicOrganisation OR schema:Organization (requires classification) |
TODO |
| company | 189 | schema:Corporation + crm:E40_Legal_Body |
TODO |
| university | 66 | schema:EducationalOrganization + schema:CollegeOrUniversity |
TODO |
| higher education institution | 42 | schema:EducationalOrganization |
TODO |
| school | 39 | schema:EducationalOrganization |
TODO |
| research center | (in organisation) | schema:ResearchOrganization + cpov:PublicOrganisation |
TODO |
Subtotal: ~1,058 entries (43% of total)
Priority 2: Physical Sites and Places (1,183 entries)
Environmental and landscape heritage:
| Hypernym | Count | Ontology Pattern | Status |
|---|---|---|---|
| protected area | 875 | schema:Place + crm:E27_Site |
TODO |
| national park | 74 | schema:Park + environmental heritage mixins |
TODO |
| natural monument | 70 | schema:LandmarksOrHistoricalBuildings |
TODO |
| building | 35 | crm:E27_Site + schema:Place |
✅ 1/35 |
| park | 21 | schema:Park |
TODO |
| zoo | 17 | schema:Zoo + crm:E39_Actor |
TODO |
Subtotal: ~1,092 entries (45% of total)
Priority 3: Specialized Categories (302 entries)
Collections, groups, and specialized types:
| Hypernym | Count | Ontology Pattern | Status |
|---|---|---|---|
| group | 28 | crm:E74_Group + schema:Organization |
TODO |
| collection | 16 | rico:RecordSet OR crm:E78_Curated_Holding OR bf:Collection |
TODO |
| data repository | 19 | schema:DataCatalog + digital platform mixins |
TODO |
| historical society | (in organisation) | schema:NGO + crm:E74_Group |
TODO |
Subtotal: ~63 entries (3% of total)
Priority 4: Settlement and Administrative Units (139 entries)
Geographic and political entities (low priority for heritage custodian ontology):
| Hypernym | Count | Ontology Pattern | Status |
|---|---|---|---|
| settlement | varies | schema:Place |
TODO |
| province | varies | schema:AdministrativeArea |
TODO |
| polity | varies | schema:GovernmentOrganization |
TODO |
Subtotal: ~139 entries (6% of total)
Enrichment Workflow
For each entry, add the following YAML structure:
- label: Q1234567
hypernym:
- museum
type:
- M
ontology_mapping:
wikidata_source: Q1234567
enrichment_date: '2025-11-20T...'
enriched_by: manual_ontology_mapper
complexity_score: 7 # 1-10 scale
complexity_note: "Explanation of why this entity is complex to model"
semantic_aspects:
- custodian_reference
- place_reference
- collections_reference
custodian_ontology:
primary_class: cpov:PublicOrganisation
namespace: http://data.europa.eu/m8g/
secondary_class: schema:Museum
rdfs_comment: "Description of when to use this class"
properties:
- dct:identifier (ISIL code, Wikidata)
- cpov:hasUnit (organizational structure)
place_ontology: # If applicable
primary_class: crm:E27_Site
properties:
- schema:geo (coordinates)
collections_ontology: # If applicable
primary_class: crm:E78_Curated_Holding
properties:
- crm:P147i_was_curated_by (custodian)
temporal_model:
custodian_aspect: "Founding → Present/Closure"
collections_aspect: "Accession dates (per object)"
Next Steps
Automated Batch Processing
Create script to process entries in batches:
-
Batch 1: Museums (133 entries)
- Pattern:
cpov:PublicOrganisation+schema:Museum+crm:E39_Actor - Collections:
crm:E78_Curated_Holding - People:
picom:PersonObservation
- Pattern:
-
Batch 2: Archives (117 entries)
- Pattern:
cpov:PublicOrganisation+rico:CorporateBody - Collections:
rico:RecordSet
- Pattern:
-
Batch 3: Libraries (29 entries)
- Pattern:
cpov:PublicOrganisation+schema:Library - Collections:
bf:Collection
- Pattern:
-
Batch 4: Buildings (35 entries)
- Pattern:
crm:E27_Site+schema:Place - Dual aspect: place + potential custodian
- Pattern:
Manual Review Required
- Entries with hypernym "organisation" (193 entries) - need public/private classification
- Entries with multiple hypernyms - need multi-aspect modeling
- Entries with complexity score ≥ 7 - require human review
Progress Tracking
- Entry 1/2,452: Q1802963 (mansion) ✅
- Batch 1: Museums (0/133)
- Batch 2: Archives (0/117)
- Batch 3: Libraries (0/29)
- Batch 4: Buildings (1/35)
- Remaining: (1/2,138)
Total Progress: 0.04% (1/2,452 entries)
Automation vs. Manual Work
Can Be Automated (70% of entries)
- Single hypernym with clear ontology mapping
- Standard patterns (museum, archive, library)
- Protected areas and natural monuments
Requires Manual Review (30% of entries)
- Multiple hypernyms (multi-aspect entities)
- Generic "organisation" classification
- Complex historical societies (heemkamer, etc.)
- Ambiguous building types
Estimated Effort
- Automated enrichment: 2-3 hours processing time
- Manual review: 20-30 hours for complex entries
- Quality assurance: 5-10 hours spot-checking
Total: 27-43 hours of work
Resources
- Ontology files:
/data/ontology/ - Full Wikidata metadata:
hyponyms_curated_full.yaml - Enrichment target:
hyponyms_curated.yaml - Rules reference:
.opencode/agent/ontology-mapping-rules.md
DBpedia Ontology Integration Discovered - 2025-11-20 23:56:32
Major Discovery: DBpedia Ontology provides pre-existing Wikidata → formal ontology mappings for heritage institutions.
Key Findings:
-
DBpedia has GLAM classes:
- dbo:Museum ←→ wd:Q33506 ←→ schema:Museum
- dbo:Library ←→ wd:Q7075 ←→ schema:Library
- dbo:Archive ←→ wd:Q166118
-
DBpedia provides heritage-specific properties:
- dbo:collection (museum collections)
- dbo:curator (curator name)
- dbo:museumType (specialization)
- dbo:isil (ISIL codes for libraries)
- dbo:numberOfCollectionItems
-
Integration benefits:
- Pre-mapped Wikidata entities save manual mapping work
- Standardized properties avoid custom property invention
- OWL reasoning support for ontology inference
- Validates existing Schema.org mappings
Documentation Created:
docs/DBPEDIA_ONTOLOGY_INTEGRATION.md(12,500+ words)- DBpedia ontology overview
- Heritage class mappings (Museum, Library, Archive)
- Integration workflow (4 steps)
- SPARQL queries for discovery
- Implementation recommendations
- Example enriched YAML with DBpedia references
Next Actions:
- Update
.opencode/agent/ontology-mapping-rules.mdwith DBpedia step - Create DBpedia → Wikidata mapping cache script
- Retrofit existing mappings (Q1802963, Q3694, Q2927789) with DBpedia
- Continue Q119459808 enrichment with DBpedia integration
Heritage-First Framing Principle Added - 2025-11-20 23:55
Critical Policy Update: Added Heritage-First Framing Principle to ontology mapping rules.
Problem Identified
Initial Q3694 (vacation property) mapping used generic real estate classes:
- ❌ PRIMARY:
schema:Accommodation(too generic) - ❌ RATIONALE: "Most vacation properties are commercial rentals"
This violated project mission: we model heritage custodians, not generic real estate.
Solution: Heritage-First Framing Principle
New Rule: All entities in GLAMORCUBESFIXPHDNT taxonomy are evaluated through heritage significance lens.
Key Points:
- ✅ ALWAYS assume heritage significance - entities in our taxonomy have heritage value
- ✅ ALWAYS use heritage-focused classes -
crm:E27_Site, notschema:Accommodation - ✅ ALWAYS model place aspect for sites - physical entities are heritage sites
- ❌ NEVER use generic classes -
schema:Residence,schema:Accommodationtoo generic - ❌ NEVER require "proof" - if in Wikidata extraction, has heritage potential
Documentation Updated
File: .opencode/agent/ontology-mapping-rules.md
Added section: "Heritage-First Framing Principle" (60 lines)
- Heritage Significance Default
- Examples (vacation properties, mansions, buitenplaatsen)
- Ontology Selection Decision Tree for Physical Sites
- Rationale (5 key points)
Entries Retrofitted
Q3694 (vacation property) - Fixed heritage framing:
- ✅ BEFORE:
schema:Accommodation(generic) - ✅ AFTER:
crm:E27_Site(heritage site) - ✅ Added:
heritage_framing_noteexplaining Heritage-First Principle - ✅ Updated:
ontology_rationalewith heritage-focused reasoning - ✅ Added: DBpedia mapping (
dbo:HistoricPlace)
Q1802963 (mansion) - Added DBpedia:
- ✅ Added:
dbpedia_mappingsection - ✅ Classes:
dbo:Building,dbo:HistoricBuilding,dbo:HistoricPlace
Q2927789 (buitenplaats) - Added DBpedia:
- ✅ Added:
dbpedia_mappingsection - ✅ Classes:
dbo:HistoricBuilding(Dutch heritage estates)
Impact
All future ontology mappings will:
- Default to heritage-focused classes (
crm:E27_Site, notschema:Place) - Use CIDOC-CRM as PRIMARY for cultural heritage sites
- Reject generic real estate classes
- Reference Heritage-First Framing Principle in rationale