- Introduced custodian_hub_v3.mmd, custodian_hub_v4_final.mmd, and custodian_hub_v5_FINAL.mmd for Mermaid representation. - Created custodian_hub_FINAL.puml and custodian_hub_v3.puml for PlantUML representation. - Defined entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument. - Established relationships and associations between entities, including temporal extents, observations, and reconstruction activities. - Incorporated enumerations for various types, statuses, and classifications relevant to custodians and their activities.
384 lines
14 KiB
Markdown
384 lines
14 KiB
Markdown
# Ontology Enrichment Plan for hyponyms_curated.yaml
|
|
|
|
**Date**: 2025-11-21 (Updated)
|
|
**Total Entries**: 2,453 Wikidata entities
|
|
**Status**: In Progress (5/2,453 complete = 0.20%)
|
|
|
|
---
|
|
|
|
## 🎯 Latest Session: DBpedia Integration Complete
|
|
|
|
**Session Date**: 2025-11-21
|
|
**Focus**: DBpedia ontology caching + Q119459808 enrichment
|
|
**Status**: ✅ COMPLETE
|
|
|
|
### Major Achievements
|
|
|
|
1. **DBpedia Ontology Files Cached** (276 KB total)
|
|
- `data/ontology/dbpedia_wikidata_mappings.ttl` (804 lines)
|
|
- `data/ontology/dbpedia_classes_sample.ttl` (2,514 lines)
|
|
- `data/ontology/dbpedia_heritage_classes.ttl` (219 lines)
|
|
- `data/ontology/dbpedia_glam_mappings_index.md` (usage guide)
|
|
|
|
2. **Q119459808 (scientific facility) Enriched**
|
|
- Heritage-first framing note added
|
|
- DBpedia mapping: `dbo:ResearchProject` (medium confidence)
|
|
- Related classes documented
|
|
- Coverage gap identified: No direct DBpedia class for research infrastructure
|
|
|
|
3. **4-Step DBpedia Workflow Established**
|
|
- Step 1: Check direct Wikidata mappings (high confidence)
|
|
- Step 2: Semantic keyword search (medium confidence)
|
|
- Step 3: Review heritage classes (validation)
|
|
- Step 4: Document confidence + gaps
|
|
|
|
**See**: `SESSION_SUMMARY_20251121_DBPEDIA_INTEGRATION_COMPLETE.md` for full details.
|
|
|
|
---
|
|
|
|
## Completed Entries
|
|
|
|
### 1. Q1802963 - mansion (RETROFITTED with DBpedia)
|
|
- **Hypernym**: building
|
|
- **Type**: F (Features - physical landmarks)
|
|
- **Ontology Mapping**: ✅ Complete + DBpedia
|
|
- Place aspect: `crm:E27_Site`, `schema:LandmarksOrHistoricalBuildings`
|
|
- Custodian aspect: `cpov:PublicOrganisation` (public) OR `schema:Museum` (private)
|
|
- DBpedia: `dbo:Building`, `dbo:HistoricBuilding`, `dbo:HistoricPlace`
|
|
- Complexity: 9/10
|
|
- Properties: 8 properties mapped
|
|
|
|
### 2. Q3694 - vacation property (FIXED heritage-first framing + DBpedia)
|
|
- **Hypernym**: accommodation
|
|
- **Type**: F (Features)
|
|
- **Ontology Mapping**: ✅ Complete + DBpedia (heritage-first fix)
|
|
- Place aspect: `crm:E27_Site` (heritage site focus)
|
|
- ~~`schema:Accommodation`~~ → Changed to heritage-focused classes
|
|
- DBpedia: `dbo:HistoricPlace`
|
|
- Heritage framing note added
|
|
- Complexity: 8/10
|
|
|
|
### 3. Q2927789 - buitenplaats (Dutch country estate) (RETROFITTED with DBpedia)
|
|
- **Hypernym**: building
|
|
- **Type**: F (Features)
|
|
- **Country**: Netherlands
|
|
- **Ontology Mapping**: ✅ Complete + DBpedia
|
|
- Place aspect: `crm:E27_Site`, `schema:LandmarksOrHistoricalBuildings`
|
|
- DBpedia: `dbo:HistoricBuilding`
|
|
- Dutch heritage context: Rijksmonument status, 17th-19th century estates
|
|
- Complexity: 7/10
|
|
|
|
### 4. Q2772772 - military museum
|
|
- **Hypernym**: museum
|
|
- **Type**: M (Museum)
|
|
- **Ontology Mapping**: ✅ Complete + DBpedia
|
|
- Custodian aspect: `cpov:PublicOrganisation`, `schema:Museum`
|
|
- Collections: `crm:E78_Curated_Holding` (military artifacts), `rico:RecordSet` (archival records)
|
|
- DBpedia: `dbo:Museum` (high confidence, direct Wikidata equivalent)
|
|
- Complexity: 4/10 (straightforward museum pattern)
|
|
|
|
### 5. Q119459808 - scientific facility ✨ NEW
|
|
- **Hypernym**: organisation
|
|
- **Type**: R (Research) + E (Education)
|
|
- **Ontology Mapping**: ✅ Complete + DBpedia + Heritage-First
|
|
- Custodian aspect: `schema:ResearchOrganization`, `cpov:PublicOrganisation` (if public)
|
|
- Place aspect: `crm:E27_Site` (conditional on permanent facilities)
|
|
- Collections: `schema:Dataset` (research data), `crm:E78_Curated_Holding` (specimens)
|
|
- DBpedia: `dbo:ResearchProject` (medium confidence, semantic approximation)
|
|
- Heritage framing note: Emphasizes scientific facilities as **heritage custodians** (specimen archives, research data), not generic R&D
|
|
- Coverage gap documented: DBpedia lacks "scientific facility" class
|
|
- Complexity: 7/10 (multi-functional research infrastructure)
|
|
|
|
---
|
|
|
|
## Batch Processing Strategy
|
|
|
|
Given 2,452 entries, we'll process them in batches by hypernym category:
|
|
|
|
### Priority 1: Core Heritage Custodian Types (1,465 entries)
|
|
These are the most critical for the heritage custodian ontology:
|
|
|
|
| Hypernym | Count | Ontology Pattern | Status |
|
|
|----------|-------|------------------|--------|
|
|
| museum | 133 | `cpov:PublicOrganisation` + `schema:Museum` + `crm:E39_Actor` | TODO |
|
|
| archive | 117 | `cpov:PublicOrganisation` + `rico:CorporateBody` + `rico:RecordSet` | TODO |
|
|
| library | 29 | `cpov:PublicOrganisation` + `schema:Library` + `bf:Collection` | TODO |
|
|
| art institution | 77 | `cpov:PublicOrganisation` + `schema:ArtGallery` + `crm:E78_Curated_Holding` | TODO |
|
|
| cultural institution | 22 | `cpov:PublicOrganisation` + `schema:Organization` | TODO |
|
|
| heritage site | 151 | `crm:E27_Site` + `schema:LandmarksOrHistoricalBuildings` | TODO |
|
|
| organisation | 193 | `cpov:PublicOrganisation` OR `schema:Organization` (requires classification) | TODO |
|
|
| company | 189 | `schema:Corporation` + `crm:E40_Legal_Body` | TODO |
|
|
| university | 66 | `schema:EducationalOrganization` + `schema:CollegeOrUniversity` | TODO |
|
|
| higher education institution | 42 | `schema:EducationalOrganization` | TODO |
|
|
| school | 39 | `schema:EducationalOrganization` | TODO |
|
|
| research center | (in organisation) | `schema:ResearchOrganization` + `cpov:PublicOrganisation` | TODO |
|
|
|
|
**Subtotal**: ~1,058 entries (43% of total)
|
|
|
|
### Priority 2: Physical Sites and Places (1,183 entries)
|
|
Environmental and landscape heritage:
|
|
|
|
| Hypernym | Count | Ontology Pattern | Status |
|
|
|----------|-------|------------------|--------|
|
|
| protected area | 875 | `schema:Place` + `crm:E27_Site` | TODO |
|
|
| national park | 74 | `schema:Park` + environmental heritage mixins | TODO |
|
|
| natural monument | 70 | `schema:LandmarksOrHistoricalBuildings` | TODO |
|
|
| building | 35 | `crm:E27_Site` + `schema:Place` | ✅ 1/35 |
|
|
| park | 21 | `schema:Park` | TODO |
|
|
| zoo | 17 | `schema:Zoo` + `crm:E39_Actor` | TODO |
|
|
|
|
**Subtotal**: ~1,092 entries (45% of total)
|
|
|
|
### Priority 3: Specialized Categories (302 entries)
|
|
Collections, groups, and specialized types:
|
|
|
|
| Hypernym | Count | Ontology Pattern | Status |
|
|
|----------|-------|------------------|--------|
|
|
| group | 28 | `crm:E74_Group` + `schema:Organization` | TODO |
|
|
| collection | 16 | `rico:RecordSet` OR `crm:E78_Curated_Holding` OR `bf:Collection` | TODO |
|
|
| data repository | 19 | `schema:DataCatalog` + digital platform mixins | TODO |
|
|
| historical society | (in organisation) | `schema:NGO` + `crm:E74_Group` | TODO |
|
|
|
|
**Subtotal**: ~63 entries (3% of total)
|
|
|
|
### Priority 4: Settlement and Administrative Units (139 entries)
|
|
Geographic and political entities (low priority for heritage custodian ontology):
|
|
|
|
| Hypernym | Count | Ontology Pattern | Status |
|
|
|----------|-------|------------------|--------|
|
|
| settlement | varies | `schema:Place` | TODO |
|
|
| province | varies | `schema:AdministrativeArea` | TODO |
|
|
| polity | varies | `schema:GovernmentOrganization` | TODO |
|
|
|
|
**Subtotal**: ~139 entries (6% of total)
|
|
|
|
---
|
|
|
|
## Enrichment Workflow
|
|
|
|
For each entry, add the following YAML structure:
|
|
|
|
```yaml
|
|
- label: Q1234567
|
|
hypernym:
|
|
- museum
|
|
type:
|
|
- M
|
|
ontology_mapping:
|
|
wikidata_source: Q1234567
|
|
enrichment_date: '2025-11-20T...'
|
|
enriched_by: manual_ontology_mapper
|
|
complexity_score: 7 # 1-10 scale
|
|
complexity_note: "Explanation of why this entity is complex to model"
|
|
|
|
semantic_aspects:
|
|
- custodian_reference
|
|
- place_reference
|
|
- collections_reference
|
|
|
|
custodian_ontology:
|
|
primary_class: cpov:PublicOrganisation
|
|
namespace: http://data.europa.eu/m8g/
|
|
secondary_class: schema:Museum
|
|
rdfs_comment: "Description of when to use this class"
|
|
properties:
|
|
- dct:identifier (ISIL code, Wikidata)
|
|
- cpov:hasUnit (organizational structure)
|
|
|
|
place_ontology: # If applicable
|
|
primary_class: crm:E27_Site
|
|
properties:
|
|
- schema:geo (coordinates)
|
|
|
|
collections_ontology: # If applicable
|
|
primary_class: crm:E78_Curated_Holding
|
|
properties:
|
|
- crm:P147i_was_curated_by (custodian)
|
|
|
|
temporal_model:
|
|
custodian_aspect: "Founding → Present/Closure"
|
|
collections_aspect: "Accession dates (per object)"
|
|
```
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
### Automated Batch Processing
|
|
|
|
Create script to process entries in batches:
|
|
|
|
1. **Batch 1: Museums (133 entries)**
|
|
- Pattern: `cpov:PublicOrganisation` + `schema:Museum` + `crm:E39_Actor`
|
|
- Collections: `crm:E78_Curated_Holding`
|
|
- People: `pico:PersonObservation`
|
|
|
|
2. **Batch 2: Archives (117 entries)**
|
|
- Pattern: `cpov:PublicOrganisation` + `rico:CorporateBody`
|
|
- Collections: `rico:RecordSet`
|
|
|
|
3. **Batch 3: Libraries (29 entries)**
|
|
- Pattern: `cpov:PublicOrganisation` + `schema:Library`
|
|
- Collections: `bf:Collection`
|
|
|
|
4. **Batch 4: Buildings (35 entries)**
|
|
- Pattern: `crm:E27_Site` + `schema:Place`
|
|
- Dual aspect: place + potential custodian
|
|
|
|
### Manual Review Required
|
|
|
|
- Entries with hypernym "organisation" (193 entries) - need public/private classification
|
|
- Entries with multiple hypernyms - need multi-aspect modeling
|
|
- Entries with complexity score ≥ 7 - require human review
|
|
|
|
---
|
|
|
|
## Progress Tracking
|
|
|
|
- [x] Entry 1/2,452: Q1802963 (mansion) ✅
|
|
- [ ] Batch 1: Museums (0/133)
|
|
- [ ] Batch 2: Archives (0/117)
|
|
- [ ] Batch 3: Libraries (0/29)
|
|
- [ ] Batch 4: Buildings (1/35)
|
|
- [ ] Remaining: (1/2,138)
|
|
|
|
**Total Progress**: 0.04% (1/2,452 entries)
|
|
|
|
---
|
|
|
|
## Automation vs. Manual Work
|
|
|
|
### Can Be Automated (70% of entries)
|
|
- Single hypernym with clear ontology mapping
|
|
- Standard patterns (museum, archive, library)
|
|
- Protected areas and natural monuments
|
|
|
|
### Requires Manual Review (30% of entries)
|
|
- Multiple hypernyms (multi-aspect entities)
|
|
- Generic "organisation" classification
|
|
- Complex historical societies (heemkamer, etc.)
|
|
- Ambiguous building types
|
|
|
|
---
|
|
|
|
## Estimated Effort
|
|
|
|
- **Automated enrichment**: 2-3 hours processing time
|
|
- **Manual review**: 20-30 hours for complex entries
|
|
- **Quality assurance**: 5-10 hours spot-checking
|
|
|
|
**Total**: 27-43 hours of work
|
|
|
|
---
|
|
|
|
## Resources
|
|
|
|
- **Ontology files**: `/data/ontology/`
|
|
- **Full Wikidata metadata**: `hyponyms_curated_full.yaml`
|
|
- **Enrichment target**: `hyponyms_curated.yaml`
|
|
- **Rules reference**: `.opencode/agent/ontology-mapping-rules.md`
|
|
|
|
## DBpedia Ontology Integration Discovered - 2025-11-20 23:56:32
|
|
|
|
**Major Discovery**: DBpedia Ontology provides pre-existing Wikidata → formal ontology mappings for heritage institutions.
|
|
|
|
### Key Findings:
|
|
|
|
1. **DBpedia has GLAM classes**:
|
|
- dbo:Museum ←→ wd:Q33506 ←→ schema:Museum
|
|
- dbo:Library ←→ wd:Q7075 ←→ schema:Library
|
|
- dbo:Archive ←→ wd:Q166118
|
|
|
|
2. **DBpedia provides heritage-specific properties**:
|
|
- dbo:collection (museum collections)
|
|
- dbo:curator (curator name)
|
|
- dbo:museumType (specialization)
|
|
- dbo:isil (ISIL codes for libraries)
|
|
- dbo:numberOfCollectionItems
|
|
|
|
3. **Integration benefits**:
|
|
- Pre-mapped Wikidata entities save manual mapping work
|
|
- Standardized properties avoid custom property invention
|
|
- OWL reasoning support for ontology inference
|
|
- Validates existing Schema.org mappings
|
|
|
|
### Documentation Created:
|
|
|
|
- `docs/DBPEDIA_ONTOLOGY_INTEGRATION.md` (12,500+ words)
|
|
- DBpedia ontology overview
|
|
- Heritage class mappings (Museum, Library, Archive)
|
|
- Integration workflow (4 steps)
|
|
- SPARQL queries for discovery
|
|
- Implementation recommendations
|
|
- Example enriched YAML with DBpedia references
|
|
|
|
### Next Actions:
|
|
|
|
1. Update `.opencode/agent/ontology-mapping-rules.md` with DBpedia step
|
|
2. Create DBpedia → Wikidata mapping cache script
|
|
3. Retrofit existing mappings (Q1802963, Q3694, Q2927789) with DBpedia
|
|
4. Continue Q119459808 enrichment with DBpedia integration
|
|
|
|
---
|
|
|
|
|
|
## Heritage-First Framing Principle Added - 2025-11-20 23:55
|
|
|
|
**Critical Policy Update**: Added Heritage-First Framing Principle to ontology mapping rules.
|
|
|
|
### Problem Identified
|
|
|
|
Initial Q3694 (vacation property) mapping used generic real estate classes:
|
|
- ❌ PRIMARY: `schema:Accommodation` (too generic)
|
|
- ❌ RATIONALE: "Most vacation properties are commercial rentals"
|
|
|
|
This violated project mission: we model **heritage custodians**, not generic real estate.
|
|
|
|
### Solution: Heritage-First Framing Principle
|
|
|
|
**New Rule**: All entities in GLAMORCUBESFIXPHDNT taxonomy are evaluated through **heritage significance lens**.
|
|
|
|
**Key Points**:
|
|
1. ✅ **ALWAYS assume heritage significance** - entities in our taxonomy have heritage value
|
|
2. ✅ **ALWAYS use heritage-focused classes** - `crm:E27_Site`, not `schema:Accommodation`
|
|
3. ✅ **ALWAYS model place aspect for sites** - physical entities are heritage sites
|
|
4. ❌ **NEVER use generic classes** - `schema:Residence`, `schema:Accommodation` too generic
|
|
5. ❌ **NEVER require "proof"** - if in Wikidata extraction, has heritage potential
|
|
|
|
### Documentation Updated
|
|
|
|
**File**: `.opencode/agent/ontology-mapping-rules.md`
|
|
|
|
Added section: "Heritage-First Framing Principle" (60 lines)
|
|
- Heritage Significance Default
|
|
- Examples (vacation properties, mansions, buitenplaatsen)
|
|
- Ontology Selection Decision Tree for Physical Sites
|
|
- Rationale (5 key points)
|
|
|
|
### Entries Retrofitted
|
|
|
|
**Q3694 (vacation property)** - Fixed heritage framing:
|
|
- ✅ BEFORE: `schema:Accommodation` (generic)
|
|
- ✅ AFTER: `crm:E27_Site` (heritage site)
|
|
- ✅ Added: `heritage_framing_note` explaining Heritage-First Principle
|
|
- ✅ Updated: `ontology_rationale` with heritage-focused reasoning
|
|
- ✅ Added: DBpedia mapping (`dbo:HistoricPlace`)
|
|
|
|
**Q1802963 (mansion)** - Added DBpedia:
|
|
- ✅ Added: `dbpedia_mapping` section
|
|
- ✅ Classes: `dbo:Building`, `dbo:HistoricBuilding`, `dbo:HistoricPlace`
|
|
|
|
**Q2927789 (buitenplaats)** - Added DBpedia:
|
|
- ✅ Added: `dbpedia_mapping` section
|
|
- ✅ Classes: `dbo:HistoricBuilding` (Dutch heritage estates)
|
|
|
|
### Impact
|
|
|
|
**All future ontology mappings** will:
|
|
1. Default to heritage-focused classes (`crm:E27_Site`, not `schema:Place`)
|
|
2. Use CIDOC-CRM as PRIMARY for cultural heritage sites
|
|
3. Reject generic real estate classes
|
|
4. Reference Heritage-First Framing Principle in rationale
|
|
|
|
---
|
|
|