glam/FEATUREPLACE_ONTOLOGY_MAPPING_STRATEGY.md
kempersc 6eb18700f0 Add SHACL validation shapes and validation script for Heritage Custodian Ontology
- Created SHACL shapes for validating temporal consistency and bidirectional relationships in custodial collections and staff observations.
- Implemented a Python script to validate RDF data against the defined SHACL shapes using the pyshacl library.
- Added command-line interface for validation with options for specifying data formats and output reports.
- Included detailed error handling and reporting for validation results.
2025-11-22 23:22:10 +01:00

477 lines
12 KiB
Markdown

# FeaturePlace Ontology Mapping Strategy
**Date**: 2025-11-22
**Task**: Map 298 Wikidata feature types to ontology classes from `/data/ontology/`
---
## Ontology Sources Available
### Primary Ontologies
1. **CIDOC-CRM** (`CIDOC_CRM_v7.1.3.rdf`)
- Cultural heritage domain standard
- Key classes: `E27_Site`, `E22_Human-Made_Object`, `E25_Human-Made_Feature`, `E26_Physical_Feature`
2. **Schema.org** (`schemaorg.owl`)
- Web semantics, general-purpose
- Key classes: `schema:Place`, `schema:LandmarksOrHistoricalBuildings`, `schema:Museum`, `schema:Church`, `schema:PlaceOfWorship`
3. **DBpedia Ontology** (`dbpedia_heritage_classes.ttl`, `dbpedia_ontology.owl`)
- Linked data from Wikipedia
- Key classes: `dbo:Building`, `dbo:HistoricBuilding`, `dbo:Museum`, `dbo:Library`, `dbo:Archive`
- **Mappings**: 804-line `dbpedia_wikidata_mappings.ttl` provides `dbo:Class ↔ wd:Q*` equivalences
4. **W3C Org Ontology** (`org.rdf`)
- Organizational structures
- Key classes: `org:Organization`, `org:FormalOrganization`
5. **GeoSPARQL** (`geo.ttl`)
- Spatial features
- Key classes: `geo:Feature`, `geo:Geometry`
### Supporting Ontologies
- **PROV-O** (`prov.ttl`, `prov-o.rdf`) - Provenance
- **Dublin Core** (`dublin_core_elements.rdf`) - Metadata
- **SKOS** (`skos.rdf`) - Knowledge organization
- **FOAF** (`foaf.ttl`) - Social networks
- **VCARD** (`vcard.rdf`) - Contact information
---
## Mapping Strategy by Hypernym Category
### 1. Buildings (33 entries, 11.1%)
**Wikidata Examples**: Q1802963 (mansion), Q317557 (parish church), Q1021645 (office building)
**Ontology Mappings**:
- **Primary**: `crm:E22_Human-Made_Object` (CIDOC-CRM)
- **Secondary**: `dbo:Building` (DBpedia)
- **Web**: `schema:LandmarksOrHistoricalBuildings` (Schema.org for heritage buildings)
- **Specific types**:
- Churches → `schema:Church`, `schema:PlaceOfWorship`
- Museums → `schema:Museum`, `dbo:Museum`
- Historic buildings → `dbo:HistoricBuilding`
**Mapping Pattern**:
```yaml
MANSION:
meaning: wd:Q1802963
exact_mappings:
- crm:E22_Human-Made_Object
- dbo:Building
close_mappings:
- schema:LandmarksOrHistoricalBuildings
- dbo:HistoricBuilding
```
---
### 2. Heritage Sites (144 entries, 48.3%)
**Wikidata Examples**: Q3694 (vacation property), Q2927789 (buitenplaats)
**Ontology Mappings**:
- **Primary**: `crm:E27_Site` (CIDOC-CRM physical site)
- **Secondary**: `dbo:HistoricPlace` (DBpedia)
- **Web**: `schema:LandmarksOrHistoricalBuildings`, `schema:TouristAttraction`
**Mapping Pattern**:
```yaml
HERITAGE_SITE:
meaning: wd:Q???
exact_mappings:
- crm:E27_Site
close_mappings:
- dbo:HistoricPlace
- schema:LandmarksOrHistoricalBuildings
```
---
### 3. Protected Areas (23 entries, 7.7%)
**Wikidata Examples**: National parks, nature reserves, conservation areas
**Ontology Mappings**:
- **Primary**: `crm:E27_Site` (CIDOC-CRM)
- **Web**: `schema:Park`, `schema:Place`
- **Geo**: `geo:Feature` (GeoSPARQL)
**Mapping Pattern**:
```yaml
PROTECTED_AREA:
meaning: wd:Q???
exact_mappings:
- crm:E27_Site
- geo:Feature
close_mappings:
- schema:Park
```
---
### 4. Structures (12 entries, 4.0%)
**Wikidata Examples**: Q336164 (sewerage pumping station), Q15710813 (physical structure)
**Ontology Mappings**:
- **Primary**: `crm:E25_Human-Made_Feature` (CIDOC-CRM)
- **Secondary**: `crm:E26_Physical_Feature` (broader)
- **Web**: `schema:Place`
**Mapping Pattern**:
```yaml
STRUCTURE:
meaning: wd:Q???
exact_mappings:
- crm:E25_Human-Made_Feature
close_mappings:
- crm:E26_Physical_Feature
```
---
### 5. Museums (8 entries, 2.7%)
**Wikidata Examples**: Military museums, art museums, historical museums
**Ontology Mappings**:
- **Primary**: `schema:Museum` (Schema.org)
- **Secondary**: `dbo:Museum` (DBpedia)
- **Heritage**: `crm:E22_Human-Made_Object` (building as object)
**Mapping Pattern**:
```yaml
MUSEUM:
meaning: wd:Q33506
exact_mappings:
- schema:Museum
- dbo:Museum
close_mappings:
- crm:E22_Human-Made_Object
```
---
### 6. Infrastructure (6 entries, 2.0%)
**Wikidata Examples**: Q376799 (transport infrastructure), Q1311670 (rail infrastructure)
**Ontology Mappings**:
- **Primary**: `crm:E25_Human-Made_Feature` (CIDOC-CRM)
- **Web**: `schema:Place`
- **Note**: Infrastructure is underrepresented in cultural heritage ontologies
**Mapping Pattern**:
```yaml
INFRASTRUCTURE:
meaning: wd:Q???
exact_mappings:
- crm:E25_Human-Made_Feature
close_mappings:
- schema:Place
related_mappings:
- crm:E26_Physical_Feature
```
---
### 7. Organizations (monasteries, etc.)
**Wikidata Examples**: Q44613 (monastery)
**Ontology Mappings**:
- **Primary**: `org:Organization` (W3C Org)
- **Secondary**: `dbo:Organisation` (DBpedia)
- **But also**: `crm:E22_Human-Made_Object` (monastery as building)
**Note**: Monasteries are BOTH organizations AND buildings - use multi-aspect approach
**Mapping Pattern**:
```yaml
MONASTERY:
meaning: wd:Q44613
exact_mappings:
- org:Organization # Organizational aspect
- crm:E22_Human-Made_Object # Building aspect
close_mappings:
- dbo:Organisation
- schema:PlaceOfWorship
```
---
## General Mapping Rules
### Rule 1: Multiple Mappings (Multi-Aspect Entities)
Many heritage features have MULTIPLE ontological aspects:
```yaml
CASTLE:
exact_mappings:
- crm:E22_Human-Made_Object # Physical building
- crm:E27_Site # Historic site
- dbo:Building # DBpedia building class
close_mappings:
- schema:LandmarksOrHistoricalBuildings
```
**Rationale**: A castle is simultaneously:
- A physical building (E22)
- A historic site (E27)
- A landmark (Schema.org)
### Rule 2: Hierarchy (Exact → Close → Related)
```yaml
exact_mappings:
# Direct equivalence (this IS that class)
- crm:E27_Site
close_mappings:
# Close semantic match (this is SIMILAR to that class)
- dbo:HistoricPlace
- schema:LandmarksOrHistoricalBuildings
related_mappings:
# Related but not equivalent (this RELATES to that class)
- geo:Feature
- dcterms:Location
```
### Rule 3: Prefer Heritage-Specific Ontologies
**Priority order**:
1. **CIDOC-CRM** (cultural heritage domain standard)
2. **DBpedia** (linked data with Wikidata mappings)
3. **Schema.org** (web semantics, broad coverage)
4. **Domain-specific** (GeoSPARQL for geographic, Org for organizations)
### Rule 4: Use DBpedia Wikidata Mappings When Available
**Check first**: `dbpedia_wikidata_mappings.ttl`
```bash
# Example: Look up DBpedia class for Wikidata Q33506 (museum)
grep "wikidata:Q33506" /Users/kempersc/apps/glam/data/ontology/dbpedia_wikidata_mappings.ttl
# Returns: dbo:Museum owl:equivalentClass wikidata:Q33506
```
**If found**: Use `dbo:Class` as exact mapping
**If not found**: Use semantic approximation + document in `mapping_note`
---
## Implementation Workflow
### Step 1: Automated Mapping (High Confidence)
Use `dbpedia_wikidata_mappings.ttl` to automatically map entries with direct DBpedia equivalents:
```python
# Load mappings
dbpedia_wd_mappings = parse_ttl('dbpedia_wikidata_mappings.ttl')
# For each feature type
for feature in feature_types:
q_number = feature['meaning'] # e.g., wd:Q33506
# Check for DBpedia mapping
if q_number in dbpedia_wd_mappings:
dbo_class = dbpedia_wd_mappings[q_number]
feature['exact_mappings'].append(dbo_class)
feature['mapping_confidence'] = 'high'
```
**Coverage estimate**: ~60-70% of entries (based on DBpedia's GLAM coverage)
---
### Step 2: Semantic Rule-Based Mapping (Medium Confidence)
Use hypernym categories to apply ontology mapping rules:
```python
# Mapping rules by hypernym
hypernym_rules = {
'building': ['crm:E22_Human-Made_Object', 'dbo:Building'],
'heritage site': ['crm:E27_Site', 'dbo:HistoricPlace'],
'museum': ['schema:Museum', 'dbo:Museum'],
'park': ['crm:E27_Site', 'schema:Park'],
'structure': ['crm:E25_Human-Made_Feature'],
'infrastructure': ['crm:E25_Human-Made_Feature'],
# ... etc.
}
# Apply rules
for feature in feature_types:
for hypernym in feature['hypernyms']:
if hypernym in hypernym_rules:
feature['exact_mappings'].extend(hypernym_rules[hypernym])
feature['mapping_confidence'] = 'medium'
```
**Coverage estimate**: ~25-30% additional entries
---
### Step 3: Manual Review (Low Confidence)
Remaining entries (~5-10%) require manual ontology consultation:
- Read Wikidata descriptions
- Search ontology files for semantic matches
- Document mapping rationale
```yaml
ESOTERIC_FEATURE_TYPE:
meaning: wd:Q???
exact_mappings:
- crm:E27_Site # Default fallback
mapping_note: "No specific ontology class found. Using general site class."
mapping_confidence: low
```
---
## Default Fallback Mappings
When no specific mapping found, use these defaults:
```yaml
# Physical features (default)
exact_mappings:
- crm:E27_Site # CIDOC-CRM site (broadest physical feature)
close_mappings:
- schema:Place # Schema.org generic place
related_mappings:
- geo:Feature # GeoSPARQL spatial feature
```
**Rationale**: Every feature type is AT LEAST:
- A site (E27)
- A place (Schema.org)
- A geographic feature (GeoSPARQL)
---
## Quality Assurance
### Validation Checks
1. **Every entry has at least one exact_mapping**: No orphaned entries
2. **CIDOC-CRM class present**: Cultural heritage standard compliance
3. **Mapping confidence documented**: Transparency about mapping quality
4. **Wikidata Q-number valid**: All `wd:Q*` references resolve
### Confidence Levels
```yaml
mapping_confidence:
high: # DBpedia direct equivalence or clear 1:1 match
medium: # Semantic rule-based mapping
low: # Manual approximation or fallback to general class
```
### Mapping Notes
Document rationale for non-obvious mappings:
```yaml
SCIENTIFIC_FACILITY:
meaning: wd:Q119459808
exact_mappings:
- org:Organization # Organizational aspect
- crm:E27_Site # Physical site aspect
mapping_note: >-
DBpedia lacks specific 'scientific facility' class.
Mapped to Organization (function) + Site (physical).
mapping_confidence: medium
```
---
## Expected Output Format
```yaml
enums:
FeatureTypeEnum:
permissible_values:
MANSION:
title: mansion
description: very large and imposing dwelling house
meaning: wd:Q1802963
# NEW: Ontology mappings
exact_mappings:
- crm:E22_Human-Made_Object
- dbo:Building
close_mappings:
- schema:LandmarksOrHistoricalBuildings
- dbo:HistoricBuilding
related_mappings:
- geo:Feature
# NEW: Mapping metadata
annotations:
wikidata_id: Q1802963
wikidata_url: https://www.wikidata.org/wiki/Q1802963
hypernyms: building
dbpedia_class: dbo:Building
cidoc_crm_class: crm:E22_Human-Made_Object
schema_org_class: schema:LandmarksOrHistoricalBuildings
mapping_confidence: high
mapping_date: 2025-11-22
```
---
## Implementation Plan
### Phase 1: Automated Mapping (2 hours)
1. Parse `dbpedia_wikidata_mappings.ttl`
2. Create hypernym → ontology class rules
3. Apply automated mapping to all 298 entries
4. Generate updated `FeatureTypeEnum.yaml`
### Phase 2: Manual Review (3 hours)
1. Review entries with `mapping_confidence: low`
2. Search ontology files for better matches
3. Document mapping rationale
4. Update entries with improved mappings
### Phase 3: Validation (1 hour)
1. Check all entries have exact_mappings
2. Verify CIDOC-CRM coverage
3. Validate Wikidata Q-numbers
4. Generate mapping quality report
### Phase 4: Documentation (1 hour)
1. Update AGENTS.md with mapping workflow
2. Create ontology mapping reference guide
3. Generate mapping statistics report
4. Update FeaturePlace.yaml with ontology references
**Total estimated time**: 7 hours
---
## References
- **CIDOC-CRM Specification**: http://www.cidoc-crm.org/html/cidoc_crm_v7.1.3.html
- **Schema.org**: https://schema.org/
- **DBpedia Ontology**: https://dbpedia.org/ontology/
- **DBpedia Wikidata Mappings**: `/data/ontology/dbpedia_wikidata_mappings.ttl`
- **DBpedia Heritage Classes**: `/data/ontology/dbpedia_heritage_classes.ttl`
- **GeoSPARQL**: https://www.ogc.org/standards/geosparql
---
**Next Step**: Implement Phase 1 automated mapping script