- Created SHACL shapes for validating temporal consistency and bidirectional relationships in custodial collections and staff observations. - Implemented a Python script to validate RDF data against the defined SHACL shapes using the pyshacl library. - Added command-line interface for validation with options for specifying data formats and output reports. - Included detailed error handling and reporting for validation results.
477 lines
12 KiB
Markdown
477 lines
12 KiB
Markdown
# FeaturePlace Ontology Mapping Strategy
|
|
|
|
**Date**: 2025-11-22
|
|
**Task**: Map 298 Wikidata feature types to ontology classes from `/data/ontology/`
|
|
|
|
---
|
|
|
|
## Ontology Sources Available
|
|
|
|
### Primary Ontologies
|
|
|
|
1. **CIDOC-CRM** (`CIDOC_CRM_v7.1.3.rdf`)
|
|
- Cultural heritage domain standard
|
|
- Key classes: `E27_Site`, `E22_Human-Made_Object`, `E25_Human-Made_Feature`, `E26_Physical_Feature`
|
|
|
|
2. **Schema.org** (`schemaorg.owl`)
|
|
- Web semantics, general-purpose
|
|
- Key classes: `schema:Place`, `schema:LandmarksOrHistoricalBuildings`, `schema:Museum`, `schema:Church`, `schema:PlaceOfWorship`
|
|
|
|
3. **DBpedia Ontology** (`dbpedia_heritage_classes.ttl`, `dbpedia_ontology.owl`)
|
|
- Linked data from Wikipedia
|
|
- Key classes: `dbo:Building`, `dbo:HistoricBuilding`, `dbo:Museum`, `dbo:Library`, `dbo:Archive`
|
|
- **Mappings**: 804-line `dbpedia_wikidata_mappings.ttl` provides `dbo:Class ↔ wd:Q*` equivalences
|
|
|
|
4. **W3C Org Ontology** (`org.rdf`)
|
|
- Organizational structures
|
|
- Key classes: `org:Organization`, `org:FormalOrganization`
|
|
|
|
5. **GeoSPARQL** (`geo.ttl`)
|
|
- Spatial features
|
|
- Key classes: `geo:Feature`, `geo:Geometry`
|
|
|
|
### Supporting Ontologies
|
|
|
|
- **PROV-O** (`prov.ttl`, `prov-o.rdf`) - Provenance
|
|
- **Dublin Core** (`dublin_core_elements.rdf`) - Metadata
|
|
- **SKOS** (`skos.rdf`) - Knowledge organization
|
|
- **FOAF** (`foaf.ttl`) - Social networks
|
|
- **VCARD** (`vcard.rdf`) - Contact information
|
|
|
|
---
|
|
|
|
## Mapping Strategy by Hypernym Category
|
|
|
|
### 1. Buildings (33 entries, 11.1%)
|
|
|
|
**Wikidata Examples**: Q1802963 (mansion), Q317557 (parish church), Q1021645 (office building)
|
|
|
|
**Ontology Mappings**:
|
|
- **Primary**: `crm:E22_Human-Made_Object` (CIDOC-CRM)
|
|
- **Secondary**: `dbo:Building` (DBpedia)
|
|
- **Web**: `schema:LandmarksOrHistoricalBuildings` (Schema.org for heritage buildings)
|
|
- **Specific types**:
|
|
- Churches → `schema:Church`, `schema:PlaceOfWorship`
|
|
- Museums → `schema:Museum`, `dbo:Museum`
|
|
- Historic buildings → `dbo:HistoricBuilding`
|
|
|
|
**Mapping Pattern**:
|
|
```yaml
|
|
MANSION:
|
|
meaning: wd:Q1802963
|
|
exact_mappings:
|
|
- crm:E22_Human-Made_Object
|
|
- dbo:Building
|
|
close_mappings:
|
|
- schema:LandmarksOrHistoricalBuildings
|
|
- dbo:HistoricBuilding
|
|
```
|
|
|
|
---
|
|
|
|
### 2. Heritage Sites (144 entries, 48.3%)
|
|
|
|
**Wikidata Examples**: Q3694 (vacation property), Q2927789 (buitenplaats)
|
|
|
|
**Ontology Mappings**:
|
|
- **Primary**: `crm:E27_Site` (CIDOC-CRM physical site)
|
|
- **Secondary**: `dbo:HistoricPlace` (DBpedia)
|
|
- **Web**: `schema:LandmarksOrHistoricalBuildings`, `schema:TouristAttraction`
|
|
|
|
**Mapping Pattern**:
|
|
```yaml
|
|
HERITAGE_SITE:
|
|
meaning: wd:Q???
|
|
exact_mappings:
|
|
- crm:E27_Site
|
|
close_mappings:
|
|
- dbo:HistoricPlace
|
|
- schema:LandmarksOrHistoricalBuildings
|
|
```
|
|
|
|
---
|
|
|
|
### 3. Protected Areas (23 entries, 7.7%)
|
|
|
|
**Wikidata Examples**: National parks, nature reserves, conservation areas
|
|
|
|
**Ontology Mappings**:
|
|
- **Primary**: `crm:E27_Site` (CIDOC-CRM)
|
|
- **Web**: `schema:Park`, `schema:Place`
|
|
- **Geo**: `geo:Feature` (GeoSPARQL)
|
|
|
|
**Mapping Pattern**:
|
|
```yaml
|
|
PROTECTED_AREA:
|
|
meaning: wd:Q???
|
|
exact_mappings:
|
|
- crm:E27_Site
|
|
- geo:Feature
|
|
close_mappings:
|
|
- schema:Park
|
|
```
|
|
|
|
---
|
|
|
|
### 4. Structures (12 entries, 4.0%)
|
|
|
|
**Wikidata Examples**: Q336164 (sewerage pumping station), Q15710813 (physical structure)
|
|
|
|
**Ontology Mappings**:
|
|
- **Primary**: `crm:E25_Human-Made_Feature` (CIDOC-CRM)
|
|
- **Secondary**: `crm:E26_Physical_Feature` (broader)
|
|
- **Web**: `schema:Place`
|
|
|
|
**Mapping Pattern**:
|
|
```yaml
|
|
STRUCTURE:
|
|
meaning: wd:Q???
|
|
exact_mappings:
|
|
- crm:E25_Human-Made_Feature
|
|
close_mappings:
|
|
- crm:E26_Physical_Feature
|
|
```
|
|
|
|
---
|
|
|
|
### 5. Museums (8 entries, 2.7%)
|
|
|
|
**Wikidata Examples**: Military museums, art museums, historical museums
|
|
|
|
**Ontology Mappings**:
|
|
- **Primary**: `schema:Museum` (Schema.org)
|
|
- **Secondary**: `dbo:Museum` (DBpedia)
|
|
- **Heritage**: `crm:E22_Human-Made_Object` (building as object)
|
|
|
|
**Mapping Pattern**:
|
|
```yaml
|
|
MUSEUM:
|
|
meaning: wd:Q33506
|
|
exact_mappings:
|
|
- schema:Museum
|
|
- dbo:Museum
|
|
close_mappings:
|
|
- crm:E22_Human-Made_Object
|
|
```
|
|
|
|
---
|
|
|
|
### 6. Infrastructure (6 entries, 2.0%)
|
|
|
|
**Wikidata Examples**: Q376799 (transport infrastructure), Q1311670 (rail infrastructure)
|
|
|
|
**Ontology Mappings**:
|
|
- **Primary**: `crm:E25_Human-Made_Feature` (CIDOC-CRM)
|
|
- **Web**: `schema:Place`
|
|
- **Note**: Infrastructure is underrepresented in cultural heritage ontologies
|
|
|
|
**Mapping Pattern**:
|
|
```yaml
|
|
INFRASTRUCTURE:
|
|
meaning: wd:Q???
|
|
exact_mappings:
|
|
- crm:E25_Human-Made_Feature
|
|
close_mappings:
|
|
- schema:Place
|
|
related_mappings:
|
|
- crm:E26_Physical_Feature
|
|
```
|
|
|
|
---
|
|
|
|
### 7. Organizations (monasteries, etc.)
|
|
|
|
**Wikidata Examples**: Q44613 (monastery)
|
|
|
|
**Ontology Mappings**:
|
|
- **Primary**: `org:Organization` (W3C Org)
|
|
- **Secondary**: `dbo:Organisation` (DBpedia)
|
|
- **But also**: `crm:E22_Human-Made_Object` (monastery as building)
|
|
|
|
**Note**: Monasteries are BOTH organizations AND buildings - use multi-aspect approach
|
|
|
|
**Mapping Pattern**:
|
|
```yaml
|
|
MONASTERY:
|
|
meaning: wd:Q44613
|
|
exact_mappings:
|
|
- org:Organization # Organizational aspect
|
|
- crm:E22_Human-Made_Object # Building aspect
|
|
close_mappings:
|
|
- dbo:Organisation
|
|
- schema:PlaceOfWorship
|
|
```
|
|
|
|
---
|
|
|
|
## General Mapping Rules
|
|
|
|
### Rule 1: Multiple Mappings (Multi-Aspect Entities)
|
|
|
|
Many heritage features have MULTIPLE ontological aspects:
|
|
|
|
```yaml
|
|
CASTLE:
|
|
exact_mappings:
|
|
- crm:E22_Human-Made_Object # Physical building
|
|
- crm:E27_Site # Historic site
|
|
- dbo:Building # DBpedia building class
|
|
close_mappings:
|
|
- schema:LandmarksOrHistoricalBuildings
|
|
```
|
|
|
|
**Rationale**: A castle is simultaneously:
|
|
- A physical building (E22)
|
|
- A historic site (E27)
|
|
- A landmark (Schema.org)
|
|
|
|
### Rule 2: Hierarchy (Exact → Close → Related)
|
|
|
|
```yaml
|
|
exact_mappings:
|
|
# Direct equivalence (this IS that class)
|
|
- crm:E27_Site
|
|
|
|
close_mappings:
|
|
# Close semantic match (this is SIMILAR to that class)
|
|
- dbo:HistoricPlace
|
|
- schema:LandmarksOrHistoricalBuildings
|
|
|
|
related_mappings:
|
|
# Related but not equivalent (this RELATES to that class)
|
|
- geo:Feature
|
|
- dcterms:Location
|
|
```
|
|
|
|
### Rule 3: Prefer Heritage-Specific Ontologies
|
|
|
|
**Priority order**:
|
|
1. **CIDOC-CRM** (cultural heritage domain standard)
|
|
2. **DBpedia** (linked data with Wikidata mappings)
|
|
3. **Schema.org** (web semantics, broad coverage)
|
|
4. **Domain-specific** (GeoSPARQL for geographic, Org for organizations)
|
|
|
|
### Rule 4: Use DBpedia Wikidata Mappings When Available
|
|
|
|
**Check first**: `dbpedia_wikidata_mappings.ttl`
|
|
|
|
```bash
|
|
# Example: Look up DBpedia class for Wikidata Q33506 (museum)
|
|
grep "wikidata:Q33506" /Users/kempersc/apps/glam/data/ontology/dbpedia_wikidata_mappings.ttl
|
|
# Returns: dbo:Museum owl:equivalentClass wikidata:Q33506
|
|
```
|
|
|
|
**If found**: Use `dbo:Class` as exact mapping
|
|
**If not found**: Use semantic approximation + document in `mapping_note`
|
|
|
|
---
|
|
|
|
## Implementation Workflow
|
|
|
|
### Step 1: Automated Mapping (High Confidence)
|
|
|
|
Use `dbpedia_wikidata_mappings.ttl` to automatically map entries with direct DBpedia equivalents:
|
|
|
|
```python
|
|
# Load mappings
|
|
dbpedia_wd_mappings = parse_ttl('dbpedia_wikidata_mappings.ttl')
|
|
|
|
# For each feature type
|
|
for feature in feature_types:
|
|
q_number = feature['meaning'] # e.g., wd:Q33506
|
|
|
|
# Check for DBpedia mapping
|
|
if q_number in dbpedia_wd_mappings:
|
|
dbo_class = dbpedia_wd_mappings[q_number]
|
|
feature['exact_mappings'].append(dbo_class)
|
|
feature['mapping_confidence'] = 'high'
|
|
```
|
|
|
|
**Coverage estimate**: ~60-70% of entries (based on DBpedia's GLAM coverage)
|
|
|
|
---
|
|
|
|
### Step 2: Semantic Rule-Based Mapping (Medium Confidence)
|
|
|
|
Use hypernym categories to apply ontology mapping rules:
|
|
|
|
```python
|
|
# Mapping rules by hypernym
|
|
hypernym_rules = {
|
|
'building': ['crm:E22_Human-Made_Object', 'dbo:Building'],
|
|
'heritage site': ['crm:E27_Site', 'dbo:HistoricPlace'],
|
|
'museum': ['schema:Museum', 'dbo:Museum'],
|
|
'park': ['crm:E27_Site', 'schema:Park'],
|
|
'structure': ['crm:E25_Human-Made_Feature'],
|
|
'infrastructure': ['crm:E25_Human-Made_Feature'],
|
|
# ... etc.
|
|
}
|
|
|
|
# Apply rules
|
|
for feature in feature_types:
|
|
for hypernym in feature['hypernyms']:
|
|
if hypernym in hypernym_rules:
|
|
feature['exact_mappings'].extend(hypernym_rules[hypernym])
|
|
feature['mapping_confidence'] = 'medium'
|
|
```
|
|
|
|
**Coverage estimate**: ~25-30% additional entries
|
|
|
|
---
|
|
|
|
### Step 3: Manual Review (Low Confidence)
|
|
|
|
Remaining entries (~5-10%) require manual ontology consultation:
|
|
- Read Wikidata descriptions
|
|
- Search ontology files for semantic matches
|
|
- Document mapping rationale
|
|
|
|
```yaml
|
|
ESOTERIC_FEATURE_TYPE:
|
|
meaning: wd:Q???
|
|
exact_mappings:
|
|
- crm:E27_Site # Default fallback
|
|
mapping_note: "No specific ontology class found. Using general site class."
|
|
mapping_confidence: low
|
|
```
|
|
|
|
---
|
|
|
|
## Default Fallback Mappings
|
|
|
|
When no specific mapping found, use these defaults:
|
|
|
|
```yaml
|
|
# Physical features (default)
|
|
exact_mappings:
|
|
- crm:E27_Site # CIDOC-CRM site (broadest physical feature)
|
|
|
|
close_mappings:
|
|
- schema:Place # Schema.org generic place
|
|
|
|
related_mappings:
|
|
- geo:Feature # GeoSPARQL spatial feature
|
|
```
|
|
|
|
**Rationale**: Every feature type is AT LEAST:
|
|
- A site (E27)
|
|
- A place (Schema.org)
|
|
- A geographic feature (GeoSPARQL)
|
|
|
|
---
|
|
|
|
## Quality Assurance
|
|
|
|
### Validation Checks
|
|
|
|
1. **Every entry has at least one exact_mapping**: No orphaned entries
|
|
2. **CIDOC-CRM class present**: Cultural heritage standard compliance
|
|
3. **Mapping confidence documented**: Transparency about mapping quality
|
|
4. **Wikidata Q-number valid**: All `wd:Q*` references resolve
|
|
|
|
### Confidence Levels
|
|
|
|
```yaml
|
|
mapping_confidence:
|
|
high: # DBpedia direct equivalence or clear 1:1 match
|
|
medium: # Semantic rule-based mapping
|
|
low: # Manual approximation or fallback to general class
|
|
```
|
|
|
|
### Mapping Notes
|
|
|
|
Document rationale for non-obvious mappings:
|
|
|
|
```yaml
|
|
SCIENTIFIC_FACILITY:
|
|
meaning: wd:Q119459808
|
|
exact_mappings:
|
|
- org:Organization # Organizational aspect
|
|
- crm:E27_Site # Physical site aspect
|
|
mapping_note: >-
|
|
DBpedia lacks specific 'scientific facility' class.
|
|
Mapped to Organization (function) + Site (physical).
|
|
mapping_confidence: medium
|
|
```
|
|
|
|
---
|
|
|
|
## Expected Output Format
|
|
|
|
```yaml
|
|
enums:
|
|
FeatureTypeEnum:
|
|
permissible_values:
|
|
MANSION:
|
|
title: mansion
|
|
description: very large and imposing dwelling house
|
|
meaning: wd:Q1802963
|
|
|
|
# NEW: Ontology mappings
|
|
exact_mappings:
|
|
- crm:E22_Human-Made_Object
|
|
- dbo:Building
|
|
|
|
close_mappings:
|
|
- schema:LandmarksOrHistoricalBuildings
|
|
- dbo:HistoricBuilding
|
|
|
|
related_mappings:
|
|
- geo:Feature
|
|
|
|
# NEW: Mapping metadata
|
|
annotations:
|
|
wikidata_id: Q1802963
|
|
wikidata_url: https://www.wikidata.org/wiki/Q1802963
|
|
hypernyms: building
|
|
dbpedia_class: dbo:Building
|
|
cidoc_crm_class: crm:E22_Human-Made_Object
|
|
schema_org_class: schema:LandmarksOrHistoricalBuildings
|
|
mapping_confidence: high
|
|
mapping_date: 2025-11-22
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Plan
|
|
|
|
### Phase 1: Automated Mapping (2 hours)
|
|
1. Parse `dbpedia_wikidata_mappings.ttl`
|
|
2. Create hypernym → ontology class rules
|
|
3. Apply automated mapping to all 298 entries
|
|
4. Generate updated `FeatureTypeEnum.yaml`
|
|
|
|
### Phase 2: Manual Review (3 hours)
|
|
1. Review entries with `mapping_confidence: low`
|
|
2. Search ontology files for better matches
|
|
3. Document mapping rationale
|
|
4. Update entries with improved mappings
|
|
|
|
### Phase 3: Validation (1 hour)
|
|
1. Check all entries have exact_mappings
|
|
2. Verify CIDOC-CRM coverage
|
|
3. Validate Wikidata Q-numbers
|
|
4. Generate mapping quality report
|
|
|
|
### Phase 4: Documentation (1 hour)
|
|
1. Update AGENTS.md with mapping workflow
|
|
2. Create ontology mapping reference guide
|
|
3. Generate mapping statistics report
|
|
4. Update FeaturePlace.yaml with ontology references
|
|
|
|
**Total estimated time**: 7 hours
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- **CIDOC-CRM Specification**: http://www.cidoc-crm.org/html/cidoc_crm_v7.1.3.html
|
|
- **Schema.org**: https://schema.org/
|
|
- **DBpedia Ontology**: https://dbpedia.org/ontology/
|
|
- **DBpedia Wikidata Mappings**: `/data/ontology/dbpedia_wikidata_mappings.ttl`
|
|
- **DBpedia Heritage Classes**: `/data/ontology/dbpedia_heritage_classes.ttl`
|
|
- **GeoSPARQL**: https://www.ogc.org/standards/geosparql
|
|
|
|
---
|
|
|
|
**Next Step**: Implement Phase 1 automated mapping script
|