- Created SHACL shapes for validating temporal consistency and bidirectional relationships in custodial collections and staff observations. - Implemented a Python script to validate RDF data against the defined SHACL shapes using the pyshacl library. - Added command-line interface for validation with options for specifying data formats and output reports. - Included detailed error handling and reporting for validation results.
12 KiB
FeaturePlace Ontology Mapping Strategy
Date: 2025-11-22
Task: Map 298 Wikidata feature types to ontology classes from /data/ontology/
Ontology Sources Available
Primary Ontologies
-
CIDOC-CRM (
CIDOC_CRM_v7.1.3.rdf)- Cultural heritage domain standard
- Key classes:
E27_Site,E22_Human-Made_Object,E25_Human-Made_Feature,E26_Physical_Feature
-
Schema.org (
schemaorg.owl)- Web semantics, general-purpose
- Key classes:
schema:Place,schema:LandmarksOrHistoricalBuildings,schema:Museum,schema:Church,schema:PlaceOfWorship
-
DBpedia Ontology (
dbpedia_heritage_classes.ttl,dbpedia_ontology.owl)- Linked data from Wikipedia
- Key classes:
dbo:Building,dbo:HistoricBuilding,dbo:Museum,dbo:Library,dbo:Archive - Mappings: 804-line
dbpedia_wikidata_mappings.ttlprovidesdbo:Class ↔ wd:Q*equivalences
-
W3C Org Ontology (
org.rdf)- Organizational structures
- Key classes:
org:Organization,org:FormalOrganization
-
GeoSPARQL (
geo.ttl)- Spatial features
- Key classes:
geo:Feature,geo:Geometry
Supporting Ontologies
- PROV-O (
prov.ttl,prov-o.rdf) - Provenance - Dublin Core (
dublin_core_elements.rdf) - Metadata - SKOS (
skos.rdf) - Knowledge organization - FOAF (
foaf.ttl) - Social networks - VCARD (
vcard.rdf) - Contact information
Mapping Strategy by Hypernym Category
1. Buildings (33 entries, 11.1%)
Wikidata Examples: Q1802963 (mansion), Q317557 (parish church), Q1021645 (office building)
Ontology Mappings:
- Primary:
crm:E22_Human-Made_Object(CIDOC-CRM) - Secondary:
dbo:Building(DBpedia) - Web:
schema:LandmarksOrHistoricalBuildings(Schema.org for heritage buildings) - Specific types:
- Churches →
schema:Church,schema:PlaceOfWorship - Museums →
schema:Museum,dbo:Museum - Historic buildings →
dbo:HistoricBuilding
- Churches →
Mapping Pattern:
MANSION:
meaning: wd:Q1802963
exact_mappings:
- crm:E22_Human-Made_Object
- dbo:Building
close_mappings:
- schema:LandmarksOrHistoricalBuildings
- dbo:HistoricBuilding
2. Heritage Sites (144 entries, 48.3%)
Wikidata Examples: Q3694 (vacation property), Q2927789 (buitenplaats)
Ontology Mappings:
- Primary:
crm:E27_Site(CIDOC-CRM physical site) - Secondary:
dbo:HistoricPlace(DBpedia) - Web:
schema:LandmarksOrHistoricalBuildings,schema:TouristAttraction
Mapping Pattern:
HERITAGE_SITE:
meaning: wd:Q???
exact_mappings:
- crm:E27_Site
close_mappings:
- dbo:HistoricPlace
- schema:LandmarksOrHistoricalBuildings
3. Protected Areas (23 entries, 7.7%)
Wikidata Examples: National parks, nature reserves, conservation areas
Ontology Mappings:
- Primary:
crm:E27_Site(CIDOC-CRM) - Web:
schema:Park,schema:Place - Geo:
geo:Feature(GeoSPARQL)
Mapping Pattern:
PROTECTED_AREA:
meaning: wd:Q???
exact_mappings:
- crm:E27_Site
- geo:Feature
close_mappings:
- schema:Park
4. Structures (12 entries, 4.0%)
Wikidata Examples: Q336164 (sewerage pumping station), Q15710813 (physical structure)
Ontology Mappings:
- Primary:
crm:E25_Human-Made_Feature(CIDOC-CRM) - Secondary:
crm:E26_Physical_Feature(broader) - Web:
schema:Place
Mapping Pattern:
STRUCTURE:
meaning: wd:Q???
exact_mappings:
- crm:E25_Human-Made_Feature
close_mappings:
- crm:E26_Physical_Feature
5. Museums (8 entries, 2.7%)
Wikidata Examples: Military museums, art museums, historical museums
Ontology Mappings:
- Primary:
schema:Museum(Schema.org) - Secondary:
dbo:Museum(DBpedia) - Heritage:
crm:E22_Human-Made_Object(building as object)
Mapping Pattern:
MUSEUM:
meaning: wd:Q33506
exact_mappings:
- schema:Museum
- dbo:Museum
close_mappings:
- crm:E22_Human-Made_Object
6. Infrastructure (6 entries, 2.0%)
Wikidata Examples: Q376799 (transport infrastructure), Q1311670 (rail infrastructure)
Ontology Mappings:
- Primary:
crm:E25_Human-Made_Feature(CIDOC-CRM) - Web:
schema:Place - Note: Infrastructure is underrepresented in cultural heritage ontologies
Mapping Pattern:
INFRASTRUCTURE:
meaning: wd:Q???
exact_mappings:
- crm:E25_Human-Made_Feature
close_mappings:
- schema:Place
related_mappings:
- crm:E26_Physical_Feature
7. Organizations (monasteries, etc.)
Wikidata Examples: Q44613 (monastery)
Ontology Mappings:
- Primary:
org:Organization(W3C Org) - Secondary:
dbo:Organisation(DBpedia) - But also:
crm:E22_Human-Made_Object(monastery as building)
Note: Monasteries are BOTH organizations AND buildings - use multi-aspect approach
Mapping Pattern:
MONASTERY:
meaning: wd:Q44613
exact_mappings:
- org:Organization # Organizational aspect
- crm:E22_Human-Made_Object # Building aspect
close_mappings:
- dbo:Organisation
- schema:PlaceOfWorship
General Mapping Rules
Rule 1: Multiple Mappings (Multi-Aspect Entities)
Many heritage features have MULTIPLE ontological aspects:
CASTLE:
exact_mappings:
- crm:E22_Human-Made_Object # Physical building
- crm:E27_Site # Historic site
- dbo:Building # DBpedia building class
close_mappings:
- schema:LandmarksOrHistoricalBuildings
Rationale: A castle is simultaneously:
- A physical building (E22)
- A historic site (E27)
- A landmark (Schema.org)
Rule 2: Hierarchy (Exact → Close → Related)
exact_mappings:
# Direct equivalence (this IS that class)
- crm:E27_Site
close_mappings:
# Close semantic match (this is SIMILAR to that class)
- dbo:HistoricPlace
- schema:LandmarksOrHistoricalBuildings
related_mappings:
# Related but not equivalent (this RELATES to that class)
- geo:Feature
- dcterms:Location
Rule 3: Prefer Heritage-Specific Ontologies
Priority order:
- CIDOC-CRM (cultural heritage domain standard)
- DBpedia (linked data with Wikidata mappings)
- Schema.org (web semantics, broad coverage)
- Domain-specific (GeoSPARQL for geographic, Org for organizations)
Rule 4: Use DBpedia Wikidata Mappings When Available
Check first: dbpedia_wikidata_mappings.ttl
# Example: Look up DBpedia class for Wikidata Q33506 (museum)
grep "wikidata:Q33506" /Users/kempersc/apps/glam/data/ontology/dbpedia_wikidata_mappings.ttl
# Returns: dbo:Museum owl:equivalentClass wikidata:Q33506
If found: Use dbo:Class as exact mapping
If not found: Use semantic approximation + document in mapping_note
Implementation Workflow
Step 1: Automated Mapping (High Confidence)
Use dbpedia_wikidata_mappings.ttl to automatically map entries with direct DBpedia equivalents:
# Load mappings
dbpedia_wd_mappings = parse_ttl('dbpedia_wikidata_mappings.ttl')
# For each feature type
for feature in feature_types:
q_number = feature['meaning'] # e.g., wd:Q33506
# Check for DBpedia mapping
if q_number in dbpedia_wd_mappings:
dbo_class = dbpedia_wd_mappings[q_number]
feature['exact_mappings'].append(dbo_class)
feature['mapping_confidence'] = 'high'
Coverage estimate: ~60-70% of entries (based on DBpedia's GLAM coverage)
Step 2: Semantic Rule-Based Mapping (Medium Confidence)
Use hypernym categories to apply ontology mapping rules:
# Mapping rules by hypernym
hypernym_rules = {
'building': ['crm:E22_Human-Made_Object', 'dbo:Building'],
'heritage site': ['crm:E27_Site', 'dbo:HistoricPlace'],
'museum': ['schema:Museum', 'dbo:Museum'],
'park': ['crm:E27_Site', 'schema:Park'],
'structure': ['crm:E25_Human-Made_Feature'],
'infrastructure': ['crm:E25_Human-Made_Feature'],
# ... etc.
}
# Apply rules
for feature in feature_types:
for hypernym in feature['hypernyms']:
if hypernym in hypernym_rules:
feature['exact_mappings'].extend(hypernym_rules[hypernym])
feature['mapping_confidence'] = 'medium'
Coverage estimate: ~25-30% additional entries
Step 3: Manual Review (Low Confidence)
Remaining entries (~5-10%) require manual ontology consultation:
- Read Wikidata descriptions
- Search ontology files for semantic matches
- Document mapping rationale
ESOTERIC_FEATURE_TYPE:
meaning: wd:Q???
exact_mappings:
- crm:E27_Site # Default fallback
mapping_note: "No specific ontology class found. Using general site class."
mapping_confidence: low
Default Fallback Mappings
When no specific mapping found, use these defaults:
# Physical features (default)
exact_mappings:
- crm:E27_Site # CIDOC-CRM site (broadest physical feature)
close_mappings:
- schema:Place # Schema.org generic place
related_mappings:
- geo:Feature # GeoSPARQL spatial feature
Rationale: Every feature type is AT LEAST:
- A site (E27)
- A place (Schema.org)
- A geographic feature (GeoSPARQL)
Quality Assurance
Validation Checks
- Every entry has at least one exact_mapping: No orphaned entries
- CIDOC-CRM class present: Cultural heritage standard compliance
- Mapping confidence documented: Transparency about mapping quality
- Wikidata Q-number valid: All
wd:Q*references resolve
Confidence Levels
mapping_confidence:
high: # DBpedia direct equivalence or clear 1:1 match
medium: # Semantic rule-based mapping
low: # Manual approximation or fallback to general class
Mapping Notes
Document rationale for non-obvious mappings:
SCIENTIFIC_FACILITY:
meaning: wd:Q119459808
exact_mappings:
- org:Organization # Organizational aspect
- crm:E27_Site # Physical site aspect
mapping_note: >-
DBpedia lacks specific 'scientific facility' class.
Mapped to Organization (function) + Site (physical).
mapping_confidence: medium
Expected Output Format
enums:
FeatureTypeEnum:
permissible_values:
MANSION:
title: mansion
description: very large and imposing dwelling house
meaning: wd:Q1802963
# NEW: Ontology mappings
exact_mappings:
- crm:E22_Human-Made_Object
- dbo:Building
close_mappings:
- schema:LandmarksOrHistoricalBuildings
- dbo:HistoricBuilding
related_mappings:
- geo:Feature
# NEW: Mapping metadata
annotations:
wikidata_id: Q1802963
wikidata_url: https://www.wikidata.org/wiki/Q1802963
hypernyms: building
dbpedia_class: dbo:Building
cidoc_crm_class: crm:E22_Human-Made_Object
schema_org_class: schema:LandmarksOrHistoricalBuildings
mapping_confidence: high
mapping_date: 2025-11-22
Implementation Plan
Phase 1: Automated Mapping (2 hours)
- Parse
dbpedia_wikidata_mappings.ttl - Create hypernym → ontology class rules
- Apply automated mapping to all 298 entries
- Generate updated
FeatureTypeEnum.yaml
Phase 2: Manual Review (3 hours)
- Review entries with
mapping_confidence: low - Search ontology files for better matches
- Document mapping rationale
- Update entries with improved mappings
Phase 3: Validation (1 hour)
- Check all entries have exact_mappings
- Verify CIDOC-CRM coverage
- Validate Wikidata Q-numbers
- Generate mapping quality report
Phase 4: Documentation (1 hour)
- Update AGENTS.md with mapping workflow
- Create ontology mapping reference guide
- Generate mapping statistics report
- Update FeaturePlace.yaml with ontology references
Total estimated time: 7 hours
References
- CIDOC-CRM Specification: http://www.cidoc-crm.org/html/cidoc_crm_v7.1.3.html
- Schema.org: https://schema.org/
- DBpedia Ontology: https://dbpedia.org/ontology/
- DBpedia Wikidata Mappings:
/data/ontology/dbpedia_wikidata_mappings.ttl - DBpedia Heritage Classes:
/data/ontology/dbpedia_heritage_classes.ttl - GeoSPARQL: https://www.ogc.org/standards/geosparql
Next Step: Implement Phase 1 automated mapping script