glam/FEATUREPLACE_ONTOLOGY_MAPPING_STRATEGY.md
kempersc 6eb18700f0 Add SHACL validation shapes and validation script for Heritage Custodian Ontology
- Created SHACL shapes for validating temporal consistency and bidirectional relationships in custodial collections and staff observations.
- Implemented a Python script to validate RDF data against the defined SHACL shapes using the pyshacl library.
- Added command-line interface for validation with options for specifying data formats and output reports.
- Included detailed error handling and reporting for validation results.
2025-11-22 23:22:10 +01:00

12 KiB

FeaturePlace Ontology Mapping Strategy

Date: 2025-11-22
Task: Map 298 Wikidata feature types to ontology classes from /data/ontology/


Ontology Sources Available

Primary Ontologies

  1. CIDOC-CRM (CIDOC_CRM_v7.1.3.rdf)

    • Cultural heritage domain standard
    • Key classes: E27_Site, E22_Human-Made_Object, E25_Human-Made_Feature, E26_Physical_Feature
  2. Schema.org (schemaorg.owl)

    • Web semantics, general-purpose
    • Key classes: schema:Place, schema:LandmarksOrHistoricalBuildings, schema:Museum, schema:Church, schema:PlaceOfWorship
  3. DBpedia Ontology (dbpedia_heritage_classes.ttl, dbpedia_ontology.owl)

    • Linked data from Wikipedia
    • Key classes: dbo:Building, dbo:HistoricBuilding, dbo:Museum, dbo:Library, dbo:Archive
    • Mappings: 804-line dbpedia_wikidata_mappings.ttl provides dbo:Class ↔ wd:Q* equivalences
  4. W3C Org Ontology (org.rdf)

    • Organizational structures
    • Key classes: org:Organization, org:FormalOrganization
  5. GeoSPARQL (geo.ttl)

    • Spatial features
    • Key classes: geo:Feature, geo:Geometry

Supporting Ontologies

  • PROV-O (prov.ttl, prov-o.rdf) - Provenance
  • Dublin Core (dublin_core_elements.rdf) - Metadata
  • SKOS (skos.rdf) - Knowledge organization
  • FOAF (foaf.ttl) - Social networks
  • VCARD (vcard.rdf) - Contact information

Mapping Strategy by Hypernym Category

1. Buildings (33 entries, 11.1%)

Wikidata Examples: Q1802963 (mansion), Q317557 (parish church), Q1021645 (office building)

Ontology Mappings:

  • Primary: crm:E22_Human-Made_Object (CIDOC-CRM)
  • Secondary: dbo:Building (DBpedia)
  • Web: schema:LandmarksOrHistoricalBuildings (Schema.org for heritage buildings)
  • Specific types:
    • Churches → schema:Church, schema:PlaceOfWorship
    • Museums → schema:Museum, dbo:Museum
    • Historic buildings → dbo:HistoricBuilding

Mapping Pattern:

MANSION:
  meaning: wd:Q1802963
  exact_mappings:
    - crm:E22_Human-Made_Object
    - dbo:Building
  close_mappings:
    - schema:LandmarksOrHistoricalBuildings
    - dbo:HistoricBuilding

2. Heritage Sites (144 entries, 48.3%)

Wikidata Examples: Q3694 (vacation property), Q2927789 (buitenplaats)

Ontology Mappings:

  • Primary: crm:E27_Site (CIDOC-CRM physical site)
  • Secondary: dbo:HistoricPlace (DBpedia)
  • Web: schema:LandmarksOrHistoricalBuildings, schema:TouristAttraction

Mapping Pattern:

HERITAGE_SITE:
  meaning: wd:Q???
  exact_mappings:
    - crm:E27_Site
  close_mappings:
    - dbo:HistoricPlace
    - schema:LandmarksOrHistoricalBuildings

3. Protected Areas (23 entries, 7.7%)

Wikidata Examples: National parks, nature reserves, conservation areas

Ontology Mappings:

  • Primary: crm:E27_Site (CIDOC-CRM)
  • Web: schema:Park, schema:Place
  • Geo: geo:Feature (GeoSPARQL)

Mapping Pattern:

PROTECTED_AREA:
  meaning: wd:Q???
  exact_mappings:
    - crm:E27_Site
    - geo:Feature
  close_mappings:
    - schema:Park

4. Structures (12 entries, 4.0%)

Wikidata Examples: Q336164 (sewerage pumping station), Q15710813 (physical structure)

Ontology Mappings:

  • Primary: crm:E25_Human-Made_Feature (CIDOC-CRM)
  • Secondary: crm:E26_Physical_Feature (broader)
  • Web: schema:Place

Mapping Pattern:

STRUCTURE:
  meaning: wd:Q???
  exact_mappings:
    - crm:E25_Human-Made_Feature
  close_mappings:
    - crm:E26_Physical_Feature

5. Museums (8 entries, 2.7%)

Wikidata Examples: Military museums, art museums, historical museums

Ontology Mappings:

  • Primary: schema:Museum (Schema.org)
  • Secondary: dbo:Museum (DBpedia)
  • Heritage: crm:E22_Human-Made_Object (building as object)

Mapping Pattern:

MUSEUM:
  meaning: wd:Q33506
  exact_mappings:
    - schema:Museum
    - dbo:Museum
  close_mappings:
    - crm:E22_Human-Made_Object

6. Infrastructure (6 entries, 2.0%)

Wikidata Examples: Q376799 (transport infrastructure), Q1311670 (rail infrastructure)

Ontology Mappings:

  • Primary: crm:E25_Human-Made_Feature (CIDOC-CRM)
  • Web: schema:Place
  • Note: Infrastructure is underrepresented in cultural heritage ontologies

Mapping Pattern:

INFRASTRUCTURE:
  meaning: wd:Q???
  exact_mappings:
    - crm:E25_Human-Made_Feature
  close_mappings:
    - schema:Place
  related_mappings:
    - crm:E26_Physical_Feature

7. Organizations (monasteries, etc.)

Wikidata Examples: Q44613 (monastery)

Ontology Mappings:

  • Primary: org:Organization (W3C Org)
  • Secondary: dbo:Organisation (DBpedia)
  • But also: crm:E22_Human-Made_Object (monastery as building)

Note: Monasteries are BOTH organizations AND buildings - use multi-aspect approach

Mapping Pattern:

MONASTERY:
  meaning: wd:Q44613
  exact_mappings:
    - org:Organization  # Organizational aspect
    - crm:E22_Human-Made_Object  # Building aspect
  close_mappings:
    - dbo:Organisation
    - schema:PlaceOfWorship

General Mapping Rules

Rule 1: Multiple Mappings (Multi-Aspect Entities)

Many heritage features have MULTIPLE ontological aspects:

CASTLE:
  exact_mappings:
    - crm:E22_Human-Made_Object  # Physical building
    - crm:E27_Site  # Historic site
    - dbo:Building  # DBpedia building class
  close_mappings:
    - schema:LandmarksOrHistoricalBuildings

Rationale: A castle is simultaneously:

  • A physical building (E22)
  • A historic site (E27)
  • A landmark (Schema.org)
exact_mappings:
  # Direct equivalence (this IS that class)
  - crm:E27_Site

close_mappings:
  # Close semantic match (this is SIMILAR to that class)
  - dbo:HistoricPlace
  - schema:LandmarksOrHistoricalBuildings

related_mappings:
  # Related but not equivalent (this RELATES to that class)
  - geo:Feature
  - dcterms:Location

Rule 3: Prefer Heritage-Specific Ontologies

Priority order:

  1. CIDOC-CRM (cultural heritage domain standard)
  2. DBpedia (linked data with Wikidata mappings)
  3. Schema.org (web semantics, broad coverage)
  4. Domain-specific (GeoSPARQL for geographic, Org for organizations)

Rule 4: Use DBpedia Wikidata Mappings When Available

Check first: dbpedia_wikidata_mappings.ttl

# Example: Look up DBpedia class for Wikidata Q33506 (museum)
grep "wikidata:Q33506" /Users/kempersc/apps/glam/data/ontology/dbpedia_wikidata_mappings.ttl
# Returns: dbo:Museum owl:equivalentClass wikidata:Q33506

If found: Use dbo:Class as exact mapping
If not found: Use semantic approximation + document in mapping_note


Implementation Workflow

Step 1: Automated Mapping (High Confidence)

Use dbpedia_wikidata_mappings.ttl to automatically map entries with direct DBpedia equivalents:

# Load mappings
dbpedia_wd_mappings = parse_ttl('dbpedia_wikidata_mappings.ttl')

# For each feature type
for feature in feature_types:
    q_number = feature['meaning']  # e.g., wd:Q33506
    
    # Check for DBpedia mapping
    if q_number in dbpedia_wd_mappings:
        dbo_class = dbpedia_wd_mappings[q_number]
        feature['exact_mappings'].append(dbo_class)
        feature['mapping_confidence'] = 'high'

Coverage estimate: ~60-70% of entries (based on DBpedia's GLAM coverage)


Step 2: Semantic Rule-Based Mapping (Medium Confidence)

Use hypernym categories to apply ontology mapping rules:

# Mapping rules by hypernym
hypernym_rules = {
    'building': ['crm:E22_Human-Made_Object', 'dbo:Building'],
    'heritage site': ['crm:E27_Site', 'dbo:HistoricPlace'],
    'museum': ['schema:Museum', 'dbo:Museum'],
    'park': ['crm:E27_Site', 'schema:Park'],
    'structure': ['crm:E25_Human-Made_Feature'],
    'infrastructure': ['crm:E25_Human-Made_Feature'],
    # ... etc.
}

# Apply rules
for feature in feature_types:
    for hypernym in feature['hypernyms']:
        if hypernym in hypernym_rules:
            feature['exact_mappings'].extend(hypernym_rules[hypernym])
            feature['mapping_confidence'] = 'medium'

Coverage estimate: ~25-30% additional entries


Step 3: Manual Review (Low Confidence)

Remaining entries (~5-10%) require manual ontology consultation:

  • Read Wikidata descriptions
  • Search ontology files for semantic matches
  • Document mapping rationale
ESOTERIC_FEATURE_TYPE:
  meaning: wd:Q???
  exact_mappings:
    - crm:E27_Site  # Default fallback
  mapping_note: "No specific ontology class found. Using general site class."
  mapping_confidence: low

Default Fallback Mappings

When no specific mapping found, use these defaults:

# Physical features (default)
exact_mappings:
  - crm:E27_Site  # CIDOC-CRM site (broadest physical feature)

close_mappings:
  - schema:Place  # Schema.org generic place

related_mappings:
  - geo:Feature  # GeoSPARQL spatial feature

Rationale: Every feature type is AT LEAST:

  • A site (E27)
  • A place (Schema.org)
  • A geographic feature (GeoSPARQL)

Quality Assurance

Validation Checks

  1. Every entry has at least one exact_mapping: No orphaned entries
  2. CIDOC-CRM class present: Cultural heritage standard compliance
  3. Mapping confidence documented: Transparency about mapping quality
  4. Wikidata Q-number valid: All wd:Q* references resolve

Confidence Levels

mapping_confidence:
  high:   # DBpedia direct equivalence or clear 1:1 match
  medium: # Semantic rule-based mapping
  low:    # Manual approximation or fallback to general class

Mapping Notes

Document rationale for non-obvious mappings:

SCIENTIFIC_FACILITY:
  meaning: wd:Q119459808
  exact_mappings:
    - org:Organization  # Organizational aspect
    - crm:E27_Site  # Physical site aspect
  mapping_note: >-
    DBpedia lacks specific 'scientific facility' class.
    Mapped to Organization (function) + Site (physical).    
  mapping_confidence: medium

Expected Output Format

enums:
  FeatureTypeEnum:
    permissible_values:
      MANSION:
        title: mansion
        description: very large and imposing dwelling house
        meaning: wd:Q1802963
        
        # NEW: Ontology mappings
        exact_mappings:
          - crm:E22_Human-Made_Object
          - dbo:Building
        
        close_mappings:
          - schema:LandmarksOrHistoricalBuildings
          - dbo:HistoricBuilding
        
        related_mappings:
          - geo:Feature
        
        # NEW: Mapping metadata
        annotations:
          wikidata_id: Q1802963
          wikidata_url: https://www.wikidata.org/wiki/Q1802963
          hypernyms: building
          dbpedia_class: dbo:Building
          cidoc_crm_class: crm:E22_Human-Made_Object
          schema_org_class: schema:LandmarksOrHistoricalBuildings
          mapping_confidence: high
          mapping_date: 2025-11-22

Implementation Plan

Phase 1: Automated Mapping (2 hours)

  1. Parse dbpedia_wikidata_mappings.ttl
  2. Create hypernym → ontology class rules
  3. Apply automated mapping to all 298 entries
  4. Generate updated FeatureTypeEnum.yaml

Phase 2: Manual Review (3 hours)

  1. Review entries with mapping_confidence: low
  2. Search ontology files for better matches
  3. Document mapping rationale
  4. Update entries with improved mappings

Phase 3: Validation (1 hour)

  1. Check all entries have exact_mappings
  2. Verify CIDOC-CRM coverage
  3. Validate Wikidata Q-numbers
  4. Generate mapping quality report

Phase 4: Documentation (1 hour)

  1. Update AGENTS.md with mapping workflow
  2. Create ontology mapping reference guide
  3. Generate mapping statistics report
  4. Update FeaturePlace.yaml with ontology references

Total estimated time: 7 hours


References


Next Step: Implement Phase 1 automated mapping script