glam/docs/DBPEDIA_ONTOLOGY_INTEGRATION.md
2025-11-21 22:12:33 +01:00

523 lines
14 KiB
Markdown

# DBpedia Ontology Integration for Heritage Custodian Project
**Date**: 2025-11-20
**Purpose**: Document DBpedia Ontology (DBO) conventions for mapping Wikidata entities to specialized heritage ontologies
---
## Executive Summary
**DBpedia Ontology (DBO)** provides a critical bridge between Wikidata entities and formal ontology classes. This document establishes conventions for integrating DBO mappings into the heritage custodian ontology enrichment workflow.
**Key Finding**: DBpedia already maps many Wikidata GLAM entities to ontology classes via `owl:equivalentClass` assertions. We should leverage these existing mappings instead of creating them from scratch.
---
## DBpedia Ontology Overview
**Namespace**: `http://dbpedia.org/ontology/`
**Prefix**: `dbo:`
**Coverage**: 768 classes, 3000 properties, ~4.2M instances
**Scope**: Cross-domain ontology (shallow but broad coverage)
### Key Resources
- **Ontology Browser**: http://dbpedia.org/ontology/
- **SPARQL Endpoint**: http://dbpedia.org/sparql
- **Mappings Wiki**: http://mappings.dbpedia.org
- **Archivo (Ontology Archive)**: https://archivo.dbpedia.org/info?o=http://dbpedia.org/ontology/
- **Development Version**: https://databus.dbpedia.org/ontologies/dbpedia.org/ontology--DEV
---
## Why DBpedia Matters for This Project
### 1. **Pre-existing Wikidata Mappings**
DBpedia already maps many heritage institution Wikidata entities to ontology classes:
```turtle
dbo:Museum owl:equivalentClass wd:Q33506 .
dbo:Library owl:equivalentClass wd:Q7075 .
dbo:Archive owl:equivalentClass wd:Q166118 .
```
**Benefit**: We can use DBpedia as an intermediary to discover ontology mappings for Wikidata entities.
### 2. **Schema.org Alignment**
DBpedia classes map to Schema.org (which we already use):
```turtle
dbo:Library owl:equivalentClass schema:Library .
```
**Benefit**: DBpedia validates our existing Schema.org mappings.
### 3. **Domain-Specific Properties**
DBpedia defines heritage-specific properties:
- `dbo:collection` - Museum collections
- `dbo:curator` - Museum curator
- `dbo:museumType` - Museum specialization
- `dbo:isil` - ISIL code (for libraries)
- `dbo:numberOfCollectionItems` - Collection size
**Benefit**: We can reference DBpedia properties in our mappings instead of inventing custom ones.
---
## DBpedia Heritage Classes
### Museums
**Class**: `dbo:Museum`
**Wikidata**: `wd:Q33506`
**Subclass of**: `dbo:Building`
**Properties**:
- `dbo:collection` (museum collections)
- `dbo:curator` (curator name)
- `dbo:museumType` (specialization: art, history, science, etc.)
**Example RDF**:
```turtle
<http://dbpedia.org/resource/Rijksmuseum>
rdf:type dbo:Museum ;
dbo:collection "Dutch Golden Age paintings" ;
dbo:curator "Taco Dibbits" ;
dbo:museumType "Art museum" .
```
### Libraries
**Class**: `dbo:Library`
**Wikidata**: `wd:Q7075`
**Schema.org**: `schema:Library`
**Subclass of**: `dbo:EducationalInstitution`
**Properties**:
- `dbo:isil` (ISIL code)
- `dbo:numberOfCollectionItems` (collection size)
**Example RDF**:
```turtle
<http://dbpedia.org/resource/Library_of_Congress>
rdf:type dbo:Library ;
dbo:isil "US-DLC" ;
dbo:numberOfCollectionItems 17000000 .
```
### Archives
**Class**: `dbo:Archive`
**Wikidata**: `wd:Q166118`
**Subclass of**: `dbo:CollectionOfValuables`
**Properties**: (fewer specialized properties than Museum/Library)
**Example RDF**:
```turtle
<http://dbpedia.org/resource/National_Archives_and_Records_Administration>
rdf:type dbo:Archive ;
rdfs:label "National Archives and Records Administration"@en .
```
---
## Integration Workflow for Ontology Enrichment
### Step 1: Check DBpedia for Wikidata Mapping
When enriching a Wikidata entity (e.g., Q2772772 - military museum):
```sparql
# Query DBpedia SPARQL endpoint
SELECT ?dboClass WHERE {
?dboClass owl:equivalentClass <http://www.wikidata.org/entity/Q2772772> .
}
```
**If match found**: Use DBpedia class as secondary/tertiary ontology reference.
### Step 2: Discover DBpedia Subclass Hierarchy
```sparql
# Find superclasses
SELECT ?superclass WHERE {
dbo:Museum rdfs:subClassOf ?superclass .
}
# Result: dbo:Building
```
**Use this to understand** where DBpedia places the entity in the ontology hierarchy.
### Step 3: Extract DBpedia Properties
```sparql
# Find properties applicable to Museum class
SELECT DISTINCT ?property WHERE {
?property rdfs:domain dbo:Museum .
}
```
**Result**:
- `dbo:collection`
- `dbo:curator`
- `dbo:museumType`
**Action**: Reference these properties in our ontology mapping `properties:` section.
### Step 4: Document DBpedia Mapping in YAML
```yaml
ontology_mapping:
wikidata_source: Q2772772
dbpedia_class: dbo:Museum # ← ADD THIS
dbpedia_equivalent_wikidata: wd:Q33506 # ← ADD THIS
custodian_ontology:
public_sector:
class: cpov:PublicOrganisation
secondary_class: schema:Museum
tertiary_class: dbo:Museum # ← REFERENCE DBpedia
quaternary_class: crm:E39_Actor
properties:
- label: dbo:collection # ← USE DBpedia property
value:
- label: Museum collections
- label: dbo:curator # ← USE DBpedia property
value:
- label: Curator name
- label: dbo:museumType # ← USE DBpedia property
value:
- label: Museum specialization (military, art, history, etc.)
```
---
## DBpedia Advantages Over Wikidata
| Feature | Wikidata | DBpedia |
|---------|----------|---------|
| **Ontology Structure** | Flat entity graph | Hierarchical class ontology |
| **Property Definitions** | No formal domains/ranges | Typed properties with domain/range |
| **OWL Semantics** | Limited OWL support | Full OWL ontology |
| **Reasoning Support** | Manual queries | OWL reasoning possible |
| **Multilingual Labels** | Excellent | Good (40+ languages) |
| **Heritage Coverage** | Comprehensive instances | Structured classes + properties |
**Use Case**: Wikidata provides entity instances; DBpedia provides ontology structure.
---
## Updated Ontology Mapping Template
### New Fields to Add
```yaml
ontology_mapping:
wikidata_source: Q[number]
# NEW: DBpedia integration
dbpedia_mapping:
dbpedia_class: dbo:[ClassName] # If DBpedia has equivalent class
dbpedia_equivalent_wikidata: wd:Q[number] # Wikidata entity DBpedia maps to
dbpedia_properties: # DBpedia-specific properties to use
- dbo:collection
- dbo:curator
- dbo:isil
sparql_query: | # SPARQL query used to discover mapping
SELECT ?dboClass WHERE {
?dboClass owl:equivalentClass <http://www.wikidata.org/entity/Q[number]> .
}
semantic_aspects: [...]
complexity_score: N
custodian_ontology:
public_sector:
class: cpov:PublicOrganisation
secondary_class: schema:Museum
tertiary_class: dbo:Museum # ← REFERENCE DBpedia class
quaternary_class: crm:E39_Actor
properties:
- label: dbo:collection # ← USE DBpedia properties
value:
- label: Collection description
```
---
## DBpedia Properties for Heritage Institutions
### Museum Properties
| Property | Domain | Range | Description |
|----------|--------|-------|-------------|
| `dbo:collection` | `dbo:Museum` | `xsd:string` | Collections held by museum |
| `dbo:curator` | `dbo:Museum` | `dbo:Person` | Museum curator |
| `dbo:museumType` | `dbo:Museum` | `xsd:string` | Museum specialization |
### Library Properties
| Property | Domain | Range | Description |
|----------|--------|-------|-------------|
| `dbo:isil` | `dbo:Library` | `xsd:string` | ISIL code |
| `dbo:numberOfCollectionItems` | `dbo:Library` | `xsd:integer` | Collection size |
### General Organizational Properties
| Property | Domain | Range | Description |
|----------|--------|-------|-------------|
| `dbo:foundingDate` | `dbo:Organisation` | `xsd:date` | Founding date |
| `dbo:dissolutionDate` | `dbo:Organisation` | `xsd:date` | Closure date |
| `dbo:location` | `dbo:Organisation` | `dbo:Place` | Physical location |
| `dbo:affiliation` | `dbo:Organisation` | `dbo:Organisation` | Parent organization |
---
## Comparison: CPOV vs Schema.org vs DBpedia
### Museum Example
```turtle
# CPOV (EU Public Sector)
<http://example.org/rijksmuseum>
rdf:type cpov:PublicOrganisation ;
skos:prefLabel "Rijksmuseum" ;
dct:identifier "NL-AmRMA" . # ISIL
# Schema.org (Web Semantics)
<http://example.org/rijksmuseum>
rdf:type schema:Museum ;
schema:name "Rijksmuseum" ;
schema:identifier "NL-AmRMA" .
# DBpedia (Cross-Domain Ontology)
<http://example.org/rijksmuseum>
rdf:type dbo:Museum ;
rdfs:label "Rijksmuseum" ;
dbo:isil "NL-AmRMA" ;
dbo:collection "Dutch Golden Age paintings" ;
dbo:curator "Taco Dibbits" .
```
**Decision**: Use ALL THREE in multi-typed assertions for maximum interoperability.
---
## SPARQL Queries for DBpedia Integration
### Query 1: Find DBpedia Class for Wikidata Entity
```sparql
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT ?dboClass ?label WHERE {
?dboClass owl:equivalentClass wd:Q2772772 .
?dboClass rdfs:label ?label .
FILTER(LANG(?label) = "en")
}
```
### Query 2: Find All Museum-Related DBpedia Classes
```sparql
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?label WHERE {
?class rdfs:subClassOf* dbo:Museum .
?class rdfs:label ?label .
FILTER(LANG(?label) = "en")
}
```
### Query 3: Find DBpedia Properties for a Class
```sparql
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?property ?label WHERE {
?property rdfs:domain dbo:Museum .
?property rdfs:label ?label .
FILTER(LANG(?label) = "en")
}
```
---
## Implementation Recommendations
### Recommendation 1: Add DBpedia as Fourth Ontology Layer
**Current**: CPOV (primary) + Schema.org (secondary) + CIDOC-CRM (tertiary)
**Proposed**: CPOV + Schema.org + **DBpedia** + CIDOC-CRM
**Rationale**: DBpedia bridges Wikidata entities to formal ontologies.
### Recommendation 2: Use DBpedia Properties in LinkML Schema
**Current**: Custom properties or Schema.org properties
**Proposed**: Reference DBpedia properties when available
**Example** (LinkML schema):
```yaml
HeritageCustodian:
slots:
collection_description:
slot_uri: dbo:collection # ← Map to DBpedia property
curator_name:
slot_uri: dbo:curator
museum_type:
slot_uri: dbo:museumType
isil_code:
slot_uri: dbo:isil
```
### Recommendation 3: Create DBpedia Mapping Cache
**Problem**: Querying DBpedia SPARQL endpoint for every entity is slow.
**Solution**: Pre-cache Wikidata → DBpedia mappings for common heritage classes.
**Script** (`scripts/cache_dbpedia_mappings.py`):
```python
import requests
DBPEDIA_SPARQL = "http://dbpedia.org/sparql"
def get_dbpedia_class(wikidata_id):
"""Query DBpedia for equivalent class of Wikidata entity."""
query = f"""
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT ?dboClass WHERE {{
?dboClass owl:equivalentClass wd:{wikidata_id} .
}}
"""
response = requests.get(DBPEDIA_SPARQL, params={
'query': query,
'format': 'json'
})
results = response.json()['results']['bindings']
if results:
return results[0]['dboClass']['value']
return None
# Cache mappings for GLAM entities
cache = {}
for qid in ['Q33506', 'Q7075', 'Q166118', 'Q2772772', ...]:
cache[qid] = get_dbpedia_class(qid)
# Save to YAML
with open('data/ontology/dbpedia_wikidata_mappings.yaml', 'w') as f:
yaml.dump(cache, f)
```
### Recommendation 4: Enrich Ontology Mapping Workflow
**Updated workflow** (`.opencode/agent/ontology-mapping-rules.md`):
1. Read Wikidata entity metadata
2. **Query DBpedia for equivalent class** ← NEW STEP
3. Search base ontologies (CPOV, TOOI, Schema.org, CIDOC-CRM)
4. **Reference DBpedia properties** ← NEW STEP
5. Map to ontology classes
6. Document rationale
7. Write ontology_mapping YAML
---
## Example: Military Museum with DBpedia Integration
```yaml
- label: Q2772772
hypernym:
- museum
type:
- M
ontology_mapping:
wikidata_source: Q2772772
# DBpedia integration
dbpedia_mapping:
dbpedia_class: dbo:Museum
dbpedia_equivalent_wikidata: wd:Q33506
dbpedia_subclass_of: dbo:Building
dbpedia_properties:
- dbo:collection
- dbo:curator
- dbo:museumType
sparql_discovery_date: "2025-11-20"
semantic_aspects:
- custodian
- collections
complexity_score: 4
custodian_ontology:
public_sector:
class: cpov:PublicOrganisation
namespace: http://data.europa.eu/m8g/
secondary_class: schema:Museum
tertiary_class: dbo:Museum # ← DBpedia class
quaternary_class: crm:E39_Actor
properties:
- label: dbo:collection
value:
- label: Military artifacts and archival records
- label: dbo:curator
value:
- label: Museum curator name
- label: dbo:museumType
value:
- label: Military history specialization
- label: dct:identifier
value:
- label: ISIL code
- label: schema:url
value:
- label: Official website
collections_ontology:
museum_collections:
class: crm:E78_Curated_Holding
properties:
- label: dbo:collection # ← Reference DBpedia property
value:
- label: Military artifacts description
```
---
## References
- **DBpedia Homepage**: https://www.dbpedia.org/
- **DBpedia Ontology**: http://dbpedia.org/ontology/
- **DBpedia SPARQL Endpoint**: http://dbpedia.org/sparql
- **DBpedia Mappings Wiki**: http://mappings.dbpedia.org
- **Archivo (DBpedia Ontology Archive)**: https://archivo.dbpedia.org/
- **DBpedia Databus**: https://databus.dbpedia.org/ontologies/dbpedia.org/ontology--DEV
---
## Next Steps
1. ✅ Document DBpedia integration conventions (THIS DOCUMENT)
2. ⏳ Create DBpedia → Wikidata mapping cache script
3. ⏳ Update `.opencode/agent/ontology-mapping-rules.md` with DBpedia step
4. ⏳ Retrofit existing ontology mappings (Q1802963, Q3694, Q2927789) with DBpedia references
5. ⏳ Add `dbpedia_class` field to LinkML schema
6. ⏳ Continue ontology enrichment with DBpedia integration
---
**Version**: 1.0
**Last Updated**: 2025-11-20
**Maintained By**: Heritage Custodian Ontology Project