4.7 KiB
SPARQL Predicate Architecture
Overview
The GLAM RAG system uses two different predicate URI styles that coexist:
- LinkML Schema - Uses semantic URIs from base ontologies
- RAG SPARQL Queries - Uses custom
hc:prefixed predicates
This document explains why this dual system exists and how it's handled.
The Two Predicate Systems
1. LinkML Schema Predicates (Semantic URIs)
The LinkML schema in schemas/20251121/linkml/ uses slot_uri properties that map to established ontology vocabularies:
| Slot | slot_uri | Ontology |
|---|---|---|
custodian_type |
org:classification |
W3C Organization Ontology |
settlement |
schema:location |
Schema.org |
country |
schema:addressCountry |
Schema.org |
name |
skos:prefLabel |
SKOS |
Rationale: Semantic interoperability with linked data ecosystems (Europeana, Wikidata, etc.)
2. RAG SPARQL Predicates (Custom hc: prefix)
The RAG system generates SPARQL queries using custom hc: prefixed predicates:
| Predicate | Purpose |
|---|---|
hc:institutionType |
Filter by heritage type (M, L, A, G, etc.) |
hc:settlementName |
Filter by city name |
hc:subregionCode |
Filter by province/state (NL-NH, NL-GE) |
hc:countryCode |
Filter by country (ISO 3166-1 alpha-2) |
hc:ghcid |
Global Heritage Custodian Identifier |
Rationale: Simplified, consistent predicates for RAG query generation
Why Two Systems?
Historical Context
- LinkML Schema was designed for semantic web interoperability and RDF serialization
- RAG Queries evolved independently for efficient knowledge graph querying
- The Oxigraph knowledge graph stores data using the
hc:namespace
Technical Trade-offs
| Aspect | Semantic URIs | Custom hc: URIs |
|---|---|---|
| Interoperability | ✅ Standards-compliant | ❌ Project-specific |
| Query Simplicity | ❌ Long URIs | ✅ Short, memorable |
| LLM Generation | ❌ Harder to generate | ✅ Easier patterns |
| Validation | ✅ LinkML tooling | ⚠️ Custom validation |
How SPARQLValidator Handles This
The SPARQLValidator class in backend/rag/template_sparql.py includes BOTH predicate systems:
def __init__(self):
# 1. Core RAG predicates (always included)
hc_predicates = set(self._FALLBACK_HC_PREDICATES)
# 2. Schema predicates from OntologyLoader (semantic URIs)
schema_predicates = ontology.get_predicates()
if schema_predicates:
hc_predicates = hc_predicates | schema_predicates
# 3. External predicates (base ontology URIs)
self._all_predicates = hc_predicates | self.VALID_EXTERNAL_PREDICATES
Predicate Categories
| Category | Count | Source |
|---|---|---|
| Core RAG predicates | 12 | _FALLBACK_HC_PREDICATES |
| Schema predicates | 286 | OntologyLoader (LinkML) |
| External predicates | ~40 | VALID_EXTERNAL_PREDICATES |
Future Considerations
Option A: Unify to Semantic URIs (Recommended Long-term)
- Update Oxigraph data to use semantic URIs
- Update RAG query templates to use
org:classificationetc. - Deprecate custom
hc:predicates
Pros: Single source of truth, better interoperability Cons: Migration effort, breaking changes
Option B: Maintain Dual System
- Keep custom
hc:predicates for RAG queries - Add URI mapping layer in Oxigraph (CONSTRUCT queries)
- Document both systems
Pros: No breaking changes Cons: Ongoing maintenance, potential confusion
Option C: Namespace Aliasing
Configure Oxigraph to treat hc:institutionType as equivalent to org:classification:
# SPARQL 1.1 Property Paths with owl:equivalentProperty
hc:institutionType owl:equivalentProperty org:classification .
Pros: Transparent to RAG system Cons: Reasoning overhead, complexity
Current State (January 2025)
- SPARQLValidator: Accepts both predicate systems ✅
- SynonymResolver: Uses OntologyLoader for type codes ✅
- SchemaAwareSlotValidator: Uses validation rules JSON ✅
- Oxigraph: Uses
hc:namespace for data storage
Related Files
| File | Purpose |
|---|---|
backend/rag/template_sparql.py |
SPARQLValidator, OntologyLoader |
data/validation/sparql_validation_rules.json |
Enum definitions, mappings |
schemas/20251121/linkml/modules/slots/*.yaml |
LinkML slot definitions |
.opencode/rules/slot-centralization-and-semantic-uri-rule.md |
Rule 38 |