# SPARQL Predicate Architecture ## Overview The GLAM RAG system uses two different predicate URI styles that coexist: 1. **LinkML Schema** - Uses semantic URIs from base ontologies 2. **RAG SPARQL Queries** - Uses custom `hc:` prefixed predicates This document explains why this dual system exists and how it's handled. --- ## The Two Predicate Systems ### 1. LinkML Schema Predicates (Semantic URIs) The LinkML schema in `schemas/20251121/linkml/` uses `slot_uri` properties that map to established ontology vocabularies: | Slot | slot_uri | Ontology | |------|----------|----------| | `custodian_type` | `org:classification` | W3C Organization Ontology | | `settlement` | `schema:location` | Schema.org | | `country` | `schema:addressCountry` | Schema.org | | `name` | `skos:prefLabel` | SKOS | **Rationale**: Semantic interoperability with linked data ecosystems (Europeana, Wikidata, etc.) ### 2. RAG SPARQL Predicates (Custom hc: prefix) The RAG system generates SPARQL queries using custom `hc:` prefixed predicates: | Predicate | Purpose | |-----------|---------| | `hc:institutionType` | Filter by heritage type (M, L, A, G, etc.) | | `hc:settlementName` | Filter by city name | | `hc:subregionCode` | Filter by province/state (NL-NH, NL-GE) | | `hc:countryCode` | Filter by country (ISO 3166-1 alpha-2) | | `hc:ghcid` | Global Heritage Custodian Identifier | **Rationale**: Simplified, consistent predicates for RAG query generation --- ## Why Two Systems? ### Historical Context 1. **LinkML Schema** was designed for semantic web interoperability and RDF serialization 2. **RAG Queries** evolved independently for efficient knowledge graph querying 3. The Oxigraph knowledge graph stores data using the `hc:` namespace ### Technical Trade-offs | Aspect | Semantic URIs | Custom hc: URIs | |--------|---------------|-----------------| | **Interoperability** | ✅ Standards-compliant | ❌ Project-specific | | **Query Simplicity** | ❌ Long URIs | ✅ Short, memorable | | **LLM Generation** | ❌ Harder to generate | ✅ Easier patterns | | **Validation** | ✅ LinkML tooling | ⚠️ Custom validation | --- ## How SPARQLValidator Handles This The `SPARQLValidator` class in `backend/rag/template_sparql.py` includes BOTH predicate systems: ```python def __init__(self): # 1. Core RAG predicates (always included) hc_predicates = set(self._FALLBACK_HC_PREDICATES) # 2. Schema predicates from OntologyLoader (semantic URIs) schema_predicates = ontology.get_predicates() if schema_predicates: hc_predicates = hc_predicates | schema_predicates # 3. External predicates (base ontology URIs) self._all_predicates = hc_predicates | self.VALID_EXTERNAL_PREDICATES ``` ### Predicate Categories | Category | Count | Source | |----------|-------|--------| | Core RAG predicates | 12 | `_FALLBACK_HC_PREDICATES` | | Schema predicates | 286 | OntologyLoader (LinkML) | | External predicates | ~40 | `VALID_EXTERNAL_PREDICATES` | --- ## Future Considerations ### Option A: Unify to Semantic URIs (Recommended Long-term) 1. Update Oxigraph data to use semantic URIs 2. Update RAG query templates to use `org:classification` etc. 3. Deprecate custom `hc:` predicates **Pros**: Single source of truth, better interoperability **Cons**: Migration effort, breaking changes ### Option B: Maintain Dual System 1. Keep custom `hc:` predicates for RAG queries 2. Add URI mapping layer in Oxigraph (CONSTRUCT queries) 3. Document both systems **Pros**: No breaking changes **Cons**: Ongoing maintenance, potential confusion ### Option C: Namespace Aliasing Configure Oxigraph to treat `hc:institutionType` as equivalent to `org:classification`: ```sparql # SPARQL 1.1 Property Paths with owl:equivalentProperty hc:institutionType owl:equivalentProperty org:classification . ``` **Pros**: Transparent to RAG system **Cons**: Reasoning overhead, complexity --- ## Current State (January 2025) - **SPARQLValidator**: Accepts both predicate systems ✅ - **SynonymResolver**: Uses OntologyLoader for type codes ✅ - **SchemaAwareSlotValidator**: Uses validation rules JSON ✅ - **Oxigraph**: Uses `hc:` namespace for data storage --- ## Related Files | File | Purpose | |------|---------| | `backend/rag/template_sparql.py` | SPARQLValidator, OntologyLoader | | `data/validation/sparql_validation_rules.json` | Enum definitions, mappings | | `schemas/20251121/linkml/modules/slots/*.yaml` | LinkML slot definitions | | `.opencode/rules/slot-centralization-and-semantic-uri-rule.md` | Rule 38 | --- ## References - [W3C Organization Ontology](https://www.w3.org/TR/vocab-org/) - [Schema.org](https://schema.org/) - [SKOS](https://www.w3.org/TR/skos-reference/) - [LinkML slot_uri documentation](https://linkml.io/linkml/schemas/uris-and-mappings.html)