151 lines
4.7 KiB
Markdown
151 lines
4.7 KiB
Markdown
# SPARQL Predicate Architecture
|
|
|
|
## Overview
|
|
|
|
The GLAM RAG system uses two different predicate URI styles that coexist:
|
|
|
|
1. **LinkML Schema** - Uses semantic URIs from base ontologies
|
|
2. **RAG SPARQL Queries** - Uses custom `hc:` prefixed predicates
|
|
|
|
This document explains why this dual system exists and how it's handled.
|
|
|
|
---
|
|
|
|
## The Two Predicate Systems
|
|
|
|
### 1. LinkML Schema Predicates (Semantic URIs)
|
|
|
|
The LinkML schema in `schemas/20251121/linkml/` uses `slot_uri` properties that map to established ontology vocabularies:
|
|
|
|
| Slot | slot_uri | Ontology |
|
|
|------|----------|----------|
|
|
| `custodian_type` | `org:classification` | W3C Organization Ontology |
|
|
| `settlement` | `schema:location` | Schema.org |
|
|
| `country` | `schema:addressCountry` | Schema.org |
|
|
| `name` | `skos:prefLabel` | SKOS |
|
|
|
|
**Rationale**: Semantic interoperability with linked data ecosystems (Europeana, Wikidata, etc.)
|
|
|
|
### 2. RAG SPARQL Predicates (Custom hc: prefix)
|
|
|
|
The RAG system generates SPARQL queries using custom `hc:` prefixed predicates:
|
|
|
|
| Predicate | Purpose |
|
|
|-----------|---------|
|
|
| `hc:institutionType` | Filter by heritage type (M, L, A, G, etc.) |
|
|
| `hc:settlementName` | Filter by city name |
|
|
| `hc:subregionCode` | Filter by province/state (NL-NH, NL-GE) |
|
|
| `hc:countryCode` | Filter by country (ISO 3166-1 alpha-2) |
|
|
| `hc:ghcid` | Global Heritage Custodian Identifier |
|
|
|
|
**Rationale**: Simplified, consistent predicates for RAG query generation
|
|
|
|
---
|
|
|
|
## Why Two Systems?
|
|
|
|
### Historical Context
|
|
|
|
1. **LinkML Schema** was designed for semantic web interoperability and RDF serialization
|
|
2. **RAG Queries** evolved independently for efficient knowledge graph querying
|
|
3. The Oxigraph knowledge graph stores data using the `hc:` namespace
|
|
|
|
### Technical Trade-offs
|
|
|
|
| Aspect | Semantic URIs | Custom hc: URIs |
|
|
|--------|---------------|-----------------|
|
|
| **Interoperability** | ✅ Standards-compliant | ❌ Project-specific |
|
|
| **Query Simplicity** | ❌ Long URIs | ✅ Short, memorable |
|
|
| **LLM Generation** | ❌ Harder to generate | ✅ Easier patterns |
|
|
| **Validation** | ✅ LinkML tooling | ⚠️ Custom validation |
|
|
|
|
---
|
|
|
|
## How SPARQLValidator Handles This
|
|
|
|
The `SPARQLValidator` class in `backend/rag/template_sparql.py` includes BOTH predicate systems:
|
|
|
|
```python
|
|
def __init__(self):
|
|
# 1. Core RAG predicates (always included)
|
|
hc_predicates = set(self._FALLBACK_HC_PREDICATES)
|
|
|
|
# 2. Schema predicates from OntologyLoader (semantic URIs)
|
|
schema_predicates = ontology.get_predicates()
|
|
if schema_predicates:
|
|
hc_predicates = hc_predicates | schema_predicates
|
|
|
|
# 3. External predicates (base ontology URIs)
|
|
self._all_predicates = hc_predicates | self.VALID_EXTERNAL_PREDICATES
|
|
```
|
|
|
|
### Predicate Categories
|
|
|
|
| Category | Count | Source |
|
|
|----------|-------|--------|
|
|
| Core RAG predicates | 12 | `_FALLBACK_HC_PREDICATES` |
|
|
| Schema predicates | 286 | OntologyLoader (LinkML) |
|
|
| External predicates | ~40 | `VALID_EXTERNAL_PREDICATES` |
|
|
|
|
---
|
|
|
|
## Future Considerations
|
|
|
|
### Option A: Unify to Semantic URIs (Recommended Long-term)
|
|
|
|
1. Update Oxigraph data to use semantic URIs
|
|
2. Update RAG query templates to use `org:classification` etc.
|
|
3. Deprecate custom `hc:` predicates
|
|
|
|
**Pros**: Single source of truth, better interoperability
|
|
**Cons**: Migration effort, breaking changes
|
|
|
|
### Option B: Maintain Dual System
|
|
|
|
1. Keep custom `hc:` predicates for RAG queries
|
|
2. Add URI mapping layer in Oxigraph (CONSTRUCT queries)
|
|
3. Document both systems
|
|
|
|
**Pros**: No breaking changes
|
|
**Cons**: Ongoing maintenance, potential confusion
|
|
|
|
### Option C: Namespace Aliasing
|
|
|
|
Configure Oxigraph to treat `hc:institutionType` as equivalent to `org:classification`:
|
|
|
|
```sparql
|
|
# SPARQL 1.1 Property Paths with owl:equivalentProperty
|
|
hc:institutionType owl:equivalentProperty org:classification .
|
|
```
|
|
|
|
**Pros**: Transparent to RAG system
|
|
**Cons**: Reasoning overhead, complexity
|
|
|
|
---
|
|
|
|
## Current State (January 2025)
|
|
|
|
- **SPARQLValidator**: Accepts both predicate systems ✅
|
|
- **SynonymResolver**: Uses OntologyLoader for type codes ✅
|
|
- **SchemaAwareSlotValidator**: Uses validation rules JSON ✅
|
|
- **Oxigraph**: Uses `hc:` namespace for data storage
|
|
|
|
---
|
|
|
|
## Related Files
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `backend/rag/template_sparql.py` | SPARQLValidator, OntologyLoader |
|
|
| `data/validation/sparql_validation_rules.json` | Enum definitions, mappings |
|
|
| `schemas/20251121/linkml/modules/slots/*.yaml` | LinkML slot definitions |
|
|
| `.opencode/rules/slot-centralization-and-semantic-uri-rule.md` | Rule 38 |
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- [W3C Organization Ontology](https://www.w3.org/TR/vocab-org/)
|
|
- [Schema.org](https://schema.org/)
|
|
- [SKOS](https://www.w3.org/TR/skos-reference/)
|
|
- [LinkML slot_uri documentation](https://linkml.io/linkml/schemas/uris-and-mappings.html)
|