| .. | ||
| competency-questions.md | ||
| conversation-context.md | ||
| design-patterns.md | ||
| dspy-compatibility.md | ||
| external-dependencies.md | ||
| methodology.md | ||
| rag-integration.md | ||
| README.md | ||
| tdd.md | ||
| templates-schema.md | ||
Template-Based SPARQL Query Generation
Overview
This plan documents the implementation of a template-based SPARQL query generation system for the de Aa ArchiefAssistent (archief.support). The system replaces unreliable LLM-generated SPARQL with validated, pre-defined query templates that are selected and instantiated based on user intent.
Problem Statement
The current RAG system generates SPARQL queries via LLM, which produces:
- Syntax errors - Orphaned punctuation (
.on empty lines) after auto-correction - Invalid predicates -
crm:P53_has_former_or_current_location(CIDOC-CRM not in ontology) - Undefined prefixes -
wd:used without declaration - Inconsistent results - Same question generates different (sometimes broken) SPARQL
Example Failure
Question: "Welke archieven zijn er in Drenthe?"
# LLM-generated query with orphaned "." causing 400 error:
?archive skos:prefLabel ?name .
. <-- SYNTAX ERROR
FILTER(CONTAINS(STR(?archive), "NL-DR")) .
Solution: Template-Based KBQA
Research confirms template-based Knowledge Base Question Answering (KBQA) achieves 65.44% precision vs 10.52% for LLM-only approaches (Formica et al., 2023).
Architecture
User Question
|
v
+--------------------+
| Intent Classifier | <-- DSPy Signature for question classification
+--------------------+
|
v
+--------------------+
| Template Router | <-- Selects appropriate SPARQL template
+--------------------+
|
v
+--------------------+
| Entity Extractor | <-- Extracts slots (province, type, etc.)
+--------------------+
|
v
+--------------------+
| Template Filler | <-- Instantiates template with slot values
+--------------------+
|
v
+--------------------+
| SPARQL Validator | <-- Validates syntax before execution
+--------------------+
|
v
Valid SPARQL Query
Documentation Index
| Document | Description |
|---|---|
| methodology.md | Academic methodology and research citations |
| design-patterns.md | Software patterns (Strategy, Template Method, CoR) |
| tdd.md | Test-driven development approach with test cases |
| dspy-compatibility.md | DSPy integration for template classification |
| rag-integration.md | SPARQL-first retrieval flow integration |
| external-dependencies.md | Required libraries and services |
| templates-schema.md | YAML/JSON schema for template definitions |
| competency-questions.md | Ontology coverage testing, fyke filter for irrelevant questions |
| conversation-context.md | Follow-up prompt handling with DSPy History |
Quick Start
1. Template Definition
templates:
region_institution_search:
id: "region_institution_search"
question_patterns:
- "Welke {institution_type_nl} zijn er in {province}?"
- "{institution_type_nl} in {province}"
sparql_template: |
PREFIX hc: <https://nde.nl/ontology/hc/class/>
PREFIX hcp: <https://nde.nl/ontology/hc/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?institution ?name WHERE {
?institution a hc:Custodian ;
hcp:institutionType "{{institution_type_code}}" ;
skos:prefLabel ?name .
FILTER(CONTAINS(STR(?institution), "{{province_code}}"))
}
slots:
institution_type_code:
source: sparql_validation_rules.json#institution_types
province_code:
source: sparql_validation_rules.json#subregions
2. Usage Flow
# 1. Classify question intent
intent = classifier.classify("Welke archieven zijn er in Drenthe?")
# -> Intent(template_id="region_institution_search", slots={"province": "Drenthe", "type": "archieven"})
# 2. Extract slot values
slots = extractor.extract(intent)
# -> {"institution_type_code": "A", "province_code": "NL-DR"}
# 3. Instantiate template
query = filler.fill("region_institution_search", slots)
# -> Valid SPARQL query
# 4. Execute against SPARQL endpoint
results = execute_sparql(query)
Integration Points
| Component | File | Integration |
|---|---|---|
| Query Router | dspy_heritage_rag.py |
Add template classification before LLM |
| Slot Filler | ontology_mapping.py |
Use for entity extraction |
| Validation | sparql_linter.py |
Validate instantiated templates |
| Rules | sparql_validation_rules.json |
Source for slot enum values |
Key Metrics
| Metric | Current (LLM-only) | Target (Template-based) |
|---|---|---|
| Query success rate | ~40% | >95% |
| Syntax errors | ~30% | 0% |
| Response time | ~3s | <1s |
| Consistency | Low | High |
Next Steps After Planning
- Create template definitions for top 10 question types
- Implement TemplateRouter (DSPy Signature for classification)
- Implement SlotFiller (entity extraction using ontology_mapping.py)
- Implement TemplateInstantiator (fill slots -> valid SPARQL)
- Add fallback to LLM generation for unmatched questions
- Write tests for each template (TDD approach)
References
- Formica et al. (2023) - Template-based approach for QA over knowledge bases
- Steinmetz et al. (2019) - Pattern-based NL to SPARQL
- Zheng et al. (2018) - Template decomposition for complex questions
- SPARQL-LLM (2025) - Hybrid template + LLM approach