History

kempersc 84904e344b Make AGENTS more succint by referring to opencode rules & enrich custodians		2025-12-28 14:56:35 +01:00
..
competency-questions.md	Make AGENTS more succint by referring to opencode rules & enrich custodians	2025-12-28 14:56:35 +01:00
conversation-context.md	Make AGENTS more succint by referring to opencode rules & enrich custodians	2025-12-28 14:56:35 +01:00
design-patterns.md	Make AGENTS more succint by referring to opencode rules & enrich custodians	2025-12-28 14:56:35 +01:00
dspy-compatibility.md	Make AGENTS more succint by referring to opencode rules & enrich custodians	2025-12-28 14:56:35 +01:00
external-dependencies.md	Make AGENTS more succint by referring to opencode rules & enrich custodians	2025-12-28 14:56:35 +01:00
methodology.md	Make AGENTS more succint by referring to opencode rules & enrich custodians	2025-12-28 14:56:35 +01:00
rag-integration.md	Make AGENTS more succint by referring to opencode rules & enrich custodians	2025-12-28 14:56:35 +01:00
README.md	Make AGENTS more succint by referring to opencode rules & enrich custodians	2025-12-28 14:56:35 +01:00
tdd.md	Make AGENTS more succint by referring to opencode rules & enrich custodians	2025-12-28 14:56:35 +01:00
templates-schema.md	Make AGENTS more succint by referring to opencode rules & enrich custodians	2025-12-28 14:56:35 +01:00

README.md

Template-Based SPARQL Query Generation

Overview

This plan documents the implementation of a template-based SPARQL query generation system for the de Aa ArchiefAssistent (archief.support). The system replaces unreliable LLM-generated SPARQL with validated, pre-defined query templates that are selected and instantiated based on user intent.

Problem Statement

The current RAG system generates SPARQL queries via LLM, which produces:

Syntax errors - Orphaned punctuation (. on empty lines) after auto-correction
Invalid predicates - crm:P53_has_former_or_current_location (CIDOC-CRM not in ontology)
Undefined prefixes - wd: used without declaration
Inconsistent results - Same question generates different (sometimes broken) SPARQL

Example Failure

Question: "Welke archieven zijn er in Drenthe?"

# LLM-generated query with orphaned "." causing 400 error:
?archive skos:prefLabel ?name .
  .                              <-- SYNTAX ERROR
  FILTER(CONTAINS(STR(?archive), "NL-DR")) .

Solution: Template-Based KBQA

Research confirms template-based Knowledge Base Question Answering (KBQA) achieves 65.44% precision vs 10.52% for LLM-only approaches (Formica et al., 2023).

Architecture

User Question
     |
     v
+--------------------+
| Intent Classifier  |  <-- DSPy Signature for question classification
+--------------------+
     |
     v
+--------------------+
| Template Router    |  <-- Selects appropriate SPARQL template
+--------------------+
     |
     v
+--------------------+
| Entity Extractor   |  <-- Extracts slots (province, type, etc.)
+--------------------+
     |
     v
+--------------------+
| Template Filler    |  <-- Instantiates template with slot values
+--------------------+
     |
     v
+--------------------+
| SPARQL Validator   |  <-- Validates syntax before execution
+--------------------+
     |
     v
Valid SPARQL Query

Documentation Index

Document	Description
methodology.md	Academic methodology and research citations
design-patterns.md	Software patterns (Strategy, Template Method, CoR)
tdd.md	Test-driven development approach with test cases
dspy-compatibility.md	DSPy integration for template classification
rag-integration.md	SPARQL-first retrieval flow integration
external-dependencies.md	Required libraries and services
templates-schema.md	YAML/JSON schema for template definitions
competency-questions.md	Ontology coverage testing, fyke filter for irrelevant questions
conversation-context.md	Follow-up prompt handling with DSPy History

Quick Start

1. Template Definition

templates:
  region_institution_search:
    id: "region_institution_search"
    question_patterns:
      - "Welke {institution_type_nl} zijn er in {province}?"
      - "{institution_type_nl} in {province}"
    sparql_template: |
      PREFIX hc: <https://nde.nl/ontology/hc/class/>
      PREFIX hcp: <https://nde.nl/ontology/hc/>
      PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
      SELECT ?institution ?name WHERE {
        ?institution a hc:Custodian ;
                     hcp:institutionType "{{institution_type_code}}" ;
                     skos:prefLabel ?name .
        FILTER(CONTAINS(STR(?institution), "{{province_code}}"))
      }      
    slots:
      institution_type_code:
        source: sparql_validation_rules.json#institution_types
      province_code:
        source: sparql_validation_rules.json#subregions

2. Usage Flow

# 1. Classify question intent
intent = classifier.classify("Welke archieven zijn er in Drenthe?")
# -> Intent(template_id="region_institution_search", slots={"province": "Drenthe", "type": "archieven"})

# 2. Extract slot values
slots = extractor.extract(intent)
# -> {"institution_type_code": "A", "province_code": "NL-DR"}

# 3. Instantiate template
query = filler.fill("region_institution_search", slots)
# -> Valid SPARQL query

# 4. Execute against SPARQL endpoint
results = execute_sparql(query)

Integration Points

Component	File	Integration
Query Router	`dspy_heritage_rag.py`	Add template classification before LLM
Slot Filler	`ontology_mapping.py`	Use for entity extraction
Validation	`sparql_linter.py`	Validate instantiated templates
Rules	`sparql_validation_rules.json`	Source for slot enum values

Key Metrics

Metric	Current (LLM-only)	Target (Template-based)
Query success rate	~40%	>95%
Syntax errors	~30%	0%
Response time	~3s	<1s
Consistency	Low	High

Next Steps After Planning

Create template definitions for top 10 question types
Implement TemplateRouter (DSPy Signature for classification)
Implement SlotFiller (entity extraction using ontology_mapping.py)
Implement TemplateInstantiator (fill slots -> valid SPARQL)
Add fallback to LLM generation for unmatched questions
Write tests for each template (TDD approach)

References

Formica et al. (2023) - Template-based approach for QA over knowledge bases
Steinmetz et al. (2019) - Pattern-based NL to SPARQL
Zheng et al. (2018) - Template decomposition for complex questions
SPARQL-LLM (2025) - Hybrid template + LLM approach