glam/docs/plan/prompt-query_template_mapping
kempersc 30b9cb9d14 Add SOTA analysis and update design pattern documentation
- Add prompt-query_template_mapping/SOTA_analysis.md with Formica et al. research
- Update GraphRAG design patterns documentation
- Update temporal semantic hypergraph documentation
2026-01-07 22:05:01 +01:00
..
competency-questions.md Make AGENTS more succint by referring to opencode rules & enrich custodians 2025-12-28 14:56:35 +01:00
conversation-context.md Make AGENTS more succint by referring to opencode rules & enrich custodians 2025-12-28 14:56:35 +01:00
design-patterns.md Make AGENTS more succint by referring to opencode rules & enrich custodians 2025-12-28 14:56:35 +01:00
dspy-compatibility.md Make AGENTS more succint by referring to opencode rules & enrich custodians 2025-12-28 14:56:35 +01:00
external-dependencies.md Make AGENTS more succint by referring to opencode rules & enrich custodians 2025-12-28 14:56:35 +01:00
methodology.md Make AGENTS more succint by referring to opencode rules & enrich custodians 2025-12-28 14:56:35 +01:00
rag-integration.md Make AGENTS more succint by referring to opencode rules & enrich custodians 2025-12-28 14:56:35 +01:00
README.md Make AGENTS more succint by referring to opencode rules & enrich custodians 2025-12-28 14:56:35 +01:00
SOTA_analysis.md Add SOTA analysis and update design pattern documentation 2026-01-07 22:05:01 +01:00
tdd.md Make AGENTS more succint by referring to opencode rules & enrich custodians 2025-12-28 14:56:35 +01:00
templates-schema.md Make AGENTS more succint by referring to opencode rules & enrich custodians 2025-12-28 14:56:35 +01:00

Template-Based SPARQL Query Generation

Overview

This plan documents the implementation of a template-based SPARQL query generation system for the de Aa ArchiefAssistent (archief.support). The system replaces unreliable LLM-generated SPARQL with validated, pre-defined query templates that are selected and instantiated based on user intent.

Problem Statement

The current RAG system generates SPARQL queries via LLM, which produces:

  1. Syntax errors - Orphaned punctuation (. on empty lines) after auto-correction
  2. Invalid predicates - crm:P53_has_former_or_current_location (CIDOC-CRM not in ontology)
  3. Undefined prefixes - wd: used without declaration
  4. Inconsistent results - Same question generates different (sometimes broken) SPARQL

Example Failure

Question: "Welke archieven zijn er in Drenthe?"

# LLM-generated query with orphaned "." causing 400 error:
?archive skos:prefLabel ?name .
  .                              <-- SYNTAX ERROR
  FILTER(CONTAINS(STR(?archive), "NL-DR")) .

Solution: Template-Based KBQA

Research confirms template-based Knowledge Base Question Answering (KBQA) achieves 65.44% precision vs 10.52% for LLM-only approaches (Formica et al., 2023).

Architecture

User Question
     |
     v
+--------------------+
| Intent Classifier  |  <-- DSPy Signature for question classification
+--------------------+
     |
     v
+--------------------+
| Template Router    |  <-- Selects appropriate SPARQL template
+--------------------+
     |
     v
+--------------------+
| Entity Extractor   |  <-- Extracts slots (province, type, etc.)
+--------------------+
     |
     v
+--------------------+
| Template Filler    |  <-- Instantiates template with slot values
+--------------------+
     |
     v
+--------------------+
| SPARQL Validator   |  <-- Validates syntax before execution
+--------------------+
     |
     v
Valid SPARQL Query

Documentation Index

Document Description
methodology.md Academic methodology and research citations
design-patterns.md Software patterns (Strategy, Template Method, CoR)
tdd.md Test-driven development approach with test cases
dspy-compatibility.md DSPy integration for template classification
rag-integration.md SPARQL-first retrieval flow integration
external-dependencies.md Required libraries and services
templates-schema.md YAML/JSON schema for template definitions
competency-questions.md Ontology coverage testing, fyke filter for irrelevant questions
conversation-context.md Follow-up prompt handling with DSPy History

Quick Start

1. Template Definition

templates:
  region_institution_search:
    id: "region_institution_search"
    question_patterns:
      - "Welke {institution_type_nl} zijn er in {province}?"
      - "{institution_type_nl} in {province}"
    sparql_template: |
      PREFIX hc: <https://nde.nl/ontology/hc/class/>
      PREFIX hcp: <https://nde.nl/ontology/hc/>
      PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
      SELECT ?institution ?name WHERE {
        ?institution a hc:Custodian ;
                     hcp:institutionType "{{institution_type_code}}" ;
                     skos:prefLabel ?name .
        FILTER(CONTAINS(STR(?institution), "{{province_code}}"))
      }      
    slots:
      institution_type_code:
        source: sparql_validation_rules.json#institution_types
      province_code:
        source: sparql_validation_rules.json#subregions

2. Usage Flow

# 1. Classify question intent
intent = classifier.classify("Welke archieven zijn er in Drenthe?")
# -> Intent(template_id="region_institution_search", slots={"province": "Drenthe", "type": "archieven"})

# 2. Extract slot values
slots = extractor.extract(intent)
# -> {"institution_type_code": "A", "province_code": "NL-DR"}

# 3. Instantiate template
query = filler.fill("region_institution_search", slots)
# -> Valid SPARQL query

# 4. Execute against SPARQL endpoint
results = execute_sparql(query)

Integration Points

Component File Integration
Query Router dspy_heritage_rag.py Add template classification before LLM
Slot Filler ontology_mapping.py Use for entity extraction
Validation sparql_linter.py Validate instantiated templates
Rules sparql_validation_rules.json Source for slot enum values

Key Metrics

Metric Current (LLM-only) Target (Template-based)
Query success rate ~40% >95%
Syntax errors ~30% 0%
Response time ~3s <1s
Consistency Low High

Next Steps After Planning

  1. Create template definitions for top 10 question types
  2. Implement TemplateRouter (DSPy Signature for classification)
  3. Implement SlotFiller (entity extraction using ontology_mapping.py)
  4. Implement TemplateInstantiator (fill slots -> valid SPARQL)
  5. Add fallback to LLM generation for unmatched questions
  6. Write tests for each template (TDD approach)

References

  • Formica et al. (2023) - Template-based approach for QA over knowledge bases
  • Steinmetz et al. (2019) - Pattern-based NL to SPARQL
  • Zheng et al. (2018) - Template decomposition for complex questions
  • SPARQL-LLM (2025) - Hybrid template + LLM approach