glam/docs/plan/prompt-query_template_mapping/dspy-compatibility.md

# DSPy Compatibility

## Overview

This document describes how the template-based SPARQL system integrates with DSPy 2.6+. The key requirement is that template classification should be a **DSPy module** that can be optimized alongside existing modules using GEPA or other optimizers.

## DSPy Integration Points

### Current DSPy Architecture (dspy_heritage_rag.py)

The existing RAG system uses DSPy for:
1. **Query Intent Classification** - `ClassifyQueryIntent` Signature
2. **Entity Extraction** - `ExtractHeritageEntities` Signature
3. **SPARQL Generation** - `GenerateSPARQL` Signature (LLM-based)
4. **Answer Generation** - `GenerateHeritageAnswer` Signature

### Template Integration Strategy

We add template-based query generation as a **pre-filter** before LLM-based generation:

```
User Question
     |
     v
+------------------------+
| TemplateClassifier     |  <-- NEW: DSPy Module
| (DSPy Signature)       |
+------------------------+
     |
     v
[Template Match?]
     |
    Yes                       No
     |                         |
     v                         v
+----------------+      +----------------+
| Template       |      | GenerateSPARQL |  <-- Existing
| Instantiation  |      | (LLM-based)    |
+----------------+      +----------------+
     |                         |
     +------------+------------+
                  |
                  v
           Valid SPARQL
```

## DSPy Signatures

### 1. TemplateClassifier Signature

```python
import dspy
from typing import Optional, Literal
from pydantic import BaseModel, Field

class TemplateMatch(BaseModel):
    """Output model for template classification."""

    template_id: str = Field(
        description="ID of the matched template, or 'none' if no match"
    )
    confidence: float = Field(
        description="Confidence score between 0.0 and 1.0",
        ge=0.0,
        le=1.0
    )
    extracted_slots: dict[str, str] = Field(
        default_factory=dict,
        description="Extracted slot values from the question"
    )
    reasoning: str = Field(
        description="Brief explanation of why this template was selected"
    )


class ClassifyTemplateSignature(dspy.Signature):
    """Classify a heritage question and match it to a SPARQL template.

    Given a user question about Dutch heritage institutions, determine which
    predefined SPARQL template best matches the query intent. If no template
    matches, return template_id='none' to fall back to LLM generation.

    Available Templates:
    - region_institution_search: Find institutions of a type in a province
      Example: "Welke archieven zijn er in Drenthe?"
      Slots: institution_type (archieven/musea/bibliotheken), province (Dutch province name)

    - count_by_type: Count institutions by type
      Example: "Hoeveel musea zijn er in Nederland?"
      Slots: institution_type

    - count_by_type_region: Count institutions by type in a region
      Example: "Hoeveel archieven zijn er in Noord-Holland?"
      Slots: institution_type, province

    - entity_lookup: Look up a specific institution by name
      Example: "Wat is het Nationaal Archief?"
      Slots: institution_name

    - entity_lookup_by_ghcid: Look up by GHCID
      Example: "Details van NL-HaNA"
      Slots: ghcid

    - list_by_type: List all institutions of a type
      Example: "Toon alle bibliotheken"
      Slots: institution_type

    Dutch Institution Types:
    - archief/archieven -> type code "A"
    - museum/musea -> type code "M"
    - bibliotheek/bibliotheken -> type code "L"
    - galerie/galerijen -> type code "G"

    Dutch Provinces:
    - Drenthe, Flevoland, Friesland, Gelderland, Groningen,
      Limburg, Noord-Brabant, Noord-Holland, Overijssel,
      Utrecht, Zeeland, Zuid-Holland
    """

    question: str = dspy.InputField(
        desc="The user's question about heritage institutions"
    )
    language: str = dspy.InputField(
        desc="Language code: 'nl' for Dutch, 'en' for English",
        default="nl"
    )

    template_match: TemplateMatch = dspy.OutputField(
        desc="The matched template and extracted slots"
    )


class TemplateClassifier(dspy.Module):
    """DSPy module for template classification."""

    def __init__(self):
        super().__init__()
        self.classify = dspy.ChainOfThought(ClassifyTemplateSignature)

    def forward(self, question: str, language: str = "nl") -> TemplateMatch:
        """Classify question and return template match."""
        result = self.classify(question=question, language=language)
        return result.template_match
```

### 2. SlotExtractor Signature

```python
class ExtractedSlots(BaseModel):
    """Output model for slot extraction."""

    slots: dict[str, str] = Field(
        description="Mapping of slot names to extracted values"
    )
    normalized_slots: dict[str, str] = Field(
        description="Mapping of slot names to normalized values (codes)"
    )
    missing_slots: list[str] = Field(
        default_factory=list,
        description="List of required slots that could not be extracted"
    )


class ExtractSlotsSignature(dspy.Signature):
    """Extract slot values from a question for a specific template.

    Given a question and a template definition, extract the values for each
    slot defined in the template. Normalize values to their standard codes.

    Normalization Rules:
    - Province names -> ISO 3166-2 codes (e.g., "Drenthe" -> "NL-DR")
    - Institution types -> Single-letter codes (e.g., "archieven" -> "A")
    - GHCID -> Keep as-is if valid format (e.g., "NL-HaNA")

    Province Code Mappings:
    - Drenthe: NL-DR
    - Flevoland: NL-FL
    - Friesland: NL-FR
    - Gelderland: NL-GE
    - Groningen: NL-GR
    - Limburg: NL-LI
    - Noord-Brabant: NL-NB
    - Noord-Holland: NL-NH
    - Overijssel: NL-OV
    - Utrecht: NL-UT
    - Zeeland: NL-ZE
    - Zuid-Holland: NL-ZH

    Institution Type Codes:
    - A: Archive (archief, archieven)
    - M: Museum (museum, musea)
    - L: Library (bibliotheek, bibliotheken)
    - G: Gallery (galerie, galerijen)
    """

    question: str = dspy.InputField(
        desc="The user's question"
    )
    template_id: str = dspy.InputField(
        desc="ID of the matched template"
    )
    required_slots: list[str] = dspy.InputField(
        desc="List of required slot names for this template"
    )

    extracted_slots: ExtractedSlots = dspy.OutputField(
        desc="Extracted and normalized slot values"
    )


class SlotExtractor(dspy.Module):
    """DSPy module for slot extraction."""

    def __init__(self):
        super().__init__()
        self.extract = dspy.ChainOfThought(ExtractSlotsSignature)

    def forward(
        self,
        question: str,
        template_id: str,
        required_slots: list[str]
    ) -> ExtractedSlots:
        """Extract slots from question."""
        result = self.extract(
            question=question,
            template_id=template_id,
            required_slots=required_slots
        )
        return result.extracted_slots
```

### 3. Combined TemplateSPARQL Module

```python
class TemplateSPARQL(dspy.Module):
    """Combined DSPy module for template-based SPARQL generation.

    This module orchestrates:
    1. Template classification
    2. Slot extraction (if template matches)
    3. Template instantiation (if slots valid)
    4. Fallback to LLM (if no match or invalid slots)
    """

    def __init__(
        self,
        template_registry: "TemplateRegistry",
        fallback_module: Optional[dspy.Module] = None
    ):
        super().__init__()
        self.classifier = TemplateClassifier()
        self.slot_extractor = SlotExtractor()
        self.template_registry = template_registry
        self.fallback_module = fallback_module

    def forward(self, question: str, language: str = "nl") -> str:
        """Generate SPARQL query from question.

        Args:
            question: User's natural language question
            language: Language code ('nl' or 'en')

        Returns:
            Valid SPARQL query string
        """
        # Step 1: Classify template
        match = self.classifier(question=question, language=language)

        # Step 2: Check if template matched
        if match.template_id == "none" or match.confidence < 0.7:
            # Fall back to LLM generation
            if self.fallback_module:
                return self.fallback_module(question=question)
            raise ValueError("No template match and no fallback configured")

        # Step 3: Get template and extract slots
        template = self.template_registry.get(match.template_id)
        required_slots = list(template.slots.keys())

        slots = self.slot_extractor(
            question=question,
            template_id=match.template_id,
            required_slots=required_slots
        )

        # Step 4: Check for missing required slots
        if slots.missing_slots:
            # Fall back if required slots are missing
            if self.fallback_module:
                return self.fallback_module(question=question)
            raise ValueError(f"Missing required slots: {slots.missing_slots}")

        # Step 5: Instantiate template
        return template.instantiate(slots.normalized_slots)
```

## GEPA Optimization

The template classification can be optimized using GEPA:

```python
from dspy.teleprompt import GEPA

def template_accuracy_metric(example, prediction):
    """Metric for template classification accuracy."""
    # Check if correct template was selected
    if prediction.template_match.template_id != example.expected_template_id:
        return 0.0

    # Check slot extraction accuracy
    expected_slots = example.expected_slots
    predicted_slots = prediction.template_match.extracted_slots

    if not expected_slots:
        return 1.0  # No slots to check

    correct_slots = sum(
        1 for k, v in expected_slots.items()
        if predicted_slots.get(k) == v
    )
    return correct_slots / len(expected_slots)


def create_training_examples():
    """Create training examples for GEPA optimization."""
    return [
        dspy.Example(
            question="Welke archieven zijn er in Drenthe?",
            language="nl",
            expected_template_id="region_institution_search",
            expected_slots={
                "institution_type": "archieven",
                "province": "Drenthe"
            }
        ).with_inputs("question", "language"),

        dspy.Example(
            question="Welke musea zijn er in Noord-Holland?",
            language="nl",
            expected_template_id="region_institution_search",
            expected_slots={
                "institution_type": "musea",
                "province": "Noord-Holland"
            }
        ).with_inputs("question", "language"),

        dspy.Example(
            question="Hoeveel bibliotheken zijn er in Nederland?",
            language="nl",
            expected_template_id="count_by_type",
            expected_slots={
                "institution_type": "bibliotheken"
            }
        ).with_inputs("question", "language"),

        # Add more examples...
    ]


async def optimize_template_classifier():
    """Run GEPA optimization on template classifier."""

    # Create training data
    trainset = create_training_examples()

    # Initialize classifier
    classifier = TemplateClassifier()

    # Configure GEPA optimizer
    optimizer = GEPA(
        metric=template_accuracy_metric,
        auto="light",  # Use light optimization
        max_metric_calls=100,
    )

    # Run optimization
    optimized_classifier = optimizer.compile(
        classifier,
        trainset=trainset,
    )

    # Save optimized module
    optimized_classifier.save("optimized_template_classifier.json")

    return optimized_classifier
```

## Integration with Existing HeritageRAG

### Modified HeritageRAG Class

```python
# In dspy_heritage_rag.py

class HeritageRAG(dspy.Module):
    """Heritage RAG with template-based SPARQL generation."""

    def __init__(
        self,
        template_registry: Optional["TemplateRegistry"] = None,
        use_templates: bool = True,
        template_confidence_threshold: float = 0.7,
        **kwargs
    ):
        super().__init__()

        # Existing components
        self.query_intent = dspy.ChainOfThought(ClassifyQueryIntent)
        self.entity_extractor = dspy.ChainOfThought(ExtractHeritageEntities)
        self.sparql_generator = dspy.ChainOfThought(GenerateSPARQL)
        self.answer_generator = dspy.ChainOfThought(GenerateHeritageAnswer)

        # NEW: Template-based components
        self.use_templates = use_templates
        self.template_confidence_threshold = template_confidence_threshold

        if use_templates:
            self.template_classifier = TemplateClassifier()
            self.slot_extractor = SlotExtractor()
            self.template_registry = template_registry or TemplateRegistry.load_default()

    def generate_sparql(self, question: str, language: str = "nl") -> str:
        """Generate SPARQL query, trying templates first."""

        if self.use_templates:
            # Try template-based generation
            try:
                match = self.template_classifier(question=question, language=language)

                if (match.template_id != "none" and
                    match.confidence >= self.template_confidence_threshold):

                    template = self.template_registry.get(match.template_id)
                    required_slots = list(template.slots.keys())

                    slots = self.slot_extractor(
                        question=question,
                        template_id=match.template_id,
                        required_slots=required_slots
                    )

                    if not slots.missing_slots:
                        # Successfully matched template
                        logger.info(
                            f"Using template '{match.template_id}' "
                            f"(confidence: {match.confidence:.2f})"
                        )
                        return template.instantiate(slots.normalized_slots)

            except Exception as e:
                logger.warning(f"Template generation failed: {e}, falling back to LLM")

        # Fall back to LLM-based generation
        logger.info("Using LLM-based SPARQL generation")
        return self.sparql_generator(question=question).sparql_query
```

## Signature Caching for OpenAI

To leverage OpenAI's prompt caching (1,024+ token threshold), we create cacheable docstrings:

```python
def get_cacheable_template_classifier_docstring() -> str:
    """Generate >1024 token docstring for template classifier."""

    # Load template definitions
    templates = TemplateRegistry.load_default()

    # Build comprehensive docstring
    parts = [
        "Classify a heritage question and match it to a SPARQL template.",
        "",
        "## Available Templates",
        ""
    ]

    for template_id, template in templates.items():
        parts.extend([
            f"### {template_id}",
            f"Description: {template.description}",
            f"Example questions:",
        ])
        for pattern in template.question_patterns[:3]:
            parts.append(f"  - {pattern}")
        parts.extend([
            f"Required slots: {list(template.slots.keys())}",
            ""
        ])

    parts.extend([
        "## Slot Value Mappings",
        "",
        "### Province Codes (ISO 3166-2)",
    ])

    # Add all province mappings
    for province, code in get_subregion_mappings().items():
        parts.append(f"- {province}: {code}")

    parts.extend([
        "",
        "### Institution Type Codes",
    ])

    for type_name, code in get_institution_type_mappings().items():
        parts.append(f"- {type_name}: {code}")

    return "\n".join(parts)
```

## Testing DSPy Integration

```python
# tests/template_sparql/test_dspy_integration.py

import pytest
import dspy

class TestDSPyIntegration:
    """Test DSPy module integration."""

    @pytest.fixture
    def classifier(self):
        """Create template classifier with mock LM."""
        # Use DSPy's testing utilities
        dspy.configure(lm=dspy.LM("gpt-4o-mini"))
        return TemplateClassifier()

    def test_classifier_forward(self, classifier):
        """Test classifier forward pass."""
        result = classifier(
            question="Welke archieven zijn er in Drenthe?",
            language="nl"
        )

        assert result.template_id == "region_institution_search"
        assert result.confidence > 0.5
        assert "institution_type" in result.extracted_slots
        assert "province" in result.extracted_slots

    def test_classifier_is_optimizable(self, classifier):
        """Test that classifier can be compiled with GEPA."""
        from dspy.teleprompt import GEPA

        trainset = [
            dspy.Example(
                question="Welke archieven zijn er in Drenthe?",
                language="nl",
                expected_template_id="region_institution_search"
            ).with_inputs("question", "language")
        ]

        optimizer = GEPA(
            metric=lambda e, p: 1.0 if p.template_match.template_id == e.expected_template_id else 0.0,
            max_metric_calls=5
        )

        # Should not raise
        compiled = optimizer.compile(classifier, trainset=trainset)
        assert compiled is not None
```

## Performance Considerations

### Caching Strategy

```python
from functools import lru_cache

class CachedTemplateClassifier(TemplateClassifier):
    """Template classifier with caching."""

    @lru_cache(maxsize=1000)
    def forward_cached(self, question: str, language: str = "nl") -> TemplateMatch:
        """Cached forward pass for repeated questions."""
        return super().forward(question, language)

    def forward(self, question: str, language: str = "nl") -> TemplateMatch:
        # Normalize question for better cache hits
        normalized = question.lower().strip()
        return self.forward_cached(normalized, language)
```

### Batch Processing

```python
class BatchTemplateClassifier(dspy.Module):
    """Batch classification for efficiency."""

    def __init__(self):
        super().__init__()
        self.classifier = TemplateClassifier()

    async def forward_batch(
        self,
        questions: list[str],
        language: str = "nl"
    ) -> list[TemplateMatch]:
        """Classify multiple questions in parallel."""
        import asyncio

        async def classify_one(q: str) -> TemplateMatch:
            return self.classifier(question=q, language=language)

        return await asyncio.gather(*[classify_one(q) for q in questions])
```

## References

- DSPy Documentation: https://dspy-docs.vercel.app/
- GEPA Paper: https://arxiv.org/abs/2507.19457
- DSPy Signatures: https://dspy-docs.vercel.app/docs/building-blocks/signatures
- DSPy Modules: https://dspy-docs.vercel.app/docs/building-blocks/modules