glam/docs/plan/prompt-query_template_mapping/competency-questions.md

26 KiB
Raw Blame History

Competency Questions and Ontology Coverage

Overview

This document describes how the template-based SPARQL system serves a dual purpose:

  1. Query generation - Translate user questions to valid SPARQL
  2. Ontology validation - Identify gaps in the Heritage Custodian Ontology through unanswerable questions

The principle is simple: if a relevant question cannot be mapped to a SPARQL template, the ontology likely lacks coverage for that domain.

Competency Questions (CQs)

What are Competency Questions?

Competency Questions are natural language questions that an ontology should be able to answer. They serve as:

  • Requirements during ontology design
  • Validation criteria during ontology evaluation
  • Coverage metrics for ongoing maintenance

"Competency questions define what the ontology knows about. If you can't answer a competency question with a SPARQL query, your ontology is incomplete." — Grüninger & Fox (1995)

CQs as Template Coverage Metrics

Each SPARQL template implicitly defines a set of Competency Questions:

# Template definition implies these CQs are answerable:
region_institution_search:
  question_patterns:
    - "Welke {institution_type_nl} zijn er in {province}?"
    - "Which {institution_type_en} are in {province}?"
  
  # Implied Competency Questions:
  # CQ1: What archives exist in a given Dutch province?
  # CQ2: What museums exist in a given Dutch province?
  # CQ3: What libraries exist in a given Dutch province?
  # etc.

Tracking Ontology Coverage

"""Track which competency questions the ontology can answer."""

from dataclasses import dataclass
from enum import Enum


class CQStatus(Enum):
    """Status of a competency question."""
    ANSWERABLE = "answerable"           # Has matching template
    PARTIAL = "partial"                 # Template exists but limited
    UNANSWERABLE = "unanswerable"       # No template, ontology gap
    OUT_OF_SCOPE = "out_of_scope"       # Not relevant to ontology (fyke)


@dataclass
class CompetencyQuestion:
    """A competency question for ontology validation."""
    
    id: str
    question_nl: str
    question_en: str
    category: str  # geographic, statistical, relational, etc.
    status: CQStatus
    template_id: str | None  # Template that answers this CQ
    ontology_classes: list[str]  # Classes needed to answer
    ontology_properties: list[str]  # Properties needed to answer
    notes: str | None = None


# Example CQ registry
COMPETENCY_QUESTIONS = [
    CompetencyQuestion(
        id="CQ-GEO-001",
        question_nl="Welke archieven zijn er in een bepaalde provincie?",
        question_en="What archives exist in a given province?",
        category="geographic",
        status=CQStatus.ANSWERABLE,
        template_id="region_institution_search",
        ontology_classes=["crm:E39_Actor"],
        ontology_properties=["hc:institutionType", "schema:addressLocality"],
    ),
    CompetencyQuestion(
        id="CQ-REL-001",
        question_nl="Welke instellingen hebben dezelfde directeur gehad?",
        question_en="Which institutions have shared the same director?",
        category="relational",
        status=CQStatus.UNANSWERABLE,  # GAP: No staff employment history
        template_id=None,
        ontology_classes=["crm:E39_Actor", "schema:Person"],
        ontology_properties=["schema:employee", "schema:worksFor"],  # MISSING
        notes="Ontology lacks employment history modeling",
    ),
    CompetencyQuestion(
        id="CQ-OUT-001",
        question_nl="Waar kan ik tandpasta met korting kopen?",
        question_en="Where can I buy toothpaste with a discount?",
        category="out_of_scope",
        status=CQStatus.OUT_OF_SCOPE,
        template_id=None,
        ontology_classes=[],
        ontology_properties=[],
        notes="Not related to heritage institutions - route to fyke",
    ),
]

The Fyke: Catching Irrelevant Questions

What is a Fyke?

A fyke (Dutch: fuik) is a type of fish trap - a net that catches fish swimming in a certain direction. In our system, the fyke catches questions that are irrelevant to the Heritage Custodian Ontology.

User Question
     |
     v
+------------------+
| Relevance Filter |  <-- Is this about heritage institutions?
+------------------+
     |
     +---> Relevant?
     |        |
     |   Yes  |   No
     |    |   |    |
     |    v   |    v
     |  Route |  +-------+
     |  to    |  | FYKE  |  <-- Catch irrelevant questions
     |  SPARQL|  +-------+
     |        |       |
     |        |       v
     |        |  Standard Response:
     |        |  "Deze vraag kan niet beantwoord worden
     |        |   door de ArchiefAssistent. De service
     |        |   bevat informatie over erfgoedinstellingen
     |        |   zoals archieven, musea en bibliotheken."

Fyke Implementation

"""Fyke: Filter for irrelevant questions."""

import dspy
from typing import Literal
from pydantic import BaseModel, Field


class RelevanceClassification(BaseModel):
    """Structured output for relevance classification."""
    
    is_relevant: bool = Field(
        description="Whether the question relates to heritage institutions"
    )
    confidence: float = Field(
        ge=0.0, le=1.0,
        description="Confidence in the classification"
    )
    reasoning: str = Field(
        description="Brief explanation of why the question is or isn't relevant"
    )
    detected_topics: list[str] = Field(
        description="Topics detected in the question"
    )


class HeritageRelevanceSignature(dspy.Signature):
    """Determine if a question is relevant to the Heritage Custodian Ontology.
    
    The Heritage Custodian Ontology covers:
    - Heritage institutions: museums, archives, libraries, galleries
    - Institution properties: location, founding date, type, collections
    - Staff and personnel at heritage institutions
    - Geographic distribution of heritage institutions
    - Relationships between institutions
    
    Questions about the following are OUT OF SCOPE and should be marked irrelevant:
    - Commercial products or shopping
    - Medical or health advice
    - Legal advice
    - Current news or politics (unless about heritage policy)
    - Personal relationships
    - Technical support for unrelated systems
    - General knowledge not related to heritage
    
    Be generous with relevance: if the question MIGHT relate to heritage institutions,
    mark it as relevant. Only flag clearly unrelated questions as irrelevant.
    """
    
    question: str = dspy.InputField(
        desc="User's question to classify"
    )
    language: str = dspy.InputField(
        desc="Language of the question (nl, en, de, fr)",
        default="nl"
    )
    
    classification: RelevanceClassification = dspy.OutputField(
        desc="Structured relevance classification"
    )


class FykeFilter(dspy.Module):
    """Filter irrelevant questions before template matching.
    
    The fyke catches questions that cannot be answered by the
    Heritage Custodian Ontology, returning a polite standard response.
    """
    
    # Standard responses by language
    STANDARD_RESPONSES = {
        "nl": (
            "Deze vraag kan helaas niet beantwoord worden door de ArchiefAssistent. "
            "Deze service bevat informatie over erfgoedinstellingen in Nederland en "
            "daarbuiten, zoals archieven, musea, bibliotheken en galerieën. "
            "Stel gerust een vraag over deze instellingen!"
        ),
        "en": (
            "Unfortunately, this question cannot be answered by the ArchiefAssistent. "
            "This service contains information about heritage institutions in the "
            "Netherlands and beyond, such as archives, museums, libraries, and galleries. "
            "Feel free to ask a question about these institutions!"
        ),
        "de": (
            "Leider kann diese Frage vom ArchiefAssistent nicht beantwortet werden. "
            "Dieser Service enthält Informationen über Kulturerbe-Einrichtungen in den "
            "Niederlanden und darüber hinaus, wie Archive, Museen, Bibliotheken und Galerien. "
            "Stellen Sie gerne eine Frage zu diesen Einrichtungen!"
        ),
        "fr": (
            "Malheureusement, cette question ne peut pas être répondue par l'ArchiefAssistent. "
            "Ce service contient des informations sur les institutions patrimoniales aux "
            "Pays-Bas et au-delà, telles que les archives, les musées, les bibliothèques "
            "et les galeries. N'hésitez pas à poser une question sur ces institutions!"
        ),
    }
    
    # Confidence threshold for fyke activation
    IRRELEVANCE_THRESHOLD = 0.85
    
    def __init__(self, fast_lm: dspy.LM | None = None):
        """Initialize the fyke filter.
        
        Args:
            fast_lm: Optional fast LM for relevance classification.
                     Recommended: gpt-4o-mini or similar for speed.
        """
        super().__init__()
        self.fast_lm = fast_lm
        self.classifier = dspy.TypedPredictor(HeritageRelevanceSignature)
    
    def forward(
        self,
        question: str,
        language: str = "nl",
    ) -> dspy.Prediction:
        """Classify question relevance and optionally catch in fyke.
        
        Returns:
            Prediction with:
                - is_relevant: bool
                - caught_by_fyke: bool
                - fyke_response: str | None (if caught)
                - reasoning: str
                - confidence: float
        """
        # Use fast LM if configured
        if self.fast_lm:
            with dspy.settings.context(lm=self.fast_lm):
                result = self.classifier(question=question, language=language)
        else:
            result = self.classifier(question=question, language=language)
        
        classification = result.classification
        
        # Determine if caught by fyke
        caught = (
            not classification.is_relevant 
            and classification.confidence >= self.IRRELEVANCE_THRESHOLD
        )
        
        # Get appropriate response
        fyke_response = None
        if caught:
            fyke_response = self.STANDARD_RESPONSES.get(
                language, 
                self.STANDARD_RESPONSES["en"]
            )
        
        return dspy.Prediction(
            is_relevant=classification.is_relevant,
            caught_by_fyke=caught,
            fyke_response=fyke_response,
            reasoning=classification.reasoning,
            confidence=classification.confidence,
            detected_topics=classification.detected_topics,
        )


# Example usage in the RAG pipeline
class HeritageRAGWithFyke(dspy.Module):
    """Heritage RAG with fyke pre-filter."""
    
    def __init__(self):
        super().__init__()
        self.fyke = FykeFilter()
        self.router = HeritageQueryRouter()
        self.template_classifier = TemplateClassifier()
        # ... other components
    
    async def answer(
        self,
        question: str,
        language: str = "nl",
    ) -> dspy.Prediction:
        """Answer question with fyke pre-filtering."""
        
        # Step 1: Check relevance (fyke filter)
        relevance = self.fyke(question=question, language=language)
        
        if relevance.caught_by_fyke:
            # Question is irrelevant - return standard response
            return dspy.Prediction(
                answer=relevance.fyke_response,
                caught_by_fyke=True,
                reasoning=relevance.reasoning,
                confidence=relevance.confidence,
            )
        
        # Step 2: Route relevant question to templates
        routing = self.router(question=question, language=language)
        
        # ... continue with normal processing

Fyke Examples

Question Language Relevant? Reasoning
"Welke archieven zijn er in Utrecht?" nl Yes Asks about archives in a location
"Waar kan ik tandpasta met korting kopen?" nl No Shopping query, not heritage
"What is the weather in Amsterdam?" en No Weather query, not heritage
"Wie is de directeur van het Rijksmuseum?" nl Yes Asks about museum staff
"How do I reset my password?" en No Technical support, not heritage
"Welke musea hebben een Van Gogh collectie?" nl Yes Asks about museum collections

Ontology Gap Detection

Identifying Gaps Through Template Failure

When a relevant question cannot be mapped to any template, this signals a potential ontology gap:

"""Detect ontology gaps through template matching failures."""

from dataclasses import dataclass
from datetime import datetime


@dataclass
class OntologyGapReport:
    """Report of a potential ontology gap."""
    
    question: str
    timestamp: datetime
    detected_intent: str
    detected_entities: list[str]
    required_classes: list[str]  # Classes that would be needed
    required_properties: list[str]  # Properties that would be needed
    existing_coverage: str  # What the ontology currently covers
    gap_description: str
    priority: str  # high, medium, low
    

class OntologyGapDetector:
    """Detect gaps in the Heritage Custodian Ontology."""
    
    # Known ontology capabilities
    ONTOLOGY_COVERAGE = {
        "institution_identity": {
            "classes": ["crm:E39_Actor"],
            "properties": ["skos:prefLabel", "hc:institutionType", "schema:description"],
            "description": "Institution names, types, and descriptions",
        },
        "institution_location": {
            "classes": ["crm:E39_Actor"],
            "properties": ["schema:addressLocality", "schema:addressCountry", "schema:geo"],
            "description": "Institution geographic locations",
        },
        "institution_founding": {
            "classes": ["crm:E39_Actor"],
            "properties": ["schema:foundingDate", "schema:dissolutionDate"],
            "description": "Institution founding and closure dates",
        },
        "staff_current": {
            "classes": ["schema:Person"],
            "properties": ["schema:name", "schema:jobTitle"],
            "description": "Current staff members and roles",
        },
    }
    
    # Known gaps (to be expanded as gaps are discovered)
    KNOWN_GAPS = {
        "staff_history": {
            "description": "Historical employment records",
            "example_questions": [
                "Wie was de eerste directeur van het Rijksmuseum?",
                "Welke archivarissen hebben bij meerdere instellingen gewerkt?",
            ],
            "required_modeling": "Employment periods with start/end dates",
        },
        "collection_items": {
            "description": "Individual collection items",
            "example_questions": [
                "Welke musea hebben werken van Rembrandt?",
                "Waar kan ik de Nachtwacht zien?",
            ],
            "required_modeling": "Collection items linked to institutions",
        },
        "institutional_relationships": {
            "description": "Relationships between institutions",
            "example_questions": [
                "Welke instellingen zijn onderdeel van de Reinwardt Academie?",
                "Met welke musea werkt het Rijksmuseum samen?",
            ],
            "required_modeling": "Formal relationships (parent, partner, etc.)",
        },
    }
    
    def analyze_unmatched_question(
        self,
        question: str,
        routing: dspy.Prediction,
    ) -> OntologyGapReport | None:
        """Analyze why a relevant question couldn't be matched to a template.
        
        Args:
            question: The user's question
            routing: Routing prediction with intent, entities, etc.
            
        Returns:
            OntologyGapReport if a gap is detected, None otherwise
        """
        # Check if this matches a known gap pattern
        for gap_id, gap_info in self.KNOWN_GAPS.items():
            for example in gap_info["example_questions"]:
                if self._is_similar_question(question, example):
                    return OntologyGapReport(
                        question=question,
                        timestamp=datetime.now(),
                        detected_intent=routing.intent,
                        detected_entities=routing.entities,
                        required_classes=self._infer_required_classes(gap_id),
                        required_properties=self._infer_required_properties(gap_id),
                        existing_coverage=self._describe_current_coverage(routing.intent),
                        gap_description=gap_info["description"],
                        priority=self._assess_priority(gap_id),
                    )
        
        # Unknown gap - log for review
        return OntologyGapReport(
            question=question,
            timestamp=datetime.now(),
            detected_intent=routing.intent,
            detected_entities=routing.entities,
            required_classes=["UNKNOWN"],
            required_properties=["UNKNOWN"],
            existing_coverage=self._describe_current_coverage(routing.intent),
            gap_description="Unknown gap - requires manual review",
            priority="medium",
        )
    
    def _is_similar_question(self, q1: str, q2: str) -> bool:
        """Check if two questions are semantically similar."""
        # Simple implementation - could use embeddings
        q1_lower = q1.lower()
        q2_lower = q2.lower()
        
        # Check for key term overlap
        q1_terms = set(q1_lower.split())
        q2_terms = set(q2_lower.split())
        
        overlap = len(q1_terms & q2_terms) / max(len(q1_terms), len(q2_terms))
        return overlap > 0.5

Gap Reporting Dashboard

"""Generate ontology coverage reports."""

from collections import Counter


def generate_coverage_report(
    competency_questions: list[CompetencyQuestion],
    gap_reports: list[OntologyGapReport],
) -> dict:
    """Generate an ontology coverage report.
    
    Returns:
        Dict with coverage statistics and gap analysis
    """
    # Count CQ statuses
    status_counts = Counter(cq.status for cq in competency_questions)
    
    # Calculate coverage percentage
    total_cqs = len(competency_questions)
    answerable = status_counts.get(CQStatus.ANSWERABLE, 0)
    partial = status_counts.get(CQStatus.PARTIAL, 0)
    coverage_pct = (answerable + 0.5 * partial) / total_cqs * 100 if total_cqs > 0 else 0
    
    # Group gaps by category
    gaps_by_category = {}
    for gap in gap_reports:
        category = gap.detected_intent
        if category not in gaps_by_category:
            gaps_by_category[category] = []
        gaps_by_category[category].append(gap)
    
    # Identify most common gaps
    gap_descriptions = Counter(g.gap_description for g in gap_reports)
    
    return {
        "summary": {
            "total_competency_questions": total_cqs,
            "answerable": answerable,
            "partial": partial,
            "unanswerable": status_counts.get(CQStatus.UNANSWERABLE, 0),
            "out_of_scope": status_counts.get(CQStatus.OUT_OF_SCOPE, 0),
            "coverage_percentage": round(coverage_pct, 1),
        },
        "gaps_by_category": {
            cat: len(gaps) for cat, gaps in gaps_by_category.items()
        },
        "top_gaps": gap_descriptions.most_common(10),
        "recommendations": _generate_recommendations(gap_reports),
    }


def _generate_recommendations(gaps: list[OntologyGapReport]) -> list[str]:
    """Generate recommendations for ontology improvements."""
    recommendations = []
    
    # Analyze gap patterns
    gap_types = Counter(g.gap_description for g in gaps)
    
    for gap_type, count in gap_types.most_common(5):
        if count >= 3:
            recommendations.append(
                f"High priority: Add modeling for '{gap_type}' "
                f"({count} unanswered questions)"
            )
    
    return recommendations

Competency Question Registry

YAML Format for CQ Tracking

# data/competency_questions.yaml
version: "1.0.0"

categories:
  geographic:
    description: "Questions about institution locations"
    coverage: high
  statistical:
    description: "Questions about counts and distributions"
    coverage: high
  relational:
    description: "Questions about relationships between entities"
    coverage: low
  temporal:
    description: "Questions about historical changes"
    coverage: medium
  biographical:
    description: "Questions about people in heritage sector"
    coverage: medium

competency_questions:
  # ANSWERABLE - Geographic
  - id: CQ-GEO-001
    question_nl: "Welke archieven zijn er in een bepaalde provincie?"
    question_en: "What archives exist in a given province?"
    category: geographic
    status: answerable
    template_id: region_institution_search
    ontology_classes: [crm:E39_Actor]
    ontology_properties: [hc:institutionType, schema:addressLocality]
    
  - id: CQ-GEO-002
    question_nl: "Welke musea zijn er in een bepaalde stad?"
    question_en: "What museums exist in a given city?"
    category: geographic
    status: answerable
    template_id: city_institution_search
    ontology_classes: [crm:E39_Actor]
    ontology_properties: [hc:institutionType, schema:addressLocality]

  # ANSWERABLE - Statistical
  - id: CQ-STAT-001
    question_nl: "Hoeveel musea zijn er in Nederland?"
    question_en: "How many museums are there in the Netherlands?"
    category: statistical
    status: answerable
    template_id: count_institutions_by_type
    ontology_classes: [crm:E39_Actor]
    ontology_properties: [hc:institutionType, schema:addressCountry]

  # PARTIAL - Temporal
  - id: CQ-TEMP-001
    question_nl: "Wat is het oudste archief in Nederland?"
    question_en: "What is the oldest archive in the Netherlands?"
    category: temporal
    status: partial
    template_id: find_oldest_institution
    ontology_classes: [crm:E39_Actor]
    ontology_properties: [schema:foundingDate]
    notes: "Founding dates incomplete for many institutions"

  # UNANSWERABLE - Relational (GAP)
  - id: CQ-REL-001
    question_nl: "Welke instellingen hebben dezelfde directeur gehad?"
    question_en: "Which institutions have shared the same director?"
    category: relational
    status: unanswerable
    template_id: null
    ontology_classes: [crm:E39_Actor, schema:Person]
    ontology_properties: [schema:employee]  # MISSING: employment history
    gap_description: "No modeling for historical employment relationships"
    recommended_fix: "Add employment periods with start/end dates"

  - id: CQ-REL-002
    question_nl: "Welke musea zijn onderdeel van een groter netwerk?"
    question_en: "Which museums are part of a larger network?"
    category: relational
    status: unanswerable
    template_id: null
    ontology_classes: [crm:E39_Actor, org:Organization]
    ontology_properties: [org:subOrganizationOf]  # MISSING
    gap_description: "No modeling for organizational hierarchies"

  # OUT OF SCOPE - Fyke
  - id: CQ-OUT-001
    question_nl: "Waar kan ik tandpasta met korting kopen?"
    question_en: "Where can I buy toothpaste with a discount?"
    category: out_of_scope
    status: out_of_scope
    template_id: null
    ontology_classes: []
    ontology_properties: []
    notes: "Shopping query - route to fyke"

  - id: CQ-OUT-002
    question_nl: "Wat is het weer morgen in Amsterdam?"
    question_en: "What is the weather tomorrow in Amsterdam?"
    category: out_of_scope
    status: out_of_scope
    template_id: null
    ontology_classes: []
    ontology_properties: []
    notes: "Weather query - route to fyke"

Integration with Template System

Template-CQ Bidirectional Linking

"""Link templates to competency questions bidirectionally."""

def validate_template_cq_coverage(
    templates: dict[str, SPARQLTemplate],
    competency_questions: list[CompetencyQuestion],
) -> dict:
    """Validate that templates cover expected CQs and vice versa.
    
    Returns:
        Validation report with coverage analysis
    """
    # Templates without CQ coverage
    templates_without_cq = []
    for template_id, template in templates.items():
        matching_cqs = [
            cq for cq in competency_questions 
            if cq.template_id == template_id
        ]
        if not matching_cqs:
            templates_without_cq.append(template_id)
    
    # CQs without template coverage
    cqs_without_template = [
        cq for cq in competency_questions
        if cq.status == CQStatus.ANSWERABLE and cq.template_id is None
    ]
    
    # CQs marked answerable but template doesn't exist
    orphaned_cqs = [
        cq for cq in competency_questions
        if cq.template_id and cq.template_id not in templates
    ]
    
    return {
        "templates_without_cq": templates_without_cq,
        "cqs_without_template": [cq.id for cq in cqs_without_template],
        "orphaned_cqs": [cq.id for cq in orphaned_cqs],
        "coverage_complete": (
            len(templates_without_cq) == 0 
            and len(cqs_without_template) == 0
            and len(orphaned_cqs) == 0
        ),
    }

Summary

The template-based SPARQL system provides critical insights into ontology coverage:

Aspect Implementation Benefit
Competency Questions YAML registry with status tracking Defines what ontology should answer
Fyke Filter DSPy relevance classifier Catches irrelevant questions early
Gap Detection Analysis of unmatched questions Identifies ontology improvements needed
Coverage Reports Automated metrics generation Tracks ontology completeness over time
Bidirectional Linking Templates ↔ CQs validation Ensures consistency

Key Metrics

  • Coverage % = (Answerable CQs + 0.5 × Partial CQs) / Total Relevant CQs
  • Fyke Rate = Out-of-scope questions / Total questions
  • Gap Rate = Unanswerable relevant questions / Total relevant questions

Target: >90% coverage of relevant competency questions.