739 lines
26 KiB
Markdown
739 lines
26 KiB
Markdown
# Competency Questions and Ontology Coverage
|
||
|
||
## Overview
|
||
|
||
This document describes how the template-based SPARQL system serves a dual purpose:
|
||
|
||
1. **Query generation** - Translate user questions to valid SPARQL
|
||
2. **Ontology validation** - Identify gaps in the Heritage Custodian Ontology through unanswerable questions
|
||
|
||
The principle is simple: **if a relevant question cannot be mapped to a SPARQL template, the ontology likely lacks coverage for that domain**.
|
||
|
||
## Competency Questions (CQs)
|
||
|
||
### What are Competency Questions?
|
||
|
||
Competency Questions are natural language questions that an ontology should be able to answer. They serve as:
|
||
|
||
- **Requirements** during ontology design
|
||
- **Validation criteria** during ontology evaluation
|
||
- **Coverage metrics** for ongoing maintenance
|
||
|
||
> "Competency questions define what the ontology knows about. If you can't answer a competency question with a SPARQL query, your ontology is incomplete." — Grüninger & Fox (1995)
|
||
|
||
### CQs as Template Coverage Metrics
|
||
|
||
Each SPARQL template implicitly defines a set of Competency Questions:
|
||
|
||
```yaml
|
||
# Template definition implies these CQs are answerable:
|
||
region_institution_search:
|
||
question_patterns:
|
||
- "Welke {institution_type_nl} zijn er in {province}?"
|
||
- "Which {institution_type_en} are in {province}?"
|
||
|
||
# Implied Competency Questions:
|
||
# CQ1: What archives exist in a given Dutch province?
|
||
# CQ2: What museums exist in a given Dutch province?
|
||
# CQ3: What libraries exist in a given Dutch province?
|
||
# etc.
|
||
```
|
||
|
||
### Tracking Ontology Coverage
|
||
|
||
```python
|
||
"""Track which competency questions the ontology can answer."""
|
||
|
||
from dataclasses import dataclass
|
||
from enum import Enum
|
||
|
||
|
||
class CQStatus(Enum):
|
||
"""Status of a competency question."""
|
||
ANSWERABLE = "answerable" # Has matching template
|
||
PARTIAL = "partial" # Template exists but limited
|
||
UNANSWERABLE = "unanswerable" # No template, ontology gap
|
||
OUT_OF_SCOPE = "out_of_scope" # Not relevant to ontology (fyke)
|
||
|
||
|
||
@dataclass
|
||
class CompetencyQuestion:
|
||
"""A competency question for ontology validation."""
|
||
|
||
id: str
|
||
question_nl: str
|
||
question_en: str
|
||
category: str # geographic, statistical, relational, etc.
|
||
status: CQStatus
|
||
template_id: str | None # Template that answers this CQ
|
||
ontology_classes: list[str] # Classes needed to answer
|
||
ontology_properties: list[str] # Properties needed to answer
|
||
notes: str | None = None
|
||
|
||
|
||
# Example CQ registry
|
||
COMPETENCY_QUESTIONS = [
|
||
CompetencyQuestion(
|
||
id="CQ-GEO-001",
|
||
question_nl="Welke archieven zijn er in een bepaalde provincie?",
|
||
question_en="What archives exist in a given province?",
|
||
category="geographic",
|
||
status=CQStatus.ANSWERABLE,
|
||
template_id="region_institution_search",
|
||
ontology_classes=["crm:E39_Actor"],
|
||
ontology_properties=["hc:institutionType", "schema:addressLocality"],
|
||
),
|
||
CompetencyQuestion(
|
||
id="CQ-REL-001",
|
||
question_nl="Welke instellingen hebben dezelfde directeur gehad?",
|
||
question_en="Which institutions have shared the same director?",
|
||
category="relational",
|
||
status=CQStatus.UNANSWERABLE, # GAP: No staff employment history
|
||
template_id=None,
|
||
ontology_classes=["crm:E39_Actor", "schema:Person"],
|
||
ontology_properties=["schema:employee", "schema:worksFor"], # MISSING
|
||
notes="Ontology lacks employment history modeling",
|
||
),
|
||
CompetencyQuestion(
|
||
id="CQ-OUT-001",
|
||
question_nl="Waar kan ik tandpasta met korting kopen?",
|
||
question_en="Where can I buy toothpaste with a discount?",
|
||
category="out_of_scope",
|
||
status=CQStatus.OUT_OF_SCOPE,
|
||
template_id=None,
|
||
ontology_classes=[],
|
||
ontology_properties=[],
|
||
notes="Not related to heritage institutions - route to fyke",
|
||
),
|
||
]
|
||
```
|
||
|
||
## The Fyke: Catching Irrelevant Questions
|
||
|
||
### What is a Fyke?
|
||
|
||
A **fyke** (Dutch: *fuik*) is a type of fish trap - a net that catches fish swimming in a certain direction. In our system, the fyke catches questions that are **irrelevant to the Heritage Custodian Ontology**.
|
||
|
||
```
|
||
User Question
|
||
|
|
||
v
|
||
+------------------+
|
||
| Relevance Filter | <-- Is this about heritage institutions?
|
||
+------------------+
|
||
|
|
||
+---> Relevant?
|
||
| |
|
||
| Yes | No
|
||
| | | |
|
||
| v | v
|
||
| Route | +-------+
|
||
| to | | FYKE | <-- Catch irrelevant questions
|
||
| SPARQL| +-------+
|
||
| | |
|
||
| | v
|
||
| | Standard Response:
|
||
| | "Deze vraag kan niet beantwoord worden
|
||
| | door de ArchiefAssistent. De service
|
||
| | bevat informatie over erfgoedinstellingen
|
||
| | zoals archieven, musea en bibliotheken."
|
||
```
|
||
|
||
### Fyke Implementation
|
||
|
||
```python
|
||
"""Fyke: Filter for irrelevant questions."""
|
||
|
||
import dspy
|
||
from typing import Literal
|
||
from pydantic import BaseModel, Field
|
||
|
||
|
||
class RelevanceClassification(BaseModel):
|
||
"""Structured output for relevance classification."""
|
||
|
||
is_relevant: bool = Field(
|
||
description="Whether the question relates to heritage institutions"
|
||
)
|
||
confidence: float = Field(
|
||
ge=0.0, le=1.0,
|
||
description="Confidence in the classification"
|
||
)
|
||
reasoning: str = Field(
|
||
description="Brief explanation of why the question is or isn't relevant"
|
||
)
|
||
detected_topics: list[str] = Field(
|
||
description="Topics detected in the question"
|
||
)
|
||
|
||
|
||
class HeritageRelevanceSignature(dspy.Signature):
|
||
"""Determine if a question is relevant to the Heritage Custodian Ontology.
|
||
|
||
The Heritage Custodian Ontology covers:
|
||
- Heritage institutions: museums, archives, libraries, galleries
|
||
- Institution properties: location, founding date, type, collections
|
||
- Staff and personnel at heritage institutions
|
||
- Geographic distribution of heritage institutions
|
||
- Relationships between institutions
|
||
|
||
Questions about the following are OUT OF SCOPE and should be marked irrelevant:
|
||
- Commercial products or shopping
|
||
- Medical or health advice
|
||
- Legal advice
|
||
- Current news or politics (unless about heritage policy)
|
||
- Personal relationships
|
||
- Technical support for unrelated systems
|
||
- General knowledge not related to heritage
|
||
|
||
Be generous with relevance: if the question MIGHT relate to heritage institutions,
|
||
mark it as relevant. Only flag clearly unrelated questions as irrelevant.
|
||
"""
|
||
|
||
question: str = dspy.InputField(
|
||
desc="User's question to classify"
|
||
)
|
||
language: str = dspy.InputField(
|
||
desc="Language of the question (nl, en, de, fr)",
|
||
default="nl"
|
||
)
|
||
|
||
classification: RelevanceClassification = dspy.OutputField(
|
||
desc="Structured relevance classification"
|
||
)
|
||
|
||
|
||
class FykeFilter(dspy.Module):
|
||
"""Filter irrelevant questions before template matching.
|
||
|
||
The fyke catches questions that cannot be answered by the
|
||
Heritage Custodian Ontology, returning a polite standard response.
|
||
"""
|
||
|
||
# Standard responses by language
|
||
STANDARD_RESPONSES = {
|
||
"nl": (
|
||
"Deze vraag kan helaas niet beantwoord worden door de ArchiefAssistent. "
|
||
"Deze service bevat informatie over erfgoedinstellingen in Nederland en "
|
||
"daarbuiten, zoals archieven, musea, bibliotheken en galerieën. "
|
||
"Stel gerust een vraag over deze instellingen!"
|
||
),
|
||
"en": (
|
||
"Unfortunately, this question cannot be answered by the ArchiefAssistent. "
|
||
"This service contains information about heritage institutions in the "
|
||
"Netherlands and beyond, such as archives, museums, libraries, and galleries. "
|
||
"Feel free to ask a question about these institutions!"
|
||
),
|
||
"de": (
|
||
"Leider kann diese Frage vom ArchiefAssistent nicht beantwortet werden. "
|
||
"Dieser Service enthält Informationen über Kulturerbe-Einrichtungen in den "
|
||
"Niederlanden und darüber hinaus, wie Archive, Museen, Bibliotheken und Galerien. "
|
||
"Stellen Sie gerne eine Frage zu diesen Einrichtungen!"
|
||
),
|
||
"fr": (
|
||
"Malheureusement, cette question ne peut pas être répondue par l'ArchiefAssistent. "
|
||
"Ce service contient des informations sur les institutions patrimoniales aux "
|
||
"Pays-Bas et au-delà, telles que les archives, les musées, les bibliothèques "
|
||
"et les galeries. N'hésitez pas à poser une question sur ces institutions!"
|
||
),
|
||
}
|
||
|
||
# Confidence threshold for fyke activation
|
||
IRRELEVANCE_THRESHOLD = 0.85
|
||
|
||
def __init__(self, fast_lm: dspy.LM | None = None):
|
||
"""Initialize the fyke filter.
|
||
|
||
Args:
|
||
fast_lm: Optional fast LM for relevance classification.
|
||
Recommended: gpt-4o-mini or similar for speed.
|
||
"""
|
||
super().__init__()
|
||
self.fast_lm = fast_lm
|
||
self.classifier = dspy.TypedPredictor(HeritageRelevanceSignature)
|
||
|
||
def forward(
|
||
self,
|
||
question: str,
|
||
language: str = "nl",
|
||
) -> dspy.Prediction:
|
||
"""Classify question relevance and optionally catch in fyke.
|
||
|
||
Returns:
|
||
Prediction with:
|
||
- is_relevant: bool
|
||
- caught_by_fyke: bool
|
||
- fyke_response: str | None (if caught)
|
||
- reasoning: str
|
||
- confidence: float
|
||
"""
|
||
# Use fast LM if configured
|
||
if self.fast_lm:
|
||
with dspy.settings.context(lm=self.fast_lm):
|
||
result = self.classifier(question=question, language=language)
|
||
else:
|
||
result = self.classifier(question=question, language=language)
|
||
|
||
classification = result.classification
|
||
|
||
# Determine if caught by fyke
|
||
caught = (
|
||
not classification.is_relevant
|
||
and classification.confidence >= self.IRRELEVANCE_THRESHOLD
|
||
)
|
||
|
||
# Get appropriate response
|
||
fyke_response = None
|
||
if caught:
|
||
fyke_response = self.STANDARD_RESPONSES.get(
|
||
language,
|
||
self.STANDARD_RESPONSES["en"]
|
||
)
|
||
|
||
return dspy.Prediction(
|
||
is_relevant=classification.is_relevant,
|
||
caught_by_fyke=caught,
|
||
fyke_response=fyke_response,
|
||
reasoning=classification.reasoning,
|
||
confidence=classification.confidence,
|
||
detected_topics=classification.detected_topics,
|
||
)
|
||
|
||
|
||
# Example usage in the RAG pipeline
|
||
class HeritageRAGWithFyke(dspy.Module):
|
||
"""Heritage RAG with fyke pre-filter."""
|
||
|
||
def __init__(self):
|
||
super().__init__()
|
||
self.fyke = FykeFilter()
|
||
self.router = HeritageQueryRouter()
|
||
self.template_classifier = TemplateClassifier()
|
||
# ... other components
|
||
|
||
async def answer(
|
||
self,
|
||
question: str,
|
||
language: str = "nl",
|
||
) -> dspy.Prediction:
|
||
"""Answer question with fyke pre-filtering."""
|
||
|
||
# Step 1: Check relevance (fyke filter)
|
||
relevance = self.fyke(question=question, language=language)
|
||
|
||
if relevance.caught_by_fyke:
|
||
# Question is irrelevant - return standard response
|
||
return dspy.Prediction(
|
||
answer=relevance.fyke_response,
|
||
caught_by_fyke=True,
|
||
reasoning=relevance.reasoning,
|
||
confidence=relevance.confidence,
|
||
)
|
||
|
||
# Step 2: Route relevant question to templates
|
||
routing = self.router(question=question, language=language)
|
||
|
||
# ... continue with normal processing
|
||
```
|
||
|
||
### Fyke Examples
|
||
|
||
| Question | Language | Relevant? | Reasoning |
|
||
|----------|----------|-----------|-----------|
|
||
| "Welke archieven zijn er in Utrecht?" | nl | ✅ Yes | Asks about archives in a location |
|
||
| "Waar kan ik tandpasta met korting kopen?" | nl | ❌ No | Shopping query, not heritage |
|
||
| "What is the weather in Amsterdam?" | en | ❌ No | Weather query, not heritage |
|
||
| "Wie is de directeur van het Rijksmuseum?" | nl | ✅ Yes | Asks about museum staff |
|
||
| "How do I reset my password?" | en | ❌ No | Technical support, not heritage |
|
||
| "Welke musea hebben een Van Gogh collectie?" | nl | ✅ Yes | Asks about museum collections |
|
||
|
||
## Ontology Gap Detection
|
||
|
||
### Identifying Gaps Through Template Failure
|
||
|
||
When a relevant question cannot be mapped to any template, this signals a potential ontology gap:
|
||
|
||
```python
|
||
"""Detect ontology gaps through template matching failures."""
|
||
|
||
from dataclasses import dataclass
|
||
from datetime import datetime
|
||
|
||
|
||
@dataclass
|
||
class OntologyGapReport:
|
||
"""Report of a potential ontology gap."""
|
||
|
||
question: str
|
||
timestamp: datetime
|
||
detected_intent: str
|
||
detected_entities: list[str]
|
||
required_classes: list[str] # Classes that would be needed
|
||
required_properties: list[str] # Properties that would be needed
|
||
existing_coverage: str # What the ontology currently covers
|
||
gap_description: str
|
||
priority: str # high, medium, low
|
||
|
||
|
||
class OntologyGapDetector:
|
||
"""Detect gaps in the Heritage Custodian Ontology."""
|
||
|
||
# Known ontology capabilities
|
||
ONTOLOGY_COVERAGE = {
|
||
"institution_identity": {
|
||
"classes": ["crm:E39_Actor"],
|
||
"properties": ["skos:prefLabel", "hc:institutionType", "schema:description"],
|
||
"description": "Institution names, types, and descriptions",
|
||
},
|
||
"institution_location": {
|
||
"classes": ["crm:E39_Actor"],
|
||
"properties": ["schema:addressLocality", "schema:addressCountry", "schema:geo"],
|
||
"description": "Institution geographic locations",
|
||
},
|
||
"institution_founding": {
|
||
"classes": ["crm:E39_Actor"],
|
||
"properties": ["schema:foundingDate", "schema:dissolutionDate"],
|
||
"description": "Institution founding and closure dates",
|
||
},
|
||
"staff_current": {
|
||
"classes": ["schema:Person"],
|
||
"properties": ["schema:name", "schema:jobTitle"],
|
||
"description": "Current staff members and roles",
|
||
},
|
||
}
|
||
|
||
# Known gaps (to be expanded as gaps are discovered)
|
||
KNOWN_GAPS = {
|
||
"staff_history": {
|
||
"description": "Historical employment records",
|
||
"example_questions": [
|
||
"Wie was de eerste directeur van het Rijksmuseum?",
|
||
"Welke archivarissen hebben bij meerdere instellingen gewerkt?",
|
||
],
|
||
"required_modeling": "Employment periods with start/end dates",
|
||
},
|
||
"collection_items": {
|
||
"description": "Individual collection items",
|
||
"example_questions": [
|
||
"Welke musea hebben werken van Rembrandt?",
|
||
"Waar kan ik de Nachtwacht zien?",
|
||
],
|
||
"required_modeling": "Collection items linked to institutions",
|
||
},
|
||
"institutional_relationships": {
|
||
"description": "Relationships between institutions",
|
||
"example_questions": [
|
||
"Welke instellingen zijn onderdeel van de Reinwardt Academie?",
|
||
"Met welke musea werkt het Rijksmuseum samen?",
|
||
],
|
||
"required_modeling": "Formal relationships (parent, partner, etc.)",
|
||
},
|
||
}
|
||
|
||
def analyze_unmatched_question(
|
||
self,
|
||
question: str,
|
||
routing: dspy.Prediction,
|
||
) -> OntologyGapReport | None:
|
||
"""Analyze why a relevant question couldn't be matched to a template.
|
||
|
||
Args:
|
||
question: The user's question
|
||
routing: Routing prediction with intent, entities, etc.
|
||
|
||
Returns:
|
||
OntologyGapReport if a gap is detected, None otherwise
|
||
"""
|
||
# Check if this matches a known gap pattern
|
||
for gap_id, gap_info in self.KNOWN_GAPS.items():
|
||
for example in gap_info["example_questions"]:
|
||
if self._is_similar_question(question, example):
|
||
return OntologyGapReport(
|
||
question=question,
|
||
timestamp=datetime.now(),
|
||
detected_intent=routing.intent,
|
||
detected_entities=routing.entities,
|
||
required_classes=self._infer_required_classes(gap_id),
|
||
required_properties=self._infer_required_properties(gap_id),
|
||
existing_coverage=self._describe_current_coverage(routing.intent),
|
||
gap_description=gap_info["description"],
|
||
priority=self._assess_priority(gap_id),
|
||
)
|
||
|
||
# Unknown gap - log for review
|
||
return OntologyGapReport(
|
||
question=question,
|
||
timestamp=datetime.now(),
|
||
detected_intent=routing.intent,
|
||
detected_entities=routing.entities,
|
||
required_classes=["UNKNOWN"],
|
||
required_properties=["UNKNOWN"],
|
||
existing_coverage=self._describe_current_coverage(routing.intent),
|
||
gap_description="Unknown gap - requires manual review",
|
||
priority="medium",
|
||
)
|
||
|
||
def _is_similar_question(self, q1: str, q2: str) -> bool:
|
||
"""Check if two questions are semantically similar."""
|
||
# Simple implementation - could use embeddings
|
||
q1_lower = q1.lower()
|
||
q2_lower = q2.lower()
|
||
|
||
# Check for key term overlap
|
||
q1_terms = set(q1_lower.split())
|
||
q2_terms = set(q2_lower.split())
|
||
|
||
overlap = len(q1_terms & q2_terms) / max(len(q1_terms), len(q2_terms))
|
||
return overlap > 0.5
|
||
```
|
||
|
||
### Gap Reporting Dashboard
|
||
|
||
```python
|
||
"""Generate ontology coverage reports."""
|
||
|
||
from collections import Counter
|
||
|
||
|
||
def generate_coverage_report(
|
||
competency_questions: list[CompetencyQuestion],
|
||
gap_reports: list[OntologyGapReport],
|
||
) -> dict:
|
||
"""Generate an ontology coverage report.
|
||
|
||
Returns:
|
||
Dict with coverage statistics and gap analysis
|
||
"""
|
||
# Count CQ statuses
|
||
status_counts = Counter(cq.status for cq in competency_questions)
|
||
|
||
# Calculate coverage percentage
|
||
total_cqs = len(competency_questions)
|
||
answerable = status_counts.get(CQStatus.ANSWERABLE, 0)
|
||
partial = status_counts.get(CQStatus.PARTIAL, 0)
|
||
coverage_pct = (answerable + 0.5 * partial) / total_cqs * 100 if total_cqs > 0 else 0
|
||
|
||
# Group gaps by category
|
||
gaps_by_category = {}
|
||
for gap in gap_reports:
|
||
category = gap.detected_intent
|
||
if category not in gaps_by_category:
|
||
gaps_by_category[category] = []
|
||
gaps_by_category[category].append(gap)
|
||
|
||
# Identify most common gaps
|
||
gap_descriptions = Counter(g.gap_description for g in gap_reports)
|
||
|
||
return {
|
||
"summary": {
|
||
"total_competency_questions": total_cqs,
|
||
"answerable": answerable,
|
||
"partial": partial,
|
||
"unanswerable": status_counts.get(CQStatus.UNANSWERABLE, 0),
|
||
"out_of_scope": status_counts.get(CQStatus.OUT_OF_SCOPE, 0),
|
||
"coverage_percentage": round(coverage_pct, 1),
|
||
},
|
||
"gaps_by_category": {
|
||
cat: len(gaps) for cat, gaps in gaps_by_category.items()
|
||
},
|
||
"top_gaps": gap_descriptions.most_common(10),
|
||
"recommendations": _generate_recommendations(gap_reports),
|
||
}
|
||
|
||
|
||
def _generate_recommendations(gaps: list[OntologyGapReport]) -> list[str]:
|
||
"""Generate recommendations for ontology improvements."""
|
||
recommendations = []
|
||
|
||
# Analyze gap patterns
|
||
gap_types = Counter(g.gap_description for g in gaps)
|
||
|
||
for gap_type, count in gap_types.most_common(5):
|
||
if count >= 3:
|
||
recommendations.append(
|
||
f"High priority: Add modeling for '{gap_type}' "
|
||
f"({count} unanswered questions)"
|
||
)
|
||
|
||
return recommendations
|
||
```
|
||
|
||
## Competency Question Registry
|
||
|
||
### YAML Format for CQ Tracking
|
||
|
||
```yaml
|
||
# data/competency_questions.yaml
|
||
version: "1.0.0"
|
||
|
||
categories:
|
||
geographic:
|
||
description: "Questions about institution locations"
|
||
coverage: high
|
||
statistical:
|
||
description: "Questions about counts and distributions"
|
||
coverage: high
|
||
relational:
|
||
description: "Questions about relationships between entities"
|
||
coverage: low
|
||
temporal:
|
||
description: "Questions about historical changes"
|
||
coverage: medium
|
||
biographical:
|
||
description: "Questions about people in heritage sector"
|
||
coverage: medium
|
||
|
||
competency_questions:
|
||
# ANSWERABLE - Geographic
|
||
- id: CQ-GEO-001
|
||
question_nl: "Welke archieven zijn er in een bepaalde provincie?"
|
||
question_en: "What archives exist in a given province?"
|
||
category: geographic
|
||
status: answerable
|
||
template_id: region_institution_search
|
||
ontology_classes: [crm:E39_Actor]
|
||
ontology_properties: [hc:institutionType, schema:addressLocality]
|
||
|
||
- id: CQ-GEO-002
|
||
question_nl: "Welke musea zijn er in een bepaalde stad?"
|
||
question_en: "What museums exist in a given city?"
|
||
category: geographic
|
||
status: answerable
|
||
template_id: city_institution_search
|
||
ontology_classes: [crm:E39_Actor]
|
||
ontology_properties: [hc:institutionType, schema:addressLocality]
|
||
|
||
# ANSWERABLE - Statistical
|
||
- id: CQ-STAT-001
|
||
question_nl: "Hoeveel musea zijn er in Nederland?"
|
||
question_en: "How many museums are there in the Netherlands?"
|
||
category: statistical
|
||
status: answerable
|
||
template_id: count_institutions_by_type
|
||
ontology_classes: [crm:E39_Actor]
|
||
ontology_properties: [hc:institutionType, schema:addressCountry]
|
||
|
||
# PARTIAL - Temporal
|
||
- id: CQ-TEMP-001
|
||
question_nl: "Wat is het oudste archief in Nederland?"
|
||
question_en: "What is the oldest archive in the Netherlands?"
|
||
category: temporal
|
||
status: partial
|
||
template_id: find_oldest_institution
|
||
ontology_classes: [crm:E39_Actor]
|
||
ontology_properties: [schema:foundingDate]
|
||
notes: "Founding dates incomplete for many institutions"
|
||
|
||
# UNANSWERABLE - Relational (GAP)
|
||
- id: CQ-REL-001
|
||
question_nl: "Welke instellingen hebben dezelfde directeur gehad?"
|
||
question_en: "Which institutions have shared the same director?"
|
||
category: relational
|
||
status: unanswerable
|
||
template_id: null
|
||
ontology_classes: [crm:E39_Actor, schema:Person]
|
||
ontology_properties: [schema:employee] # MISSING: employment history
|
||
gap_description: "No modeling for historical employment relationships"
|
||
recommended_fix: "Add employment periods with start/end dates"
|
||
|
||
- id: CQ-REL-002
|
||
question_nl: "Welke musea zijn onderdeel van een groter netwerk?"
|
||
question_en: "Which museums are part of a larger network?"
|
||
category: relational
|
||
status: unanswerable
|
||
template_id: null
|
||
ontology_classes: [crm:E39_Actor, org:Organization]
|
||
ontology_properties: [org:subOrganizationOf] # MISSING
|
||
gap_description: "No modeling for organizational hierarchies"
|
||
|
||
# OUT OF SCOPE - Fyke
|
||
- id: CQ-OUT-001
|
||
question_nl: "Waar kan ik tandpasta met korting kopen?"
|
||
question_en: "Where can I buy toothpaste with a discount?"
|
||
category: out_of_scope
|
||
status: out_of_scope
|
||
template_id: null
|
||
ontology_classes: []
|
||
ontology_properties: []
|
||
notes: "Shopping query - route to fyke"
|
||
|
||
- id: CQ-OUT-002
|
||
question_nl: "Wat is het weer morgen in Amsterdam?"
|
||
question_en: "What is the weather tomorrow in Amsterdam?"
|
||
category: out_of_scope
|
||
status: out_of_scope
|
||
template_id: null
|
||
ontology_classes: []
|
||
ontology_properties: []
|
||
notes: "Weather query - route to fyke"
|
||
```
|
||
|
||
## Integration with Template System
|
||
|
||
### Template-CQ Bidirectional Linking
|
||
|
||
```python
|
||
"""Link templates to competency questions bidirectionally."""
|
||
|
||
def validate_template_cq_coverage(
|
||
templates: dict[str, SPARQLTemplate],
|
||
competency_questions: list[CompetencyQuestion],
|
||
) -> dict:
|
||
"""Validate that templates cover expected CQs and vice versa.
|
||
|
||
Returns:
|
||
Validation report with coverage analysis
|
||
"""
|
||
# Templates without CQ coverage
|
||
templates_without_cq = []
|
||
for template_id, template in templates.items():
|
||
matching_cqs = [
|
||
cq for cq in competency_questions
|
||
if cq.template_id == template_id
|
||
]
|
||
if not matching_cqs:
|
||
templates_without_cq.append(template_id)
|
||
|
||
# CQs without template coverage
|
||
cqs_without_template = [
|
||
cq for cq in competency_questions
|
||
if cq.status == CQStatus.ANSWERABLE and cq.template_id is None
|
||
]
|
||
|
||
# CQs marked answerable but template doesn't exist
|
||
orphaned_cqs = [
|
||
cq for cq in competency_questions
|
||
if cq.template_id and cq.template_id not in templates
|
||
]
|
||
|
||
return {
|
||
"templates_without_cq": templates_without_cq,
|
||
"cqs_without_template": [cq.id for cq in cqs_without_template],
|
||
"orphaned_cqs": [cq.id for cq in orphaned_cqs],
|
||
"coverage_complete": (
|
||
len(templates_without_cq) == 0
|
||
and len(cqs_without_template) == 0
|
||
and len(orphaned_cqs) == 0
|
||
),
|
||
}
|
||
```
|
||
|
||
## Summary
|
||
|
||
The template-based SPARQL system provides critical insights into ontology coverage:
|
||
|
||
| Aspect | Implementation | Benefit |
|
||
|--------|----------------|---------|
|
||
| **Competency Questions** | YAML registry with status tracking | Defines what ontology should answer |
|
||
| **Fyke Filter** | DSPy relevance classifier | Catches irrelevant questions early |
|
||
| **Gap Detection** | Analysis of unmatched questions | Identifies ontology improvements needed |
|
||
| **Coverage Reports** | Automated metrics generation | Tracks ontology completeness over time |
|
||
| **Bidirectional Linking** | Templates ↔ CQs validation | Ensures consistency |
|
||
|
||
### Key Metrics
|
||
|
||
- **Coverage %** = (Answerable CQs + 0.5 × Partial CQs) / Total Relevant CQs
|
||
- **Fyke Rate** = Out-of-scope questions / Total questions
|
||
- **Gap Rate** = Unanswerable relevant questions / Total relevant questions
|
||
|
||
Target: **>90% coverage** of relevant competency questions.
|