- Updated documentation to clarify integration points with existing components in the RAG pipeline and DSPy framework. - Added detailed mapping of SPARQL templates to context templates for improved specificity filtering. - Implemented wrapper patterns around existing classifiers to extend functionality without duplication. - Introduced new tests for the SpecificityAwareClassifier and SPARQLToContextMapper to ensure proper integration and functionality. - Enhanced the CustodianRDFConverter to include ISO country and subregion codes from GHCID for better geospatial data handling.
24 KiB
Specificity Score System - Design Patterns
Overview
This document describes the software design patterns used in the specificity scoring system. The patterns ensure maintainability, testability, and extensibility.
INTEGRATION NOTE: The design patterns in this document build upon existing infrastructure in the codebase. Specifically:
- The existing
TemplateClassifieratbackend/rag/template_sparql.py:1104handles question → SPARQL template classification- The existing
TemplateClassifierSignatureatbackend/rag/template_sparql.py:634defines the DSPy signature- New code should wrap the existing classifier using the Decorator Pattern (see Pattern 2 below)
- DO NOT recreate template classification logic - extend the existing implementation
Existing Infrastructure Reference
Before implementing new patterns, understand what already exists:
| Component | Location | Purpose |
|---|---|---|
TemplateClassifier |
backend/rag/template_sparql.py:1104 |
DSPy module for question classification |
TemplateClassifierSignature |
backend/rag/template_sparql.py:634 |
Input/output signature definition |
SlotExtractor |
backend/rag/template_sparql.py |
Extracts slots (institution_type, location, etc.) |
ConversationContextResolver |
backend/rag/template_sparql.py:745 |
Manages conversation context |
sparql_templates.yaml |
data/sparql_templates.yaml |
Template definitions |
Key Insight: The Strategy Pattern and Decorator Pattern below should extend these existing components, not replace them.
Pattern 1: Strategy Pattern for Score Calculation
Problem
Different conversation templates require different scoring logic. We need to calculate template-specific scores without hardcoding logic for each template.
Solution
Use the Strategy Pattern to encapsulate scoring algorithms for each template type.
from abc import ABC, abstractmethod
from typing import Dict
class ScoringStrategy(ABC):
"""Abstract base class for template-specific scoring strategies."""
@abstractmethod
def calculate_score(self, class_name: str, class_metadata: Dict) -> float:
"""Calculate specificity score for a class in this template context."""
pass
@abstractmethod
def get_template_id(self) -> str:
"""Return the template ID this strategy handles."""
pass
class ArchiveSearchStrategy(ScoringStrategy):
"""Scoring strategy for archive-related queries."""
# Classes highly relevant to archive searches
HIGH_RELEVANCE = {"Archive", "RecordSet", "Collection", "Fonds", "Series"}
MEDIUM_RELEVANCE = {"HeritageCustodian", "Location", "GHCID", "Identifier"}
LOW_RELEVANCE = {"Museum", "Library", "Gallery", "PersonProfile"}
def calculate_score(self, class_name: str, class_metadata: Dict) -> float:
if class_name in self.HIGH_RELEVANCE:
return 0.90 + (0.05 * self._has_archival_properties(class_metadata))
elif class_name in self.MEDIUM_RELEVANCE:
return 0.60
elif class_name in self.LOW_RELEVANCE:
return 0.20
else:
return 0.40 # Default moderate relevance
def _has_archival_properties(self, metadata: Dict) -> int:
"""Boost score if class has archival-specific properties."""
archival_props = {"record_type", "finding_aid", "extent", "date_range"}
return 1 if any(p in metadata.get("slots", []) for p in archival_props) else 0
def get_template_id(self) -> str:
return "archive_search"
class ScoringStrategyFactory:
"""Factory for creating scoring strategies based on template ID."""
_strategies: Dict[str, ScoringStrategy] = {}
@classmethod
def register(cls, strategy: ScoringStrategy):
"""Register a scoring strategy."""
cls._strategies[strategy.get_template_id()] = strategy
@classmethod
def get_strategy(cls, template_id: str) -> ScoringStrategy:
"""Get the scoring strategy for a template."""
if template_id not in cls._strategies:
return DefaultScoringStrategy()
return cls._strategies[template_id]
# Register strategies at module load
ScoringStrategyFactory.register(ArchiveSearchStrategy())
ScoringStrategyFactory.register(MuseumSearchStrategy())
ScoringStrategyFactory.register(LocationBrowseStrategy())
# ... register all strategies
Benefits
- Open/Closed Principle: Add new templates without modifying existing code
- Single Responsibility: Each strategy handles one template's logic
- Testability: Test each strategy in isolation
Pattern 2: Decorator Pattern for Score Modifiers
Problem
Scores may need adjustment based on multiple factors:
- Custodian type annotations (GLAMORCUBESFIXPHDNT)
- Inheritance depth in class hierarchy
- Slot count (more complex classes may be more specific)
Solution
Use the Decorator Pattern to layer score modifications.
from abc import ABC, abstractmethod
class ScoreCalculator(ABC):
"""Base interface for score calculation."""
@abstractmethod
def calculate(self, class_name: str, template_id: str) -> float:
pass
class BaseScoreCalculator(ScoreCalculator):
"""Base calculator using stored annotation scores."""
def __init__(self, schema_loader):
self.schema_loader = schema_loader
def calculate(self, class_name: str, template_id: str) -> float:
class_def = self.schema_loader.get_class(class_name)
annotations = class_def.get("annotations", {})
# Try template-specific score first
template_scores = annotations.get("template_specificity", {})
if template_id in template_scores:
return template_scores[template_id]
# Fall back to general score
return annotations.get("specificity_score", 0.5)
class ScoreDecorator(ScoreCalculator):
"""Base decorator class."""
def __init__(self, wrapped: ScoreCalculator):
self._wrapped = wrapped
def calculate(self, class_name: str, template_id: str) -> float:
return self._wrapped.calculate(class_name, template_id)
class CustodianTypeBoostDecorator(ScoreDecorator):
"""Boost scores for classes matching custodian type context."""
def __init__(self, wrapped: ScoreCalculator, custodian_type: str):
super().__init__(wrapped)
self.custodian_type = custodian_type
def calculate(self, class_name: str, template_id: str) -> float:
base_score = self._wrapped.calculate(class_name, template_id)
# Check if class has matching custodian_types annotation
class_def = self.schema_loader.get_class(class_name)
custodian_types = class_def.get("annotations", {}).get("custodian_types", [])
if self.custodian_type in custodian_types or "*" in custodian_types:
return min(1.0, base_score + 0.15) # Boost by 0.15, cap at 1.0
return base_score
class InheritanceDepthDecorator(ScoreDecorator):
"""Adjust scores based on class hierarchy depth."""
def __init__(self, wrapped: ScoreCalculator, schema_loader):
super().__init__(wrapped)
self.schema_loader = schema_loader
def calculate(self, class_name: str, template_id: str) -> float:
base_score = self._wrapped.calculate(class_name, template_id)
depth = self._get_inheritance_depth(class_name)
# Deeper classes are more specific (higher score)
# Depth 0 (root) = no change, Depth 3+ = +0.10
depth_boost = min(0.10, depth * 0.03)
return min(1.0, base_score + depth_boost)
def _get_inheritance_depth(self, class_name: str) -> int:
"""Calculate inheritance depth from root class."""
depth = 0
current = class_name
while True:
class_def = self.schema_loader.get_class(current)
parent = class_def.get("is_a")
if not parent:
break
depth += 1
current = parent
return depth
# Usage: Compose decorators
calculator = BaseScoreCalculator(schema_loader)
calculator = CustodianTypeBoostDecorator(calculator, custodian_type="A") # Archive context
calculator = InheritanceDepthDecorator(calculator, schema_loader)
score = calculator.calculate("ArchivalFonds", "archive_search")
Benefits
- Flexible composition: Mix and match score modifiers
- Separation of concerns: Each decorator handles one modification
- Runtime configuration: Add/remove decorators based on context
Pattern 3: Observer Pattern for Score Updates
Problem
When scores are updated (manually or via feedback), multiple components need notification:
- RAG pipeline cache
- UML visualization
- Exported JSON files
- Validation reports
Solution
Use the Observer Pattern to notify interested components of score changes.
from abc import ABC, abstractmethod
from typing import List, Dict
class ScoreObserver(ABC):
"""Interface for components that react to score changes."""
@abstractmethod
def on_score_update(self, class_name: str, old_score: float, new_score: float, template_id: str = None):
"""Called when a class's specificity score is updated."""
pass
class ScoreSubject:
"""Manages score data and notifies observers of changes."""
def __init__(self):
self._observers: List[ScoreObserver] = []
self._scores: Dict[str, Dict] = {} # class_name -> {general: float, templates: {}}
def attach(self, observer: ScoreObserver):
self._observers.append(observer)
def detach(self, observer: ScoreObserver):
self._observers.remove(observer)
def notify(self, class_name: str, old_score: float, new_score: float, template_id: str = None):
for observer in self._observers:
observer.on_score_update(class_name, old_score, new_score, template_id)
def update_score(self, class_name: str, new_score: float, template_id: str = None):
"""Update a score and notify observers."""
if class_name not in self._scores:
self._scores[class_name] = {"general": 0.5, "templates": {}}
if template_id:
old_score = self._scores[class_name]["templates"].get(template_id, 0.5)
self._scores[class_name]["templates"][template_id] = new_score
else:
old_score = self._scores[class_name]["general"]
self._scores[class_name]["general"] = new_score
self.notify(class_name, old_score, new_score, template_id)
# Concrete observers
class RAGCacheInvalidator(ScoreObserver):
"""Invalidates RAG cache when scores change."""
def __init__(self, cache_manager):
self.cache_manager = cache_manager
def on_score_update(self, class_name: str, old_score: float, new_score: float, template_id: str = None):
# Invalidate cached class rankings for affected template
if template_id:
self.cache_manager.invalidate(f"class_rankings_{template_id}")
else:
self.cache_manager.invalidate_all("class_rankings_*")
class UMLVisualizationUpdater(ScoreObserver):
"""Triggers UML refresh when scores change."""
def __init__(self, websocket_manager):
self.ws_manager = websocket_manager
def on_score_update(self, class_name: str, old_score: float, new_score: float, template_id: str = None):
# Push update to connected frontend clients
self.ws_manager.broadcast({
"type": "score_update",
"class": class_name,
"old_score": old_score,
"new_score": new_score,
"template": template_id
})
class ScoreChangeLogger(ScoreObserver):
"""Logs score changes for audit trail."""
def __init__(self, logger):
self.logger = logger
def on_score_update(self, class_name: str, old_score: float, new_score: float, template_id: str = None):
self.logger.info(
f"Score updated: {class_name} "
f"[{template_id or 'general'}] {old_score:.2f} -> {new_score:.2f}"
)
# Usage
score_subject = ScoreSubject()
score_subject.attach(RAGCacheInvalidator(cache_manager))
score_subject.attach(UMLVisualizationUpdater(websocket_manager))
score_subject.attach(ScoreChangeLogger(logger))
# When a score is updated, all observers are notified
score_subject.update_score("Archive", 0.80, template_id="collection_discovery")
Benefits
- Loose coupling: Score management doesn't know about consumers
- Extensibility: Add new observers without modifying core logic
- Consistency: All components stay synchronized
Pattern 4: Repository Pattern for Score Persistence
Problem
Scores are stored as LinkML annotations in YAML files. We need a clean abstraction for reading/writing scores without coupling to file format.
Solution
Use the Repository Pattern to abstract score persistence.
from abc import ABC, abstractmethod
from typing import Dict, Optional
from pathlib import Path
import yaml
class ScoreRepository(ABC):
"""Abstract repository for specificity scores."""
@abstractmethod
def get_score(self, class_name: str, template_id: Optional[str] = None) -> float:
"""Get specificity score for a class."""
pass
@abstractmethod
def set_score(self, class_name: str, score: float, template_id: Optional[str] = None):
"""Set specificity score for a class."""
pass
@abstractmethod
def get_all_scores(self, template_id: Optional[str] = None) -> Dict[str, float]:
"""Get all class scores for a template (or general scores)."""
pass
@abstractmethod
def bulk_update(self, scores: Dict[str, float], template_id: Optional[str] = None):
"""Update multiple scores at once."""
pass
class LinkMLScoreRepository(ScoreRepository):
"""Repository that reads/writes scores from LinkML YAML files."""
def __init__(self, schema_dir: Path):
self.schema_dir = schema_dir
self.classes_dir = schema_dir / "modules" / "classes"
self._cache: Dict[str, Dict] = {}
def get_score(self, class_name: str, template_id: Optional[str] = None) -> float:
class_data = self._load_class(class_name)
annotations = class_data.get("annotations", {})
if template_id:
return annotations.get("template_specificity", {}).get(template_id, 0.5)
return annotations.get("specificity_score", 0.5)
def set_score(self, class_name: str, score: float, template_id: Optional[str] = None):
class_data = self._load_class(class_name)
if "annotations" not in class_data:
class_data["annotations"] = {}
if template_id:
if "template_specificity" not in class_data["annotations"]:
class_data["annotations"]["template_specificity"] = {}
class_data["annotations"]["template_specificity"][template_id] = score
else:
class_data["annotations"]["specificity_score"] = score
self._save_class(class_name, class_data)
def get_all_scores(self, template_id: Optional[str] = None) -> Dict[str, float]:
scores = {}
for yaml_file in self.classes_dir.glob("*.yaml"):
class_name = yaml_file.stem
scores[class_name] = self.get_score(class_name, template_id)
return scores
def bulk_update(self, scores: Dict[str, float], template_id: Optional[str] = None):
for class_name, score in scores.items():
self.set_score(class_name, score, template_id)
def _load_class(self, class_name: str) -> Dict:
if class_name in self._cache:
return self._cache[class_name]
yaml_path = self.classes_dir / f"{class_name}.yaml"
if not yaml_path.exists():
raise ValueError(f"Class file not found: {yaml_path}")
with open(yaml_path) as f:
data = yaml.safe_load(f)
# Extract the class definition (may be nested under 'classes' key)
if "classes" in data:
class_data = data["classes"].get(class_name, {})
else:
class_data = data
self._cache[class_name] = class_data
return class_data
def _save_class(self, class_name: str, class_data: Dict):
yaml_path = self.classes_dir / f"{class_name}.yaml"
# Preserve original file structure
with open(yaml_path) as f:
original = yaml.safe_load(f)
if "classes" in original:
original["classes"][class_name] = class_data
else:
original = class_data
with open(yaml_path, 'w') as f:
yaml.dump(original, f, default_flow_style=False, allow_unicode=True)
# Update cache
self._cache[class_name] = class_data
class InMemoryScoreRepository(ScoreRepository):
"""In-memory repository for testing."""
def __init__(self):
self._scores: Dict[str, Dict] = {}
def get_score(self, class_name: str, template_id: Optional[str] = None) -> float:
if class_name not in self._scores:
return 0.5
if template_id:
return self._scores[class_name].get("templates", {}).get(template_id, 0.5)
return self._scores[class_name].get("general", 0.5)
def set_score(self, class_name: str, score: float, template_id: Optional[str] = None):
if class_name not in self._scores:
self._scores[class_name] = {"general": 0.5, "templates": {}}
if template_id:
self._scores[class_name]["templates"][template_id] = score
else:
self._scores[class_name]["general"] = score
def get_all_scores(self, template_id: Optional[str] = None) -> Dict[str, float]:
return {
name: self.get_score(name, template_id)
for name in self._scores
}
def bulk_update(self, scores: Dict[str, float], template_id: Optional[str] = None):
for class_name, score in scores.items():
self.set_score(class_name, score, template_id)
Benefits
- Abstraction: Business logic doesn't depend on file format
- Testability: Use in-memory repository for tests
- Flexibility: Easy to add new persistence backends (database, API, etc.)
Pattern 5: Command Pattern for Score Updates
Problem
Score updates may need to be:
- Undone (revert accidental changes)
- Batched (apply multiple changes atomically)
- Audited (track who changed what)
Solution
Use the Command Pattern to encapsulate score changes as objects.
from abc import ABC, abstractmethod
from dataclasses import dataclass
from datetime import datetime
from typing import List, Optional
@dataclass
class ScoreChange:
"""Represents a single score change."""
class_name: str
old_score: float
new_score: float
template_id: Optional[str]
timestamp: datetime
author: str
rationale: str
class ScoreCommand(ABC):
"""Abstract command for score operations."""
@abstractmethod
def execute(self) -> ScoreChange:
"""Execute the command and return the change record."""
pass
@abstractmethod
def undo(self):
"""Undo the command."""
pass
class UpdateScoreCommand(ScoreCommand):
"""Command to update a single score."""
def __init__(
self,
repository: ScoreRepository,
class_name: str,
new_score: float,
template_id: Optional[str] = None,
author: str = "system",
rationale: str = ""
):
self.repository = repository
self.class_name = class_name
self.new_score = new_score
self.template_id = template_id
self.author = author
self.rationale = rationale
self._old_score: Optional[float] = None
def execute(self) -> ScoreChange:
self._old_score = self.repository.get_score(self.class_name, self.template_id)
self.repository.set_score(self.class_name, self.new_score, self.template_id)
return ScoreChange(
class_name=self.class_name,
old_score=self._old_score,
new_score=self.new_score,
template_id=self.template_id,
timestamp=datetime.now(),
author=self.author,
rationale=self.rationale
)
def undo(self):
if self._old_score is not None:
self.repository.set_score(self.class_name, self._old_score, self.template_id)
class BatchScoreCommand(ScoreCommand):
"""Command to update multiple scores atomically."""
def __init__(self, commands: List[UpdateScoreCommand]):
self.commands = commands
self._executed: List[UpdateScoreCommand] = []
def execute(self) -> List[ScoreChange]:
changes = []
try:
for cmd in self.commands:
change = cmd.execute()
changes.append(change)
self._executed.append(cmd)
except Exception as e:
# Rollback on failure
self.undo()
raise e
return changes
def undo(self):
for cmd in reversed(self._executed):
cmd.undo()
self._executed.clear()
class ScoreCommandInvoker:
"""Manages command execution and history."""
def __init__(self):
self._history: List[ScoreCommand] = []
self._redo_stack: List[ScoreCommand] = []
def execute(self, command: ScoreCommand):
result = command.execute()
self._history.append(command)
self._redo_stack.clear()
return result
def undo(self):
if not self._history:
return
command = self._history.pop()
command.undo()
self._redo_stack.append(command)
def redo(self):
if not self._redo_stack:
return
command = self._redo_stack.pop()
command.execute()
self._history.append(command)
def get_history(self) -> List[ScoreCommand]:
return self._history.copy()
# Usage
invoker = ScoreCommandInvoker()
# Single update
cmd = UpdateScoreCommand(
repository=repo,
class_name="Archive",
new_score=0.85,
template_id="archive_search",
author="kempersc",
rationale="Increased based on user feedback"
)
change = invoker.execute(cmd)
# Batch update
batch = BatchScoreCommand([
UpdateScoreCommand(repo, "Museum", 0.90, "museum_search"),
UpdateScoreCommand(repo, "Gallery", 0.85, "museum_search"),
UpdateScoreCommand(repo, "Archive", 0.30, "museum_search"),
])
changes = invoker.execute(batch)
# Undo last change
invoker.undo()
Benefits
- Undo/Redo: Easy to revert changes
- Audit trail: Track all changes with metadata
- Atomicity: Batch changes succeed or fail together
- Testability: Test commands in isolation
Pattern Summary
| Pattern | Purpose | Key Benefit |
|---|---|---|
| Strategy | Different scoring algorithms per template | Open/Closed Principle |
| Decorator | Layer score modifications | Flexible composition |
| Observer | Notify components of changes | Loose coupling |
| Repository | Abstract score persistence | Testability |
| Command | Encapsulate score updates | Undo/Redo, Audit |
Implementation Priority
- Repository Pattern - Foundation for score storage (Week 1)
- Strategy Pattern - Template-specific scoring (Week 1)
- Command Pattern - Score updates with audit (Week 2)
- Observer Pattern - Cross-component updates (Week 2)
- Decorator Pattern - Score modifiers (Week 3)
References
- Gang of Four: Design Patterns
- Martin Fowler: Patterns of Enterprise Application Architecture
- LinkML Documentation: Annotations
- Project:
docs/plan/prompt-query_template_mapping/design-patterns.md