# Specificity Score System - Design Patterns ## Overview This document describes the software design patterns used in the specificity scoring system. The patterns ensure maintainability, testability, and extensibility. > **INTEGRATION NOTE**: The design patterns in this document build upon **existing infrastructure** in the codebase. Specifically: > - The **existing** `TemplateClassifier` at `backend/rag/template_sparql.py:1104` handles question → SPARQL template classification > - The **existing** `TemplateClassifierSignature` at `backend/rag/template_sparql.py:634` defines the DSPy signature > - New code should **wrap** the existing classifier using the **Decorator Pattern** (see Pattern 2 below) > - **DO NOT** recreate template classification logic - extend the existing implementation --- ## Existing Infrastructure Reference Before implementing new patterns, understand what already exists: | Component | Location | Purpose | |-----------|----------|---------| | `TemplateClassifier` | `backend/rag/template_sparql.py:1104` | DSPy module for question classification | | `TemplateClassifierSignature` | `backend/rag/template_sparql.py:634` | Input/output signature definition | | `SlotExtractor` | `backend/rag/template_sparql.py` | Extracts slots (institution_type, location, etc.) | | `ConversationContextResolver` | `backend/rag/template_sparql.py:745` | Manages conversation context | | `sparql_templates.yaml` | `data/sparql_templates.yaml` | Template definitions | **Key Insight**: The Strategy Pattern and Decorator Pattern below should **extend** these existing components, not replace them. --- ## Pattern 1: Strategy Pattern for Score Calculation ### Problem Different conversation templates require different scoring logic. We need to calculate template-specific scores without hardcoding logic for each template. ### Solution Use the **Strategy Pattern** to encapsulate scoring algorithms for each template type. ```python from abc import ABC, abstractmethod from typing import Dict class ScoringStrategy(ABC): """Abstract base class for template-specific scoring strategies.""" @abstractmethod def calculate_score(self, class_name: str, class_metadata: Dict) -> float: """Calculate specificity score for a class in this template context.""" pass @abstractmethod def get_template_id(self) -> str: """Return the template ID this strategy handles.""" pass class ArchiveSearchStrategy(ScoringStrategy): """Scoring strategy for archive-related queries.""" # Classes highly relevant to archive searches HIGH_RELEVANCE = {"Archive", "RecordSet", "Collection", "Fonds", "Series"} MEDIUM_RELEVANCE = {"HeritageCustodian", "Location", "GHCID", "Identifier"} LOW_RELEVANCE = {"Museum", "Library", "Gallery", "PersonProfile"} def calculate_score(self, class_name: str, class_metadata: Dict) -> float: if class_name in self.HIGH_RELEVANCE: return 0.90 + (0.05 * self._has_archival_properties(class_metadata)) elif class_name in self.MEDIUM_RELEVANCE: return 0.60 elif class_name in self.LOW_RELEVANCE: return 0.20 else: return 0.40 # Default moderate relevance def _has_archival_properties(self, metadata: Dict) -> int: """Boost score if class has archival-specific properties.""" archival_props = {"record_type", "finding_aid", "extent", "date_range"} return 1 if any(p in metadata.get("slots", []) for p in archival_props) else 0 def get_template_id(self) -> str: return "archive_search" class ScoringStrategyFactory: """Factory for creating scoring strategies based on template ID.""" _strategies: Dict[str, ScoringStrategy] = {} @classmethod def register(cls, strategy: ScoringStrategy): """Register a scoring strategy.""" cls._strategies[strategy.get_template_id()] = strategy @classmethod def get_strategy(cls, template_id: str) -> ScoringStrategy: """Get the scoring strategy for a template.""" if template_id not in cls._strategies: return DefaultScoringStrategy() return cls._strategies[template_id] # Register strategies at module load ScoringStrategyFactory.register(ArchiveSearchStrategy()) ScoringStrategyFactory.register(MuseumSearchStrategy()) ScoringStrategyFactory.register(LocationBrowseStrategy()) # ... register all strategies ``` ### Benefits - **Open/Closed Principle**: Add new templates without modifying existing code - **Single Responsibility**: Each strategy handles one template's logic - **Testability**: Test each strategy in isolation --- ## Pattern 2: Decorator Pattern for Score Modifiers ### Problem Scores may need adjustment based on multiple factors: - Custodian type annotations (GLAMORCUBESFIXPHDNT) - Inheritance depth in class hierarchy - Slot count (more complex classes may be more specific) ### Solution Use the **Decorator Pattern** to layer score modifications. ```python from abc import ABC, abstractmethod class ScoreCalculator(ABC): """Base interface for score calculation.""" @abstractmethod def calculate(self, class_name: str, template_id: str) -> float: pass class BaseScoreCalculator(ScoreCalculator): """Base calculator using stored annotation scores.""" def __init__(self, schema_loader): self.schema_loader = schema_loader def calculate(self, class_name: str, template_id: str) -> float: class_def = self.schema_loader.get_class(class_name) annotations = class_def.get("annotations", {}) # Try template-specific score first template_scores = annotations.get("template_specificity", {}) if template_id in template_scores: return template_scores[template_id] # Fall back to general score return annotations.get("specificity_score", 0.5) class ScoreDecorator(ScoreCalculator): """Base decorator class.""" def __init__(self, wrapped: ScoreCalculator): self._wrapped = wrapped def calculate(self, class_name: str, template_id: str) -> float: return self._wrapped.calculate(class_name, template_id) class CustodianTypeBoostDecorator(ScoreDecorator): """Boost scores for classes matching custodian type context.""" def __init__(self, wrapped: ScoreCalculator, custodian_type: str): super().__init__(wrapped) self.custodian_type = custodian_type def calculate(self, class_name: str, template_id: str) -> float: base_score = self._wrapped.calculate(class_name, template_id) # Check if class has matching custodian_types annotation class_def = self.schema_loader.get_class(class_name) custodian_types = class_def.get("annotations", {}).get("custodian_types", []) if self.custodian_type in custodian_types or "*" in custodian_types: return min(1.0, base_score + 0.15) # Boost by 0.15, cap at 1.0 return base_score class InheritanceDepthDecorator(ScoreDecorator): """Adjust scores based on class hierarchy depth.""" def __init__(self, wrapped: ScoreCalculator, schema_loader): super().__init__(wrapped) self.schema_loader = schema_loader def calculate(self, class_name: str, template_id: str) -> float: base_score = self._wrapped.calculate(class_name, template_id) depth = self._get_inheritance_depth(class_name) # Deeper classes are more specific (higher score) # Depth 0 (root) = no change, Depth 3+ = +0.10 depth_boost = min(0.10, depth * 0.03) return min(1.0, base_score + depth_boost) def _get_inheritance_depth(self, class_name: str) -> int: """Calculate inheritance depth from root class.""" depth = 0 current = class_name while True: class_def = self.schema_loader.get_class(current) parent = class_def.get("is_a") if not parent: break depth += 1 current = parent return depth # Usage: Compose decorators calculator = BaseScoreCalculator(schema_loader) calculator = CustodianTypeBoostDecorator(calculator, custodian_type="A") # Archive context calculator = InheritanceDepthDecorator(calculator, schema_loader) score = calculator.calculate("ArchivalFonds", "archive_search") ``` ### Benefits - **Flexible composition**: Mix and match score modifiers - **Separation of concerns**: Each decorator handles one modification - **Runtime configuration**: Add/remove decorators based on context --- ## Pattern 3: Observer Pattern for Score Updates ### Problem When scores are updated (manually or via feedback), multiple components need notification: - RAG pipeline cache - UML visualization - Exported JSON files - Validation reports ### Solution Use the **Observer Pattern** to notify interested components of score changes. ```python from abc import ABC, abstractmethod from typing import List, Dict class ScoreObserver(ABC): """Interface for components that react to score changes.""" @abstractmethod def on_score_update(self, class_name: str, old_score: float, new_score: float, template_id: str = None): """Called when a class's specificity score is updated.""" pass class ScoreSubject: """Manages score data and notifies observers of changes.""" def __init__(self): self._observers: List[ScoreObserver] = [] self._scores: Dict[str, Dict] = {} # class_name -> {general: float, templates: {}} def attach(self, observer: ScoreObserver): self._observers.append(observer) def detach(self, observer: ScoreObserver): self._observers.remove(observer) def notify(self, class_name: str, old_score: float, new_score: float, template_id: str = None): for observer in self._observers: observer.on_score_update(class_name, old_score, new_score, template_id) def update_score(self, class_name: str, new_score: float, template_id: str = None): """Update a score and notify observers.""" if class_name not in self._scores: self._scores[class_name] = {"general": 0.5, "templates": {}} if template_id: old_score = self._scores[class_name]["templates"].get(template_id, 0.5) self._scores[class_name]["templates"][template_id] = new_score else: old_score = self._scores[class_name]["general"] self._scores[class_name]["general"] = new_score self.notify(class_name, old_score, new_score, template_id) # Concrete observers class RAGCacheInvalidator(ScoreObserver): """Invalidates RAG cache when scores change.""" def __init__(self, cache_manager): self.cache_manager = cache_manager def on_score_update(self, class_name: str, old_score: float, new_score: float, template_id: str = None): # Invalidate cached class rankings for affected template if template_id: self.cache_manager.invalidate(f"class_rankings_{template_id}") else: self.cache_manager.invalidate_all("class_rankings_*") class UMLVisualizationUpdater(ScoreObserver): """Triggers UML refresh when scores change.""" def __init__(self, websocket_manager): self.ws_manager = websocket_manager def on_score_update(self, class_name: str, old_score: float, new_score: float, template_id: str = None): # Push update to connected frontend clients self.ws_manager.broadcast({ "type": "score_update", "class": class_name, "old_score": old_score, "new_score": new_score, "template": template_id }) class ScoreChangeLogger(ScoreObserver): """Logs score changes for audit trail.""" def __init__(self, logger): self.logger = logger def on_score_update(self, class_name: str, old_score: float, new_score: float, template_id: str = None): self.logger.info( f"Score updated: {class_name} " f"[{template_id or 'general'}] {old_score:.2f} -> {new_score:.2f}" ) # Usage score_subject = ScoreSubject() score_subject.attach(RAGCacheInvalidator(cache_manager)) score_subject.attach(UMLVisualizationUpdater(websocket_manager)) score_subject.attach(ScoreChangeLogger(logger)) # When a score is updated, all observers are notified score_subject.update_score("Archive", 0.80, template_id="collection_discovery") ``` ### Benefits - **Loose coupling**: Score management doesn't know about consumers - **Extensibility**: Add new observers without modifying core logic - **Consistency**: All components stay synchronized --- ## Pattern 4: Repository Pattern for Score Persistence ### Problem Scores are stored as LinkML annotations in YAML files. We need a clean abstraction for reading/writing scores without coupling to file format. ### Solution Use the **Repository Pattern** to abstract score persistence. ```python from abc import ABC, abstractmethod from typing import Dict, Optional from pathlib import Path import yaml class ScoreRepository(ABC): """Abstract repository for specificity scores.""" @abstractmethod def get_score(self, class_name: str, template_id: Optional[str] = None) -> float: """Get specificity score for a class.""" pass @abstractmethod def set_score(self, class_name: str, score: float, template_id: Optional[str] = None): """Set specificity score for a class.""" pass @abstractmethod def get_all_scores(self, template_id: Optional[str] = None) -> Dict[str, float]: """Get all class scores for a template (or general scores).""" pass @abstractmethod def bulk_update(self, scores: Dict[str, float], template_id: Optional[str] = None): """Update multiple scores at once.""" pass class LinkMLScoreRepository(ScoreRepository): """Repository that reads/writes scores from LinkML YAML files.""" def __init__(self, schema_dir: Path): self.schema_dir = schema_dir self.classes_dir = schema_dir / "modules" / "classes" self._cache: Dict[str, Dict] = {} def get_score(self, class_name: str, template_id: Optional[str] = None) -> float: class_data = self._load_class(class_name) annotations = class_data.get("annotations", {}) if template_id: return annotations.get("template_specificity", {}).get(template_id, 0.5) return annotations.get("specificity_score", 0.5) def set_score(self, class_name: str, score: float, template_id: Optional[str] = None): class_data = self._load_class(class_name) if "annotations" not in class_data: class_data["annotations"] = {} if template_id: if "template_specificity" not in class_data["annotations"]: class_data["annotations"]["template_specificity"] = {} class_data["annotations"]["template_specificity"][template_id] = score else: class_data["annotations"]["specificity_score"] = score self._save_class(class_name, class_data) def get_all_scores(self, template_id: Optional[str] = None) -> Dict[str, float]: scores = {} for yaml_file in self.classes_dir.glob("*.yaml"): class_name = yaml_file.stem scores[class_name] = self.get_score(class_name, template_id) return scores def bulk_update(self, scores: Dict[str, float], template_id: Optional[str] = None): for class_name, score in scores.items(): self.set_score(class_name, score, template_id) def _load_class(self, class_name: str) -> Dict: if class_name in self._cache: return self._cache[class_name] yaml_path = self.classes_dir / f"{class_name}.yaml" if not yaml_path.exists(): raise ValueError(f"Class file not found: {yaml_path}") with open(yaml_path) as f: data = yaml.safe_load(f) # Extract the class definition (may be nested under 'classes' key) if "classes" in data: class_data = data["classes"].get(class_name, {}) else: class_data = data self._cache[class_name] = class_data return class_data def _save_class(self, class_name: str, class_data: Dict): yaml_path = self.classes_dir / f"{class_name}.yaml" # Preserve original file structure with open(yaml_path) as f: original = yaml.safe_load(f) if "classes" in original: original["classes"][class_name] = class_data else: original = class_data with open(yaml_path, 'w') as f: yaml.dump(original, f, default_flow_style=False, allow_unicode=True) # Update cache self._cache[class_name] = class_data class InMemoryScoreRepository(ScoreRepository): """In-memory repository for testing.""" def __init__(self): self._scores: Dict[str, Dict] = {} def get_score(self, class_name: str, template_id: Optional[str] = None) -> float: if class_name not in self._scores: return 0.5 if template_id: return self._scores[class_name].get("templates", {}).get(template_id, 0.5) return self._scores[class_name].get("general", 0.5) def set_score(self, class_name: str, score: float, template_id: Optional[str] = None): if class_name not in self._scores: self._scores[class_name] = {"general": 0.5, "templates": {}} if template_id: self._scores[class_name]["templates"][template_id] = score else: self._scores[class_name]["general"] = score def get_all_scores(self, template_id: Optional[str] = None) -> Dict[str, float]: return { name: self.get_score(name, template_id) for name in self._scores } def bulk_update(self, scores: Dict[str, float], template_id: Optional[str] = None): for class_name, score in scores.items(): self.set_score(class_name, score, template_id) ``` ### Benefits - **Abstraction**: Business logic doesn't depend on file format - **Testability**: Use in-memory repository for tests - **Flexibility**: Easy to add new persistence backends (database, API, etc.) --- ## Pattern 5: Command Pattern for Score Updates ### Problem Score updates may need to be: - Undone (revert accidental changes) - Batched (apply multiple changes atomically) - Audited (track who changed what) ### Solution Use the **Command Pattern** to encapsulate score changes as objects. ```python from abc import ABC, abstractmethod from dataclasses import dataclass from datetime import datetime from typing import List, Optional @dataclass class ScoreChange: """Represents a single score change.""" class_name: str old_score: float new_score: float template_id: Optional[str] timestamp: datetime author: str rationale: str class ScoreCommand(ABC): """Abstract command for score operations.""" @abstractmethod def execute(self) -> ScoreChange: """Execute the command and return the change record.""" pass @abstractmethod def undo(self): """Undo the command.""" pass class UpdateScoreCommand(ScoreCommand): """Command to update a single score.""" def __init__( self, repository: ScoreRepository, class_name: str, new_score: float, template_id: Optional[str] = None, author: str = "system", rationale: str = "" ): self.repository = repository self.class_name = class_name self.new_score = new_score self.template_id = template_id self.author = author self.rationale = rationale self._old_score: Optional[float] = None def execute(self) -> ScoreChange: self._old_score = self.repository.get_score(self.class_name, self.template_id) self.repository.set_score(self.class_name, self.new_score, self.template_id) return ScoreChange( class_name=self.class_name, old_score=self._old_score, new_score=self.new_score, template_id=self.template_id, timestamp=datetime.now(), author=self.author, rationale=self.rationale ) def undo(self): if self._old_score is not None: self.repository.set_score(self.class_name, self._old_score, self.template_id) class BatchScoreCommand(ScoreCommand): """Command to update multiple scores atomically.""" def __init__(self, commands: List[UpdateScoreCommand]): self.commands = commands self._executed: List[UpdateScoreCommand] = [] def execute(self) -> List[ScoreChange]: changes = [] try: for cmd in self.commands: change = cmd.execute() changes.append(change) self._executed.append(cmd) except Exception as e: # Rollback on failure self.undo() raise e return changes def undo(self): for cmd in reversed(self._executed): cmd.undo() self._executed.clear() class ScoreCommandInvoker: """Manages command execution and history.""" def __init__(self): self._history: List[ScoreCommand] = [] self._redo_stack: List[ScoreCommand] = [] def execute(self, command: ScoreCommand): result = command.execute() self._history.append(command) self._redo_stack.clear() return result def undo(self): if not self._history: return command = self._history.pop() command.undo() self._redo_stack.append(command) def redo(self): if not self._redo_stack: return command = self._redo_stack.pop() command.execute() self._history.append(command) def get_history(self) -> List[ScoreCommand]: return self._history.copy() # Usage invoker = ScoreCommandInvoker() # Single update cmd = UpdateScoreCommand( repository=repo, class_name="Archive", new_score=0.85, template_id="archive_search", author="kempersc", rationale="Increased based on user feedback" ) change = invoker.execute(cmd) # Batch update batch = BatchScoreCommand([ UpdateScoreCommand(repo, "Museum", 0.90, "museum_search"), UpdateScoreCommand(repo, "Gallery", 0.85, "museum_search"), UpdateScoreCommand(repo, "Archive", 0.30, "museum_search"), ]) changes = invoker.execute(batch) # Undo last change invoker.undo() ``` ### Benefits - **Undo/Redo**: Easy to revert changes - **Audit trail**: Track all changes with metadata - **Atomicity**: Batch changes succeed or fail together - **Testability**: Test commands in isolation --- ## Pattern Summary | Pattern | Purpose | Key Benefit | |---------|---------|-------------| | **Strategy** | Different scoring algorithms per template | Open/Closed Principle | | **Decorator** | Layer score modifications | Flexible composition | | **Observer** | Notify components of changes | Loose coupling | | **Repository** | Abstract score persistence | Testability | | **Command** | Encapsulate score updates | Undo/Redo, Audit | --- ## Implementation Priority 1. **Repository Pattern** - Foundation for score storage (Week 1) 2. **Strategy Pattern** - Template-specific scoring (Week 1) 3. **Command Pattern** - Score updates with audit (Week 2) 4. **Observer Pattern** - Cross-component updates (Week 2) 5. **Decorator Pattern** - Score modifiers (Week 3) --- ## References - Gang of Four: Design Patterns - Martin Fowler: Patterns of Enterprise Application Architecture - LinkML Documentation: Annotations - Project: `docs/plan/prompt-query_template_mapping/design-patterns.md`