# Specificity Score System - Design Patterns

## Overview

This document describes the software design patterns used in the specificity scoring system. The patterns ensure maintainability, testability, and extensibility.

> **INTEGRATION NOTE**: The design patterns in this document build upon **existing infrastructure** in the codebase. Specifically:
> - The **existing** `TemplateClassifier` at `backend/rag/template_sparql.py:1104` handles question → SPARQL template classification
> - The **existing** `TemplateClassifierSignature` at `backend/rag/template_sparql.py:634` defines the DSPy signature
> - New code should **wrap** the existing classifier using the **Decorator Pattern** (see Pattern 2 below)
> - **DO NOT** recreate template classification logic - extend the existing implementation

---

## Existing Infrastructure Reference

Before implementing new patterns, understand what already exists:

| Component | Location | Purpose |
|-----------|----------|---------|
| `TemplateClassifier` | `backend/rag/template_sparql.py:1104` | DSPy module for question classification |
| `TemplateClassifierSignature` | `backend/rag/template_sparql.py:634` | Input/output signature definition |
| `SlotExtractor` | `backend/rag/template_sparql.py` | Extracts slots (institution_type, location, etc.) |
| `ConversationContextResolver` | `backend/rag/template_sparql.py:745` | Manages conversation context |
| `sparql_templates.yaml` | `data/sparql_templates.yaml` | Template definitions |

**Key Insight**: The Strategy Pattern and Decorator Pattern below should **extend** these existing components, not replace them.

---

## Pattern 1: Strategy Pattern for Score Calculation

### Problem

Different conversation templates require different scoring logic. We need to calculate template-specific scores without hardcoding logic for each template.

### Solution

Use the **Strategy Pattern** to encapsulate scoring algorithms for each template type.

```python
from abc import ABC, abstractmethod
from typing import Dict

class ScoringStrategy(ABC):
    """Abstract base class for template-specific scoring strategies."""
    
    @abstractmethod
    def calculate_score(self, class_name: str, class_metadata: Dict) -> float:
        """Calculate specificity score for a class in this template context."""
        pass
    
    @abstractmethod
    def get_template_id(self) -> str:
        """Return the template ID this strategy handles."""
        pass


class ArchiveSearchStrategy(ScoringStrategy):
    """Scoring strategy for archive-related queries."""
    
    # Classes highly relevant to archive searches
    HIGH_RELEVANCE = {"Archive", "RecordSet", "Collection", "Fonds", "Series"}
    MEDIUM_RELEVANCE = {"HeritageCustodian", "Location", "GHCID", "Identifier"}
    LOW_RELEVANCE = {"Museum", "Library", "Gallery", "PersonProfile"}
    
    def calculate_score(self, class_name: str, class_metadata: Dict) -> float:
        if class_name in self.HIGH_RELEVANCE:
            return 0.90 + (0.05 * self._has_archival_properties(class_metadata))
        elif class_name in self.MEDIUM_RELEVANCE:
            return 0.60
        elif class_name in self.LOW_RELEVANCE:
            return 0.20
        else:
            return 0.40  # Default moderate relevance
    
    def _has_archival_properties(self, metadata: Dict) -> int:
        """Boost score if class has archival-specific properties."""
        archival_props = {"record_type", "finding_aid", "extent", "date_range"}
        return 1 if any(p in metadata.get("slots", []) for p in archival_props) else 0
    
    def get_template_id(self) -> str:
        return "archive_search"


class ScoringStrategyFactory:
    """Factory for creating scoring strategies based on template ID."""
    
    _strategies: Dict[str, ScoringStrategy] = {}
    
    @classmethod
    def register(cls, strategy: ScoringStrategy):
        """Register a scoring strategy."""
        cls._strategies[strategy.get_template_id()] = strategy
    
    @classmethod
    def get_strategy(cls, template_id: str) -> ScoringStrategy:
        """Get the scoring strategy for a template."""
        if template_id not in cls._strategies:
            return DefaultScoringStrategy()
        return cls._strategies[template_id]


# Register strategies at module load
ScoringStrategyFactory.register(ArchiveSearchStrategy())
ScoringStrategyFactory.register(MuseumSearchStrategy())
ScoringStrategyFactory.register(LocationBrowseStrategy())
# ... register all strategies
```

### Benefits

- **Open/Closed Principle**: Add new templates without modifying existing code
- **Single Responsibility**: Each strategy handles one template's logic
- **Testability**: Test each strategy in isolation

---

## Pattern 2: Decorator Pattern for Score Modifiers

### Problem

Scores may need adjustment based on multiple factors:
- Custodian type annotations (GLAMORCUBESFIXPHDNT)
- Inheritance depth in class hierarchy
- Slot count (more complex classes may be more specific)

### Solution

Use the **Decorator Pattern** to layer score modifications.

```python
from abc import ABC, abstractmethod

class ScoreCalculator(ABC):
    """Base interface for score calculation."""
    
    @abstractmethod
    def calculate(self, class_name: str, template_id: str) -> float:
        pass


class BaseScoreCalculator(ScoreCalculator):
    """Base calculator using stored annotation scores."""
    
    def __init__(self, schema_loader):
        self.schema_loader = schema_loader
    
    def calculate(self, class_name: str, template_id: str) -> float:
        class_def = self.schema_loader.get_class(class_name)
        annotations = class_def.get("annotations", {})
        
        # Try template-specific score first
        template_scores = annotations.get("template_specificity", {})
        if template_id in template_scores:
            return template_scores[template_id]
        
        # Fall back to general score
        return annotations.get("specificity_score", 0.5)


class ScoreDecorator(ScoreCalculator):
    """Base decorator class."""
    
    def __init__(self, wrapped: ScoreCalculator):
        self._wrapped = wrapped
    
    def calculate(self, class_name: str, template_id: str) -> float:
        return self._wrapped.calculate(class_name, template_id)


class CustodianTypeBoostDecorator(ScoreDecorator):
    """Boost scores for classes matching custodian type context."""
    
    def __init__(self, wrapped: ScoreCalculator, custodian_type: str):
        super().__init__(wrapped)
        self.custodian_type = custodian_type
    
    def calculate(self, class_name: str, template_id: str) -> float:
        base_score = self._wrapped.calculate(class_name, template_id)
        
        # Check if class has matching custodian_types annotation
        class_def = self.schema_loader.get_class(class_name)
        custodian_types = class_def.get("annotations", {}).get("custodian_types", [])
        
        if self.custodian_type in custodian_types or "*" in custodian_types:
            return min(1.0, base_score + 0.15)  # Boost by 0.15, cap at 1.0
        
        return base_score


class InheritanceDepthDecorator(ScoreDecorator):
    """Adjust scores based on class hierarchy depth."""
    
    def __init__(self, wrapped: ScoreCalculator, schema_loader):
        super().__init__(wrapped)
        self.schema_loader = schema_loader
    
    def calculate(self, class_name: str, template_id: str) -> float:
        base_score = self._wrapped.calculate(class_name, template_id)
        depth = self._get_inheritance_depth(class_name)
        
        # Deeper classes are more specific (higher score)
        # Depth 0 (root) = no change, Depth 3+ = +0.10
        depth_boost = min(0.10, depth * 0.03)
        
        return min(1.0, base_score + depth_boost)
    
    def _get_inheritance_depth(self, class_name: str) -> int:
        """Calculate inheritance depth from root class."""
        depth = 0
        current = class_name
        while True:
            class_def = self.schema_loader.get_class(current)
            parent = class_def.get("is_a")
            if not parent:
                break
            depth += 1
            current = parent
        return depth


# Usage: Compose decorators
calculator = BaseScoreCalculator(schema_loader)
calculator = CustodianTypeBoostDecorator(calculator, custodian_type="A")  # Archive context
calculator = InheritanceDepthDecorator(calculator, schema_loader)

score = calculator.calculate("ArchivalFonds", "archive_search")
```

### Benefits

- **Flexible composition**: Mix and match score modifiers
- **Separation of concerns**: Each decorator handles one modification
- **Runtime configuration**: Add/remove decorators based on context

---

## Pattern 3: Observer Pattern for Score Updates

### Problem

When scores are updated (manually or via feedback), multiple components need notification:
- RAG pipeline cache
- UML visualization
- Exported JSON files
- Validation reports

### Solution

Use the **Observer Pattern** to notify interested components of score changes.

```python
from abc import ABC, abstractmethod
from typing import List, Dict

class ScoreObserver(ABC):
    """Interface for components that react to score changes."""
    
    @abstractmethod
    def on_score_update(self, class_name: str, old_score: float, new_score: float, template_id: str = None):
        """Called when a class's specificity score is updated."""
        pass


class ScoreSubject:
    """Manages score data and notifies observers of changes."""
    
    def __init__(self):
        self._observers: List[ScoreObserver] = []
        self._scores: Dict[str, Dict] = {}  # class_name -> {general: float, templates: {}}
    
    def attach(self, observer: ScoreObserver):
        self._observers.append(observer)
    
    def detach(self, observer: ScoreObserver):
        self._observers.remove(observer)
    
    def notify(self, class_name: str, old_score: float, new_score: float, template_id: str = None):
        for observer in self._observers:
            observer.on_score_update(class_name, old_score, new_score, template_id)
    
    def update_score(self, class_name: str, new_score: float, template_id: str = None):
        """Update a score and notify observers."""
        if class_name not in self._scores:
            self._scores[class_name] = {"general": 0.5, "templates": {}}
        
        if template_id:
            old_score = self._scores[class_name]["templates"].get(template_id, 0.5)
            self._scores[class_name]["templates"][template_id] = new_score
        else:
            old_score = self._scores[class_name]["general"]
            self._scores[class_name]["general"] = new_score
        
        self.notify(class_name, old_score, new_score, template_id)


# Concrete observers
class RAGCacheInvalidator(ScoreObserver):
    """Invalidates RAG cache when scores change."""
    
    def __init__(self, cache_manager):
        self.cache_manager = cache_manager
    
    def on_score_update(self, class_name: str, old_score: float, new_score: float, template_id: str = None):
        # Invalidate cached class rankings for affected template
        if template_id:
            self.cache_manager.invalidate(f"class_rankings_{template_id}")
        else:
            self.cache_manager.invalidate_all("class_rankings_*")


class UMLVisualizationUpdater(ScoreObserver):
    """Triggers UML refresh when scores change."""
    
    def __init__(self, websocket_manager):
        self.ws_manager = websocket_manager
    
    def on_score_update(self, class_name: str, old_score: float, new_score: float, template_id: str = None):
        # Push update to connected frontend clients
        self.ws_manager.broadcast({
            "type": "score_update",
            "class": class_name,
            "old_score": old_score,
            "new_score": new_score,
            "template": template_id
        })


class ScoreChangeLogger(ScoreObserver):
    """Logs score changes for audit trail."""
    
    def __init__(self, logger):
        self.logger = logger
    
    def on_score_update(self, class_name: str, old_score: float, new_score: float, template_id: str = None):
        self.logger.info(
            f"Score updated: {class_name} "
            f"[{template_id or 'general'}] {old_score:.2f} -> {new_score:.2f}"
        )


# Usage
score_subject = ScoreSubject()
score_subject.attach(RAGCacheInvalidator(cache_manager))
score_subject.attach(UMLVisualizationUpdater(websocket_manager))
score_subject.attach(ScoreChangeLogger(logger))

# When a score is updated, all observers are notified
score_subject.update_score("Archive", 0.80, template_id="collection_discovery")
```

### Benefits

- **Loose coupling**: Score management doesn't know about consumers
- **Extensibility**: Add new observers without modifying core logic
- **Consistency**: All components stay synchronized

---

## Pattern 4: Repository Pattern for Score Persistence

### Problem

Scores are stored as LinkML annotations in YAML files. We need a clean abstraction for reading/writing scores without coupling to file format.

### Solution

Use the **Repository Pattern** to abstract score persistence.

```python
from abc import ABC, abstractmethod
from typing import Dict, Optional
from pathlib import Path
import yaml

class ScoreRepository(ABC):
    """Abstract repository for specificity scores."""
    
    @abstractmethod
    def get_score(self, class_name: str, template_id: Optional[str] = None) -> float:
        """Get specificity score for a class."""
        pass
    
    @abstractmethod
    def set_score(self, class_name: str, score: float, template_id: Optional[str] = None):
        """Set specificity score for a class."""
        pass
    
    @abstractmethod
    def get_all_scores(self, template_id: Optional[str] = None) -> Dict[str, float]:
        """Get all class scores for a template (or general scores)."""
        pass
    
    @abstractmethod
    def bulk_update(self, scores: Dict[str, float], template_id: Optional[str] = None):
        """Update multiple scores at once."""
        pass


class LinkMLScoreRepository(ScoreRepository):
    """Repository that reads/writes scores from LinkML YAML files."""
    
    def __init__(self, schema_dir: Path):
        self.schema_dir = schema_dir
        self.classes_dir = schema_dir / "modules" / "classes"
        self._cache: Dict[str, Dict] = {}
    
    def get_score(self, class_name: str, template_id: Optional[str] = None) -> float:
        class_data = self._load_class(class_name)
        annotations = class_data.get("annotations", {})
        
        if template_id:
            return annotations.get("template_specificity", {}).get(template_id, 0.5)
        return annotations.get("specificity_score", 0.5)
    
    def set_score(self, class_name: str, score: float, template_id: Optional[str] = None):
        class_data = self._load_class(class_name)
        
        if "annotations" not in class_data:
            class_data["annotations"] = {}
        
        if template_id:
            if "template_specificity" not in class_data["annotations"]:
                class_data["annotations"]["template_specificity"] = {}
            class_data["annotations"]["template_specificity"][template_id] = score
        else:
            class_data["annotations"]["specificity_score"] = score
        
        self._save_class(class_name, class_data)
    
    def get_all_scores(self, template_id: Optional[str] = None) -> Dict[str, float]:
        scores = {}
        for yaml_file in self.classes_dir.glob("*.yaml"):
            class_name = yaml_file.stem
            scores[class_name] = self.get_score(class_name, template_id)
        return scores
    
    def bulk_update(self, scores: Dict[str, float], template_id: Optional[str] = None):
        for class_name, score in scores.items():
            self.set_score(class_name, score, template_id)
    
    def _load_class(self, class_name: str) -> Dict:
        if class_name in self._cache:
            return self._cache[class_name]
        
        yaml_path = self.classes_dir / f"{class_name}.yaml"
        if not yaml_path.exists():
            raise ValueError(f"Class file not found: {yaml_path}")
        
        with open(yaml_path) as f:
            data = yaml.safe_load(f)
        
        # Extract the class definition (may be nested under 'classes' key)
        if "classes" in data:
            class_data = data["classes"].get(class_name, {})
        else:
            class_data = data
        
        self._cache[class_name] = class_data
        return class_data
    
    def _save_class(self, class_name: str, class_data: Dict):
        yaml_path = self.classes_dir / f"{class_name}.yaml"
        
        # Preserve original file structure
        with open(yaml_path) as f:
            original = yaml.safe_load(f)
        
        if "classes" in original:
            original["classes"][class_name] = class_data
        else:
            original = class_data
        
        with open(yaml_path, 'w') as f:
            yaml.dump(original, f, default_flow_style=False, allow_unicode=True)
        
        # Update cache
        self._cache[class_name] = class_data


class InMemoryScoreRepository(ScoreRepository):
    """In-memory repository for testing."""
    
    def __init__(self):
        self._scores: Dict[str, Dict] = {}
    
    def get_score(self, class_name: str, template_id: Optional[str] = None) -> float:
        if class_name not in self._scores:
            return 0.5
        
        if template_id:
            return self._scores[class_name].get("templates", {}).get(template_id, 0.5)
        return self._scores[class_name].get("general", 0.5)
    
    def set_score(self, class_name: str, score: float, template_id: Optional[str] = None):
        if class_name not in self._scores:
            self._scores[class_name] = {"general": 0.5, "templates": {}}
        
        if template_id:
            self._scores[class_name]["templates"][template_id] = score
        else:
            self._scores[class_name]["general"] = score
    
    def get_all_scores(self, template_id: Optional[str] = None) -> Dict[str, float]:
        return {
            name: self.get_score(name, template_id)
            for name in self._scores
        }
    
    def bulk_update(self, scores: Dict[str, float], template_id: Optional[str] = None):
        for class_name, score in scores.items():
            self.set_score(class_name, score, template_id)
```

### Benefits

- **Abstraction**: Business logic doesn't depend on file format
- **Testability**: Use in-memory repository for tests
- **Flexibility**: Easy to add new persistence backends (database, API, etc.)

---

## Pattern 5: Command Pattern for Score Updates

### Problem

Score updates may need to be:
- Undone (revert accidental changes)
- Batched (apply multiple changes atomically)
- Audited (track who changed what)

### Solution

Use the **Command Pattern** to encapsulate score changes as objects.

```python
from abc import ABC, abstractmethod
from dataclasses import dataclass
from datetime import datetime
from typing import List, Optional

@dataclass
class ScoreChange:
    """Represents a single score change."""
    class_name: str
    old_score: float
    new_score: float
    template_id: Optional[str]
    timestamp: datetime
    author: str
    rationale: str


class ScoreCommand(ABC):
    """Abstract command for score operations."""
    
    @abstractmethod
    def execute(self) -> ScoreChange:
        """Execute the command and return the change record."""
        pass
    
    @abstractmethod
    def undo(self):
        """Undo the command."""
        pass


class UpdateScoreCommand(ScoreCommand):
    """Command to update a single score."""
    
    def __init__(
        self,
        repository: ScoreRepository,
        class_name: str,
        new_score: float,
        template_id: Optional[str] = None,
        author: str = "system",
        rationale: str = ""
    ):
        self.repository = repository
        self.class_name = class_name
        self.new_score = new_score
        self.template_id = template_id
        self.author = author
        self.rationale = rationale
        self._old_score: Optional[float] = None
    
    def execute(self) -> ScoreChange:
        self._old_score = self.repository.get_score(self.class_name, self.template_id)
        self.repository.set_score(self.class_name, self.new_score, self.template_id)
        
        return ScoreChange(
            class_name=self.class_name,
            old_score=self._old_score,
            new_score=self.new_score,
            template_id=self.template_id,
            timestamp=datetime.now(),
            author=self.author,
            rationale=self.rationale
        )
    
    def undo(self):
        if self._old_score is not None:
            self.repository.set_score(self.class_name, self._old_score, self.template_id)


class BatchScoreCommand(ScoreCommand):
    """Command to update multiple scores atomically."""
    
    def __init__(self, commands: List[UpdateScoreCommand]):
        self.commands = commands
        self._executed: List[UpdateScoreCommand] = []
    
    def execute(self) -> List[ScoreChange]:
        changes = []
        try:
            for cmd in self.commands:
                change = cmd.execute()
                changes.append(change)
                self._executed.append(cmd)
        except Exception as e:
            # Rollback on failure
            self.undo()
            raise e
        return changes
    
    def undo(self):
        for cmd in reversed(self._executed):
            cmd.undo()
        self._executed.clear()


class ScoreCommandInvoker:
    """Manages command execution and history."""
    
    def __init__(self):
        self._history: List[ScoreCommand] = []
        self._redo_stack: List[ScoreCommand] = []
    
    def execute(self, command: ScoreCommand):
        result = command.execute()
        self._history.append(command)
        self._redo_stack.clear()
        return result
    
    def undo(self):
        if not self._history:
            return
        
        command = self._history.pop()
        command.undo()
        self._redo_stack.append(command)
    
    def redo(self):
        if not self._redo_stack:
            return
        
        command = self._redo_stack.pop()
        command.execute()
        self._history.append(command)
    
    def get_history(self) -> List[ScoreCommand]:
        return self._history.copy()


# Usage
invoker = ScoreCommandInvoker()

# Single update
cmd = UpdateScoreCommand(
    repository=repo,
    class_name="Archive",
    new_score=0.85,
    template_id="archive_search",
    author="kempersc",
    rationale="Increased based on user feedback"
)
change = invoker.execute(cmd)

# Batch update
batch = BatchScoreCommand([
    UpdateScoreCommand(repo, "Museum", 0.90, "museum_search"),
    UpdateScoreCommand(repo, "Gallery", 0.85, "museum_search"),
    UpdateScoreCommand(repo, "Archive", 0.30, "museum_search"),
])
changes = invoker.execute(batch)

# Undo last change
invoker.undo()
```

### Benefits

- **Undo/Redo**: Easy to revert changes
- **Audit trail**: Track all changes with metadata
- **Atomicity**: Batch changes succeed or fail together
- **Testability**: Test commands in isolation

---

## Pattern Summary

| Pattern | Purpose | Key Benefit |
|---------|---------|-------------|
| **Strategy** | Different scoring algorithms per template | Open/Closed Principle |
| **Decorator** | Layer score modifications | Flexible composition |
| **Observer** | Notify components of changes | Loose coupling |
| **Repository** | Abstract score persistence | Testability |
| **Command** | Encapsulate score updates | Undo/Redo, Audit |

---

## Implementation Priority

1. **Repository Pattern** - Foundation for score storage (Week 1)
2. **Strategy Pattern** - Template-specific scoring (Week 1)
3. **Command Pattern** - Score updates with audit (Week 2)
4. **Observer Pattern** - Cross-component updates (Week 2)
5. **Decorator Pattern** - Score modifiers (Week 3)

---

## References

- Gang of Four: Design Patterns
- Martin Fowler: Patterns of Enterprise Application Architecture
- LinkML Documentation: Annotations
- Project: `docs/plan/prompt-query_template_mapping/design-patterns.md`