# Specificity Score System - UML Visualization Integration

## Overview

This document describes how specificity scores integrate with UML diagram generation to create filtered, readable visualizations of the Heritage Custodian Ontology's 304+ classes.

> **INTEGRATION NOTE**: This document references **context templates** (e.g., `archive_search`, `museum_search`) used for UML filtering. These context templates are **mapped from** the existing SPARQL templates in `backend/rag/template_sparql.py`. See the mapping table below.

---

## SPARQL → Context Template Mapping (for UML Views)

The existing `TemplateClassifier` in `backend/rag/template_sparql.py:1104` classifies questions to SPARQL template IDs. For UML visualization, these are mapped to context templates:

| SPARQL Template ID | Context Template | UML View Focus |
|--------------------|------------------|----------------|
| `list_institutions_by_type_city` | `location_browse` | Institution + Location classes |
| `list_institutions_by_type_region` | `location_browse` | Institution + Region classes |
| `find_institution_by_identifier` | `identifier_lookup` | Identifier + GHCID classes |
| `find_institutions_by_founding_date` | `organizational_change` | ChangeEvent + Timeline classes |
| `list_institutions_by_collection_type` | `collection_discovery` | Collection + Subject classes |
| `find_person_by_role` | `person_research` | Person + Staff + Role classes |
| `none` (default) | `general_heritage` | Core heritage classes |

**Institution Type Refinement**: When SPARQL templates extract `institution_type` slot:
- `institution_type = A` → refine to `archive_search`
- `institution_type = M` → refine to `museum_search`  
- `institution_type = L` → refine to `library_search`

---

## Problem Statement

### Current UML Challenges

The Heritage Custodian Ontology contains **304+ classes**, making full UML diagrams:

1. **Visually overwhelming** - Too many nodes to comprehend
2. **Difficult to navigate** - No clear entry point or hierarchy
3. **Context-blind** - Shows everything regardless of current task
4. **Slow to render** - Large graphs take seconds to generate

### Solution: Specificity-Based Filtering

Use specificity scores to:
- Filter classes by relevance threshold
- Adjust visual prominence (opacity, size, position)
- Create template-specific focused views
- Generate progressive disclosure diagrams

---

## Filtering Strategies

### Strategy 1: Threshold Filtering

Show only classes above a specificity threshold:

```python
def filter_classes_by_threshold(
    schema: SchemaView,
    template_id: str,
    threshold: float = 0.5
) -> list[str]:
    """Return class names meeting specificity threshold."""
    included = []
    
    for class_name in schema.all_classes():
        cls = schema.get_class(class_name)
        score = get_template_score(cls, template_id)
        
        if score >= threshold:
            included.append(class_name)
    
    return included


# Example: archive_search with threshold 0.6
# Returns: ["Archive", "RecordSet", "Fonds", "FindingAid", 
#           "Collection", "Location", "HeritageCustodian", ...]
```

**Threshold Guidelines:**

| Threshold | Result | Use Case |
|-----------|--------|----------|
| 0.8+ | 5-10 classes | Focused overview, quick reference |
| 0.6+ | 15-30 classes | Detailed task view |
| 0.4+ | 40-60 classes | Comprehensive view |
| 0.2+ | 80-150 classes | Near-complete (excludes technical) |
| 0.0+ | 304 classes | Full ontology (overwhelming) |

---

### Strategy 2: Top-N Filtering

Show the N most relevant classes for a template:

```python
def top_n_classes(
    schema: SchemaView,
    template_id: str,
    n: int = 20
) -> list[str]:
    """Return top N classes by specificity for template."""
    class_scores = []
    
    for class_name in schema.all_classes():
        cls = schema.get_class(class_name)
        score = get_template_score(cls, template_id)
        class_scores.append((class_name, score))
    
    # Sort by score descending
    class_scores.sort(key=lambda x: x[1], reverse=True)
    
    return [name for name, _ in class_scores[:n]]


# Example: top_n_classes(schema, "person_research", n=10)
# Returns: ["PersonProfile", "Staff", "Role", "Affiliation", 
#           "Director", "HeritageCustodian", "ContactPoint", ...]
```

---

### Strategy 3: Tier-Based Grouping

Group classes into visual tiers based on score ranges:

```python
from dataclasses import dataclass
from enum import Enum

class VisualTier(Enum):
    PRIMARY = "primary"      # Score >= 0.8
    SECONDARY = "secondary"  # Score 0.5-0.8
    TERTIARY = "tertiary"    # Score 0.3-0.5
    BACKGROUND = "background"  # Score < 0.3

@dataclass
class TieredClass:
    name: str
    score: float
    tier: VisualTier

def tier_classes(
    schema: SchemaView,
    template_id: str
) -> dict[VisualTier, list[TieredClass]]:
    """Group classes into visual tiers."""
    tiers = {tier: [] for tier in VisualTier}
    
    for class_name in schema.all_classes():
        cls = schema.get_class(class_name)
        score = get_template_score(cls, template_id)
        
        if score >= 0.8:
            tier = VisualTier.PRIMARY
        elif score >= 0.5:
            tier = VisualTier.SECONDARY
        elif score >= 0.3:
            tier = VisualTier.TERTIARY
        else:
            tier = VisualTier.BACKGROUND
        
        tiers[tier].append(TieredClass(class_name, score, tier))
    
    return tiers
```

---

## Visual Styling

### Style 1: Opacity Mapping

Map specificity score to node opacity:

```python
def score_to_opacity(score: float) -> str:
    """Convert score to hex opacity (00-FF)."""
    # Score 1.0 -> fully opaque (FF)
    # Score 0.0 -> nearly transparent (20)
    opacity = int(32 + (score * 223))  # Range: 32-255
    return f"{opacity:02X}"

def style_node_by_score(
    class_name: str,
    score: float,
    base_color: str = "#4A90D9"
) -> dict:
    """Generate node styling based on specificity score."""
    opacity = score_to_opacity(score)
    
    return {
        "fillcolor": f"{base_color}{opacity}",
        "style": "filled",
        "fontcolor": "#333333" if score > 0.5 else "#888888",
        "penwidth": str(1 + score * 2),  # 1-3px border
    }
```

**Visual Result:**

| Score | Opacity | Appearance |
|-------|---------|------------|
| 0.95 | ~100% | Solid, prominent |
| 0.75 | ~75% | Clear, visible |
| 0.50 | ~50% | Semi-transparent |
| 0.25 | ~25% | Faded, background |
| 0.10 | ~10% | Nearly invisible |

---

### Style 2: Size Mapping

Adjust node size based on importance:

```python
def score_to_size(score: float) -> tuple[float, float]:
    """Convert score to width and height."""
    # Base size 1.0, max size 2.5
    size = 1.0 + (score * 1.5)
    return (size, size * 0.6)  # Width, height

def style_node_with_size(
    class_name: str,
    score: float
) -> dict:
    """Generate node styling with size variation."""
    width, height = score_to_size(score)
    
    return {
        "width": str(width),
        "height": str(height),
        "fontsize": str(int(10 + score * 6)),  # 10-16pt
    }
```

---

### Style 3: Color Gradients

Use color to indicate relevance:

```python
from colorsys import hsv_to_rgb

def score_to_color(score: float) -> str:
    """Map score to color gradient (red -> yellow -> green)."""
    # Hue: 0.0 (red) -> 0.33 (green)
    hue = score * 0.33
    saturation = 0.6
    value = 0.9
    
    r, g, b = hsv_to_rgb(hue, saturation, value)
    return f"#{int(r*255):02X}{int(g*255):02X}{int(b*255):02X}"


def style_node_with_color(class_name: str, score: float) -> dict:
    """Generate node styling with score-based color."""
    return {
        "fillcolor": score_to_color(score),
        "style": "filled",
    }
```

**Color Mapping:**

| Score | Color | Meaning |
|-------|-------|---------|
| 0.9+ | Green | Highly relevant |
| 0.6-0.9 | Yellow-Green | Relevant |
| 0.3-0.6 | Yellow | Somewhat relevant |
| 0.1-0.3 | Orange | Low relevance |
| <0.1 | Red | Not relevant |

---

## Diagram Generation

### Graphviz DOT Generation

```python
from graphviz import Digraph

def generate_filtered_uml(
    schema: SchemaView,
    template_id: str,
    threshold: float = 0.5,
    show_edges: bool = True
) -> Digraph:
    """Generate UML diagram filtered by specificity."""
    
    dot = Digraph(
        name=f"Heritage Ontology - {template_id}",
        comment=f"Classes with specificity >= {threshold}",
    )
    
    # Graph attributes
    dot.attr(
        rankdir="TB",  # Top to bottom
        splines="ortho",
        nodesep="0.5",
        ranksep="0.8",
    )
    
    # Node defaults
    dot.attr("node", shape="record", fontname="Helvetica")
    
    # Get filtered classes
    included_classes = set()
    for class_name in schema.all_classes():
        cls = schema.get_class(class_name)
        score = get_template_score(cls, template_id)
        
        if score >= threshold:
            included_classes.add(class_name)
            
            # Add node with styling
            style = style_node_by_score(class_name, score)
            label = create_uml_label(cls)
            dot.node(class_name, label=label, **style)
    
    # Add edges (inheritance, relationships)
    if show_edges:
        for class_name in included_classes:
            cls = schema.get_class(class_name)
            
            # Inheritance (is_a)
            if cls.is_a and cls.is_a in included_classes:
                dot.edge(cls.is_a, class_name, arrowhead="empty")
            
            # Associations (slot ranges)
            for slot in cls.slots or []:
                slot_def = schema.get_slot(slot)
                if slot_def.range and slot_def.range in included_classes:
                    dot.edge(
                        class_name,
                        slot_def.range,
                        label=slot,
                        arrowhead="open"
                    )
    
    return dot


def create_uml_label(cls) -> str:
    """Create UML class label with attributes."""
    slots = []
    for slot_name in cls.slots or []:
        slots.append(f"+ {slot_name}")
    
    slot_str = "\\l".join(slots) if slots else ""
    
    return f"{{{cls.name}|{slot_str}\\l}}"
```

---

### PlantUML Generation

```python
def generate_plantuml(
    schema: SchemaView,
    template_id: str,
    threshold: float = 0.5
) -> str:
    """Generate PlantUML diagram filtered by specificity."""
    
    lines = [
        "@startuml",
        f"title Heritage Ontology - {template_id}",
        "skinparam classAttributeIconSize 0",
        "skinparam shadowing false",
        "",
    ]
    
    # Define color scale
    lines.append("skinparam class {")
    lines.append("  BackgroundColor<<high>> #90EE90")
    lines.append("  BackgroundColor<<medium>> #FFFACD")
    lines.append("  BackgroundColor<<low>> #FFB6C1")
    lines.append("}")
    lines.append("")
    
    # Get filtered classes
    included_classes = set()
    for class_name in schema.all_classes():
        cls = schema.get_class(class_name)
        score = get_template_score(cls, template_id)
        
        if score >= threshold:
            included_classes.add(class_name)
            
            # Determine stereotype
            if score >= 0.8:
                stereotype = "<<high>>"
            elif score >= 0.5:
                stereotype = "<<medium>>"
            else:
                stereotype = "<<low>>"
            
            # Class definition
            lines.append(f"class {class_name} {stereotype} {{")
            for slot in cls.slots or []:
                lines.append(f"  +{slot}")
            lines.append("}")
    
    lines.append("")
    
    # Add relationships
    for class_name in included_classes:
        cls = schema.get_class(class_name)
        
        # Inheritance
        if cls.is_a and cls.is_a in included_classes:
            lines.append(f"{cls.is_a} <|-- {class_name}")
    
    lines.append("@enduml")
    
    return "\n".join(lines)
```

---

## Interactive Features

### Feature 1: Progressive Disclosure

Start with high-threshold view, allow drilling down:

```python
class ProgressiveUMLViewer:
    """UML viewer with progressive disclosure based on specificity."""
    
    def __init__(self, schema: SchemaView, template_id: str):
        self.schema = schema
        self.template_id = template_id
        self.current_threshold = 0.8  # Start focused
    
    def render(self) -> Digraph:
        """Render current view."""
        return generate_filtered_uml(
            self.schema,
            self.template_id,
            self.current_threshold
        )
    
    def expand(self, step: float = 0.1):
        """Show more classes by lowering threshold."""
        self.current_threshold = max(0.1, self.current_threshold - step)
        return self.render()
    
    def focus(self, step: float = 0.1):
        """Show fewer classes by raising threshold."""
        self.current_threshold = min(0.95, self.current_threshold + step)
        return self.render()
    
    def expand_around(self, class_name: str, depth: int = 1):
        """Expand to show neighbors of a specific class."""
        # Find classes connected to the given class
        neighbors = self._find_neighbors(class_name, depth)
        
        # Temporarily lower threshold for neighbors
        # Implementation depends on visualization framework
        pass
```

---

### Feature 2: Template Switching

Quick view switching between conversation templates:

```python
class MultiTemplateViewer:
    """UML viewer supporting multiple template perspectives."""
    
    TEMPLATES = [
        "archive_search",
        "museum_search", 
        "library_search",
        "collection_discovery",
        "person_research",
        "location_browse",
        "identifier_lookup",
        "organizational_change",
        "digital_platform",
        "general_heritage",
    ]
    
    def __init__(self, schema: SchemaView):
        self.schema = schema
        self.current_template = "general_heritage"
        self.threshold = 0.5
    
    def switch_template(self, template_id: str) -> Digraph:
        """Switch to a different template perspective."""
        if template_id not in self.TEMPLATES:
            raise ValueError(f"Unknown template: {template_id}")
        
        self.current_template = template_id
        return self.render()
    
    def compare_templates(
        self,
        template_a: str,
        template_b: str
    ) -> tuple[Digraph, Digraph]:
        """Generate side-by-side comparison of two templates."""
        return (
            generate_filtered_uml(self.schema, template_a, self.threshold),
            generate_filtered_uml(self.schema, template_b, self.threshold),
        )
```

---

### Feature 3: Hover Information

Add score metadata to tooltips:

```python
def add_tooltip_info(
    dot: Digraph,
    class_name: str,
    cls,
    template_id: str
) -> None:
    """Add tooltip with specificity information."""
    
    general_score = get_general_score(cls)
    template_score = get_template_score(cls, template_id)
    rationale = cls.annotations.get("specificity_rationale", {}).value
    
    tooltip = f"""Class: {class_name}
General Specificity: {general_score:.2f}
{template_id} Specificity: {template_score:.2f}

Rationale: {rationale}"""
    
    dot.node(
        class_name,
        tooltip=tooltip,
        URL=f"#class-{class_name}",  # Link to documentation
    )
```

---

## Pre-generated Views

### Standard View Set

Generate a set of pre-computed views for common use cases:

```python
STANDARD_VIEWS = {
    # Overview views
    "overview_core": {
        "template": "general_heritage",
        "threshold": 0.7,
        "description": "Core classes for heritage custodian modeling"
    },
    "overview_full": {
        "template": "general_heritage", 
        "threshold": 0.3,
        "description": "Comprehensive view of all semantic classes"
    },
    
    # Task-specific views
    "task_archive_research": {
        "template": "archive_search",
        "threshold": 0.5,
        "description": "Classes relevant for archive research"
    },
    "task_person_lookup": {
        "template": "person_research",
        "threshold": 0.5,
        "description": "Classes for finding people in heritage institutions"
    },
    "task_collection_discovery": {
        "template": "collection_discovery",
        "threshold": 0.5,
        "description": "Classes for exploring collections"
    },
    
    # Technical views
    "technical_identifiers": {
        "template": "identifier_lookup",
        "threshold": 0.6,
        "description": "Identifier and linking classes"
    },
    "technical_platforms": {
        "template": "digital_platform",
        "threshold": 0.6,
        "description": "Digital platform and API classes"
    },
}


def generate_all_standard_views(schema: SchemaView, output_dir: Path):
    """Generate all standard views as SVG files."""
    
    for view_id, config in STANDARD_VIEWS.items():
        dot = generate_filtered_uml(
            schema,
            config["template"],
            config["threshold"]
        )
        
        # Add title and description
        dot.attr(label=config["description"])
        
        # Render to SVG
        output_path = output_dir / f"uml_{view_id}"
        dot.render(output_path, format="svg", cleanup=True)
        
        print(f"Generated: {output_path}.svg")
```

---

## Integration with Frontend

### API Endpoint for Dynamic Diagrams

```python
from fastapi import FastAPI, Query
from fastapi.responses import Response

app = FastAPI()

@app.get("/api/uml/filtered")
async def get_filtered_uml(
    template: str = Query("general_heritage"),
    threshold: float = Query(0.5, ge=0.0, le=1.0),
    format: str = Query("svg", regex="^(svg|png|dot)$"),
):
    """Generate filtered UML diagram."""
    
    schema = load_schema()
    dot = generate_filtered_uml(schema, template, threshold)
    
    if format == "dot":
        return Response(
            content=dot.source,
            media_type="text/plain"
        )
    else:
        rendered = dot.pipe(format=format)
        media_type = "image/svg+xml" if format == "svg" else "image/png"
        return Response(content=rendered, media_type=media_type)


@app.get("/api/uml/templates")
async def list_available_templates():
    """List available template perspectives."""
    return {
        "templates": [
            {"id": "archive_search", "name": "Archive Search"},
            {"id": "museum_search", "name": "Museum Search"},
            {"id": "library_search", "name": "Library Search"},
            {"id": "collection_discovery", "name": "Collection Discovery"},
            {"id": "person_research", "name": "Person Research"},
            {"id": "location_browse", "name": "Location Browse"},
            {"id": "identifier_lookup", "name": "Identifier Lookup"},
            {"id": "organizational_change", "name": "Organizational Change"},
            {"id": "digital_platform", "name": "Digital Platform"},
            {"id": "general_heritage", "name": "General (Default)"},
        ]
    }
```

### React Component Example

```tsx
// components/FilteredUMLViewer.tsx
import { useState } from 'react';

interface Props {
  initialTemplate?: string;
  initialThreshold?: number;
}

export function FilteredUMLViewer({
  initialTemplate = 'general_heritage',
  initialThreshold = 0.5,
}: Props) {
  const [template, setTemplate] = useState(initialTemplate);
  const [threshold, setThreshold] = useState(initialThreshold);
  
  const umlUrl = `/api/uml/filtered?template=${template}&threshold=${threshold}&format=svg`;
  
  return (
    <div className="uml-viewer">
      <div className="controls">
        <select
          value={template}
          onChange={(e) => setTemplate(e.target.value)}
        >
          <option value="archive_search">Archive Search</option>
          <option value="museum_search">Museum Search</option>
          <option value="person_research">Person Research</option>
          <option value="general_heritage">General</option>
          {/* ... more templates */}
        </select>
        
        <input
          type="range"
          min="0"
          max="1"
          step="0.1"
          value={threshold}
          onChange={(e) => setThreshold(parseFloat(e.target.value))}
        />
        <span>Threshold: {threshold.toFixed(1)}</span>
      </div>
      
      <div className="diagram">
        <img src={umlUrl} alt={`UML diagram for ${template}`} />
      </div>
    </div>
  );
}
```

---

## Performance Considerations

### Caching Strategy

```python
from functools import lru_cache
from hashlib import md5

@lru_cache(maxsize=100)
def get_cached_uml(
    template_id: str,
    threshold: float,
    format: str
) -> bytes:
    """Cache rendered UML diagrams."""
    schema = load_schema()
    dot = generate_filtered_uml(schema, template_id, threshold)
    return dot.pipe(format=format)


def invalidate_uml_cache():
    """Clear cache when schema changes."""
    get_cached_uml.cache_clear()
```

### Pre-rendering for Production

```bash
# scripts/prerender_uml_views.sh

#!/bin/bash
# Pre-render all standard UML views for production

TEMPLATES="archive_search museum_search library_search collection_discovery person_research location_browse identifier_lookup organizational_change digital_platform general_heritage"
THRESHOLDS="0.3 0.5 0.7 0.9"

for template in $TEMPLATES; do
  for threshold in $THRESHOLDS; do
    echo "Rendering: $template @ $threshold"
    python scripts/render_uml.py \
      --template "$template" \
      --threshold "$threshold" \
      --output "static/uml/${template}_${threshold}.svg"
  done
done

echo "Pre-rendering complete!"
```

---

## Validation Checklist

- [ ] All templates have corresponding view generation
- [ ] Threshold range validated (0.0-1.0)
- [ ] Edge cases handled (no classes meet threshold)
- [ ] Large diagrams render within timeout
- [ ] SVG output is valid XML
- [ ] Interactive features work in target browsers
- [ ] Cache invalidation triggers on schema change
- [ ] Standard views regenerated on deployment

---

## References

- `docs/plan/specificity_score/04-prompt-conversation-templates.md` - Template definitions
- `docs/plan/specificity_score/05-dependencies.md` - Visualization dependencies
- `frontend/src/components/` - Existing frontend components
- `scripts/generate_uml.py` - Current UML generation script