# Specificity Score System - UML Visualization Integration ## Overview This document describes how specificity scores integrate with UML diagram generation to create filtered, readable visualizations of the Heritage Custodian Ontology's 304+ classes. > **INTEGRATION NOTE**: This document references **context templates** (e.g., `archive_search`, `museum_search`) used for UML filtering. These context templates are **mapped from** the existing SPARQL templates in `backend/rag/template_sparql.py`. See the mapping table below. --- ## SPARQL → Context Template Mapping (for UML Views) The existing `TemplateClassifier` in `backend/rag/template_sparql.py:1104` classifies questions to SPARQL template IDs. For UML visualization, these are mapped to context templates: | SPARQL Template ID | Context Template | UML View Focus | |--------------------|------------------|----------------| | `list_institutions_by_type_city` | `location_browse` | Institution + Location classes | | `list_institutions_by_type_region` | `location_browse` | Institution + Region classes | | `find_institution_by_identifier` | `identifier_lookup` | Identifier + GHCID classes | | `find_institutions_by_founding_date` | `organizational_change` | ChangeEvent + Timeline classes | | `list_institutions_by_collection_type` | `collection_discovery` | Collection + Subject classes | | `find_person_by_role` | `person_research` | Person + Staff + Role classes | | `none` (default) | `general_heritage` | Core heritage classes | **Institution Type Refinement**: When SPARQL templates extract `institution_type` slot: - `institution_type = A` → refine to `archive_search` - `institution_type = M` → refine to `museum_search` - `institution_type = L` → refine to `library_search` --- ## Problem Statement ### Current UML Challenges The Heritage Custodian Ontology contains **304+ classes**, making full UML diagrams: 1. **Visually overwhelming** - Too many nodes to comprehend 2. **Difficult to navigate** - No clear entry point or hierarchy 3. **Context-blind** - Shows everything regardless of current task 4. **Slow to render** - Large graphs take seconds to generate ### Solution: Specificity-Based Filtering Use specificity scores to: - Filter classes by relevance threshold - Adjust visual prominence (opacity, size, position) - Create template-specific focused views - Generate progressive disclosure diagrams --- ## Filtering Strategies ### Strategy 1: Threshold Filtering Show only classes above a specificity threshold: ```python def filter_classes_by_threshold( schema: SchemaView, template_id: str, threshold: float = 0.5 ) -> list[str]: """Return class names meeting specificity threshold.""" included = [] for class_name in schema.all_classes(): cls = schema.get_class(class_name) score = get_template_score(cls, template_id) if score >= threshold: included.append(class_name) return included # Example: archive_search with threshold 0.6 # Returns: ["Archive", "RecordSet", "Fonds", "FindingAid", # "Collection", "Location", "HeritageCustodian", ...] ``` **Threshold Guidelines:** | Threshold | Result | Use Case | |-----------|--------|----------| | 0.8+ | 5-10 classes | Focused overview, quick reference | | 0.6+ | 15-30 classes | Detailed task view | | 0.4+ | 40-60 classes | Comprehensive view | | 0.2+ | 80-150 classes | Near-complete (excludes technical) | | 0.0+ | 304 classes | Full ontology (overwhelming) | --- ### Strategy 2: Top-N Filtering Show the N most relevant classes for a template: ```python def top_n_classes( schema: SchemaView, template_id: str, n: int = 20 ) -> list[str]: """Return top N classes by specificity for template.""" class_scores = [] for class_name in schema.all_classes(): cls = schema.get_class(class_name) score = get_template_score(cls, template_id) class_scores.append((class_name, score)) # Sort by score descending class_scores.sort(key=lambda x: x[1], reverse=True) return [name for name, _ in class_scores[:n]] # Example: top_n_classes(schema, "person_research", n=10) # Returns: ["PersonProfile", "Staff", "Role", "Affiliation", # "Director", "HeritageCustodian", "ContactPoint", ...] ``` --- ### Strategy 3: Tier-Based Grouping Group classes into visual tiers based on score ranges: ```python from dataclasses import dataclass from enum import Enum class VisualTier(Enum): PRIMARY = "primary" # Score >= 0.8 SECONDARY = "secondary" # Score 0.5-0.8 TERTIARY = "tertiary" # Score 0.3-0.5 BACKGROUND = "background" # Score < 0.3 @dataclass class TieredClass: name: str score: float tier: VisualTier def tier_classes( schema: SchemaView, template_id: str ) -> dict[VisualTier, list[TieredClass]]: """Group classes into visual tiers.""" tiers = {tier: [] for tier in VisualTier} for class_name in schema.all_classes(): cls = schema.get_class(class_name) score = get_template_score(cls, template_id) if score >= 0.8: tier = VisualTier.PRIMARY elif score >= 0.5: tier = VisualTier.SECONDARY elif score >= 0.3: tier = VisualTier.TERTIARY else: tier = VisualTier.BACKGROUND tiers[tier].append(TieredClass(class_name, score, tier)) return tiers ``` --- ## Visual Styling ### Style 1: Opacity Mapping Map specificity score to node opacity: ```python def score_to_opacity(score: float) -> str: """Convert score to hex opacity (00-FF).""" # Score 1.0 -> fully opaque (FF) # Score 0.0 -> nearly transparent (20) opacity = int(32 + (score * 223)) # Range: 32-255 return f"{opacity:02X}" def style_node_by_score( class_name: str, score: float, base_color: str = "#4A90D9" ) -> dict: """Generate node styling based on specificity score.""" opacity = score_to_opacity(score) return { "fillcolor": f"{base_color}{opacity}", "style": "filled", "fontcolor": "#333333" if score > 0.5 else "#888888", "penwidth": str(1 + score * 2), # 1-3px border } ``` **Visual Result:** | Score | Opacity | Appearance | |-------|---------|------------| | 0.95 | ~100% | Solid, prominent | | 0.75 | ~75% | Clear, visible | | 0.50 | ~50% | Semi-transparent | | 0.25 | ~25% | Faded, background | | 0.10 | ~10% | Nearly invisible | --- ### Style 2: Size Mapping Adjust node size based on importance: ```python def score_to_size(score: float) -> tuple[float, float]: """Convert score to width and height.""" # Base size 1.0, max size 2.5 size = 1.0 + (score * 1.5) return (size, size * 0.6) # Width, height def style_node_with_size( class_name: str, score: float ) -> dict: """Generate node styling with size variation.""" width, height = score_to_size(score) return { "width": str(width), "height": str(height), "fontsize": str(int(10 + score * 6)), # 10-16pt } ``` --- ### Style 3: Color Gradients Use color to indicate relevance: ```python from colorsys import hsv_to_rgb def score_to_color(score: float) -> str: """Map score to color gradient (red -> yellow -> green).""" # Hue: 0.0 (red) -> 0.33 (green) hue = score * 0.33 saturation = 0.6 value = 0.9 r, g, b = hsv_to_rgb(hue, saturation, value) return f"#{int(r*255):02X}{int(g*255):02X}{int(b*255):02X}" def style_node_with_color(class_name: str, score: float) -> dict: """Generate node styling with score-based color.""" return { "fillcolor": score_to_color(score), "style": "filled", } ``` **Color Mapping:** | Score | Color | Meaning | |-------|-------|---------| | 0.9+ | Green | Highly relevant | | 0.6-0.9 | Yellow-Green | Relevant | | 0.3-0.6 | Yellow | Somewhat relevant | | 0.1-0.3 | Orange | Low relevance | | <0.1 | Red | Not relevant | --- ## Diagram Generation ### Graphviz DOT Generation ```python from graphviz import Digraph def generate_filtered_uml( schema: SchemaView, template_id: str, threshold: float = 0.5, show_edges: bool = True ) -> Digraph: """Generate UML diagram filtered by specificity.""" dot = Digraph( name=f"Heritage Ontology - {template_id}", comment=f"Classes with specificity >= {threshold}", ) # Graph attributes dot.attr( rankdir="TB", # Top to bottom splines="ortho", nodesep="0.5", ranksep="0.8", ) # Node defaults dot.attr("node", shape="record", fontname="Helvetica") # Get filtered classes included_classes = set() for class_name in schema.all_classes(): cls = schema.get_class(class_name) score = get_template_score(cls, template_id) if score >= threshold: included_classes.add(class_name) # Add node with styling style = style_node_by_score(class_name, score) label = create_uml_label(cls) dot.node(class_name, label=label, **style) # Add edges (inheritance, relationships) if show_edges: for class_name in included_classes: cls = schema.get_class(class_name) # Inheritance (is_a) if cls.is_a and cls.is_a in included_classes: dot.edge(cls.is_a, class_name, arrowhead="empty") # Associations (slot ranges) for slot in cls.slots or []: slot_def = schema.get_slot(slot) if slot_def.range and slot_def.range in included_classes: dot.edge( class_name, slot_def.range, label=slot, arrowhead="open" ) return dot def create_uml_label(cls) -> str: """Create UML class label with attributes.""" slots = [] for slot_name in cls.slots or []: slots.append(f"+ {slot_name}") slot_str = "\\l".join(slots) if slots else "" return f"{{{cls.name}|{slot_str}\\l}}" ``` --- ### PlantUML Generation ```python def generate_plantuml( schema: SchemaView, template_id: str, threshold: float = 0.5 ) -> str: """Generate PlantUML diagram filtered by specificity.""" lines = [ "@startuml", f"title Heritage Ontology - {template_id}", "skinparam classAttributeIconSize 0", "skinparam shadowing false", "", ] # Define color scale lines.append("skinparam class {") lines.append(" BackgroundColor<> #90EE90") lines.append(" BackgroundColor<> #FFFACD") lines.append(" BackgroundColor<> #FFB6C1") lines.append("}") lines.append("") # Get filtered classes included_classes = set() for class_name in schema.all_classes(): cls = schema.get_class(class_name) score = get_template_score(cls, template_id) if score >= threshold: included_classes.add(class_name) # Determine stereotype if score >= 0.8: stereotype = "<>" elif score >= 0.5: stereotype = "<>" else: stereotype = "<>" # Class definition lines.append(f"class {class_name} {stereotype} {{") for slot in cls.slots or []: lines.append(f" +{slot}") lines.append("}") lines.append("") # Add relationships for class_name in included_classes: cls = schema.get_class(class_name) # Inheritance if cls.is_a and cls.is_a in included_classes: lines.append(f"{cls.is_a} <|-- {class_name}") lines.append("@enduml") return "\n".join(lines) ``` --- ## Interactive Features ### Feature 1: Progressive Disclosure Start with high-threshold view, allow drilling down: ```python class ProgressiveUMLViewer: """UML viewer with progressive disclosure based on specificity.""" def __init__(self, schema: SchemaView, template_id: str): self.schema = schema self.template_id = template_id self.current_threshold = 0.8 # Start focused def render(self) -> Digraph: """Render current view.""" return generate_filtered_uml( self.schema, self.template_id, self.current_threshold ) def expand(self, step: float = 0.1): """Show more classes by lowering threshold.""" self.current_threshold = max(0.1, self.current_threshold - step) return self.render() def focus(self, step: float = 0.1): """Show fewer classes by raising threshold.""" self.current_threshold = min(0.95, self.current_threshold + step) return self.render() def expand_around(self, class_name: str, depth: int = 1): """Expand to show neighbors of a specific class.""" # Find classes connected to the given class neighbors = self._find_neighbors(class_name, depth) # Temporarily lower threshold for neighbors # Implementation depends on visualization framework pass ``` --- ### Feature 2: Template Switching Quick view switching between conversation templates: ```python class MultiTemplateViewer: """UML viewer supporting multiple template perspectives.""" TEMPLATES = [ "archive_search", "museum_search", "library_search", "collection_discovery", "person_research", "location_browse", "identifier_lookup", "organizational_change", "digital_platform", "general_heritage", ] def __init__(self, schema: SchemaView): self.schema = schema self.current_template = "general_heritage" self.threshold = 0.5 def switch_template(self, template_id: str) -> Digraph: """Switch to a different template perspective.""" if template_id not in self.TEMPLATES: raise ValueError(f"Unknown template: {template_id}") self.current_template = template_id return self.render() def compare_templates( self, template_a: str, template_b: str ) -> tuple[Digraph, Digraph]: """Generate side-by-side comparison of two templates.""" return ( generate_filtered_uml(self.schema, template_a, self.threshold), generate_filtered_uml(self.schema, template_b, self.threshold), ) ``` --- ### Feature 3: Hover Information Add score metadata to tooltips: ```python def add_tooltip_info( dot: Digraph, class_name: str, cls, template_id: str ) -> None: """Add tooltip with specificity information.""" general_score = get_general_score(cls) template_score = get_template_score(cls, template_id) rationale = cls.annotations.get("specificity_rationale", {}).value tooltip = f"""Class: {class_name} General Specificity: {general_score:.2f} {template_id} Specificity: {template_score:.2f} Rationale: {rationale}""" dot.node( class_name, tooltip=tooltip, URL=f"#class-{class_name}", # Link to documentation ) ``` --- ## Pre-generated Views ### Standard View Set Generate a set of pre-computed views for common use cases: ```python STANDARD_VIEWS = { # Overview views "overview_core": { "template": "general_heritage", "threshold": 0.7, "description": "Core classes for heritage custodian modeling" }, "overview_full": { "template": "general_heritage", "threshold": 0.3, "description": "Comprehensive view of all semantic classes" }, # Task-specific views "task_archive_research": { "template": "archive_search", "threshold": 0.5, "description": "Classes relevant for archive research" }, "task_person_lookup": { "template": "person_research", "threshold": 0.5, "description": "Classes for finding people in heritage institutions" }, "task_collection_discovery": { "template": "collection_discovery", "threshold": 0.5, "description": "Classes for exploring collections" }, # Technical views "technical_identifiers": { "template": "identifier_lookup", "threshold": 0.6, "description": "Identifier and linking classes" }, "technical_platforms": { "template": "digital_platform", "threshold": 0.6, "description": "Digital platform and API classes" }, } def generate_all_standard_views(schema: SchemaView, output_dir: Path): """Generate all standard views as SVG files.""" for view_id, config in STANDARD_VIEWS.items(): dot = generate_filtered_uml( schema, config["template"], config["threshold"] ) # Add title and description dot.attr(label=config["description"]) # Render to SVG output_path = output_dir / f"uml_{view_id}" dot.render(output_path, format="svg", cleanup=True) print(f"Generated: {output_path}.svg") ``` --- ## Integration with Frontend ### API Endpoint for Dynamic Diagrams ```python from fastapi import FastAPI, Query from fastapi.responses import Response app = FastAPI() @app.get("/api/uml/filtered") async def get_filtered_uml( template: str = Query("general_heritage"), threshold: float = Query(0.5, ge=0.0, le=1.0), format: str = Query("svg", regex="^(svg|png|dot)$"), ): """Generate filtered UML diagram.""" schema = load_schema() dot = generate_filtered_uml(schema, template, threshold) if format == "dot": return Response( content=dot.source, media_type="text/plain" ) else: rendered = dot.pipe(format=format) media_type = "image/svg+xml" if format == "svg" else "image/png" return Response(content=rendered, media_type=media_type) @app.get("/api/uml/templates") async def list_available_templates(): """List available template perspectives.""" return { "templates": [ {"id": "archive_search", "name": "Archive Search"}, {"id": "museum_search", "name": "Museum Search"}, {"id": "library_search", "name": "Library Search"}, {"id": "collection_discovery", "name": "Collection Discovery"}, {"id": "person_research", "name": "Person Research"}, {"id": "location_browse", "name": "Location Browse"}, {"id": "identifier_lookup", "name": "Identifier Lookup"}, {"id": "organizational_change", "name": "Organizational Change"}, {"id": "digital_platform", "name": "Digital Platform"}, {"id": "general_heritage", "name": "General (Default)"}, ] } ``` ### React Component Example ```tsx // components/FilteredUMLViewer.tsx import { useState } from 'react'; interface Props { initialTemplate?: string; initialThreshold?: number; } export function FilteredUMLViewer({ initialTemplate = 'general_heritage', initialThreshold = 0.5, }: Props) { const [template, setTemplate] = useState(initialTemplate); const [threshold, setThreshold] = useState(initialThreshold); const umlUrl = `/api/uml/filtered?template=${template}&threshold=${threshold}&format=svg`; return (
setThreshold(parseFloat(e.target.value))} /> Threshold: {threshold.toFixed(1)}
{`UML
); } ``` --- ## Performance Considerations ### Caching Strategy ```python from functools import lru_cache from hashlib import md5 @lru_cache(maxsize=100) def get_cached_uml( template_id: str, threshold: float, format: str ) -> bytes: """Cache rendered UML diagrams.""" schema = load_schema() dot = generate_filtered_uml(schema, template_id, threshold) return dot.pipe(format=format) def invalidate_uml_cache(): """Clear cache when schema changes.""" get_cached_uml.cache_clear() ``` ### Pre-rendering for Production ```bash # scripts/prerender_uml_views.sh #!/bin/bash # Pre-render all standard UML views for production TEMPLATES="archive_search museum_search library_search collection_discovery person_research location_browse identifier_lookup organizational_change digital_platform general_heritage" THRESHOLDS="0.3 0.5 0.7 0.9" for template in $TEMPLATES; do for threshold in $THRESHOLDS; do echo "Rendering: $template @ $threshold" python scripts/render_uml.py \ --template "$template" \ --threshold "$threshold" \ --output "static/uml/${template}_${threshold}.svg" done done echo "Pre-rendering complete!" ``` --- ## Validation Checklist - [ ] All templates have corresponding view generation - [ ] Threshold range validated (0.0-1.0) - [ ] Edge cases handled (no classes meet threshold) - [ ] Large diagrams render within timeout - [ ] SVG output is valid XML - [ ] Interactive features work in target browsers - [ ] Cache invalidation triggers on schema change - [ ] Standard views regenerated on deployment --- ## References - `docs/plan/specificity_score/04-prompt-conversation-templates.md` - Template definitions - `docs/plan/specificity_score/05-dependencies.md` - Visualization dependencies - `frontend/src/components/` - Existing frontend components - `scripts/generate_uml.py` - Current UML generation script