glam/docs/plan/specificity_score/06-uml-visualization.md
kempersc 242bc8bb35 Add new slots for heritage custodian entities
- Created deliverables_slot for expected or achieved deliverable outputs.
- Introduced event_id_slot for persistent unique event identifiers.
- Added follow_up_date_slot for scheduled follow-up action dates.
- Implemented object_ref_slot for references to heritage objects.
- Established price_slot for price information across entities.
- Added price_currency_slot for currency codes in price information.
- Created protocol_slot for API protocol specifications.
- Introduced provenance_text_slot for full provenance entry text.
- Added record_type_slot for classification of record types.
- Implemented response_formats_slot for supported API response formats.
- Established status_slot for current status of entities or activities.
- Added FactualCountDisplay component for displaying count query results.
- Introduced ReplyTypeIndicator component for visualizing reply types.
- Created approval_date_slot for formal approval dates.
- Added authentication_required_slot for API authentication status.
- Implemented capacity_items_slot for maximum storage capacity.
- Established conservation_lab_slot for conservation laboratory information.
- Added cost_usd_slot for API operation costs in USD.
2026-01-05 00:49:05 +01:00

21 KiB

Specificity Score System - UML Visualization Integration

Overview

This document describes how specificity scores integrate with UML diagram generation to create filtered, readable visualizations of the Heritage Custodian Ontology's 304+ classes.


Problem Statement

Current UML Challenges

The Heritage Custodian Ontology contains 304+ classes, making full UML diagrams:

  1. Visually overwhelming - Too many nodes to comprehend
  2. Difficult to navigate - No clear entry point or hierarchy
  3. Context-blind - Shows everything regardless of current task
  4. Slow to render - Large graphs take seconds to generate

Solution: Specificity-Based Filtering

Use specificity scores to:

  • Filter classes by relevance threshold
  • Adjust visual prominence (opacity, size, position)
  • Create template-specific focused views
  • Generate progressive disclosure diagrams

Filtering Strategies

Strategy 1: Threshold Filtering

Show only classes above a specificity threshold:

def filter_classes_by_threshold(
    schema: SchemaView,
    template_id: str,
    threshold: float = 0.5
) -> list[str]:
    """Return class names meeting specificity threshold."""
    included = []
    
    for class_name in schema.all_classes():
        cls = schema.get_class(class_name)
        score = get_template_score(cls, template_id)
        
        if score >= threshold:
            included.append(class_name)
    
    return included


# Example: archive_search with threshold 0.6
# Returns: ["Archive", "RecordSet", "Fonds", "FindingAid", 
#           "Collection", "Location", "HeritageCustodian", ...]

Threshold Guidelines:

Threshold Result Use Case
0.8+ 5-10 classes Focused overview, quick reference
0.6+ 15-30 classes Detailed task view
0.4+ 40-60 classes Comprehensive view
0.2+ 80-150 classes Near-complete (excludes technical)
0.0+ 304 classes Full ontology (overwhelming)

Strategy 2: Top-N Filtering

Show the N most relevant classes for a template:

def top_n_classes(
    schema: SchemaView,
    template_id: str,
    n: int = 20
) -> list[str]:
    """Return top N classes by specificity for template."""
    class_scores = []
    
    for class_name in schema.all_classes():
        cls = schema.get_class(class_name)
        score = get_template_score(cls, template_id)
        class_scores.append((class_name, score))
    
    # Sort by score descending
    class_scores.sort(key=lambda x: x[1], reverse=True)
    
    return [name for name, _ in class_scores[:n]]


# Example: top_n_classes(schema, "person_research", n=10)
# Returns: ["PersonProfile", "Staff", "Role", "Affiliation", 
#           "Director", "HeritageCustodian", "ContactPoint", ...]

Strategy 3: Tier-Based Grouping

Group classes into visual tiers based on score ranges:

from dataclasses import dataclass
from enum import Enum

class VisualTier(Enum):
    PRIMARY = "primary"      # Score >= 0.8
    SECONDARY = "secondary"  # Score 0.5-0.8
    TERTIARY = "tertiary"    # Score 0.3-0.5
    BACKGROUND = "background"  # Score < 0.3

@dataclass
class TieredClass:
    name: str
    score: float
    tier: VisualTier

def tier_classes(
    schema: SchemaView,
    template_id: str
) -> dict[VisualTier, list[TieredClass]]:
    """Group classes into visual tiers."""
    tiers = {tier: [] for tier in VisualTier}
    
    for class_name in schema.all_classes():
        cls = schema.get_class(class_name)
        score = get_template_score(cls, template_id)
        
        if score >= 0.8:
            tier = VisualTier.PRIMARY
        elif score >= 0.5:
            tier = VisualTier.SECONDARY
        elif score >= 0.3:
            tier = VisualTier.TERTIARY
        else:
            tier = VisualTier.BACKGROUND
        
        tiers[tier].append(TieredClass(class_name, score, tier))
    
    return tiers

Visual Styling

Style 1: Opacity Mapping

Map specificity score to node opacity:

def score_to_opacity(score: float) -> str:
    """Convert score to hex opacity (00-FF)."""
    # Score 1.0 -> fully opaque (FF)
    # Score 0.0 -> nearly transparent (20)
    opacity = int(32 + (score * 223))  # Range: 32-255
    return f"{opacity:02X}"

def style_node_by_score(
    class_name: str,
    score: float,
    base_color: str = "#4A90D9"
) -> dict:
    """Generate node styling based on specificity score."""
    opacity = score_to_opacity(score)
    
    return {
        "fillcolor": f"{base_color}{opacity}",
        "style": "filled",
        "fontcolor": "#333333" if score > 0.5 else "#888888",
        "penwidth": str(1 + score * 2),  # 1-3px border
    }

Visual Result:

Score Opacity Appearance
0.95 ~100% Solid, prominent
0.75 ~75% Clear, visible
0.50 ~50% Semi-transparent
0.25 ~25% Faded, background
0.10 ~10% Nearly invisible

Style 2: Size Mapping

Adjust node size based on importance:

def score_to_size(score: float) -> tuple[float, float]:
    """Convert score to width and height."""
    # Base size 1.0, max size 2.5
    size = 1.0 + (score * 1.5)
    return (size, size * 0.6)  # Width, height

def style_node_with_size(
    class_name: str,
    score: float
) -> dict:
    """Generate node styling with size variation."""
    width, height = score_to_size(score)
    
    return {
        "width": str(width),
        "height": str(height),
        "fontsize": str(int(10 + score * 6)),  # 10-16pt
    }

Style 3: Color Gradients

Use color to indicate relevance:

from colorsys import hsv_to_rgb

def score_to_color(score: float) -> str:
    """Map score to color gradient (red -> yellow -> green)."""
    # Hue: 0.0 (red) -> 0.33 (green)
    hue = score * 0.33
    saturation = 0.6
    value = 0.9
    
    r, g, b = hsv_to_rgb(hue, saturation, value)
    return f"#{int(r*255):02X}{int(g*255):02X}{int(b*255):02X}"


def style_node_with_color(class_name: str, score: float) -> dict:
    """Generate node styling with score-based color."""
    return {
        "fillcolor": score_to_color(score),
        "style": "filled",
    }

Color Mapping:

Score Color Meaning
0.9+ Green Highly relevant
0.6-0.9 Yellow-Green Relevant
0.3-0.6 Yellow Somewhat relevant
0.1-0.3 Orange Low relevance
<0.1 Red Not relevant

Diagram Generation

Graphviz DOT Generation

from graphviz import Digraph

def generate_filtered_uml(
    schema: SchemaView,
    template_id: str,
    threshold: float = 0.5,
    show_edges: bool = True
) -> Digraph:
    """Generate UML diagram filtered by specificity."""
    
    dot = Digraph(
        name=f"Heritage Ontology - {template_id}",
        comment=f"Classes with specificity >= {threshold}",
    )
    
    # Graph attributes
    dot.attr(
        rankdir="TB",  # Top to bottom
        splines="ortho",
        nodesep="0.5",
        ranksep="0.8",
    )
    
    # Node defaults
    dot.attr("node", shape="record", fontname="Helvetica")
    
    # Get filtered classes
    included_classes = set()
    for class_name in schema.all_classes():
        cls = schema.get_class(class_name)
        score = get_template_score(cls, template_id)
        
        if score >= threshold:
            included_classes.add(class_name)
            
            # Add node with styling
            style = style_node_by_score(class_name, score)
            label = create_uml_label(cls)
            dot.node(class_name, label=label, **style)
    
    # Add edges (inheritance, relationships)
    if show_edges:
        for class_name in included_classes:
            cls = schema.get_class(class_name)
            
            # Inheritance (is_a)
            if cls.is_a and cls.is_a in included_classes:
                dot.edge(cls.is_a, class_name, arrowhead="empty")
            
            # Associations (slot ranges)
            for slot in cls.slots or []:
                slot_def = schema.get_slot(slot)
                if slot_def.range and slot_def.range in included_classes:
                    dot.edge(
                        class_name,
                        slot_def.range,
                        label=slot,
                        arrowhead="open"
                    )
    
    return dot


def create_uml_label(cls) -> str:
    """Create UML class label with attributes."""
    slots = []
    for slot_name in cls.slots or []:
        slots.append(f"+ {slot_name}")
    
    slot_str = "\\l".join(slots) if slots else ""
    
    return f"{{{cls.name}|{slot_str}\\l}}"

PlantUML Generation

def generate_plantuml(
    schema: SchemaView,
    template_id: str,
    threshold: float = 0.5
) -> str:
    """Generate PlantUML diagram filtered by specificity."""
    
    lines = [
        "@startuml",
        f"title Heritage Ontology - {template_id}",
        "skinparam classAttributeIconSize 0",
        "skinparam shadowing false",
        "",
    ]
    
    # Define color scale
    lines.append("skinparam class {")
    lines.append("  BackgroundColor<<high>> #90EE90")
    lines.append("  BackgroundColor<<medium>> #FFFACD")
    lines.append("  BackgroundColor<<low>> #FFB6C1")
    lines.append("}")
    lines.append("")
    
    # Get filtered classes
    included_classes = set()
    for class_name in schema.all_classes():
        cls = schema.get_class(class_name)
        score = get_template_score(cls, template_id)
        
        if score >= threshold:
            included_classes.add(class_name)
            
            # Determine stereotype
            if score >= 0.8:
                stereotype = "<<high>>"
            elif score >= 0.5:
                stereotype = "<<medium>>"
            else:
                stereotype = "<<low>>"
            
            # Class definition
            lines.append(f"class {class_name} {stereotype} {{")
            for slot in cls.slots or []:
                lines.append(f"  +{slot}")
            lines.append("}")
    
    lines.append("")
    
    # Add relationships
    for class_name in included_classes:
        cls = schema.get_class(class_name)
        
        # Inheritance
        if cls.is_a and cls.is_a in included_classes:
            lines.append(f"{cls.is_a} <|-- {class_name}")
    
    lines.append("@enduml")
    
    return "\n".join(lines)

Interactive Features

Feature 1: Progressive Disclosure

Start with high-threshold view, allow drilling down:

class ProgressiveUMLViewer:
    """UML viewer with progressive disclosure based on specificity."""
    
    def __init__(self, schema: SchemaView, template_id: str):
        self.schema = schema
        self.template_id = template_id
        self.current_threshold = 0.8  # Start focused
    
    def render(self) -> Digraph:
        """Render current view."""
        return generate_filtered_uml(
            self.schema,
            self.template_id,
            self.current_threshold
        )
    
    def expand(self, step: float = 0.1):
        """Show more classes by lowering threshold."""
        self.current_threshold = max(0.1, self.current_threshold - step)
        return self.render()
    
    def focus(self, step: float = 0.1):
        """Show fewer classes by raising threshold."""
        self.current_threshold = min(0.95, self.current_threshold + step)
        return self.render()
    
    def expand_around(self, class_name: str, depth: int = 1):
        """Expand to show neighbors of a specific class."""
        # Find classes connected to the given class
        neighbors = self._find_neighbors(class_name, depth)
        
        # Temporarily lower threshold for neighbors
        # Implementation depends on visualization framework
        pass

Feature 2: Template Switching

Quick view switching between conversation templates:

class MultiTemplateViewer:
    """UML viewer supporting multiple template perspectives."""
    
    TEMPLATES = [
        "archive_search",
        "museum_search", 
        "library_search",
        "collection_discovery",
        "person_research",
        "location_browse",
        "identifier_lookup",
        "organizational_change",
        "digital_platform",
        "general_heritage",
    ]
    
    def __init__(self, schema: SchemaView):
        self.schema = schema
        self.current_template = "general_heritage"
        self.threshold = 0.5
    
    def switch_template(self, template_id: str) -> Digraph:
        """Switch to a different template perspective."""
        if template_id not in self.TEMPLATES:
            raise ValueError(f"Unknown template: {template_id}")
        
        self.current_template = template_id
        return self.render()
    
    def compare_templates(
        self,
        template_a: str,
        template_b: str
    ) -> tuple[Digraph, Digraph]:
        """Generate side-by-side comparison of two templates."""
        return (
            generate_filtered_uml(self.schema, template_a, self.threshold),
            generate_filtered_uml(self.schema, template_b, self.threshold),
        )

Feature 3: Hover Information

Add score metadata to tooltips:

def add_tooltip_info(
    dot: Digraph,
    class_name: str,
    cls,
    template_id: str
) -> None:
    """Add tooltip with specificity information."""
    
    general_score = get_general_score(cls)
    template_score = get_template_score(cls, template_id)
    rationale = cls.annotations.get("specificity_rationale", {}).value
    
    tooltip = f"""Class: {class_name}
General Specificity: {general_score:.2f}
{template_id} Specificity: {template_score:.2f}

Rationale: {rationale}"""
    
    dot.node(
        class_name,
        tooltip=tooltip,
        URL=f"#class-{class_name}",  # Link to documentation
    )

Pre-generated Views

Standard View Set

Generate a set of pre-computed views for common use cases:

STANDARD_VIEWS = {
    # Overview views
    "overview_core": {
        "template": "general_heritage",
        "threshold": 0.7,
        "description": "Core classes for heritage custodian modeling"
    },
    "overview_full": {
        "template": "general_heritage", 
        "threshold": 0.3,
        "description": "Comprehensive view of all semantic classes"
    },
    
    # Task-specific views
    "task_archive_research": {
        "template": "archive_search",
        "threshold": 0.5,
        "description": "Classes relevant for archive research"
    },
    "task_person_lookup": {
        "template": "person_research",
        "threshold": 0.5,
        "description": "Classes for finding people in heritage institutions"
    },
    "task_collection_discovery": {
        "template": "collection_discovery",
        "threshold": 0.5,
        "description": "Classes for exploring collections"
    },
    
    # Technical views
    "technical_identifiers": {
        "template": "identifier_lookup",
        "threshold": 0.6,
        "description": "Identifier and linking classes"
    },
    "technical_platforms": {
        "template": "digital_platform",
        "threshold": 0.6,
        "description": "Digital platform and API classes"
    },
}


def generate_all_standard_views(schema: SchemaView, output_dir: Path):
    """Generate all standard views as SVG files."""
    
    for view_id, config in STANDARD_VIEWS.items():
        dot = generate_filtered_uml(
            schema,
            config["template"],
            config["threshold"]
        )
        
        # Add title and description
        dot.attr(label=config["description"])
        
        # Render to SVG
        output_path = output_dir / f"uml_{view_id}"
        dot.render(output_path, format="svg", cleanup=True)
        
        print(f"Generated: {output_path}.svg")

Integration with Frontend

API Endpoint for Dynamic Diagrams

from fastapi import FastAPI, Query
from fastapi.responses import Response

app = FastAPI()

@app.get("/api/uml/filtered")
async def get_filtered_uml(
    template: str = Query("general_heritage"),
    threshold: float = Query(0.5, ge=0.0, le=1.0),
    format: str = Query("svg", regex="^(svg|png|dot)$"),
):
    """Generate filtered UML diagram."""
    
    schema = load_schema()
    dot = generate_filtered_uml(schema, template, threshold)
    
    if format == "dot":
        return Response(
            content=dot.source,
            media_type="text/plain"
        )
    else:
        rendered = dot.pipe(format=format)
        media_type = "image/svg+xml" if format == "svg" else "image/png"
        return Response(content=rendered, media_type=media_type)


@app.get("/api/uml/templates")
async def list_available_templates():
    """List available template perspectives."""
    return {
        "templates": [
            {"id": "archive_search", "name": "Archive Search"},
            {"id": "museum_search", "name": "Museum Search"},
            {"id": "library_search", "name": "Library Search"},
            {"id": "collection_discovery", "name": "Collection Discovery"},
            {"id": "person_research", "name": "Person Research"},
            {"id": "location_browse", "name": "Location Browse"},
            {"id": "identifier_lookup", "name": "Identifier Lookup"},
            {"id": "organizational_change", "name": "Organizational Change"},
            {"id": "digital_platform", "name": "Digital Platform"},
            {"id": "general_heritage", "name": "General (Default)"},
        ]
    }

React Component Example

// components/FilteredUMLViewer.tsx
import { useState } from 'react';

interface Props {
  initialTemplate?: string;
  initialThreshold?: number;
}

export function FilteredUMLViewer({
  initialTemplate = 'general_heritage',
  initialThreshold = 0.5,
}: Props) {
  const [template, setTemplate] = useState(initialTemplate);
  const [threshold, setThreshold] = useState(initialThreshold);
  
  const umlUrl = `/api/uml/filtered?template=${template}&threshold=${threshold}&format=svg`;
  
  return (
    <div className="uml-viewer">
      <div className="controls">
        <select
          value={template}
          onChange={(e) => setTemplate(e.target.value)}
        >
          <option value="archive_search">Archive Search</option>
          <option value="museum_search">Museum Search</option>
          <option value="person_research">Person Research</option>
          <option value="general_heritage">General</option>
          {/* ... more templates */}
        </select>
        
        <input
          type="range"
          min="0"
          max="1"
          step="0.1"
          value={threshold}
          onChange={(e) => setThreshold(parseFloat(e.target.value))}
        />
        <span>Threshold: {threshold.toFixed(1)}</span>
      </div>
      
      <div className="diagram">
        <img src={umlUrl} alt={`UML diagram for ${template}`} />
      </div>
    </div>
  );
}

Performance Considerations

Caching Strategy

from functools import lru_cache
from hashlib import md5

@lru_cache(maxsize=100)
def get_cached_uml(
    template_id: str,
    threshold: float,
    format: str
) -> bytes:
    """Cache rendered UML diagrams."""
    schema = load_schema()
    dot = generate_filtered_uml(schema, template_id, threshold)
    return dot.pipe(format=format)


def invalidate_uml_cache():
    """Clear cache when schema changes."""
    get_cached_uml.cache_clear()

Pre-rendering for Production

# scripts/prerender_uml_views.sh

#!/bin/bash
# Pre-render all standard UML views for production

TEMPLATES="archive_search museum_search library_search collection_discovery person_research location_browse identifier_lookup organizational_change digital_platform general_heritage"
THRESHOLDS="0.3 0.5 0.7 0.9"

for template in $TEMPLATES; do
  for threshold in $THRESHOLDS; do
    echo "Rendering: $template @ $threshold"
    python scripts/render_uml.py \
      --template "$template" \
      --threshold "$threshold" \
      --output "static/uml/${template}_${threshold}.svg"
  done
done

echo "Pre-rendering complete!"

Validation Checklist

  • All templates have corresponding view generation
  • Threshold range validated (0.0-1.0)
  • Edge cases handled (no classes meet threshold)
  • Large diagrams render within timeout
  • SVG output is valid XML
  • Interactive features work in target browsers
  • Cache invalidation triggers on schema change
  • Standard views regenerated on deployment

References

  • docs/plan/specificity_score/04-prompt-conversation-templates.md - Template definitions
  • docs/plan/specificity_score/05-dependencies.md - Visualization dependencies
  • frontend/src/components/ - Existing frontend components
  • scripts/generate_uml.py - Current UML generation script