- Updated documentation to clarify integration points with existing components in the RAG pipeline and DSPy framework. - Added detailed mapping of SPARQL templates to context templates for improved specificity filtering. - Implemented wrapper patterns around existing classifiers to extend functionality without duplication. - Introduced new tests for the SpecificityAwareClassifier and SPARQLToContextMapper to ensure proper integration and functionality. - Enhanced the CustodianRDFConverter to include ISO country and subregion codes from GHCID for better geospatial data handling.
22 KiB
Specificity Score System - UML Visualization Integration
Overview
This document describes how specificity scores integrate with UML diagram generation to create filtered, readable visualizations of the Heritage Custodian Ontology's 304+ classes.
INTEGRATION NOTE: This document references context templates (e.g.,
archive_search,museum_search) used for UML filtering. These context templates are mapped from the existing SPARQL templates inbackend/rag/template_sparql.py. See the mapping table below.
SPARQL → Context Template Mapping (for UML Views)
The existing TemplateClassifier in backend/rag/template_sparql.py:1104 classifies questions to SPARQL template IDs. For UML visualization, these are mapped to context templates:
| SPARQL Template ID | Context Template | UML View Focus |
|---|---|---|
list_institutions_by_type_city |
location_browse |
Institution + Location classes |
list_institutions_by_type_region |
location_browse |
Institution + Region classes |
find_institution_by_identifier |
identifier_lookup |
Identifier + GHCID classes |
find_institutions_by_founding_date |
organizational_change |
ChangeEvent + Timeline classes |
list_institutions_by_collection_type |
collection_discovery |
Collection + Subject classes |
find_person_by_role |
person_research |
Person + Staff + Role classes |
none (default) |
general_heritage |
Core heritage classes |
Institution Type Refinement: When SPARQL templates extract institution_type slot:
institution_type = A→ refine toarchive_searchinstitution_type = M→ refine tomuseum_searchinstitution_type = L→ refine tolibrary_search
Problem Statement
Current UML Challenges
The Heritage Custodian Ontology contains 304+ classes, making full UML diagrams:
- Visually overwhelming - Too many nodes to comprehend
- Difficult to navigate - No clear entry point or hierarchy
- Context-blind - Shows everything regardless of current task
- Slow to render - Large graphs take seconds to generate
Solution: Specificity-Based Filtering
Use specificity scores to:
- Filter classes by relevance threshold
- Adjust visual prominence (opacity, size, position)
- Create template-specific focused views
- Generate progressive disclosure diagrams
Filtering Strategies
Strategy 1: Threshold Filtering
Show only classes above a specificity threshold:
def filter_classes_by_threshold(
schema: SchemaView,
template_id: str,
threshold: float = 0.5
) -> list[str]:
"""Return class names meeting specificity threshold."""
included = []
for class_name in schema.all_classes():
cls = schema.get_class(class_name)
score = get_template_score(cls, template_id)
if score >= threshold:
included.append(class_name)
return included
# Example: archive_search with threshold 0.6
# Returns: ["Archive", "RecordSet", "Fonds", "FindingAid",
# "Collection", "Location", "HeritageCustodian", ...]
Threshold Guidelines:
| Threshold | Result | Use Case |
|---|---|---|
| 0.8+ | 5-10 classes | Focused overview, quick reference |
| 0.6+ | 15-30 classes | Detailed task view |
| 0.4+ | 40-60 classes | Comprehensive view |
| 0.2+ | 80-150 classes | Near-complete (excludes technical) |
| 0.0+ | 304 classes | Full ontology (overwhelming) |
Strategy 2: Top-N Filtering
Show the N most relevant classes for a template:
def top_n_classes(
schema: SchemaView,
template_id: str,
n: int = 20
) -> list[str]:
"""Return top N classes by specificity for template."""
class_scores = []
for class_name in schema.all_classes():
cls = schema.get_class(class_name)
score = get_template_score(cls, template_id)
class_scores.append((class_name, score))
# Sort by score descending
class_scores.sort(key=lambda x: x[1], reverse=True)
return [name for name, _ in class_scores[:n]]
# Example: top_n_classes(schema, "person_research", n=10)
# Returns: ["PersonProfile", "Staff", "Role", "Affiliation",
# "Director", "HeritageCustodian", "ContactPoint", ...]
Strategy 3: Tier-Based Grouping
Group classes into visual tiers based on score ranges:
from dataclasses import dataclass
from enum import Enum
class VisualTier(Enum):
PRIMARY = "primary" # Score >= 0.8
SECONDARY = "secondary" # Score 0.5-0.8
TERTIARY = "tertiary" # Score 0.3-0.5
BACKGROUND = "background" # Score < 0.3
@dataclass
class TieredClass:
name: str
score: float
tier: VisualTier
def tier_classes(
schema: SchemaView,
template_id: str
) -> dict[VisualTier, list[TieredClass]]:
"""Group classes into visual tiers."""
tiers = {tier: [] for tier in VisualTier}
for class_name in schema.all_classes():
cls = schema.get_class(class_name)
score = get_template_score(cls, template_id)
if score >= 0.8:
tier = VisualTier.PRIMARY
elif score >= 0.5:
tier = VisualTier.SECONDARY
elif score >= 0.3:
tier = VisualTier.TERTIARY
else:
tier = VisualTier.BACKGROUND
tiers[tier].append(TieredClass(class_name, score, tier))
return tiers
Visual Styling
Style 1: Opacity Mapping
Map specificity score to node opacity:
def score_to_opacity(score: float) -> str:
"""Convert score to hex opacity (00-FF)."""
# Score 1.0 -> fully opaque (FF)
# Score 0.0 -> nearly transparent (20)
opacity = int(32 + (score * 223)) # Range: 32-255
return f"{opacity:02X}"
def style_node_by_score(
class_name: str,
score: float,
base_color: str = "#4A90D9"
) -> dict:
"""Generate node styling based on specificity score."""
opacity = score_to_opacity(score)
return {
"fillcolor": f"{base_color}{opacity}",
"style": "filled",
"fontcolor": "#333333" if score > 0.5 else "#888888",
"penwidth": str(1 + score * 2), # 1-3px border
}
Visual Result:
| Score | Opacity | Appearance |
|---|---|---|
| 0.95 | ~100% | Solid, prominent |
| 0.75 | ~75% | Clear, visible |
| 0.50 | ~50% | Semi-transparent |
| 0.25 | ~25% | Faded, background |
| 0.10 | ~10% | Nearly invisible |
Style 2: Size Mapping
Adjust node size based on importance:
def score_to_size(score: float) -> tuple[float, float]:
"""Convert score to width and height."""
# Base size 1.0, max size 2.5
size = 1.0 + (score * 1.5)
return (size, size * 0.6) # Width, height
def style_node_with_size(
class_name: str,
score: float
) -> dict:
"""Generate node styling with size variation."""
width, height = score_to_size(score)
return {
"width": str(width),
"height": str(height),
"fontsize": str(int(10 + score * 6)), # 10-16pt
}
Style 3: Color Gradients
Use color to indicate relevance:
from colorsys import hsv_to_rgb
def score_to_color(score: float) -> str:
"""Map score to color gradient (red -> yellow -> green)."""
# Hue: 0.0 (red) -> 0.33 (green)
hue = score * 0.33
saturation = 0.6
value = 0.9
r, g, b = hsv_to_rgb(hue, saturation, value)
return f"#{int(r*255):02X}{int(g*255):02X}{int(b*255):02X}"
def style_node_with_color(class_name: str, score: float) -> dict:
"""Generate node styling with score-based color."""
return {
"fillcolor": score_to_color(score),
"style": "filled",
}
Color Mapping:
| Score | Color | Meaning |
|---|---|---|
| 0.9+ | Green | Highly relevant |
| 0.6-0.9 | Yellow-Green | Relevant |
| 0.3-0.6 | Yellow | Somewhat relevant |
| 0.1-0.3 | Orange | Low relevance |
| <0.1 | Red | Not relevant |
Diagram Generation
Graphviz DOT Generation
from graphviz import Digraph
def generate_filtered_uml(
schema: SchemaView,
template_id: str,
threshold: float = 0.5,
show_edges: bool = True
) -> Digraph:
"""Generate UML diagram filtered by specificity."""
dot = Digraph(
name=f"Heritage Ontology - {template_id}",
comment=f"Classes with specificity >= {threshold}",
)
# Graph attributes
dot.attr(
rankdir="TB", # Top to bottom
splines="ortho",
nodesep="0.5",
ranksep="0.8",
)
# Node defaults
dot.attr("node", shape="record", fontname="Helvetica")
# Get filtered classes
included_classes = set()
for class_name in schema.all_classes():
cls = schema.get_class(class_name)
score = get_template_score(cls, template_id)
if score >= threshold:
included_classes.add(class_name)
# Add node with styling
style = style_node_by_score(class_name, score)
label = create_uml_label(cls)
dot.node(class_name, label=label, **style)
# Add edges (inheritance, relationships)
if show_edges:
for class_name in included_classes:
cls = schema.get_class(class_name)
# Inheritance (is_a)
if cls.is_a and cls.is_a in included_classes:
dot.edge(cls.is_a, class_name, arrowhead="empty")
# Associations (slot ranges)
for slot in cls.slots or []:
slot_def = schema.get_slot(slot)
if slot_def.range and slot_def.range in included_classes:
dot.edge(
class_name,
slot_def.range,
label=slot,
arrowhead="open"
)
return dot
def create_uml_label(cls) -> str:
"""Create UML class label with attributes."""
slots = []
for slot_name in cls.slots or []:
slots.append(f"+ {slot_name}")
slot_str = "\\l".join(slots) if slots else ""
return f"{{{cls.name}|{slot_str}\\l}}"
PlantUML Generation
def generate_plantuml(
schema: SchemaView,
template_id: str,
threshold: float = 0.5
) -> str:
"""Generate PlantUML diagram filtered by specificity."""
lines = [
"@startuml",
f"title Heritage Ontology - {template_id}",
"skinparam classAttributeIconSize 0",
"skinparam shadowing false",
"",
]
# Define color scale
lines.append("skinparam class {")
lines.append(" BackgroundColor<<high>> #90EE90")
lines.append(" BackgroundColor<<medium>> #FFFACD")
lines.append(" BackgroundColor<<low>> #FFB6C1")
lines.append("}")
lines.append("")
# Get filtered classes
included_classes = set()
for class_name in schema.all_classes():
cls = schema.get_class(class_name)
score = get_template_score(cls, template_id)
if score >= threshold:
included_classes.add(class_name)
# Determine stereotype
if score >= 0.8:
stereotype = "<<high>>"
elif score >= 0.5:
stereotype = "<<medium>>"
else:
stereotype = "<<low>>"
# Class definition
lines.append(f"class {class_name} {stereotype} {{")
for slot in cls.slots or []:
lines.append(f" +{slot}")
lines.append("}")
lines.append("")
# Add relationships
for class_name in included_classes:
cls = schema.get_class(class_name)
# Inheritance
if cls.is_a and cls.is_a in included_classes:
lines.append(f"{cls.is_a} <|-- {class_name}")
lines.append("@enduml")
return "\n".join(lines)
Interactive Features
Feature 1: Progressive Disclosure
Start with high-threshold view, allow drilling down:
class ProgressiveUMLViewer:
"""UML viewer with progressive disclosure based on specificity."""
def __init__(self, schema: SchemaView, template_id: str):
self.schema = schema
self.template_id = template_id
self.current_threshold = 0.8 # Start focused
def render(self) -> Digraph:
"""Render current view."""
return generate_filtered_uml(
self.schema,
self.template_id,
self.current_threshold
)
def expand(self, step: float = 0.1):
"""Show more classes by lowering threshold."""
self.current_threshold = max(0.1, self.current_threshold - step)
return self.render()
def focus(self, step: float = 0.1):
"""Show fewer classes by raising threshold."""
self.current_threshold = min(0.95, self.current_threshold + step)
return self.render()
def expand_around(self, class_name: str, depth: int = 1):
"""Expand to show neighbors of a specific class."""
# Find classes connected to the given class
neighbors = self._find_neighbors(class_name, depth)
# Temporarily lower threshold for neighbors
# Implementation depends on visualization framework
pass
Feature 2: Template Switching
Quick view switching between conversation templates:
class MultiTemplateViewer:
"""UML viewer supporting multiple template perspectives."""
TEMPLATES = [
"archive_search",
"museum_search",
"library_search",
"collection_discovery",
"person_research",
"location_browse",
"identifier_lookup",
"organizational_change",
"digital_platform",
"general_heritage",
]
def __init__(self, schema: SchemaView):
self.schema = schema
self.current_template = "general_heritage"
self.threshold = 0.5
def switch_template(self, template_id: str) -> Digraph:
"""Switch to a different template perspective."""
if template_id not in self.TEMPLATES:
raise ValueError(f"Unknown template: {template_id}")
self.current_template = template_id
return self.render()
def compare_templates(
self,
template_a: str,
template_b: str
) -> tuple[Digraph, Digraph]:
"""Generate side-by-side comparison of two templates."""
return (
generate_filtered_uml(self.schema, template_a, self.threshold),
generate_filtered_uml(self.schema, template_b, self.threshold),
)
Feature 3: Hover Information
Add score metadata to tooltips:
def add_tooltip_info(
dot: Digraph,
class_name: str,
cls,
template_id: str
) -> None:
"""Add tooltip with specificity information."""
general_score = get_general_score(cls)
template_score = get_template_score(cls, template_id)
rationale = cls.annotations.get("specificity_rationale", {}).value
tooltip = f"""Class: {class_name}
General Specificity: {general_score:.2f}
{template_id} Specificity: {template_score:.2f}
Rationale: {rationale}"""
dot.node(
class_name,
tooltip=tooltip,
URL=f"#class-{class_name}", # Link to documentation
)
Pre-generated Views
Standard View Set
Generate a set of pre-computed views for common use cases:
STANDARD_VIEWS = {
# Overview views
"overview_core": {
"template": "general_heritage",
"threshold": 0.7,
"description": "Core classes for heritage custodian modeling"
},
"overview_full": {
"template": "general_heritage",
"threshold": 0.3,
"description": "Comprehensive view of all semantic classes"
},
# Task-specific views
"task_archive_research": {
"template": "archive_search",
"threshold": 0.5,
"description": "Classes relevant for archive research"
},
"task_person_lookup": {
"template": "person_research",
"threshold": 0.5,
"description": "Classes for finding people in heritage institutions"
},
"task_collection_discovery": {
"template": "collection_discovery",
"threshold": 0.5,
"description": "Classes for exploring collections"
},
# Technical views
"technical_identifiers": {
"template": "identifier_lookup",
"threshold": 0.6,
"description": "Identifier and linking classes"
},
"technical_platforms": {
"template": "digital_platform",
"threshold": 0.6,
"description": "Digital platform and API classes"
},
}
def generate_all_standard_views(schema: SchemaView, output_dir: Path):
"""Generate all standard views as SVG files."""
for view_id, config in STANDARD_VIEWS.items():
dot = generate_filtered_uml(
schema,
config["template"],
config["threshold"]
)
# Add title and description
dot.attr(label=config["description"])
# Render to SVG
output_path = output_dir / f"uml_{view_id}"
dot.render(output_path, format="svg", cleanup=True)
print(f"Generated: {output_path}.svg")
Integration with Frontend
API Endpoint for Dynamic Diagrams
from fastapi import FastAPI, Query
from fastapi.responses import Response
app = FastAPI()
@app.get("/api/uml/filtered")
async def get_filtered_uml(
template: str = Query("general_heritage"),
threshold: float = Query(0.5, ge=0.0, le=1.0),
format: str = Query("svg", regex="^(svg|png|dot)$"),
):
"""Generate filtered UML diagram."""
schema = load_schema()
dot = generate_filtered_uml(schema, template, threshold)
if format == "dot":
return Response(
content=dot.source,
media_type="text/plain"
)
else:
rendered = dot.pipe(format=format)
media_type = "image/svg+xml" if format == "svg" else "image/png"
return Response(content=rendered, media_type=media_type)
@app.get("/api/uml/templates")
async def list_available_templates():
"""List available template perspectives."""
return {
"templates": [
{"id": "archive_search", "name": "Archive Search"},
{"id": "museum_search", "name": "Museum Search"},
{"id": "library_search", "name": "Library Search"},
{"id": "collection_discovery", "name": "Collection Discovery"},
{"id": "person_research", "name": "Person Research"},
{"id": "location_browse", "name": "Location Browse"},
{"id": "identifier_lookup", "name": "Identifier Lookup"},
{"id": "organizational_change", "name": "Organizational Change"},
{"id": "digital_platform", "name": "Digital Platform"},
{"id": "general_heritage", "name": "General (Default)"},
]
}
React Component Example
// components/FilteredUMLViewer.tsx
import { useState } from 'react';
interface Props {
initialTemplate?: string;
initialThreshold?: number;
}
export function FilteredUMLViewer({
initialTemplate = 'general_heritage',
initialThreshold = 0.5,
}: Props) {
const [template, setTemplate] = useState(initialTemplate);
const [threshold, setThreshold] = useState(initialThreshold);
const umlUrl = `/api/uml/filtered?template=${template}&threshold=${threshold}&format=svg`;
return (
<div className="uml-viewer">
<div className="controls">
<select
value={template}
onChange={(e) => setTemplate(e.target.value)}
>
<option value="archive_search">Archive Search</option>
<option value="museum_search">Museum Search</option>
<option value="person_research">Person Research</option>
<option value="general_heritage">General</option>
{/* ... more templates */}
</select>
<input
type="range"
min="0"
max="1"
step="0.1"
value={threshold}
onChange={(e) => setThreshold(parseFloat(e.target.value))}
/>
<span>Threshold: {threshold.toFixed(1)}</span>
</div>
<div className="diagram">
<img src={umlUrl} alt={`UML diagram for ${template}`} />
</div>
</div>
);
}
Performance Considerations
Caching Strategy
from functools import lru_cache
from hashlib import md5
@lru_cache(maxsize=100)
def get_cached_uml(
template_id: str,
threshold: float,
format: str
) -> bytes:
"""Cache rendered UML diagrams."""
schema = load_schema()
dot = generate_filtered_uml(schema, template_id, threshold)
return dot.pipe(format=format)
def invalidate_uml_cache():
"""Clear cache when schema changes."""
get_cached_uml.cache_clear()
Pre-rendering for Production
# scripts/prerender_uml_views.sh
#!/bin/bash
# Pre-render all standard UML views for production
TEMPLATES="archive_search museum_search library_search collection_discovery person_research location_browse identifier_lookup organizational_change digital_platform general_heritage"
THRESHOLDS="0.3 0.5 0.7 0.9"
for template in $TEMPLATES; do
for threshold in $THRESHOLDS; do
echo "Rendering: $template @ $threshold"
python scripts/render_uml.py \
--template "$template" \
--threshold "$threshold" \
--output "static/uml/${template}_${threshold}.svg"
done
done
echo "Pre-rendering complete!"
Validation Checklist
- All templates have corresponding view generation
- Threshold range validated (0.0-1.0)
- Edge cases handled (no classes meet threshold)
- Large diagrams render within timeout
- SVG output is valid XML
- Interactive features work in target browsers
- Cache invalidation triggers on schema change
- Standard views regenerated on deployment
References
docs/plan/specificity_score/04-prompt-conversation-templates.md- Template definitionsdocs/plan/specificity_score/05-dependencies.md- Visualization dependenciesfrontend/src/components/- Existing frontend componentsscripts/generate_uml.py- Current UML generation script