History

kempersc 242bc8bb35 Add new slots for heritage custodian entities - Created deliverables_slot for expected or achieved deliverable outputs. - Introduced event_id_slot for persistent unique event identifiers. - Added follow_up_date_slot for scheduled follow-up action dates. - Implemented object_ref_slot for references to heritage objects. - Established price_slot for price information across entities. - Added price_currency_slot for currency codes in price information. - Created protocol_slot for API protocol specifications. - Introduced provenance_text_slot for full provenance entry text. - Added record_type_slot for classification of record types. - Implemented response_formats_slot for supported API response formats. - Established status_slot for current status of entities or activities. - Added FactualCountDisplay component for displaying count query results. - Introduced ReplyTypeIndicator component for visualizing reply types. - Created approval_date_slot for formal approval dates. - Added authentication_required_slot for API authentication status. - Implemented capacity_items_slot for maximum storage capacity. - Established conservation_lab_slot for conservation laboratory information. - Added cost_usd_slot for API operation costs in USD.		2026-01-05 00:49:05 +01:00
..
00-master-checklist.md	Add new slots for heritage custodian entities	2026-01-05 00:49:05 +01:00
01-design-patterns.md	Add new slots for heritage custodian entities	2026-01-05 00:49:05 +01:00
02-tdd.md	Add new slots for heritage custodian entities	2026-01-05 00:49:05 +01:00
03-rag-dspy-integration.md	Add new slots for heritage custodian entities	2026-01-05 00:49:05 +01:00
04-prompt-conversation-templates.md	Add new slots for heritage custodian entities	2026-01-05 00:49:05 +01:00
05-dependencies.md	Add new slots for heritage custodian entities	2026-01-05 00:49:05 +01:00
06-uml-visualization.md	Add new slots for heritage custodian entities	2026-01-05 00:49:05 +01:00
README.md	Add new slots for heritage custodian entities	2026-01-05 00:49:05 +01:00

README.md

Specificity Score System for Heritage Custodian Ontology

Overview

This plan documents the implementation of a specificity scoring system for all classes in the GLAM Heritage Custodian Ontology. The system assigns numerical scores (0.0-1.0) to each class indicating:

General Specificity Score: Context-free relevance indicating whether a class is highly specific or of general relevance
Template-Based Specificity Scores: Multiple scores keyed by prompt/conversation template IDs indicating the likelihood of a class being relevant for follow-up questions

Problem Statement

The Heritage Custodian Ontology contains 304+ classes across multiple modules. When users interact with the RAG system or view UML visualizations, they face information overload:

UML Overwhelm: Visualizations showing all classes are too dense to comprehend
RAG Retrieval Noise: Follow-up questions retrieve irrelevant classes
No Context Sensitivity: Same classes shown regardless of conversation topic
Missing Relevance Signals: No way to filter or highlight based on topic

Example Scenario

Initial Question: "What archives are in Drenthe?"

Current RAG Response: Returns 50+ classes including:

CustodianName (highly relevant for follow-up)
Location (highly relevant)
WebObservation (low relevance for this context)
PersonProfileExtraction (low relevance for this context)

Desired Behavior: Use specificity scores to prioritize Archive, Location, GHCID, Collection for follow-up questions.

Solution: Dual-Layer Specificity Scoring

Layer 1: General Specificity Score

A single context-free score (0.0-1.0) stored as a LinkML annotation on each class:

Score Range	Interpretation	Examples
0.9-1.0	Highly specific, rarely needed	`LinkedInConnectionExtraction`, `GHCIDHistoryEntry`
0.7-0.9	Domain-specific	`Archive`, `Museum`, `Collection`
0.5-0.7	Moderately general	`DigitalPlatform`, `ChangeEvent`
0.3-0.5	General utility	`Location`, `Identifier`, `Provenance`
0.0-0.3	Core/foundational	`HeritageCustodian`, `CustodianName`

Lower scores = more generally relevant (always useful) Higher scores = more specific (only useful in specialized contexts)

Layer 2: Template-Based Specificity Scores

Multiple scores per class, keyed by conversation template IDs:

# Example: Archive class
annotations:
  specificity_score: 0.75  # General score
  template_specificity:
    archive_search: 0.95      # Highly relevant for archive queries
    museum_search: 0.10       # Not relevant for museum queries
    collection_discovery: 0.70 # Moderately relevant
    person_research: 0.20     # Low relevance
    location_browse: 0.60     # Somewhat relevant

Architecture

User Question
     |
     v
+----------------------+
| Template Classifier  |  <-- DSPy Signature identifies conversation template
+----------------------+
     |
     v
+----------------------+
| Specificity Lookup   |  <-- Retrieves template-specific scores for all classes
+----------------------+
     |
     v
+----------------------+
| Class Filter/Ranker  |  <-- Filters classes below threshold, ranks by score
+----------------------+
     |
     v
+----------------------+
| RAG Context Builder  |  <-- Builds context from high-specificity classes
+----------------------+
     |
     v
+----------------------+
| UML View Renderer    |  <-- Filters/highlights UML based on specificity
+----------------------+

Documentation Index

Document	Description
00-master-checklist.md	Implementation checklist with phases and tasks
01-design-patterns.md	Software patterns (Strategy, Decorator, Observer)
02-tdd.md	Test-driven development approach with test cases
03-rag-dspy-integration.md	DSPy integration for template classification
04-prompt-conversation-templates.md	Template definitions and scoring guidelines
05-dependencies.md	Required libraries and services
06-uml-visualization.md	UML filtering and highlighting based on scores

Quick Start

1. Schema Annotation Format

# schemas/20251121/linkml/modules/classes/Archive.yaml
classes:
  Archive:
    is_a: HeritageCustodian
    class_uri: hc:Archive
    description: An archive holding historical records and documents
    annotations:
      # General specificity score (context-free)
      specificity_score: 0.75
      specificity_rationale: >-
        Domain-specific class for archival institutions. High relevance 
        for record management, genealogy, and historical research queries.        
      
      # Template-based specificity scores
      template_specificity:
        archive_search: 0.95
        museum_search: 0.10
        library_search: 0.30
        collection_discovery: 0.70
        person_research: 0.40
        location_browse: 0.60
        identifier_lookup: 0.50
        organizational_change: 0.65

2. Usage in RAG Pipeline

# 1. Classify user question into template
template_id = classifier.classify("Which archives in Drenthe have photo collections?")
# -> "archive_search"

# 2. Retrieve template-specific scores for all classes
scores = specificity_lookup.get_scores(template_id)
# -> {"Archive": 0.95, "Collection": 0.85, "Location": 0.80, ...}

# 3. Filter classes above threshold
relevant_classes = [cls for cls, score in scores.items() if score > 0.5]
# -> ["Archive", "Collection", "Location", "GHCID", "CustodianName"]

# 4. Build RAG context with relevant classes only
context = build_context(relevant_classes)

3. Usage in UML Visualization

// Filter nodes by specificity for cleaner visualization
const visibleNodes = nodes.filter(node => {
  const score = getSpecificityScore(node.class, currentTemplate);
  return score >= specificityThreshold;
});

// Or highlight by specificity (opacity/size based on score)
nodes.forEach(node => {
  const score = getSpecificityScore(node.class, currentTemplate);
  node.opacity = 0.3 + (score * 0.7);  // 0.3-1.0 opacity range
  node.radius = 10 + (score * 20);     // 10-30 radius range
});

Scope

In Scope

304 class files in schemas/20251121/linkml/modules/classes/
General specificity score (0.0-1.0) for each class
Template-based scores for 10-15 conversation templates
RAG integration for class filtering
UML visualization filtering/highlighting
Validation tooling

Out of Scope

Slot-level specificity scores (future enhancement)
Dynamic score learning (future ML enhancement)
User preference customization (future feature)

Key Metrics

Metric	Current	Target
Classes with specificity scores	0	304
Conversation templates defined	0	10-15
RAG retrieval precision	Unknown	+20% improvement
UML node count (filtered view)	304	<50 per template
Follow-up question relevance	Unknown	>80% precision

Next Steps After Planning

Define conversation templates (Task 4) - Identify 10-15 common query patterns
Score foundational classes - Start with core classes (HeritageCustodian, Location, etc.)
Build scoring tool - Create script to add annotations to all 304 classes
Integrate with RAG - Modify DSPy pipeline to use scores
Integrate with UML - Add filtering/highlighting to frontend
Validate with users - Test retrieval quality improvements

References

AGENTS.md - Rule 37: Specificity Score Convention
.opencode/rules/specificity-score-convention.md - Full scoring rules
schemas/20251121/linkml/ - Target schema files
docs/plan/prompt-query_template_mapping/ - Related template-based query system

Version: 0.1.0
Last Updated: 2025-01-04
Status: Planning Phase