glam/.opencode/rules/linkml/specificity-score-convention.md
kempersc 6e63465196 Add ImageTilingServiceEndpoint class and archive ID class
- Introduced the ImageTilingServiceEndpoint class for tiled high-resolution image delivery, including deep-zoom and transformation capabilities, with multilingual descriptions and structured aliases.
- Archived the ID class as a backwards-compatible alias for Identifier, marking it as deprecated to enforce the use of the canonical Identifier model.
2026-02-15 21:40:13 +01:00

13 KiB

Rule: Specificity Score Convention for LinkML Schema Annotations

Version: 1.0.0
Created: 2025-01-04
Status: Active
Applies to: schemas/20251121/linkml/modules/classes/*.yaml


Rule Statement

Every class in the Heritage Custodian Ontology MUST have specificity score annotations to enable intelligent filtering for RAG retrieval and UML visualization.


Annotation Schema

Required Annotations

Every class YAML file MUST include these annotations:

classes:
  ClassName:
    annotations:
      specificity_score: 0.75          # Required: General specificity (0.0-1.0)
      specificity_rationale: "..."     # Required: Why this score was assigned

Optional Annotations

Template-specific scores for context-aware filtering:

classes:
  ClassName:
    annotations:
      specificity_score: 0.75
      specificity_rationale: "..."
      template_specificity:            # Optional: Template-specific scores
        archive_search: 0.95
        museum_search: 0.20
        person_research: 0.30

Score Semantics

General Specificity Score

The specificity_score measures how context-dependent a class is:

Score Range Meaning Example Classes
0.00-0.20 Universal - relevant in almost all contexts HeritageCustodian, CustodianName, Location
0.20-0.40 Broadly useful - relevant in most contexts Collection, Identifier, GHCID
0.40-0.60 Moderately specific - relevant in several contexts ChangeEvent, PersonProfile, DigitalPlatform
0.60-0.80 Fairly specific - relevant in limited contexts Archive, Museum, Library, FindingAid
0.80-1.00 Highly specific - relevant only in specialized contexts LinkedInConnectionExtraction, GHCIDHistoryEntry

Key Insight: Lower scores = MORE generally relevant (always useful in RAG); Higher scores = MORE specific (only useful in specialized queries).


Template Specificity Scores

The template_specificity maps class relevance to 10 conversation templates:

Template ID Focus Area Example High-Score Classes
archive_search Archives and archival holdings Archive, RecordSet, Fonds
museum_search Museums and exhibitions Museum, Gallery, Exhibition
library_search Libraries and catalogs Library, Catalog, BibliographicCollection
collection_discovery Collections and holdings Collection, Accession, Extent
person_research People and staff PersonProfile, Staff, Role
location_browse Geographic information Location, Address, GeoCoordinates
identifier_lookup Identifiers (ISIL, Wikidata) Identifier, GHCID, ISIL
organizational_change History and changes ChangeEvent, Founding, Merger
digital_platform Online resources DigitalPlatform, Website, API
general_heritage Fallback/general Uses specificity_score directly

Examples

Example 1: Universal Class (Low Specificity)

# modules/classes/HeritageCustodian.yaml
classes:
  HeritageCustodian:
    description: >-
      Base class for all heritage custodian institutions.      
    annotations:
      specificity_score: 0.15
      specificity_rationale: >-
        Universal base class relevant in virtually all heritage contexts.
        Every query about heritage institutions implicitly involves this class.        
      template_specificity:
        archive_search: 0.65
        museum_search: 0.65
        library_search: 0.65
        collection_discovery: 0.70
        person_research: 0.70
        location_browse: 0.75
        identifier_lookup: 0.70
        organizational_change: 0.75
        digital_platform: 0.70
        general_heritage: 0.15

Example 2: Domain-Specific Class (High Specificity)

# modules/classes/Archive.yaml
classes:
  Archive:
    is_a: HeritageCustodian
    description: >-
      An archive institution holding historical records and documents.      
    annotations:
      specificity_score: 0.70
      specificity_rationale: >-
        Domain-specific institution type. Highly relevant for archival research
        but not needed for museum or library queries.        
      template_specificity:
        archive_search: 0.95
        museum_search: 0.20
        library_search: 0.25
        collection_discovery: 0.75
        person_research: 0.40
        location_browse: 0.65
        identifier_lookup: 0.50
        organizational_change: 0.60
        digital_platform: 0.45
        general_heritage: 0.70

Example 3: Technical Class (Very High Specificity)

# modules/classes/LinkedInConnectionExtraction.yaml
classes:
  LinkedInConnectionExtraction:
    description: >-
      Technical class for extracting LinkedIn connection data.      
    annotations:
      specificity_score: 0.95
      specificity_rationale: >-
        Internal extraction class with no semantic significance for end users.
        Only relevant when specifically researching data extraction processes.        
      template_specificity:
        archive_search: 0.05
        museum_search: 0.05
        library_search: 0.05
        collection_discovery: 0.05
        person_research: 0.40
        location_browse: 0.05
        identifier_lookup: 0.10
        organizational_change: 0.05
        digital_platform: 0.15
        general_heritage: 0.95

Score Assignment Guidelines

Factors That LOWER Specificity Score

Factor Impact Example
Base/parent class -0.20 to -0.30 HeritageCustodian is parent of all
Used in identifiers -0.10 to -0.15 CustodianName used in GHCID
Geographic component -0.10 to -0.15 Location needed for all institutions
Universal attribute -0.10 to -0.15 Provenance applies to all data

Factors That RAISE Specificity Score

Factor Impact Example
Institution type +0.30 to +0.40 Archive, Museum, Library
Technical/extraction +0.30 to +0.40 LinkedInConnectionExtraction
Event subtype +0.20 to +0.30 Merger, Founding, Closure
Domain terminology +0.15 to +0.25 Fonds, FindingAid, RecordSet

Cross-Class Consistency Rules

  1. Inheritance: Child classes should have equal or higher specificity than parents
  2. Siblings: Classes at same hierarchy level should have similar base scores
  3. Competing types: Institution types should reduce each other's template scores
# CORRECT: Archive (0.70) inherits from HeritageCustodian (0.15)
Archive:
  is_a: HeritageCustodian  # Parent: 0.15
  annotations:
    specificity_score: 0.70  # Child: 0.70 >= 0.15 ✓

# WRONG: Child less specific than parent
Archive:
  is_a: HeritageCustodian  # Parent: 0.15
  annotations:
    specificity_score: 0.10  # Child: 0.10 < 0.15 ✗

Validation Rules

Required Validations

  1. Range Check: 0.0 <= specificity_score <= 1.0
  2. Rationale Present: specificity_rationale must not be empty
  3. Inheritance Consistency: Child score >= parent score
  4. Template Score Range: All template scores must be 0.0-1.0
  1. No Orphan Scores: Every class should have annotations (warn if missing)
  2. Score Distribution: Flag if >50% of classes have same score (lack of differentiation)
  3. Template Coverage: Warn if template_specificity omits common templates

Validation Script

# scripts/validate_specificity_scores.py

from linkml_runtime import SchemaView
from pathlib import Path
import sys

REQUIRED_TEMPLATES = [
    "archive_search", "museum_search", "library_search",
    "collection_discovery", "person_research", "location_browse",
    "identifier_lookup", "organizational_change", "digital_platform",
    "general_heritage"
]

def validate_specificity_scores(schema_path: Path) -> list[str]:
    """Validate all specificity score annotations."""
    errors = []
    schema = SchemaView(str(schema_path))
    
    for class_name in schema.all_classes():
        cls = schema.get_class(class_name)
        
        # Check required annotations
        score = cls.annotations.get("specificity_score")
        rationale = cls.annotations.get("specificity_rationale")
        
        if score is None:
            errors.append(f"{class_name}: Missing specificity_score")
            continue
        
        # Validate score range
        try:
            score_val = float(score.value)
            if not 0.0 <= score_val <= 1.0:
                errors.append(f"{class_name}: Score {score_val} out of range [0.0, 1.0]")
        except (ValueError, TypeError):
            errors.append(f"{class_name}: Invalid score value: {score.value}")
        
        # Check rationale
        if rationale is None or not rationale.value.strip():
            errors.append(f"{class_name}: Missing or empty specificity_rationale")
        
        # Check inheritance consistency
        if cls.is_a:
            parent = schema.get_class(cls.is_a)
            parent_score = parent.annotations.get("specificity_score")
            if parent_score and float(score.value) < float(parent_score.value):
                errors.append(
                    f"{class_name}: Score {score.value} < parent {cls.is_a} score {parent_score.value}"
                )
    
    return errors

if __name__ == "__main__":
    schema_path = Path("schemas/20251121/linkml/01_custodian_name.yaml")
    errors = validate_specificity_scores(schema_path)
    
    if errors:
        print("Validation errors:")
        for error in errors:
            print(f"  - {error}")
        sys.exit(1)
    else:
        print("All specificity scores valid!")
        sys.exit(0)

Anti-Patterns

What NOT to Do

Anti-Pattern Why It's Wrong Correct Approach
Score without rationale No audit trail for decisions Always include rationale
All scores = 0.5 No differentiation, useless for filtering Differentiate based on semantics
Child < parent score Violates specificity inheritance Child should be equal or more specific
Template score > 1.0 Invalid score value Keep all scores in [0.0, 1.0]
Empty rationale Fails validation, no documentation Write meaningful rationale

Example of Incorrect Annotation

# WRONG - Multiple issues
classes:
  Archive:
    annotations:
      specificity_score: 1.5        # Out of range!
      specificity_rationale: ""      # Empty rationale!
      template_specificity:
        archive_search: 0.95
        # Missing other templates - incomplete coverage

Example of Correct Annotation

# CORRECT
classes:
  Archive:
    annotations:
      specificity_score: 0.70
      specificity_rationale: >-
        Domain-specific institution type for archives. Highly relevant
        for archival research queries but less useful for museum or 
        library-focused questions.        
      template_specificity:
        archive_search: 0.95
        museum_search: 0.20
        library_search: 0.25
        collection_discovery: 0.75
        person_research: 0.40
        location_browse: 0.65
        identifier_lookup: 0.50
        organizational_change: 0.60
        digital_platform: 0.45
        general_heritage: 0.70

Migration Checklist

When adding specificity scores to existing classes:

Phase 1: Assessment

  • Count classes without annotations
  • Identify class hierarchy (parents → children order)
  • Review existing descriptions for scoring hints

Phase 2: Annotation

  • Start with root classes (lowest specificity)
  • Work down hierarchy (increasing specificity)
  • Assign template scores based on domain alignment
  • Write rationale explaining score decisions

Phase 3: Validation

  • Run validation script
  • Check inheritance consistency
  • Verify score distribution (not all same value)
  • Review edge cases (technical classes, mixins)

Phase 4: Documentation

  • Update class count in plan documents
  • Document any scoring decisions that were difficult
  • Create PR with all changes

  • Rule 0: LinkML Schemas Are the Single Source of Truth
  • Rule 4: Technical Classes Are Excluded from Visualizations
  • Rule 13: Custodian Type Annotations on LinkML Schema Elements

References

  • docs/plan/specificity_score/README.md - System overview
  • docs/plan/specificity_score/04-prompt-conversation-templates.md - Template definitions
  • docs/plan/specificity_score/06-uml-visualization.md - UML filtering integration

Changelog

Date Version Change
2025-01-04 1.0.0 Initial rule created for specificity score system