kempersc 6e63465196 Add ImageTilingServiceEndpoint class and archive ID class

- Introduced the ImageTilingServiceEndpoint class for tiled high-resolution image delivery, including deep-zoom and transformation capabilities, with multilingual descriptions and structured aliases.
- Archived the ID class as a backwards-compatible alias for Identifier, marking it as deprecated to enforce the use of the canonical Identifier model.

2026-02-15 21:40:13 +01:00

13 KiB

Raw Blame History

Rule: Specificity Score Convention for LinkML Schema Annotations

Version: 1.0.0
Created: 2025-01-04
Status: Active
Applies to: schemas/20251121/linkml/modules/classes/*.yaml

Rule Statement

Every class in the Heritage Custodian Ontology MUST have specificity score annotations to enable intelligent filtering for RAG retrieval and UML visualization.

Annotation Schema

Required Annotations

Every class YAML file MUST include these annotations:

classes:
  ClassName:
    annotations:
      specificity_score: 0.75          # Required: General specificity (0.0-1.0)
      specificity_rationale: "..."     # Required: Why this score was assigned

Optional Annotations

Template-specific scores for context-aware filtering:

classes:
  ClassName:
    annotations:
      specificity_score: 0.75
      specificity_rationale: "..."
      template_specificity:            # Optional: Template-specific scores
        archive_search: 0.95
        museum_search: 0.20
        person_research: 0.30

Score Semantics

General Specificity Score

The specificity_score measures how context-dependent a class is:

Score Range	Meaning	Example Classes
0.00-0.20	Universal - relevant in almost all contexts	`HeritageCustodian`, `CustodianName`, `Location`
0.20-0.40	Broadly useful - relevant in most contexts	`Collection`, `Identifier`, `GHCID`
0.40-0.60	Moderately specific - relevant in several contexts	`ChangeEvent`, `PersonProfile`, `DigitalPlatform`
0.60-0.80	Fairly specific - relevant in limited contexts	`Archive`, `Museum`, `Library`, `FindingAid`
0.80-1.00	Highly specific - relevant only in specialized contexts	`LinkedInConnectionExtraction`, `GHCIDHistoryEntry`

Key Insight: Lower scores = MORE generally relevant (always useful in RAG); Higher scores = MORE specific (only useful in specialized queries).

Template Specificity Scores

The template_specificity maps class relevance to 10 conversation templates:

Template ID	Focus Area	Example High-Score Classes
`archive_search`	Archives and archival holdings	`Archive`, `RecordSet`, `Fonds`
`museum_search`	Museums and exhibitions	`Museum`, `Gallery`, `Exhibition`
`library_search`	Libraries and catalogs	`Library`, `Catalog`, `BibliographicCollection`
`collection_discovery`	Collections and holdings	`Collection`, `Accession`, `Extent`
`person_research`	People and staff	`PersonProfile`, `Staff`, `Role`
`location_browse`	Geographic information	`Location`, `Address`, `GeoCoordinates`
`identifier_lookup`	Identifiers (ISIL, Wikidata)	`Identifier`, `GHCID`, `ISIL`
`organizational_change`	History and changes	`ChangeEvent`, `Founding`, `Merger`
`digital_platform`	Online resources	`DigitalPlatform`, `Website`, `API`
`general_heritage`	Fallback/general	Uses `specificity_score` directly

Examples

Example 1: Universal Class (Low Specificity)

# modules/classes/HeritageCustodian.yaml
classes:
  HeritageCustodian:
    description: >-
      Base class for all heritage custodian institutions.      
    annotations:
      specificity_score: 0.15
      specificity_rationale: >-
        Universal base class relevant in virtually all heritage contexts.
        Every query about heritage institutions implicitly involves this class.        
      template_specificity:
        archive_search: 0.65
        museum_search: 0.65
        library_search: 0.65
        collection_discovery: 0.70
        person_research: 0.70
        location_browse: 0.75
        identifier_lookup: 0.70
        organizational_change: 0.75
        digital_platform: 0.70
        general_heritage: 0.15

Example 2: Domain-Specific Class (High Specificity)

# modules/classes/Archive.yaml
classes:
  Archive:
    is_a: HeritageCustodian
    description: >-
      An archive institution holding historical records and documents.      
    annotations:
      specificity_score: 0.70
      specificity_rationale: >-
        Domain-specific institution type. Highly relevant for archival research
        but not needed for museum or library queries.        
      template_specificity:
        archive_search: 0.95
        museum_search: 0.20
        library_search: 0.25
        collection_discovery: 0.75
        person_research: 0.40
        location_browse: 0.65
        identifier_lookup: 0.50
        organizational_change: 0.60
        digital_platform: 0.45
        general_heritage: 0.70

Example 3: Technical Class (Very High Specificity)

# modules/classes/LinkedInConnectionExtraction.yaml
classes:
  LinkedInConnectionExtraction:
    description: >-
      Technical class for extracting LinkedIn connection data.      
    annotations:
      specificity_score: 0.95
      specificity_rationale: >-
        Internal extraction class with no semantic significance for end users.
        Only relevant when specifically researching data extraction processes.        
      template_specificity:
        archive_search: 0.05
        museum_search: 0.05
        library_search: 0.05
        collection_discovery: 0.05
        person_research: 0.40
        location_browse: 0.05
        identifier_lookup: 0.10
        organizational_change: 0.05
        digital_platform: 0.15
        general_heritage: 0.95

Score Assignment Guidelines

Factors That LOWER Specificity Score

Factor	Impact	Example
Base/parent class	-0.20 to -0.30	`HeritageCustodian` is parent of all
Used in identifiers	-0.10 to -0.15	`CustodianName` used in GHCID
Geographic component	-0.10 to -0.15	`Location` needed for all institutions
Universal attribute	-0.10 to -0.15	`Provenance` applies to all data

Factors That RAISE Specificity Score

Factor	Impact	Example
Institution type	+0.30 to +0.40	`Archive`, `Museum`, `Library`
Technical/extraction	+0.30 to +0.40	`LinkedInConnectionExtraction`
Event subtype	+0.20 to +0.30	`Merger`, `Founding`, `Closure`
Domain terminology	+0.15 to +0.25	`Fonds`, `FindingAid`, `RecordSet`

Cross-Class Consistency Rules

Inheritance: Child classes should have equal or higher specificity than parents
Siblings: Classes at same hierarchy level should have similar base scores
Competing types: Institution types should reduce each other's template scores

# CORRECT: Archive (0.70) inherits from HeritageCustodian (0.15)
Archive:
  is_a: HeritageCustodian  # Parent: 0.15
  annotations:
    specificity_score: 0.70  # Child: 0.70 >= 0.15 ✓

# WRONG: Child less specific than parent
Archive:
  is_a: HeritageCustodian  # Parent: 0.15
  annotations:
    specificity_score: 0.10  # Child: 0.10 < 0.15 ✗

Validation Rules

Required Validations

Range Check: 0.0 <= specificity_score <= 1.0
Rationale Present: specificity_rationale must not be empty
Inheritance Consistency: Child score >= parent score
Template Score Range: All template scores must be 0.0-1.0

Recommended Validations

No Orphan Scores: Every class should have annotations (warn if missing)
Score Distribution: Flag if >50% of classes have same score (lack of differentiation)
Template Coverage: Warn if template_specificity omits common templates

Validation Script

# scripts/validate_specificity_scores.py

from linkml_runtime import SchemaView
from pathlib import Path
import sys

REQUIRED_TEMPLATES = [
    "archive_search", "museum_search", "library_search",
    "collection_discovery", "person_research", "location_browse",
    "identifier_lookup", "organizational_change", "digital_platform",
    "general_heritage"
]

def validate_specificity_scores(schema_path: Path) -> list[str]:
    """Validate all specificity score annotations."""
    errors = []
    schema = SchemaView(str(schema_path))
    
    for class_name in schema.all_classes():
        cls = schema.get_class(class_name)
        
        # Check required annotations
        score = cls.annotations.get("specificity_score")
        rationale = cls.annotations.get("specificity_rationale")
        
        if score is None:
            errors.append(f"{class_name}: Missing specificity_score")
            continue
        
        # Validate score range
        try:
            score_val = float(score.value)
            if not 0.0 <= score_val <= 1.0:
                errors.append(f"{class_name}: Score {score_val} out of range [0.0, 1.0]")
        except (ValueError, TypeError):
            errors.append(f"{class_name}: Invalid score value: {score.value}")
        
        # Check rationale
        if rationale is None or not rationale.value.strip():
            errors.append(f"{class_name}: Missing or empty specificity_rationale")
        
        # Check inheritance consistency
        if cls.is_a:
            parent = schema.get_class(cls.is_a)
            parent_score = parent.annotations.get("specificity_score")
            if parent_score and float(score.value) < float(parent_score.value):
                errors.append(
                    f"{class_name}: Score {score.value} < parent {cls.is_a} score {parent_score.value}"
                )
    
    return errors

if __name__ == "__main__":
    schema_path = Path("schemas/20251121/linkml/01_custodian_name.yaml")
    errors = validate_specificity_scores(schema_path)
    
    if errors:
        print("Validation errors:")
        for error in errors:
            print(f"  - {error}")
        sys.exit(1)
    else:
        print("All specificity scores valid!")
        sys.exit(0)

Anti-Patterns

What NOT to Do

Anti-Pattern	Why It's Wrong	Correct Approach
Score without rationale	No audit trail for decisions	Always include rationale
All scores = 0.5	No differentiation, useless for filtering	Differentiate based on semantics
Child < parent score	Violates specificity inheritance	Child should be equal or more specific
Template score > 1.0	Invalid score value	Keep all scores in [0.0, 1.0]
Empty rationale	Fails validation, no documentation	Write meaningful rationale

Example of Incorrect Annotation

# WRONG - Multiple issues
classes:
  Archive:
    annotations:
      specificity_score: 1.5        # Out of range!
      specificity_rationale: ""      # Empty rationale!
      template_specificity:
        archive_search: 0.95
        # Missing other templates - incomplete coverage

Example of Correct Annotation

# CORRECT
classes:
  Archive:
    annotations:
      specificity_score: 0.70
      specificity_rationale: >-
        Domain-specific institution type for archives. Highly relevant
        for archival research queries but less useful for museum or 
        library-focused questions.        
      template_specificity:
        archive_search: 0.95
        museum_search: 0.20
        library_search: 0.25
        collection_discovery: 0.75
        person_research: 0.40
        location_browse: 0.65
        identifier_lookup: 0.50
        organizational_change: 0.60
        digital_platform: 0.45
        general_heritage: 0.70

Migration Checklist

When adding specificity scores to existing classes:

Phase 1: Assessment

Count classes without annotations
Identify class hierarchy (parents → children order)
Review existing descriptions for scoring hints

Phase 2: Annotation

Start with root classes (lowest specificity)
Work down hierarchy (increasing specificity)
Assign template scores based on domain alignment
Write rationale explaining score decisions

Phase 3: Validation

Run validation script
Check inheritance consistency
Verify score distribution (not all same value)
Review edge cases (technical classes, mixins)

Phase 4: Documentation

Update class count in plan documents
Document any scoring decisions that were difficult
Create PR with all changes

Rule 0: LinkML Schemas Are the Single Source of Truth
Rule 4: Technical Classes Are Excluded from Visualizations
Rule 13: Custodian Type Annotations on LinkML Schema Elements

References

docs/plan/specificity_score/README.md - System overview
docs/plan/specificity_score/04-prompt-conversation-templates.md - Template definitions
docs/plan/specificity_score/06-uml-visualization.md - UML filtering integration

Changelog

Date	Version	Change
2025-01-04	1.0.0	Initial rule created for specificity score system

13 KiB Raw Blame History