- Introduced the ImageTilingServiceEndpoint class for tiled high-resolution image delivery, including deep-zoom and transformation capabilities, with multilingual descriptions and structured aliases. - Archived the ID class as a backwards-compatible alias for Identifier, marking it as deprecated to enforce the use of the canonical Identifier model.
13 KiB
13 KiB
Rule: Specificity Score Convention for LinkML Schema Annotations
Version: 1.0.0
Created: 2025-01-04
Status: Active
Applies to: schemas/20251121/linkml/modules/classes/*.yaml
Rule Statement
Every class in the Heritage Custodian Ontology MUST have specificity score annotations to enable intelligent filtering for RAG retrieval and UML visualization.
Annotation Schema
Required Annotations
Every class YAML file MUST include these annotations:
classes:
ClassName:
annotations:
specificity_score: 0.75 # Required: General specificity (0.0-1.0)
specificity_rationale: "..." # Required: Why this score was assigned
Optional Annotations
Template-specific scores for context-aware filtering:
classes:
ClassName:
annotations:
specificity_score: 0.75
specificity_rationale: "..."
template_specificity: # Optional: Template-specific scores
archive_search: 0.95
museum_search: 0.20
person_research: 0.30
Score Semantics
General Specificity Score
The specificity_score measures how context-dependent a class is:
| Score Range | Meaning | Example Classes |
|---|---|---|
| 0.00-0.20 | Universal - relevant in almost all contexts | HeritageCustodian, CustodianName, Location |
| 0.20-0.40 | Broadly useful - relevant in most contexts | Collection, Identifier, GHCID |
| 0.40-0.60 | Moderately specific - relevant in several contexts | ChangeEvent, PersonProfile, DigitalPlatform |
| 0.60-0.80 | Fairly specific - relevant in limited contexts | Archive, Museum, Library, FindingAid |
| 0.80-1.00 | Highly specific - relevant only in specialized contexts | LinkedInConnectionExtraction, GHCIDHistoryEntry |
Key Insight: Lower scores = MORE generally relevant (always useful in RAG); Higher scores = MORE specific (only useful in specialized queries).
Template Specificity Scores
The template_specificity maps class relevance to 10 conversation templates:
| Template ID | Focus Area | Example High-Score Classes |
|---|---|---|
archive_search |
Archives and archival holdings | Archive, RecordSet, Fonds |
museum_search |
Museums and exhibitions | Museum, Gallery, Exhibition |
library_search |
Libraries and catalogs | Library, Catalog, BibliographicCollection |
collection_discovery |
Collections and holdings | Collection, Accession, Extent |
person_research |
People and staff | PersonProfile, Staff, Role |
location_browse |
Geographic information | Location, Address, GeoCoordinates |
identifier_lookup |
Identifiers (ISIL, Wikidata) | Identifier, GHCID, ISIL |
organizational_change |
History and changes | ChangeEvent, Founding, Merger |
digital_platform |
Online resources | DigitalPlatform, Website, API |
general_heritage |
Fallback/general | Uses specificity_score directly |
Examples
Example 1: Universal Class (Low Specificity)
# modules/classes/HeritageCustodian.yaml
classes:
HeritageCustodian:
description: >-
Base class for all heritage custodian institutions.
annotations:
specificity_score: 0.15
specificity_rationale: >-
Universal base class relevant in virtually all heritage contexts.
Every query about heritage institutions implicitly involves this class.
template_specificity:
archive_search: 0.65
museum_search: 0.65
library_search: 0.65
collection_discovery: 0.70
person_research: 0.70
location_browse: 0.75
identifier_lookup: 0.70
organizational_change: 0.75
digital_platform: 0.70
general_heritage: 0.15
Example 2: Domain-Specific Class (High Specificity)
# modules/classes/Archive.yaml
classes:
Archive:
is_a: HeritageCustodian
description: >-
An archive institution holding historical records and documents.
annotations:
specificity_score: 0.70
specificity_rationale: >-
Domain-specific institution type. Highly relevant for archival research
but not needed for museum or library queries.
template_specificity:
archive_search: 0.95
museum_search: 0.20
library_search: 0.25
collection_discovery: 0.75
person_research: 0.40
location_browse: 0.65
identifier_lookup: 0.50
organizational_change: 0.60
digital_platform: 0.45
general_heritage: 0.70
Example 3: Technical Class (Very High Specificity)
# modules/classes/LinkedInConnectionExtraction.yaml
classes:
LinkedInConnectionExtraction:
description: >-
Technical class for extracting LinkedIn connection data.
annotations:
specificity_score: 0.95
specificity_rationale: >-
Internal extraction class with no semantic significance for end users.
Only relevant when specifically researching data extraction processes.
template_specificity:
archive_search: 0.05
museum_search: 0.05
library_search: 0.05
collection_discovery: 0.05
person_research: 0.40
location_browse: 0.05
identifier_lookup: 0.10
organizational_change: 0.05
digital_platform: 0.15
general_heritage: 0.95
Score Assignment Guidelines
Factors That LOWER Specificity Score
| Factor | Impact | Example |
|---|---|---|
| Base/parent class | -0.20 to -0.30 | HeritageCustodian is parent of all |
| Used in identifiers | -0.10 to -0.15 | CustodianName used in GHCID |
| Geographic component | -0.10 to -0.15 | Location needed for all institutions |
| Universal attribute | -0.10 to -0.15 | Provenance applies to all data |
Factors That RAISE Specificity Score
| Factor | Impact | Example |
|---|---|---|
| Institution type | +0.30 to +0.40 | Archive, Museum, Library |
| Technical/extraction | +0.30 to +0.40 | LinkedInConnectionExtraction |
| Event subtype | +0.20 to +0.30 | Merger, Founding, Closure |
| Domain terminology | +0.15 to +0.25 | Fonds, FindingAid, RecordSet |
Cross-Class Consistency Rules
- Inheritance: Child classes should have equal or higher specificity than parents
- Siblings: Classes at same hierarchy level should have similar base scores
- Competing types: Institution types should reduce each other's template scores
# CORRECT: Archive (0.70) inherits from HeritageCustodian (0.15)
Archive:
is_a: HeritageCustodian # Parent: 0.15
annotations:
specificity_score: 0.70 # Child: 0.70 >= 0.15 ✓
# WRONG: Child less specific than parent
Archive:
is_a: HeritageCustodian # Parent: 0.15
annotations:
specificity_score: 0.10 # Child: 0.10 < 0.15 ✗
Validation Rules
Required Validations
- Range Check:
0.0 <= specificity_score <= 1.0 - Rationale Present:
specificity_rationalemust not be empty - Inheritance Consistency: Child score >= parent score
- Template Score Range: All template scores must be 0.0-1.0
Recommended Validations
- No Orphan Scores: Every class should have annotations (warn if missing)
- Score Distribution: Flag if >50% of classes have same score (lack of differentiation)
- Template Coverage: Warn if template_specificity omits common templates
Validation Script
# scripts/validate_specificity_scores.py
from linkml_runtime import SchemaView
from pathlib import Path
import sys
REQUIRED_TEMPLATES = [
"archive_search", "museum_search", "library_search",
"collection_discovery", "person_research", "location_browse",
"identifier_lookup", "organizational_change", "digital_platform",
"general_heritage"
]
def validate_specificity_scores(schema_path: Path) -> list[str]:
"""Validate all specificity score annotations."""
errors = []
schema = SchemaView(str(schema_path))
for class_name in schema.all_classes():
cls = schema.get_class(class_name)
# Check required annotations
score = cls.annotations.get("specificity_score")
rationale = cls.annotations.get("specificity_rationale")
if score is None:
errors.append(f"{class_name}: Missing specificity_score")
continue
# Validate score range
try:
score_val = float(score.value)
if not 0.0 <= score_val <= 1.0:
errors.append(f"{class_name}: Score {score_val} out of range [0.0, 1.0]")
except (ValueError, TypeError):
errors.append(f"{class_name}: Invalid score value: {score.value}")
# Check rationale
if rationale is None or not rationale.value.strip():
errors.append(f"{class_name}: Missing or empty specificity_rationale")
# Check inheritance consistency
if cls.is_a:
parent = schema.get_class(cls.is_a)
parent_score = parent.annotations.get("specificity_score")
if parent_score and float(score.value) < float(parent_score.value):
errors.append(
f"{class_name}: Score {score.value} < parent {cls.is_a} score {parent_score.value}"
)
return errors
if __name__ == "__main__":
schema_path = Path("schemas/20251121/linkml/01_custodian_name.yaml")
errors = validate_specificity_scores(schema_path)
if errors:
print("Validation errors:")
for error in errors:
print(f" - {error}")
sys.exit(1)
else:
print("All specificity scores valid!")
sys.exit(0)
Anti-Patterns
What NOT to Do
| Anti-Pattern | Why It's Wrong | Correct Approach |
|---|---|---|
| Score without rationale | No audit trail for decisions | Always include rationale |
| All scores = 0.5 | No differentiation, useless for filtering | Differentiate based on semantics |
| Child < parent score | Violates specificity inheritance | Child should be equal or more specific |
| Template score > 1.0 | Invalid score value | Keep all scores in [0.0, 1.0] |
| Empty rationale | Fails validation, no documentation | Write meaningful rationale |
Example of Incorrect Annotation
# WRONG - Multiple issues
classes:
Archive:
annotations:
specificity_score: 1.5 # Out of range!
specificity_rationale: "" # Empty rationale!
template_specificity:
archive_search: 0.95
# Missing other templates - incomplete coverage
Example of Correct Annotation
# CORRECT
classes:
Archive:
annotations:
specificity_score: 0.70
specificity_rationale: >-
Domain-specific institution type for archives. Highly relevant
for archival research queries but less useful for museum or
library-focused questions.
template_specificity:
archive_search: 0.95
museum_search: 0.20
library_search: 0.25
collection_discovery: 0.75
person_research: 0.40
location_browse: 0.65
identifier_lookup: 0.50
organizational_change: 0.60
digital_platform: 0.45
general_heritage: 0.70
Migration Checklist
When adding specificity scores to existing classes:
Phase 1: Assessment
- Count classes without annotations
- Identify class hierarchy (parents → children order)
- Review existing descriptions for scoring hints
Phase 2: Annotation
- Start with root classes (lowest specificity)
- Work down hierarchy (increasing specificity)
- Assign template scores based on domain alignment
- Write rationale explaining score decisions
Phase 3: Validation
- Run validation script
- Check inheritance consistency
- Verify score distribution (not all same value)
- Review edge cases (technical classes, mixins)
Phase 4: Documentation
- Update class count in plan documents
- Document any scoring decisions that were difficult
- Create PR with all changes
Related Rules
- Rule 0: LinkML Schemas Are the Single Source of Truth
- Rule 4: Technical Classes Are Excluded from Visualizations
- Rule 13: Custodian Type Annotations on LinkML Schema Elements
References
docs/plan/specificity_score/README.md- System overviewdocs/plan/specificity_score/04-prompt-conversation-templates.md- Template definitionsdocs/plan/specificity_score/06-uml-visualization.md- UML filtering integration
Changelog
| Date | Version | Change |
|---|---|---|
| 2025-01-04 | 1.0.0 | Initial rule created for specificity score system |