glam/.opencode/rules/linkml/specificity-score-convention.md
kempersc 6e63465196 Add ImageTilingServiceEndpoint class and archive ID class
- Introduced the ImageTilingServiceEndpoint class for tiled high-resolution image delivery, including deep-zoom and transformation capabilities, with multilingual descriptions and structured aliases.
- Archived the ID class as a backwards-compatible alias for Identifier, marking it as deprecated to enforce the use of the canonical Identifier model.
2026-02-15 21:40:13 +01:00

401 lines
13 KiB
Markdown

# Rule: Specificity Score Convention for LinkML Schema Annotations
**Version**: 1.0.0
**Created**: 2025-01-04
**Status**: Active
**Applies to**: `schemas/20251121/linkml/modules/classes/*.yaml`
---
## Rule Statement
Every class in the Heritage Custodian Ontology MUST have specificity score annotations to enable intelligent filtering for RAG retrieval and UML visualization.
---
## Annotation Schema
### Required Annotations
Every class YAML file MUST include these annotations:
```yaml
classes:
ClassName:
annotations:
specificity_score: 0.75 # Required: General specificity (0.0-1.0)
specificity_rationale: "..." # Required: Why this score was assigned
```
### Optional Annotations
Template-specific scores for context-aware filtering:
```yaml
classes:
ClassName:
annotations:
specificity_score: 0.75
specificity_rationale: "..."
template_specificity: # Optional: Template-specific scores
archive_search: 0.95
museum_search: 0.20
person_research: 0.30
```
---
## Score Semantics
### General Specificity Score
The `specificity_score` measures how **context-dependent** a class is:
| Score Range | Meaning | Example Classes |
|-------------|---------|-----------------|
| 0.00-0.20 | **Universal** - relevant in almost all contexts | `HeritageCustodian`, `CustodianName`, `Location` |
| 0.20-0.40 | **Broadly useful** - relevant in most contexts | `Collection`, `Identifier`, `GHCID` |
| 0.40-0.60 | **Moderately specific** - relevant in several contexts | `ChangeEvent`, `PersonProfile`, `DigitalPlatform` |
| 0.60-0.80 | **Fairly specific** - relevant in limited contexts | `Archive`, `Museum`, `Library`, `FindingAid` |
| 0.80-1.00 | **Highly specific** - relevant only in specialized contexts | `LinkedInConnectionExtraction`, `GHCIDHistoryEntry` |
**Key Insight**: Lower scores = MORE generally relevant (always useful in RAG); Higher scores = MORE specific (only useful in specialized queries).
---
### Template Specificity Scores
The `template_specificity` maps class relevance to 10 conversation templates:
| Template ID | Focus Area | Example High-Score Classes |
|-------------|------------|---------------------------|
| `archive_search` | Archives and archival holdings | `Archive`, `RecordSet`, `Fonds` |
| `museum_search` | Museums and exhibitions | `Museum`, `Gallery`, `Exhibition` |
| `library_search` | Libraries and catalogs | `Library`, `Catalog`, `BibliographicCollection` |
| `collection_discovery` | Collections and holdings | `Collection`, `Accession`, `Extent` |
| `person_research` | People and staff | `PersonProfile`, `Staff`, `Role` |
| `location_browse` | Geographic information | `Location`, `Address`, `GeoCoordinates` |
| `identifier_lookup` | Identifiers (ISIL, Wikidata) | `Identifier`, `GHCID`, `ISIL` |
| `organizational_change` | History and changes | `ChangeEvent`, `Founding`, `Merger` |
| `digital_platform` | Online resources | `DigitalPlatform`, `Website`, `API` |
| `general_heritage` | Fallback/general | Uses `specificity_score` directly |
---
## Examples
### Example 1: Universal Class (Low Specificity)
```yaml
# modules/classes/HeritageCustodian.yaml
classes:
HeritageCustodian:
description: >-
Base class for all heritage custodian institutions.
annotations:
specificity_score: 0.15
specificity_rationale: >-
Universal base class relevant in virtually all heritage contexts.
Every query about heritage institutions implicitly involves this class.
template_specificity:
archive_search: 0.65
museum_search: 0.65
library_search: 0.65
collection_discovery: 0.70
person_research: 0.70
location_browse: 0.75
identifier_lookup: 0.70
organizational_change: 0.75
digital_platform: 0.70
general_heritage: 0.15
```
### Example 2: Domain-Specific Class (High Specificity)
```yaml
# modules/classes/Archive.yaml
classes:
Archive:
is_a: HeritageCustodian
description: >-
An archive institution holding historical records and documents.
annotations:
specificity_score: 0.70
specificity_rationale: >-
Domain-specific institution type. Highly relevant for archival research
but not needed for museum or library queries.
template_specificity:
archive_search: 0.95
museum_search: 0.20
library_search: 0.25
collection_discovery: 0.75
person_research: 0.40
location_browse: 0.65
identifier_lookup: 0.50
organizational_change: 0.60
digital_platform: 0.45
general_heritage: 0.70
```
### Example 3: Technical Class (Very High Specificity)
```yaml
# modules/classes/LinkedInConnectionExtraction.yaml
classes:
LinkedInConnectionExtraction:
description: >-
Technical class for extracting LinkedIn connection data.
annotations:
specificity_score: 0.95
specificity_rationale: >-
Internal extraction class with no semantic significance for end users.
Only relevant when specifically researching data extraction processes.
template_specificity:
archive_search: 0.05
museum_search: 0.05
library_search: 0.05
collection_discovery: 0.05
person_research: 0.40
location_browse: 0.05
identifier_lookup: 0.10
organizational_change: 0.05
digital_platform: 0.15
general_heritage: 0.95
```
---
## Score Assignment Guidelines
### Factors That LOWER Specificity Score
| Factor | Impact | Example |
|--------|--------|---------|
| Base/parent class | -0.20 to -0.30 | `HeritageCustodian` is parent of all |
| Used in identifiers | -0.10 to -0.15 | `CustodianName` used in GHCID |
| Geographic component | -0.10 to -0.15 | `Location` needed for all institutions |
| Universal attribute | -0.10 to -0.15 | `Provenance` applies to all data |
### Factors That RAISE Specificity Score
| Factor | Impact | Example |
|--------|--------|---------|
| Institution type | +0.30 to +0.40 | `Archive`, `Museum`, `Library` |
| Technical/extraction | +0.30 to +0.40 | `LinkedInConnectionExtraction` |
| Event subtype | +0.20 to +0.30 | `Merger`, `Founding`, `Closure` |
| Domain terminology | +0.15 to +0.25 | `Fonds`, `FindingAid`, `RecordSet` |
### Cross-Class Consistency Rules
1. **Inheritance**: Child classes should have equal or higher specificity than parents
2. **Siblings**: Classes at same hierarchy level should have similar base scores
3. **Competing types**: Institution types should reduce each other's template scores
```yaml
# CORRECT: Archive (0.70) inherits from HeritageCustodian (0.15)
Archive:
is_a: HeritageCustodian # Parent: 0.15
annotations:
specificity_score: 0.70 # Child: 0.70 >= 0.15 ✓
# WRONG: Child less specific than parent
Archive:
is_a: HeritageCustodian # Parent: 0.15
annotations:
specificity_score: 0.10 # Child: 0.10 < 0.15 ✗
```
---
## Validation Rules
### Required Validations
1. **Range Check**: `0.0 <= specificity_score <= 1.0`
2. **Rationale Present**: `specificity_rationale` must not be empty
3. **Inheritance Consistency**: Child score >= parent score
4. **Template Score Range**: All template scores must be 0.0-1.0
### Recommended Validations
1. **No Orphan Scores**: Every class should have annotations (warn if missing)
2. **Score Distribution**: Flag if >50% of classes have same score (lack of differentiation)
3. **Template Coverage**: Warn if template_specificity omits common templates
### Validation Script
```python
# scripts/validate_specificity_scores.py
from linkml_runtime import SchemaView
from pathlib import Path
import sys
REQUIRED_TEMPLATES = [
"archive_search", "museum_search", "library_search",
"collection_discovery", "person_research", "location_browse",
"identifier_lookup", "organizational_change", "digital_platform",
"general_heritage"
]
def validate_specificity_scores(schema_path: Path) -> list[str]:
"""Validate all specificity score annotations."""
errors = []
schema = SchemaView(str(schema_path))
for class_name in schema.all_classes():
cls = schema.get_class(class_name)
# Check required annotations
score = cls.annotations.get("specificity_score")
rationale = cls.annotations.get("specificity_rationale")
if score is None:
errors.append(f"{class_name}: Missing specificity_score")
continue
# Validate score range
try:
score_val = float(score.value)
if not 0.0 <= score_val <= 1.0:
errors.append(f"{class_name}: Score {score_val} out of range [0.0, 1.0]")
except (ValueError, TypeError):
errors.append(f"{class_name}: Invalid score value: {score.value}")
# Check rationale
if rationale is None or not rationale.value.strip():
errors.append(f"{class_name}: Missing or empty specificity_rationale")
# Check inheritance consistency
if cls.is_a:
parent = schema.get_class(cls.is_a)
parent_score = parent.annotations.get("specificity_score")
if parent_score and float(score.value) < float(parent_score.value):
errors.append(
f"{class_name}: Score {score.value} < parent {cls.is_a} score {parent_score.value}"
)
return errors
if __name__ == "__main__":
schema_path = Path("schemas/20251121/linkml/01_custodian_name.yaml")
errors = validate_specificity_scores(schema_path)
if errors:
print("Validation errors:")
for error in errors:
print(f" - {error}")
sys.exit(1)
else:
print("All specificity scores valid!")
sys.exit(0)
```
---
## Anti-Patterns
### What NOT to Do
| Anti-Pattern | Why It's Wrong | Correct Approach |
|--------------|----------------|------------------|
| Score without rationale | No audit trail for decisions | Always include rationale |
| All scores = 0.5 | No differentiation, useless for filtering | Differentiate based on semantics |
| Child < parent score | Violates specificity inheritance | Child should be equal or more specific |
| Template score > 1.0 | Invalid score value | Keep all scores in [0.0, 1.0] |
| Empty rationale | Fails validation, no documentation | Write meaningful rationale |
### Example of Incorrect Annotation
```yaml
# WRONG - Multiple issues
classes:
Archive:
annotations:
specificity_score: 1.5 # Out of range!
specificity_rationale: "" # Empty rationale!
template_specificity:
archive_search: 0.95
# Missing other templates - incomplete coverage
```
### Example of Correct Annotation
```yaml
# CORRECT
classes:
Archive:
annotations:
specificity_score: 0.70
specificity_rationale: >-
Domain-specific institution type for archives. Highly relevant
for archival research queries but less useful for museum or
library-focused questions.
template_specificity:
archive_search: 0.95
museum_search: 0.20
library_search: 0.25
collection_discovery: 0.75
person_research: 0.40
location_browse: 0.65
identifier_lookup: 0.50
organizational_change: 0.60
digital_platform: 0.45
general_heritage: 0.70
```
---
## Migration Checklist
When adding specificity scores to existing classes:
### Phase 1: Assessment
- [ ] Count classes without annotations
- [ ] Identify class hierarchy (parents → children order)
- [ ] Review existing descriptions for scoring hints
### Phase 2: Annotation
- [ ] Start with root classes (lowest specificity)
- [ ] Work down hierarchy (increasing specificity)
- [ ] Assign template scores based on domain alignment
- [ ] Write rationale explaining score decisions
### Phase 3: Validation
- [ ] Run validation script
- [ ] Check inheritance consistency
- [ ] Verify score distribution (not all same value)
- [ ] Review edge cases (technical classes, mixins)
### Phase 4: Documentation
- [ ] Update class count in plan documents
- [ ] Document any scoring decisions that were difficult
- [ ] Create PR with all changes
---
## Related Rules
- **Rule 0**: LinkML Schemas Are the Single Source of Truth
- **Rule 4**: Technical Classes Are Excluded from Visualizations
- **Rule 13**: Custodian Type Annotations on LinkML Schema Elements
---
## References
- `docs/plan/specificity_score/README.md` - System overview
- `docs/plan/specificity_score/04-prompt-conversation-templates.md` - Template definitions
- `docs/plan/specificity_score/06-uml-visualization.md` - UML filtering integration
---
## Changelog
| Date | Version | Change |
|------|---------|--------|
| 2025-01-04 | 1.0.0 | Initial rule created for specificity score system |