- Created deliverables_slot for expected or achieved deliverable outputs. - Introduced event_id_slot for persistent unique event identifiers. - Added follow_up_date_slot for scheduled follow-up action dates. - Implemented object_ref_slot for references to heritage objects. - Established price_slot for price information across entities. - Added price_currency_slot for currency codes in price information. - Created protocol_slot for API protocol specifications. - Introduced provenance_text_slot for full provenance entry text. - Added record_type_slot for classification of record types. - Implemented response_formats_slot for supported API response formats. - Established status_slot for current status of entities or activities. - Added FactualCountDisplay component for displaying count query results. - Introduced ReplyTypeIndicator component for visualizing reply types. - Created approval_date_slot for formal approval dates. - Added authentication_required_slot for API authentication status. - Implemented capacity_items_slot for maximum storage capacity. - Established conservation_lab_slot for conservation laboratory information. - Added cost_usd_slot for API operation costs in USD.
401 lines
13 KiB
Markdown
401 lines
13 KiB
Markdown
# Rule: Specificity Score Convention for LinkML Schema Annotations
|
|
|
|
**Version**: 1.0.0
|
|
**Created**: 2025-01-04
|
|
**Status**: Active
|
|
**Applies to**: `schemas/20251121/linkml/modules/classes/*.yaml`
|
|
|
|
---
|
|
|
|
## Rule Statement
|
|
|
|
Every class in the Heritage Custodian Ontology MUST have specificity score annotations to enable intelligent filtering for RAG retrieval and UML visualization.
|
|
|
|
---
|
|
|
|
## Annotation Schema
|
|
|
|
### Required Annotations
|
|
|
|
Every class YAML file MUST include these annotations:
|
|
|
|
```yaml
|
|
classes:
|
|
ClassName:
|
|
annotations:
|
|
specificity_score: 0.75 # Required: General specificity (0.0-1.0)
|
|
specificity_rationale: "..." # Required: Why this score was assigned
|
|
```
|
|
|
|
### Optional Annotations
|
|
|
|
Template-specific scores for context-aware filtering:
|
|
|
|
```yaml
|
|
classes:
|
|
ClassName:
|
|
annotations:
|
|
specificity_score: 0.75
|
|
specificity_rationale: "..."
|
|
template_specificity: # Optional: Template-specific scores
|
|
archive_search: 0.95
|
|
museum_search: 0.20
|
|
person_research: 0.30
|
|
```
|
|
|
|
---
|
|
|
|
## Score Semantics
|
|
|
|
### General Specificity Score
|
|
|
|
The `specificity_score` measures how **context-dependent** a class is:
|
|
|
|
| Score Range | Meaning | Example Classes |
|
|
|-------------|---------|-----------------|
|
|
| 0.00-0.20 | **Universal** - relevant in almost all contexts | `HeritageCustodian`, `CustodianName`, `Location` |
|
|
| 0.20-0.40 | **Broadly useful** - relevant in most contexts | `Collection`, `Identifier`, `GHCID` |
|
|
| 0.40-0.60 | **Moderately specific** - relevant in several contexts | `ChangeEvent`, `PersonProfile`, `DigitalPlatform` |
|
|
| 0.60-0.80 | **Fairly specific** - relevant in limited contexts | `Archive`, `Museum`, `Library`, `FindingAid` |
|
|
| 0.80-1.00 | **Highly specific** - relevant only in specialized contexts | `LinkedInConnectionExtraction`, `GHCIDHistoryEntry` |
|
|
|
|
**Key Insight**: Lower scores = MORE generally relevant (always useful in RAG); Higher scores = MORE specific (only useful in specialized queries).
|
|
|
|
---
|
|
|
|
### Template Specificity Scores
|
|
|
|
The `template_specificity` maps class relevance to 10 conversation templates:
|
|
|
|
| Template ID | Focus Area | Example High-Score Classes |
|
|
|-------------|------------|---------------------------|
|
|
| `archive_search` | Archives and archival holdings | `Archive`, `RecordSet`, `Fonds` |
|
|
| `museum_search` | Museums and exhibitions | `Museum`, `Gallery`, `Exhibition` |
|
|
| `library_search` | Libraries and catalogs | `Library`, `Catalog`, `BibliographicCollection` |
|
|
| `collection_discovery` | Collections and holdings | `Collection`, `Accession`, `Extent` |
|
|
| `person_research` | People and staff | `PersonProfile`, `Staff`, `Role` |
|
|
| `location_browse` | Geographic information | `Location`, `Address`, `GeoCoordinates` |
|
|
| `identifier_lookup` | Identifiers (ISIL, Wikidata) | `Identifier`, `GHCID`, `ISIL` |
|
|
| `organizational_change` | History and changes | `ChangeEvent`, `Founding`, `Merger` |
|
|
| `digital_platform` | Online resources | `DigitalPlatform`, `Website`, `API` |
|
|
| `general_heritage` | Fallback/general | Uses `specificity_score` directly |
|
|
|
|
---
|
|
|
|
## Examples
|
|
|
|
### Example 1: Universal Class (Low Specificity)
|
|
|
|
```yaml
|
|
# modules/classes/HeritageCustodian.yaml
|
|
classes:
|
|
HeritageCustodian:
|
|
description: >-
|
|
Base class for all heritage custodian institutions.
|
|
annotations:
|
|
specificity_score: 0.15
|
|
specificity_rationale: >-
|
|
Universal base class relevant in virtually all heritage contexts.
|
|
Every query about heritage institutions implicitly involves this class.
|
|
template_specificity:
|
|
archive_search: 0.65
|
|
museum_search: 0.65
|
|
library_search: 0.65
|
|
collection_discovery: 0.70
|
|
person_research: 0.70
|
|
location_browse: 0.75
|
|
identifier_lookup: 0.70
|
|
organizational_change: 0.75
|
|
digital_platform: 0.70
|
|
general_heritage: 0.15
|
|
```
|
|
|
|
### Example 2: Domain-Specific Class (High Specificity)
|
|
|
|
```yaml
|
|
# modules/classes/Archive.yaml
|
|
classes:
|
|
Archive:
|
|
is_a: HeritageCustodian
|
|
description: >-
|
|
An archive institution holding historical records and documents.
|
|
annotations:
|
|
specificity_score: 0.70
|
|
specificity_rationale: >-
|
|
Domain-specific institution type. Highly relevant for archival research
|
|
but not needed for museum or library queries.
|
|
template_specificity:
|
|
archive_search: 0.95
|
|
museum_search: 0.20
|
|
library_search: 0.25
|
|
collection_discovery: 0.75
|
|
person_research: 0.40
|
|
location_browse: 0.65
|
|
identifier_lookup: 0.50
|
|
organizational_change: 0.60
|
|
digital_platform: 0.45
|
|
general_heritage: 0.70
|
|
```
|
|
|
|
### Example 3: Technical Class (Very High Specificity)
|
|
|
|
```yaml
|
|
# modules/classes/LinkedInConnectionExtraction.yaml
|
|
classes:
|
|
LinkedInConnectionExtraction:
|
|
description: >-
|
|
Technical class for extracting LinkedIn connection data.
|
|
annotations:
|
|
specificity_score: 0.95
|
|
specificity_rationale: >-
|
|
Internal extraction class with no semantic significance for end users.
|
|
Only relevant when specifically researching data extraction processes.
|
|
template_specificity:
|
|
archive_search: 0.05
|
|
museum_search: 0.05
|
|
library_search: 0.05
|
|
collection_discovery: 0.05
|
|
person_research: 0.40
|
|
location_browse: 0.05
|
|
identifier_lookup: 0.10
|
|
organizational_change: 0.05
|
|
digital_platform: 0.15
|
|
general_heritage: 0.95
|
|
```
|
|
|
|
---
|
|
|
|
## Score Assignment Guidelines
|
|
|
|
### Factors That LOWER Specificity Score
|
|
|
|
| Factor | Impact | Example |
|
|
|--------|--------|---------|
|
|
| Base/parent class | -0.20 to -0.30 | `HeritageCustodian` is parent of all |
|
|
| Used in identifiers | -0.10 to -0.15 | `CustodianName` used in GHCID |
|
|
| Geographic component | -0.10 to -0.15 | `Location` needed for all institutions |
|
|
| Universal attribute | -0.10 to -0.15 | `Provenance` applies to all data |
|
|
|
|
### Factors That RAISE Specificity Score
|
|
|
|
| Factor | Impact | Example |
|
|
|--------|--------|---------|
|
|
| Institution type | +0.30 to +0.40 | `Archive`, `Museum`, `Library` |
|
|
| Technical/extraction | +0.30 to +0.40 | `LinkedInConnectionExtraction` |
|
|
| Event subtype | +0.20 to +0.30 | `Merger`, `Founding`, `Closure` |
|
|
| Domain terminology | +0.15 to +0.25 | `Fonds`, `FindingAid`, `RecordSet` |
|
|
|
|
### Cross-Class Consistency Rules
|
|
|
|
1. **Inheritance**: Child classes should have equal or higher specificity than parents
|
|
2. **Siblings**: Classes at same hierarchy level should have similar base scores
|
|
3. **Competing types**: Institution types should reduce each other's template scores
|
|
|
|
```yaml
|
|
# CORRECT: Archive (0.70) inherits from HeritageCustodian (0.15)
|
|
Archive:
|
|
is_a: HeritageCustodian # Parent: 0.15
|
|
annotations:
|
|
specificity_score: 0.70 # Child: 0.70 >= 0.15 ✓
|
|
|
|
# WRONG: Child less specific than parent
|
|
Archive:
|
|
is_a: HeritageCustodian # Parent: 0.15
|
|
annotations:
|
|
specificity_score: 0.10 # Child: 0.10 < 0.15 ✗
|
|
```
|
|
|
|
---
|
|
|
|
## Validation Rules
|
|
|
|
### Required Validations
|
|
|
|
1. **Range Check**: `0.0 <= specificity_score <= 1.0`
|
|
2. **Rationale Present**: `specificity_rationale` must not be empty
|
|
3. **Inheritance Consistency**: Child score >= parent score
|
|
4. **Template Score Range**: All template scores must be 0.0-1.0
|
|
|
|
### Recommended Validations
|
|
|
|
1. **No Orphan Scores**: Every class should have annotations (warn if missing)
|
|
2. **Score Distribution**: Flag if >50% of classes have same score (lack of differentiation)
|
|
3. **Template Coverage**: Warn if template_specificity omits common templates
|
|
|
|
### Validation Script
|
|
|
|
```python
|
|
# scripts/validate_specificity_scores.py
|
|
|
|
from linkml_runtime import SchemaView
|
|
from pathlib import Path
|
|
import sys
|
|
|
|
REQUIRED_TEMPLATES = [
|
|
"archive_search", "museum_search", "library_search",
|
|
"collection_discovery", "person_research", "location_browse",
|
|
"identifier_lookup", "organizational_change", "digital_platform",
|
|
"general_heritage"
|
|
]
|
|
|
|
def validate_specificity_scores(schema_path: Path) -> list[str]:
|
|
"""Validate all specificity score annotations."""
|
|
errors = []
|
|
schema = SchemaView(str(schema_path))
|
|
|
|
for class_name in schema.all_classes():
|
|
cls = schema.get_class(class_name)
|
|
|
|
# Check required annotations
|
|
score = cls.annotations.get("specificity_score")
|
|
rationale = cls.annotations.get("specificity_rationale")
|
|
|
|
if score is None:
|
|
errors.append(f"{class_name}: Missing specificity_score")
|
|
continue
|
|
|
|
# Validate score range
|
|
try:
|
|
score_val = float(score.value)
|
|
if not 0.0 <= score_val <= 1.0:
|
|
errors.append(f"{class_name}: Score {score_val} out of range [0.0, 1.0]")
|
|
except (ValueError, TypeError):
|
|
errors.append(f"{class_name}: Invalid score value: {score.value}")
|
|
|
|
# Check rationale
|
|
if rationale is None or not rationale.value.strip():
|
|
errors.append(f"{class_name}: Missing or empty specificity_rationale")
|
|
|
|
# Check inheritance consistency
|
|
if cls.is_a:
|
|
parent = schema.get_class(cls.is_a)
|
|
parent_score = parent.annotations.get("specificity_score")
|
|
if parent_score and float(score.value) < float(parent_score.value):
|
|
errors.append(
|
|
f"{class_name}: Score {score.value} < parent {cls.is_a} score {parent_score.value}"
|
|
)
|
|
|
|
return errors
|
|
|
|
if __name__ == "__main__":
|
|
schema_path = Path("schemas/20251121/linkml/01_custodian_name.yaml")
|
|
errors = validate_specificity_scores(schema_path)
|
|
|
|
if errors:
|
|
print("Validation errors:")
|
|
for error in errors:
|
|
print(f" - {error}")
|
|
sys.exit(1)
|
|
else:
|
|
print("All specificity scores valid!")
|
|
sys.exit(0)
|
|
```
|
|
|
|
---
|
|
|
|
## Anti-Patterns
|
|
|
|
### What NOT to Do
|
|
|
|
| Anti-Pattern | Why It's Wrong | Correct Approach |
|
|
|--------------|----------------|------------------|
|
|
| Score without rationale | No audit trail for decisions | Always include rationale |
|
|
| All scores = 0.5 | No differentiation, useless for filtering | Differentiate based on semantics |
|
|
| Child < parent score | Violates specificity inheritance | Child should be equal or more specific |
|
|
| Template score > 1.0 | Invalid score value | Keep all scores in [0.0, 1.0] |
|
|
| Empty rationale | Fails validation, no documentation | Write meaningful rationale |
|
|
|
|
### Example of Incorrect Annotation
|
|
|
|
```yaml
|
|
# WRONG - Multiple issues
|
|
classes:
|
|
Archive:
|
|
annotations:
|
|
specificity_score: 1.5 # Out of range!
|
|
specificity_rationale: "" # Empty rationale!
|
|
template_specificity:
|
|
archive_search: 0.95
|
|
# Missing other templates - incomplete coverage
|
|
```
|
|
|
|
### Example of Correct Annotation
|
|
|
|
```yaml
|
|
# CORRECT
|
|
classes:
|
|
Archive:
|
|
annotations:
|
|
specificity_score: 0.70
|
|
specificity_rationale: >-
|
|
Domain-specific institution type for archives. Highly relevant
|
|
for archival research queries but less useful for museum or
|
|
library-focused questions.
|
|
template_specificity:
|
|
archive_search: 0.95
|
|
museum_search: 0.20
|
|
library_search: 0.25
|
|
collection_discovery: 0.75
|
|
person_research: 0.40
|
|
location_browse: 0.65
|
|
identifier_lookup: 0.50
|
|
organizational_change: 0.60
|
|
digital_platform: 0.45
|
|
general_heritage: 0.70
|
|
```
|
|
|
|
---
|
|
|
|
## Migration Checklist
|
|
|
|
When adding specificity scores to existing classes:
|
|
|
|
### Phase 1: Assessment
|
|
|
|
- [ ] Count classes without annotations
|
|
- [ ] Identify class hierarchy (parents → children order)
|
|
- [ ] Review existing descriptions for scoring hints
|
|
|
|
### Phase 2: Annotation
|
|
|
|
- [ ] Start with root classes (lowest specificity)
|
|
- [ ] Work down hierarchy (increasing specificity)
|
|
- [ ] Assign template scores based on domain alignment
|
|
- [ ] Write rationale explaining score decisions
|
|
|
|
### Phase 3: Validation
|
|
|
|
- [ ] Run validation script
|
|
- [ ] Check inheritance consistency
|
|
- [ ] Verify score distribution (not all same value)
|
|
- [ ] Review edge cases (technical classes, mixins)
|
|
|
|
### Phase 4: Documentation
|
|
|
|
- [ ] Update class count in plan documents
|
|
- [ ] Document any scoring decisions that were difficult
|
|
- [ ] Create PR with all changes
|
|
|
|
---
|
|
|
|
## Related Rules
|
|
|
|
- **Rule 0**: LinkML Schemas Are the Single Source of Truth
|
|
- **Rule 4**: Technical Classes Are Excluded from Visualizations
|
|
- **Rule 13**: Custodian Type Annotations on LinkML Schema Elements
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- `docs/plan/specificity_score/README.md` - System overview
|
|
- `docs/plan/specificity_score/04-prompt-conversation-templates.md` - Template definitions
|
|
- `docs/plan/specificity_score/06-uml-visualization.md` - UML filtering integration
|
|
|
|
---
|
|
|
|
## Changelog
|
|
|
|
| Date | Version | Change |
|
|
|------|---------|--------|
|
|
| 2025-01-04 | 1.0.0 | Initial rule created for specificity score system |
|