11 KiB
Rule: LinkML "Types" Classes Define SPARQL Template Variables
Created: 2025-01-08 Status: Active Applies to: SPARQL template design, RAG pipeline slot extraction
Core Principle
LinkML classes following the *Type / *Types naming pattern (Rule 0b) serve as the single source of truth for valid values in SPARQL template slot variables.
When designing SPARQL templates, extract variables from the schema rather than hardcoding values. This enables:
- Flexibility: Same template works across all institution types
- Extensibility: Adding new types to schema automatically extends templates
- Consistency: Variable values always align with ontology
- Multilingual support: Type labels in multiple languages available from schema
Template Variable Sources
1. Institution Type Variable (institution_type)
Schema Source: CustodianType abstract class and its 19 subclasses
| Subclass | Code | Description |
|---|---|---|
ArchiveOrganizationType |
A | Archives |
BioCustodianType |
B | Botanical gardens, zoos |
CommercialOrganizationType |
C | Corporations |
DigitalPlatformType |
D | Digital platforms |
EducationProviderType |
E | Universities, schools |
FeatureCustodianType |
F | Geographic features |
GalleryType |
G | Art galleries |
HolySacredSiteType |
H | Religious sites |
IntangibleHeritageGroupType |
I | Folklore organizations |
LibraryType |
L | Libraries |
MuseumType |
M | Museums |
NonProfitType |
N | NGOs |
OfficialInstitutionType |
O | Government agencies |
PersonalCollectionType |
P | Private collectors |
ResearchOrganizationType |
R | Research centers |
HeritageSocietyType |
S | Historical societies |
TasteScentHeritageType |
T | Culinary heritage |
UnspecifiedType |
U | Unknown |
MixedCustodianType |
X | Multiple types |
Template Slot Definition:
slots:
institution_type:
type: institution_type
required: true
schema_source: "modules/classes/CustodianType.yaml"
# Valid values derived from CustodianType subclasses
2. Geographic Scope Variable (location)
Geographic scope is a hierarchical variable with three levels:
| Level | Schema Source | SPARQL Property | Example |
|---|---|---|---|
| Country | ISO 3166-1 alpha-2 | hc:countryCode |
NL, DE, BE |
| Subregion | ISO 3166-2 | hc:subregionCode |
NL-NH, DE-BY |
| Settlement | GeoNames | hc:settlementName |
Amsterdam, Berlin |
Template Slot Definition:
slots:
location:
type: location
required: true
schema_source:
- "modules/enums/CountryCodeEnum.yaml" (if exists)
- "data/reference/geonames.db"
resolution_order: [settlement, subregion, country]
# SlotExtractor detects which level user specified
3. Digital Platform Type Variable (platform_type)
Schema Source: DigitalPlatformType abstract class and 69+ subclasses in DigitalPlatformTypes.yaml
Categories include:
- REPOSITORY: DigitalLibrary, DigitalArchivePlatform, OpenAccessRepository
- AGGREGATOR: Europeana-type aggregators, BibliographicDatabasePlatform
- DISCOVERY: WebPortal, OnlineDatabase, OpenDataPortal
- VIRTUAL_HERITAGE: VirtualMuseum, VirtualLibrary, OnlineArtGallery
- RESEARCH: DisciplinaryRepository, PrePrintServer, GenealogyDatabase
- ...and many more
Template Slot Definition:
slots:
platform_type:
type: platform_type
required: false
schema_source: "modules/classes/DigitalPlatformTypes.yaml"
Template Design Pattern
Before (Hardcoded - WRONG)
# Separate templates for each institution type - DO NOT DO THIS
templates:
count_museums_in_region:
sparql: |
SELECT (COUNT(?s) AS ?count) WHERE {
?s hc:institutionType "M" ;
hc:subregionCode "{{ region }}" .
}
count_archives_in_region:
sparql: |
SELECT (COUNT(?s) AS ?count) WHERE {
?s hc:institutionType "A" ;
hc:subregionCode "{{ region }}" .
}
After (Parameterized - CORRECT)
# Single template with institution_type as variable
templates:
count_institutions_by_type_location:
description: "Count heritage institutions by type and location"
slots:
institution_type:
type: institution_type
required: true
schema_source: "modules/classes/CustodianType.yaml"
location:
type: location
required: true
resolution_order: [settlement, subregion, country]
# Multiple SPARQL variants based on location resolution
sparql_template: |
SELECT (COUNT(DISTINCT ?institution) AS ?count) WHERE {
?institution a hcc:Custodian ;
hc:institutionType "{{ institution_type }}" ;
hc:settlementName "{{ location }}" .
}
sparql_template_region: |
SELECT (COUNT(DISTINCT ?institution) AS ?count) WHERE {
?institution a hcc:Custodian ;
hc:institutionType "{{ institution_type }}" ;
hc:subregionCode "{{ location }}" .
}
sparql_template_country: |
SELECT (COUNT(DISTINCT ?institution) AS ?count) WHERE {
?institution a hcc:Custodian ;
hc:institutionType "{{ institution_type }}" ;
hc:countryCode "{{ location }}" .
}
SlotExtractor Responsibilities
The SlotExtractor module must:
-
Detect institution type from user query:
- "musea" → M (Dutch plural)
- "archives" → A (English)
- "bibliotheken" → L (Dutch)
- Use synonyms from
_slot_types.institution_type.synonyms
-
Detect location level from user query:
- "Amsterdam" → settlement level → use
sparql_template - "Noord-Holland" → subregion level → use
sparql_template_region - "Nederland" → country level → use
sparql_template_country
- "Amsterdam" → settlement level → use
-
Normalize values to schema-compliant codes:
- "Noord-Holland" → "NL-NH"
- "museum" → "M"
Dynamic Label Resolution (NO HARDCODING)
CRITICAL: Labels MUST be resolved at runtime from schema/reference files, NOT hardcoded in templates or code.
Institution Type Labels
The CustodianType classes contain multilingual labels via type_label slot:
MuseumType:
type_label:
- "Museum"@en
- "museum"@nl
- "Museum"@de
- "museo"@es
Label Resolution Chain:
- Load
CustodianType.yamland subclass files - Parse
type_labelslot for each type code (M, L, A, etc.) - Build runtime label dictionary keyed by code + language
Geographic Labels
Subregion/settlement names come from reference data files, not hardcoded:
label_sources:
- "data/reference/iso_3166_2_{country}.json" # e.g., iso_3166_2_nl.json
- "data/reference/geonames.db" # GeoNames database
- "data/reference/admin1CodesASCII.txt" # GeoNames fallback
Example: iso_3166_2_nl.json contains:
{
"provinces": {
"Noord-Holland": "NH",
"Zuid-Holland": "ZH",
"North Holland": "NH" // English synonym
}
}
SlotExtractor Label Loading
class SlotExtractor:
def __init__(self, schema_path: str, reference_path: str):
# Load institution type labels from schema
self.type_labels = self._load_custodian_type_labels(schema_path)
# Load geographic labels from reference files
self.subregion_labels = self._load_subregion_labels(reference_path)
def _load_custodian_type_labels(self, schema_path: str) -> dict:
"""Load multilingual labels from CustodianType schema files."""
# Parse YAML, extract type_label slots
# Return: {"M": {"nl": "musea", "en": "museums"}, ...}
def _load_subregion_labels(self, reference_path: str) -> dict:
"""Load subregion labels from ISO 3166-2 JSON files."""
# Load iso_3166_2_nl.json, iso_3166_2_de.json, etc.
# Return: {"NL-NH": {"nl": "Noord-Holland", "en": "North Holland"}, ...}
UI Template Interpolation
ui_template:
nl: "Er zijn {{ count }} {{ institution_type_nl }} in {{ location }}."
en: "There are {{ count }} {{ institution_type_en }} in {{ location }}."
The RAG pipeline populates institution_type_nl / institution_type_en from dynamically loaded labels:
# At runtime, NOT hardcoded
template_context["institution_type_nl"] = slot_extractor.type_labels[type_code]["nl"]
template_context["institution_type_en"] = slot_extractor.type_labels[type_code]["en"]
Adding New Types
When the schema gains new institution types:
- No template changes needed - parameterized templates automatically support new types
- Update synonyms in
_slot_types.institution_type.synonymsfor NLP recognition - Labels auto-discovered from schema files - no code changes needed
Anti-Patterns (FORBIDDEN)
Hardcoded Labels in Templates
# WRONG - Hardcoded labels
labels:
NL-NH: {nl: "Noord-Holland", en: "North Holland"}
NL-ZH: {nl: "Zuid-Holland", en: "South Holland"}
# WRONG - Hardcoded labels in code
INSTITUTION_TYPE_LABELS_NL = {
"M": "musea", "L": "bibliotheken", ...
}
Correct Approach
# CORRECT - Reference to schema/data source
label_sources:
- "schemas/20251121/linkml/modules/classes/CustodianType.yaml"
- "data/reference/iso_3166_2_{country}.json"
# CORRECT - Load labels at runtime
type_labels = load_labels_from_schema("CustodianType.yaml")
region_labels = load_labels_from_reference("iso_3166_2_nl.json")
Why?
- Single source of truth - Labels defined once in schema/reference files
- Automatic sync - Schema changes automatically propagate to UI
- Extensibility - Adding new countries/types doesn't require code changes
- Multilingual - All language variants come from same source
Validation
Templates MUST validate slot values against schema:
def validate_institution_type(value: str) -> bool:
"""Validate institution_type against CustodianType schema."""
valid_codes = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I',
'L', 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'X']
return value in valid_codes
Related Rules
- Rule 0b: Type/Types file naming convention
- Rule 13: Custodian type annotations on LinkML schema elements
- Rule 37: Specificity score annotations for template filtering
References
- Schema:
schemas/20251121/linkml/modules/classes/CustodianType.yaml - Types:
schemas/20251121/linkml/modules/classes/*Types.yaml - Enums:
schemas/20251121/linkml/modules/enums/InstitutionTypeCodeEnum.yaml - Templates:
data/sparql_templates.yaml