4.9 KiB
LinkML Schema Validation Patterns
This document captures critical patterns discovered during the GLAM LinkML schema validation effort against 33,444+ custodian YAML data files.
Overview
The GLAM project uses LinkML schemas to define the structure of heritage custodian data. Validation ensures data files conform to the schema before ingestion into databases.
Key Files:
- Main schema entry:
schemas/20251121/linkml/modules/classes/CustodianSourceFile.yaml - Classes:
schemas/20251121/linkml/modules/classes/ - Slots:
schemas/20251121/linkml/modules/slots/ - Enums:
schemas/20251121/linkml/modules/enums/ - Data:
data/custodian/*.yaml(33,444+ files)
Validation Command
# Single file
linkml-validate -s schemas/20251121/linkml/modules/classes/CustodianSourceFile.yaml data/custodian/FILE.yaml
# Batch test
for f in data/custodian/A*.yaml; do
linkml-validate -s schemas/20251121/linkml/modules/classes/CustodianSourceFile.yaml "$f" 2>&1 | grep -q ERROR && echo "FAIL: $f"
done
Critical Patterns
Pattern 1: Union Types Require range: Any
Priority: 🚨 CRITICAL
When using any_of to define union types (e.g., string OR integer), you MUST specify range: Any.
# CORRECT
attributes:
identifier_value:
range: Any # ← REQUIRED
any_of:
- range: string
- range: integer
# WRONG (validation silently fails)
attributes:
identifier_value:
any_of:
- range: string
- range: integer
Affected Fields:
identifier_value(string | integer)geonames_id(string | integer)youtube_channel_id(string | array)facebook_id(string | array)identifiers(dict | array)
See: Rule 59 in AGENTS.md, .opencode/rules/linkml-union-type-range-any-rule.md
Pattern 2: Flexible Data Structures (dict OR array)
Some fields accept either dict or array format depending on data source.
# Dict format (common in manual entry)
identifiers:
isil: "NL-123"
wikidata: "Q12345"
# Array format (common in API responses)
identifiers:
- scheme: isil
value: "NL-123"
- scheme: wikidata
value: "Q12345"
Schema Solution:
attributes:
identifiers:
range: Any # Accept both formats
description: >-
Identifiers from source. Accepts dict format {scheme: value}
or array format [{scheme: X, value: Y}].
Pattern 3: Adding Missing Fields to Classes
When data contains fields not in schema, add them to the appropriate class:
# In the class YAML file (e.g., CustodianSourceFile.yaml)
attributes:
new_field_name:
range: string # or appropriate type
description: Description of the field
# Optional properties:
required: false
multivalued: false
inlined: true # For complex types
Validation Error Example:
Additional properties are not allowed ('new_field' was unexpected)
Fix: Add the field to the class's attributes section.
Pattern 4: Nested Class Inlining
When a field references another class, use inlined: true:
attributes:
location:
range: Location # References Location class
inlined: true # Embed the object, don't use reference
locations:
range: Location
multivalued: true
inlined_as_list: true # For arrays of objects
Pattern 5: Optional vs Required Fields
Most fields should be optional to handle incomplete data:
attributes:
field_name:
range: string
required: false # Default, but explicit is clearer
Required fields cause validation failures when missing:
attributes:
name:
range: string
required: true # Data MUST have this field
Common Validation Errors and Solutions
Error: Additional properties not allowed
Cause: Data has field not defined in schema
Fix: Add field to class's attributes section
Error: None is not of type 'string'
Cause: Field defined as string but data has null
Fix: Make field optional or use any_of with null
Error: 123 is not of type 'string'
Cause: Data has integer, schema expects string
Fix: Use union type with range: Any and any_of
Error: [...] is not of type 'object'
Cause: Data has array, schema expects single object
Fix: Add multivalued: true or use range: Any for flexibility
Validation Session Tracking
When performing validation sessions, track fixes systematically:
| Fix # | File Modified | Change Made | Error Fixed |
|---|---|---|---|
| 1 | ClassName.yaml | Added field X | "X was unexpected" |
| 2 | ClassName.yaml | Added range: Any | Type mismatch |
This table format allows continuation across sessions.
References
- LinkML Documentation
- Union Types
- AGENTS.md Rules 48-59
.opencode/rules/for detailed rule documentation