glam/docs/LINKML_CONSTRAINTS.md
kempersc 67657c39b6 feat: Complete Country Class Implementation and Hypernyms Removal
- Created the Country class with ISO 3166-1 alpha-2 and alpha-3 codes, ensuring minimal design without additional metadata.
- Integrated the Country class into CustodianPlace and LegalForm schemas to support country-specific feature types and legal forms.
- Removed duplicate keys in FeatureTypeEnum.yaml, resulting in 294 unique feature types.
- Eliminated "Hypernyms:" text from FeatureTypeEnum descriptions, verifying that semantic relationships are now conveyed through ontology mappings.
- Created example instance file demonstrating integration of Country with CustodianPlace and LegalForm.
- Updated documentation to reflect the completion of the Country class implementation and hypernyms removal.
2025-11-23 13:09:38 +01:00

28 KiB

LinkML Constraints and Validation

Version: 1.0
Date: 2025-11-22
Status: Phase 8 Complete

This document describes the LinkML-level validation approach for the Heritage Custodian Ontology, including built-in constraints, custom validators, and integration patterns.


Table of Contents

  1. Overview
  2. Three-Layer Validation Strategy
  3. LinkML Built-in Constraints
  4. Custom Python Validators
  5. Usage Examples
  6. Validation Test Suite
  7. Integration Patterns
  8. Comparison with Other Approaches
  9. Troubleshooting

Overview

Goal: Validate heritage custodian data at the YAML instance level BEFORE converting to RDF.

Why Validate at LinkML Level?

  • Early Detection: Catch errors before expensive RDF conversion
  • Fast Feedback: YAML validation is faster than RDF/SHACL validation
  • Developer-Friendly: Error messages reference YAML structure (not RDF triples)
  • CI/CD Integration: Validate data pipelines before publishing

What LinkML Validates:

  1. Schema Compliance: Data types, required fields, cardinality
  2. Format Constraints: Date formats, regex patterns, enumerations
  3. Custom Business Rules: Temporal consistency, bidirectional relationships (via Python validators)

Three-Layer Validation Strategy

The Heritage Custodian Ontology uses complementary validation at three levels:

Layer Technology When Purpose Speed
Layer 1: LinkML Python validators YAML loading Validate BEFORE RDF conversion Fast (ms)
Layer 2: SHACL RDF shapes RDF ingestion Validate DURING triple store loading 🐢 Moderate (sec)
Layer 3: SPARQL Query-based Runtime Validate AFTER data is stored 🐢 Slow (sec-min)

Recommended Workflow:

1. Create YAML instance
   ↓
2. Validate with LinkML (Layer 1) ← THIS DOCUMENT
   ↓
3. If valid → Convert to RDF
   ↓
4. Validate with SHACL (Layer 2)
   ↓
5. If valid → Load into triple store
   ↓
6. Monitor with SPARQL (Layer 3)

See Also:

  • Layer 2: docs/SHACL_VALIDATION_SHAPES.md
  • Layer 3: docs/SPARQL_QUERIES_ORGANIZATIONAL.md

LinkML Built-in Constraints

LinkML provides declarative constraints that can be embedded directly in schema YAML files.

1. Required Fields

Schema Syntax:

# schemas/20251121/linkml/modules/classes/HeritageCustodian.yaml
slots:
  - name
  - custodian_aspect  # ← Required

slot_definitions:
  name:
    required: true  # ← Must be present

Validation:

from linkml_runtime.loaders import yaml_loader

# ❌ This will fail validation (missing required field)
data = {"id": "test", "description": "No name provided"}
try:
    instance = yaml_loader.load(data, target_class=HeritageCustodian)
except ValueError as e:
    print(f"Error: {e}")  # "Missing required field: name"

2. Data Type Constraints

Schema Syntax:

slots:
  valid_from:
    range: date  # ← Must be a valid date
  
  latitude:
    range: float  # ← Must be a float
  
  institution_type:
    range: InstitutionTypeEnum  # ← Must be one of enum values

Validation:

# ❌ This will fail (invalid date format)
data = {
  "valid_from": "not-a-date"  # Should be "YYYY-MM-DD"
}
# Error: "Expected date, got string 'not-a-date'"

# ❌ This will fail (invalid enum value)
data = {
  "institution_type": "FAKE_TYPE"  # Should be MUSEUM, LIBRARY, etc.
}
# Error: "Value 'FAKE_TYPE' not in InstitutionTypeEnum"

3. Pattern Constraints (Regex)

Schema Syntax:

# schemas/20251121/linkml/modules/slots/valid_from.yaml
valid_from:
  description: Start date of temporal validity (ISO 8601 format)
  range: date
  pattern: "^\\d{4}-\\d{2}-\\d{2}$"  # ← Regex pattern for YYYY-MM-DD
  examples:
    - value: "2000-01-01"
    - value: "1923-05-15"

Validation:

# ✅ Valid dates
"2000-01-01"  # Pass
"1923-05-15"  # Pass

# ❌ Invalid dates
"2000/01/01"  # Fail (wrong separator)
"Jan 1, 2000"  # Fail (wrong format)
"2000-1-1"    # Fail (missing leading zeros)

4. Cardinality Constraints

Schema Syntax:

slots:
  locations:
    multivalued: true   # ← Can have multiple values
    required: false     # ← But list can be empty
  
  custodian_aspect:
    multivalued: false  # ← Only one value allowed
    required: true      # ← Must be present

Validation:

# ✅ Valid: Multiple locations
data = {
  "locations": [
    {"city": "Amsterdam", "country": "NL"},
    {"city": "The Hague", "country": "NL"}
  ]
}

# ❌ Invalid: Multiple custodian_aspect (should be single)
data = {
  "custodian_aspect": [
    {"name": "Museum A"},
    {"name": "Museum B"}
  ]
}
# Error: "custodian_aspect must be single-valued"

5. Minimum/Maximum Value Constraints

Schema Syntax (example for future use):

latitude:
  range: float
  minimum_value: -90.0   # ← Latitude bounds
  maximum_value: 90.0

longitude:
  range: float
  minimum_value: -180.0
  maximum_value: 180.0

confidence_score:
  range: float
  minimum_value: 0.0     # ← Confidence between 0.0 and 1.0
  maximum_value: 1.0

Custom Python Validators

For complex business rules that can't be expressed with built-in constraints, use custom Python validators.

Location: scripts/linkml_validators.py

This script provides 5 custom validation functions implementing organizational structure rules:


Validator 1: Collection-Unit Temporal Consistency

Rule: A collection's valid_from date must be >= its managing unit's valid_from date.

Rationale: A collection cannot be managed by a unit that doesn't yet exist.

Function:

def validate_collection_unit_temporal(data: Dict[str, Any]) -> List[ValidationError]:
    """
    Validate that collections are not founded before their managing units.
    
    Rule 1: collection.valid_from >= unit.valid_from
    """
    errors = []
    
    # Extract organizational units
    units = data.get('organizational_structure', [])
    unit_dates = {unit['id']: unit.get('valid_from') for unit in units}
    
    # Extract collections
    collections = data.get('collections_aspect', [])
    
    for collection in collections:
        collection_valid_from = collection.get('valid_from')
        managing_units = collection.get('managed_by_unit', [])
        
        for unit_id in managing_units:
            unit_valid_from = unit_dates.get(unit_id)
            
            if collection_valid_from and unit_valid_from:
                if collection_valid_from < unit_valid_from:
                    errors.append(ValidationError(
                        rule="COLLECTION_UNIT_TEMPORAL",
                        severity="ERROR",
                        message=f"Collection founded before its managing unit",
                        context={
                            "collection_id": collection.get('id'),
                            "collection_valid_from": collection_valid_from,
                            "unit_id": unit_id,
                            "unit_valid_from": unit_valid_from
                        }
                    ))
    
    return errors

Example Violation:

# ❌ Collection founded in 2002, but unit not established until 2005
organizational_structure:
  - id: unit-001
    valid_from: "2005-01-01"  # Unit founded 2005

collections_aspect:
  - id: collection-001
    valid_from: "2002-03-15"  # ❌ Collection founded 2002 (before unit!)
    managed_by_unit:
      - unit-001

Expected Error:

ERROR: Collection founded before its managing unit
  Collection: collection-001 (valid_from: 2002-03-15)
  Managing Unit: unit-001 (valid_from: 2005-01-01)
  Violation: 2002-03-15 < 2005-01-01

Validator 2: Collection-Unit Bidirectional Consistency

Rule: If a collection references a unit via managed_by_unit, the unit must reference the collection back via manages_collections.

Rationale: Bidirectional relationships ensure graph consistency (required for W3C Org Ontology).

Function:

def validate_collection_unit_bidirectional(data: Dict[str, Any]) -> List[ValidationError]:
    """
    Validate bidirectional relationships between collections and units.
    
    Rule 2: If collection → unit, then unit → collection (inverse).
    """
    errors = []
    
    # Build inverse mapping: unit_id → collections managed by unit
    units = data.get('organizational_structure', [])
    unit_collections = {unit['id']: unit.get('manages_collections', []) for unit in units}
    
    # Check collections
    collections = data.get('collections_aspect', [])
    
    for collection in collections:
        collection_id = collection.get('id')
        managing_units = collection.get('managed_by_unit', [])
        
        for unit_id in managing_units:
            # Check if unit references collection back
            if collection_id not in unit_collections.get(unit_id, []):
                errors.append(ValidationError(
                    rule="COLLECTION_UNIT_BIDIRECTIONAL",
                    severity="ERROR",
                    message=f"Collection references unit, but unit doesn't reference collection",
                    context={
                        "collection_id": collection_id,
                        "unit_id": unit_id,
                        "unit_manages_collections": unit_collections.get(unit_id, [])
                    }
                ))
    
    return errors

Example Violation:

# ❌ Collection → Unit exists, but Unit → Collection missing
organizational_structure:
  - id: unit-001
    # Missing: manages_collections: [collection-001]

collections_aspect:
  - id: collection-001
    managed_by_unit:
      - unit-001  # ✓ Forward reference exists

Expected Error:

ERROR: Collection references unit, but unit doesn't reference collection
  Collection: collection-001
  Unit: unit-001
  Unit's manages_collections: [] (empty - should include collection-001)

Validator 3: Staff-Unit Temporal Consistency

Rule: A staff member's valid_from date must be >= their employing unit's valid_from date.

Rationale: A person cannot be employed by a unit that doesn't yet exist.

Function:

def validate_staff_unit_temporal(data: Dict[str, Any]) -> List[ValidationError]:
    """
    Validate that staff employment dates are consistent with unit founding dates.
    
    Rule 4: staff.valid_from >= unit.valid_from
    """
    errors = []
    
    # Extract organizational units
    units = data.get('organizational_structure', [])
    unit_dates = {unit['id']: unit.get('valid_from') for unit in units}
    
    # Extract staff
    staff = data.get('staff_aspect', [])
    
    for person in staff:
        person_obs = person.get('person_observation', {})
        person_valid_from = person_obs.get('valid_from')
        employing_units = person.get('employed_by_unit', [])
        
        for unit_id in employing_units:
            unit_valid_from = unit_dates.get(unit_id)
            
            if person_valid_from and unit_valid_from:
                if person_valid_from < unit_valid_from:
                    errors.append(ValidationError(
                        rule="STAFF_UNIT_TEMPORAL",
                        severity="ERROR",
                        message=f"Staff employment started before unit existed",
                        context={
                            "staff_id": person.get('id'),
                            "staff_valid_from": person_valid_from,
                            "unit_id": unit_id,
                            "unit_valid_from": unit_valid_from
                        }
                    ))
    
    return errors

Validator 4: Staff-Unit Bidirectional Consistency

Rule: If staff references a unit via employed_by_unit, the unit must reference the staff back via employs_staff.

Function: Similar structure to Validator 2 (see scripts/linkml_validators.py for implementation).


Validator 5: Batch Validation

Function: Run all validators at once and return combined results.

def validate_all(data: Dict[str, Any]) -> List[ValidationError]:
    """
    Run all validation rules and return combined results.
    """
    errors = []
    errors.extend(validate_collection_unit_temporal(data))
    errors.extend(validate_collection_unit_bidirectional(data))
    errors.extend(validate_staff_unit_temporal(data))
    errors.extend(validate_staff_unit_bidirectional(data))
    return errors

Usage Examples

Command-Line Interface

The linkml_validators.py script provides a CLI for standalone validation:

# Validate a single YAML file
python scripts/linkml_validators.py \
  schemas/20251121/examples/validation_tests/valid_complete_example.yaml

# ✅ Output (valid file):
# Validation successful! No errors found.
# File: valid_complete_example.yaml

# Validate an invalid file
python scripts/linkml_validators.py \
  schemas/20251121/examples/validation_tests/invalid_temporal_violation.yaml

# ❌ Output (invalid file):
# Validation failed with 4 errors:
# 
# ERROR: Collection founded before its managing unit
#   Collection: early-collection (valid_from: 2002-03-15)
#   Unit: curatorial-dept-002 (valid_from: 2005-01-01)
# 
# ERROR: Collection founded before its managing unit
#   Collection: another-early-collection (valid_from: 2008-09-01)
#   Unit: research-dept-002 (valid_from: 2010-06-01)
# 
# ERROR: Staff employment started before unit existed
#   Staff: early-curator (valid_from: 2003-01-15)
#   Unit: curatorial-dept-002 (valid_from: 2005-01-01)
# 
# ERROR: Staff employment started before unit existed
#   Staff: early-researcher (valid_from: 2009-03-01)
#   Unit: research-dept-002 (valid_from: 2010-06-01)

Python API

Import and use validators in your Python code:

from linkml_validators import validate_all, ValidationError
import yaml

# Load YAML data
with open('data/instance.yaml', 'r') as f:
    data = yaml.safe_load(f)

# Run validation
errors = validate_all(data)

if errors:
    print(f"Validation failed with {len(errors)} errors:")
    for error in errors:
        print(f"  {error.severity}: {error.message}")
        print(f"    Rule: {error.rule}")
        print(f"    Context: {error.context}")
else:
    print("Validation successful!")

Integration with Data Pipelines

Pattern 1: Validate Before Conversion

from linkml_validators import validate_all
from linkml_runtime.dumpers import rdflib_dumper
import yaml

def convert_yaml_to_rdf(yaml_path, rdf_path):
    """Convert YAML to RDF with validation."""
    
    # Load YAML
    with open(yaml_path, 'r') as f:
        data = yaml.safe_load(f)
    
    # Validate FIRST (Layer 1)
    errors = validate_all(data)
    if errors:
        print(f"❌ Validation failed: {len(errors)} errors")
        for error in errors:
            print(f"  - {error.message}")
        return False
    
    # Convert to RDF (only if validation passed)
    print("✅ Validation passed, converting to RDF...")
    graph = rdflib_dumper.dump(data, target_class=HeritageCustodian)
    graph.serialize(rdf_path, format='turtle')
    print(f"✅ RDF written to {rdf_path}")
    return True

Integration with CI/CD

GitHub Actions Example:

# .github/workflows/validate-data.yml
name: Validate Heritage Custodian Data

on:
  push:
    paths:
      - 'data/instances/**/*.yaml'
  pull_request:
    paths:
      - 'data/instances/**/*.yaml'

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'
      
      - name: Install dependencies
        run: |
          pip install pyyaml linkml-runtime          
      
      - name: Validate YAML instances
        run: |
          # Validate all YAML files in data/instances/
          for file in data/instances/**/*.yaml; do
            echo "Validating $file..."
            python scripts/linkml_validators.py "$file"
            if [ $? -ne 0 ]; then
              echo "❌ Validation failed for $file"
              exit 1
            fi
          done
          echo "✅ All files validated successfully"          

Exit Codes:

  • 0: Validation successful
  • 1: Validation failed (errors found)
  • 2: Script error (file not found, invalid YAML syntax)

Validation Test Suite

The project includes 3 comprehensive test examples demonstrating validation behavior:

Test 1: Valid Complete Example

File: schemas/20251121/examples/validation_tests/valid_complete_example.yaml

Description: A fictional heritage museum with:

  • 3 organizational units (departments)
  • 2 collections (properly aligned temporally)
  • 3 staff members (properly aligned temporally)
  • All bidirectional relationships correct

Expected Result: PASS (no validation errors)

Key Features:

  • All valid_from dates are consistent (collections/staff after units)
  • All inverse relationships present (manages_collectionsmanaged_by_unit)
  • Demonstrates best practices for data modeling

Test 2: Invalid Temporal Violation

File: schemas/20251121/examples/validation_tests/invalid_temporal_violation.yaml

Description: A museum with temporal inconsistencies:

  • Collection founded in 2002, but managing unit not established until 2005
  • Collection founded in 2008, but managing unit not established until 2010
  • Staff employed in 2003, but employing unit not established until 2005
  • Staff employed in 2009, but employing unit not established until 2010

Expected Result: FAIL with 4 errors

Violations:

  1. Collection early-collection: valid_from: 2002-03-15 < Unit valid_from: 2005-01-01
  2. Collection another-early-collection: valid_from: 2008-09-01 < Unit valid_from: 2010-06-01
  3. Staff early-curator: valid_from: 2003-01-15 < Unit valid_from: 2005-01-01
  4. Staff early-researcher: valid_from: 2009-03-01 < Unit valid_from: 2010-06-01

Test 3: Invalid Bidirectional Violation

File: schemas/20251121/examples/validation_tests/invalid_bidirectional_violation.yaml

Description: A museum with missing inverse relationships:

  • Collection references managing unit, but unit doesn't reference collection back
  • Staff references employing unit, but unit doesn't reference staff back

Expected Result: FAIL with 2 errors

Violations:

  1. Collection paintings-collection-003 → Unit curatorial-dept-003 (forward exists), but Unit → Collection (inverse missing)
  2. Staff researcher-001-003 → Unit research-dept-003 (forward exists), but Unit → Staff (inverse missing)

Running Tests

# Test 1: Valid example (should pass)
python scripts/linkml_validators.py \
  schemas/20251121/examples/validation_tests/valid_complete_example.yaml
# ✅ Expected: "Validation successful! No errors found."

# Test 2: Temporal violations (should fail)
python scripts/linkml_validators.py \
  schemas/20251121/examples/validation_tests/invalid_temporal_violation.yaml
# ❌ Expected: "Validation failed with 4 errors"

# Test 3: Bidirectional violations (should fail)
python scripts/linkml_validators.py \
  schemas/20251121/examples/validation_tests/invalid_bidirectional_violation.yaml
# ❌ Expected: "Validation failed with 2 errors"

Integration Patterns

Pattern 1: Validate on Data Import

def import_heritage_custodian(yaml_path):
    """Import and validate a heritage custodian YAML file."""
    import yaml
    from linkml_validators import validate_all
    
    # Load YAML
    with open(yaml_path, 'r') as f:
        data = yaml.safe_load(f)
    
    # Validate FIRST
    errors = validate_all(data)
    if errors:
        raise ValueError(f"Validation failed: {errors}")
    
    # Process data (convert to RDF, store in database, etc.)
    process_data(data)

Pattern 2: Pre-commit Hook

File: .git/hooks/pre-commit

#!/bin/bash
# Validate all staged YAML files before commit

echo "Validating heritage custodian YAML files..."

# Find all staged YAML files in data/instances/
staged_files=$(git diff --cached --name-only --diff-filter=ACM | grep "data/instances/.*\.yaml$")

if [ -z "$staged_files" ]; then
  echo "No YAML files staged, skipping validation."
  exit 0
fi

# Validate each file
for file in $staged_files; do
  echo "  Validating $file..."
  python scripts/linkml_validators.py "$file"
  if [ $? -ne 0 ]; then
    echo "❌ Validation failed for $file"
    echo "Commit aborted. Fix validation errors and try again."
    exit 1
  fi
done

echo "✅ All YAML files validated successfully."
exit 0

Installation:

chmod +x .git/hooks/pre-commit

Pattern 3: Batch Validation

def validate_directory(directory_path):
    """Validate all YAML files in a directory."""
    import os
    import yaml
    from linkml_validators import validate_all
    
    results = {"passed": [], "failed": []}
    
    for root, dirs, files in os.walk(directory_path):
        for file in files:
            if file.endswith('.yaml'):
                yaml_path = os.path.join(root, file)
                
                with open(yaml_path, 'r') as f:
                    data = yaml.safe_load(f)
                
                errors = validate_all(data)
                if errors:
                    results["failed"].append({
                        "file": yaml_path,
                        "errors": errors
                    })
                else:
                    results["passed"].append(yaml_path)
    
    # Report results
    print(f"✅ Passed: {len(results['passed'])} files")
    print(f"❌ Failed: {len(results['failed'])} files")
    
    for failure in results["failed"]:
        print(f"\n{failure['file']}:")
        for error in failure["errors"]:
            print(f"  - {error.message}")
    
    return results

Comparison with Other Approaches

LinkML vs. Python Validator (Phase 5)

Feature LinkML Validators Phase 5 Python Validator
Input YAML instances RDF triples (after conversion)
Speed Fast (ms) 🐢 Moderate (sec)
Error Location YAML field names RDF triple patterns
Use Case Development, CI/CD Post-conversion validation
Integration Data pipeline ingestion RDF quality assurance

Recommendation: Use both for defense-in-depth validation.


LinkML vs. SHACL (Phase 7)

Feature LinkML Validators SHACL Shapes
Input YAML instances RDF graphs
Validation Time Before RDF conversion During RDF ingestion
Error Messages Python-friendly RDF-centric
Extensibility Python code SPARQL-based constraints
Standards LinkML metamodel W3C SHACL standard
Use Case Development Triple store ingestion

Recommendation:

  • Use LinkML for early validation (development phase)
  • Use SHACL for production validation (RDF ingestion)

LinkML vs. SPARQL Queries (Phase 6)

Feature LinkML Validators SPARQL Queries
Input YAML instances RDF triple store
Timing Before RDF conversion After data is stored
Purpose Prevention Detection
Speed Fast 🐢 Slow (depends on data size)
Use Case Data quality gates Monitoring, auditing

Recommendation:

  • Use LinkML to prevent invalid data from entering system
  • Use SPARQL to detect existing violations in production data

Troubleshooting

Issue 1: "Missing required field" Error

Symptom:

ValueError: Missing required field: name

Cause: YAML instance is missing a required field defined in the schema.

Solution:

# ❌ Missing required field
id: https://example.org/custodian/001
description: Some museum

# ✅ Add required field
id: https://example.org/custodian/001
name: Example Museum  # ← Add this
description: Some museum

Issue 2: "Expected date, got string" Error

Symptom:

ValueError: Expected date, got string '2000/01/01'

Cause: Date format doesn't match ISO 8601 pattern (YYYY-MM-DD).

Solution:

# ❌ Wrong date format
valid_from: "2000/01/01"  # Slashes instead of hyphens

# ✅ Correct date format
valid_from: "2000-01-01"  # ISO 8601: YYYY-MM-DD

Issue 3: Validation Passes but SHACL Fails

Symptom: LinkML validation passes, but SHACL validation fails with the same data.

Cause: LinkML validators check YAML structure, SHACL validates RDF graph patterns. Some constraints (e.g., inverse relationships) may be implicit in YAML but explicit in RDF.

Solution: Ensure YAML data includes all required inverse relationships:

# ✅ Explicit bidirectional relationships in YAML
organizational_structure:
  - id: unit-001
    manages_collections:  # ← Inverse relationship
      - collection-001

collections_aspect:
  - id: collection-001
    managed_by_unit:  # ← Forward relationship
      - unit-001

Issue 4: "List index out of range" or "KeyError"

Symptom: Python exception during validation.

Cause: YAML structure doesn't match expected schema (e.g., missing nested fields).

Solution: Use defensive programming in custom validators:

# ❌ Unsafe access
unit_valid_from = data['organizational_structure'][0]['valid_from']

# ✅ Safe access with defaults
units = data.get('organizational_structure', [])
unit_valid_from = units[0].get('valid_from') if units else None

Issue 5: Slow Validation Performance

Symptom: Validation takes a long time for large datasets.

Cause: Custom validators may have O(n²) complexity when checking relationships.

Solution: Use indexed lookups:

# ❌ Slow: O(n²) nested loops
for collection in collections:
    for unit in units:
        if unit['id'] in collection['managed_by_unit']:
            # Check relationship

# ✅ Fast: O(n) with dict lookup
unit_dates = {unit['id']: unit['valid_from'] for unit in units}
for collection in collections:
    for unit_id in collection['managed_by_unit']:
        unit_date = unit_dates.get(unit_id)  # O(1) lookup

Summary

LinkML Constraints Capabilities:

Built-in Constraints (declarative):

  • Required fields (required: true)
  • Data types (range: date, range: float)
  • Regex patterns (pattern: "^\\d{4}-\\d{2}-\\d{2}$")
  • Cardinality (multivalued: true/false)
  • Min/max values (minimum_value, maximum_value)

Custom Validators (programmatic):

  • Temporal consistency (collections/staff before units)
  • Bidirectional relationships (forward ↔ inverse)
  • Complex business rules (Python functions)

Integration:

  • Command-line interface (linkml_validators.py)
  • Python API (import linkml_validators)
  • CI/CD workflows (GitHub Actions, pre-commit hooks)
  • Data pipelines (validate before RDF conversion)

Test Suite:

  • Valid example (passes all rules)
  • Temporal violations (fails Rules 1 & 4)
  • Bidirectional violations (fails Rules 2 & 5)

Next Steps:

  1. Phase 8 Complete: LinkML constraints documented
  2. Phase 9: Apply validators to real-world heritage institution data
  3. Performance Testing: Benchmark validation speed on large datasets (10K+ institutions)
  4. Additional Rules: Extend validators for custody transfer events, legal form constraints

References

  • Phase 5: docs/VALIDATION_RULES.md (Python validator)
  • Phase 6: docs/SPARQL_QUERIES_ORGANIZATIONAL.md (SPARQL queries)
  • Phase 7: docs/SHACL_VALIDATION_SHAPES.md (SHACL shapes)
  • Phase 8: This document (LinkML constraints)
  • Schema: schemas/20251121/linkml/01_custodian_name_modular.yaml
  • Validators: scripts/linkml_validators.py
  • Test Suite: schemas/20251121/examples/validation_tests/
  • LinkML Documentation: https://linkml.io/

Version: 1.0
Phase: 8 (Complete)
Date: 2025-11-22