# LinkML Constraints and Validation **Version**: 1.0 **Date**: 2025-11-22 **Status**: Phase 8 Complete This document describes the LinkML-level validation approach for the Heritage Custodian Ontology, including built-in constraints, custom validators, and integration patterns. --- ## Table of Contents 1. [Overview](#overview) 2. [Three-Layer Validation Strategy](#three-layer-validation-strategy) 3. [LinkML Built-in Constraints](#linkml-built-in-constraints) 4. [Custom Python Validators](#custom-python-validators) 5. [Usage Examples](#usage-examples) 6. [Validation Test Suite](#validation-test-suite) 7. [Integration Patterns](#integration-patterns) 8. [Comparison with Other Approaches](#comparison-with-other-approaches) 9. [Troubleshooting](#troubleshooting) --- ## Overview **Goal**: Validate heritage custodian data at the **YAML instance level** BEFORE converting to RDF. **Why Validate at LinkML Level?** - ✅ **Early Detection**: Catch errors before expensive RDF conversion - ✅ **Fast Feedback**: YAML validation is faster than RDF/SHACL validation - ✅ **Developer-Friendly**: Error messages reference YAML structure (not RDF triples) - ✅ **CI/CD Integration**: Validate data pipelines before publishing **What LinkML Validates**: 1. **Schema Compliance**: Data types, required fields, cardinality 2. **Format Constraints**: Date formats, regex patterns, enumerations 3. **Custom Business Rules**: Temporal consistency, bidirectional relationships (via Python validators) --- ## Three-Layer Validation Strategy The Heritage Custodian Ontology uses **complementary validation at three levels**: | Layer | Technology | When | Purpose | Speed | |-------|------------|------|---------|-------| | **Layer 1: LinkML** | Python validators | YAML loading | Validate BEFORE RDF conversion | ⚡ Fast (ms) | | **Layer 2: SHACL** | RDF shapes | RDF ingestion | Validate DURING triple store loading | 🐢 Moderate (sec) | | **Layer 3: SPARQL** | Query-based | Runtime | Validate AFTER data is stored | 🐢 Slow (sec-min) | **Recommended Workflow**: ``` 1. Create YAML instance ↓ 2. Validate with LinkML (Layer 1) ← THIS DOCUMENT ↓ 3. If valid → Convert to RDF ↓ 4. Validate with SHACL (Layer 2) ↓ 5. If valid → Load into triple store ↓ 6. Monitor with SPARQL (Layer 3) ``` **See Also**: - Layer 2: `docs/SHACL_VALIDATION_SHAPES.md` - Layer 3: `docs/SPARQL_QUERIES_ORGANIZATIONAL.md` --- ## LinkML Built-in Constraints LinkML provides **declarative constraints** that can be embedded directly in schema YAML files. ### 1. Required Fields **Schema Syntax**: ```yaml # schemas/20251121/linkml/modules/classes/HeritageCustodian.yaml slots: - name - custodian_aspect # ← Required slot_definitions: name: required: true # ← Must be present ``` **Validation**: ```python from linkml_runtime.loaders import yaml_loader # ❌ This will fail validation (missing required field) data = {"id": "test", "description": "No name provided"} try: instance = yaml_loader.load(data, target_class=HeritageCustodian) except ValueError as e: print(f"Error: {e}") # "Missing required field: name" ``` --- ### 2. Data Type Constraints **Schema Syntax**: ```yaml slots: valid_from: range: date # ← Must be a valid date latitude: range: float # ← Must be a float institution_type: range: InstitutionTypeEnum # ← Must be one of enum values ``` **Validation**: ```python # ❌ This will fail (invalid date format) data = { "valid_from": "not-a-date" # Should be "YYYY-MM-DD" } # Error: "Expected date, got string 'not-a-date'" # ❌ This will fail (invalid enum value) data = { "institution_type": "FAKE_TYPE" # Should be MUSEUM, LIBRARY, etc. } # Error: "Value 'FAKE_TYPE' not in InstitutionTypeEnum" ``` --- ### 3. Pattern Constraints (Regex) **Schema Syntax**: ```yaml # schemas/20251121/linkml/modules/slots/valid_from.yaml valid_from: description: Start date of temporal validity (ISO 8601 format) range: date pattern: "^\\d{4}-\\d{2}-\\d{2}$" # ← Regex pattern for YYYY-MM-DD examples: - value: "2000-01-01" - value: "1923-05-15" ``` **Validation**: ```python # ✅ Valid dates "2000-01-01" # Pass "1923-05-15" # Pass # ❌ Invalid dates "2000/01/01" # Fail (wrong separator) "Jan 1, 2000" # Fail (wrong format) "2000-1-1" # Fail (missing leading zeros) ``` --- ### 4. Cardinality Constraints **Schema Syntax**: ```yaml slots: locations: multivalued: true # ← Can have multiple values required: false # ← But list can be empty custodian_aspect: multivalued: false # ← Only one value allowed required: true # ← Must be present ``` **Validation**: ```python # ✅ Valid: Multiple locations data = { "locations": [ {"city": "Amsterdam", "country": "NL"}, {"city": "The Hague", "country": "NL"} ] } # ❌ Invalid: Multiple custodian_aspect (should be single) data = { "custodian_aspect": [ {"name": "Museum A"}, {"name": "Museum B"} ] } # Error: "custodian_aspect must be single-valued" ``` --- ### 5. Minimum/Maximum Value Constraints **Schema Syntax** (example for future use): ```yaml latitude: range: float minimum_value: -90.0 # ← Latitude bounds maximum_value: 90.0 longitude: range: float minimum_value: -180.0 maximum_value: 180.0 confidence_score: range: float minimum_value: 0.0 # ← Confidence between 0.0 and 1.0 maximum_value: 1.0 ``` --- ## Custom Python Validators For **complex business rules** that can't be expressed with built-in constraints, use **custom Python validators**. ### Location: `scripts/linkml_validators.py` This script provides 5 custom validation functions implementing organizational structure rules: --- ### Validator 1: Collection-Unit Temporal Consistency **Rule**: A collection's `valid_from` date must be >= its managing unit's `valid_from` date. **Rationale**: A collection cannot be managed by a unit that doesn't yet exist. **Function**: ```python def validate_collection_unit_temporal(data: Dict[str, Any]) -> List[ValidationError]: """ Validate that collections are not founded before their managing units. Rule 1: collection.valid_from >= unit.valid_from """ errors = [] # Extract organizational units units = data.get('organizational_structure', []) unit_dates = {unit['id']: unit.get('valid_from') for unit in units} # Extract collections collections = data.get('collections_aspect', []) for collection in collections: collection_valid_from = collection.get('valid_from') managing_units = collection.get('managed_by_unit', []) for unit_id in managing_units: unit_valid_from = unit_dates.get(unit_id) if collection_valid_from and unit_valid_from: if collection_valid_from < unit_valid_from: errors.append(ValidationError( rule="COLLECTION_UNIT_TEMPORAL", severity="ERROR", message=f"Collection founded before its managing unit", context={ "collection_id": collection.get('id'), "collection_valid_from": collection_valid_from, "unit_id": unit_id, "unit_valid_from": unit_valid_from } )) return errors ``` **Example Violation**: ```yaml # ❌ Collection founded in 2002, but unit not established until 2005 organizational_structure: - id: unit-001 valid_from: "2005-01-01" # Unit founded 2005 collections_aspect: - id: collection-001 valid_from: "2002-03-15" # ❌ Collection founded 2002 (before unit!) managed_by_unit: - unit-001 ``` **Expected Error**: ``` ERROR: Collection founded before its managing unit Collection: collection-001 (valid_from: 2002-03-15) Managing Unit: unit-001 (valid_from: 2005-01-01) Violation: 2002-03-15 < 2005-01-01 ``` --- ### Validator 2: Collection-Unit Bidirectional Consistency **Rule**: If a collection references a unit via `managed_by_unit`, the unit must reference the collection back via `manages_collections`. **Rationale**: Bidirectional relationships ensure graph consistency (required for W3C Org Ontology). **Function**: ```python def validate_collection_unit_bidirectional(data: Dict[str, Any]) -> List[ValidationError]: """ Validate bidirectional relationships between collections and units. Rule 2: If collection → unit, then unit → collection (inverse). """ errors = [] # Build inverse mapping: unit_id → collections managed by unit units = data.get('organizational_structure', []) unit_collections = {unit['id']: unit.get('manages_collections', []) for unit in units} # Check collections collections = data.get('collections_aspect', []) for collection in collections: collection_id = collection.get('id') managing_units = collection.get('managed_by_unit', []) for unit_id in managing_units: # Check if unit references collection back if collection_id not in unit_collections.get(unit_id, []): errors.append(ValidationError( rule="COLLECTION_UNIT_BIDIRECTIONAL", severity="ERROR", message=f"Collection references unit, but unit doesn't reference collection", context={ "collection_id": collection_id, "unit_id": unit_id, "unit_manages_collections": unit_collections.get(unit_id, []) } )) return errors ``` **Example Violation**: ```yaml # ❌ Collection → Unit exists, but Unit → Collection missing organizational_structure: - id: unit-001 # Missing: manages_collections: [collection-001] collections_aspect: - id: collection-001 managed_by_unit: - unit-001 # ✓ Forward reference exists ``` **Expected Error**: ``` ERROR: Collection references unit, but unit doesn't reference collection Collection: collection-001 Unit: unit-001 Unit's manages_collections: [] (empty - should include collection-001) ``` --- ### Validator 3: Staff-Unit Temporal Consistency **Rule**: A staff member's `valid_from` date must be >= their employing unit's `valid_from` date. **Rationale**: A person cannot be employed by a unit that doesn't yet exist. **Function**: ```python def validate_staff_unit_temporal(data: Dict[str, Any]) -> List[ValidationError]: """ Validate that staff employment dates are consistent with unit founding dates. Rule 4: staff.valid_from >= unit.valid_from """ errors = [] # Extract organizational units units = data.get('organizational_structure', []) unit_dates = {unit['id']: unit.get('valid_from') for unit in units} # Extract staff staff = data.get('staff_aspect', []) for person in staff: person_obs = person.get('person_observation', {}) person_valid_from = person_obs.get('valid_from') employing_units = person.get('employed_by_unit', []) for unit_id in employing_units: unit_valid_from = unit_dates.get(unit_id) if person_valid_from and unit_valid_from: if person_valid_from < unit_valid_from: errors.append(ValidationError( rule="STAFF_UNIT_TEMPORAL", severity="ERROR", message=f"Staff employment started before unit existed", context={ "staff_id": person.get('id'), "staff_valid_from": person_valid_from, "unit_id": unit_id, "unit_valid_from": unit_valid_from } )) return errors ``` --- ### Validator 4: Staff-Unit Bidirectional Consistency **Rule**: If staff references a unit via `employed_by_unit`, the unit must reference the staff back via `employs_staff`. **Function**: Similar structure to Validator 2 (see `scripts/linkml_validators.py` for implementation). --- ### Validator 5: Batch Validation **Function**: Run all validators at once and return combined results. ```python def validate_all(data: Dict[str, Any]) -> List[ValidationError]: """ Run all validation rules and return combined results. """ errors = [] errors.extend(validate_collection_unit_temporal(data)) errors.extend(validate_collection_unit_bidirectional(data)) errors.extend(validate_staff_unit_temporal(data)) errors.extend(validate_staff_unit_bidirectional(data)) return errors ``` --- ## Usage Examples ### Command-Line Interface The `linkml_validators.py` script provides a CLI for standalone validation: ```bash # Validate a single YAML file python scripts/linkml_validators.py \ schemas/20251121/examples/validation_tests/valid_complete_example.yaml # ✅ Output (valid file): # Validation successful! No errors found. # File: valid_complete_example.yaml # Validate an invalid file python scripts/linkml_validators.py \ schemas/20251121/examples/validation_tests/invalid_temporal_violation.yaml # ❌ Output (invalid file): # Validation failed with 4 errors: # # ERROR: Collection founded before its managing unit # Collection: early-collection (valid_from: 2002-03-15) # Unit: curatorial-dept-002 (valid_from: 2005-01-01) # # ERROR: Collection founded before its managing unit # Collection: another-early-collection (valid_from: 2008-09-01) # Unit: research-dept-002 (valid_from: 2010-06-01) # # ERROR: Staff employment started before unit existed # Staff: early-curator (valid_from: 2003-01-15) # Unit: curatorial-dept-002 (valid_from: 2005-01-01) # # ERROR: Staff employment started before unit existed # Staff: early-researcher (valid_from: 2009-03-01) # Unit: research-dept-002 (valid_from: 2010-06-01) ``` --- ### Python API Import and use validators in your Python code: ```python from linkml_validators import validate_all, ValidationError import yaml # Load YAML data with open('data/instance.yaml', 'r') as f: data = yaml.safe_load(f) # Run validation errors = validate_all(data) if errors: print(f"Validation failed with {len(errors)} errors:") for error in errors: print(f" {error.severity}: {error.message}") print(f" Rule: {error.rule}") print(f" Context: {error.context}") else: print("Validation successful!") ``` --- ### Integration with Data Pipelines **Pattern 1: Validate Before Conversion** ```python from linkml_validators import validate_all from linkml_runtime.dumpers import rdflib_dumper import yaml def convert_yaml_to_rdf(yaml_path, rdf_path): """Convert YAML to RDF with validation.""" # Load YAML with open(yaml_path, 'r') as f: data = yaml.safe_load(f) # Validate FIRST (Layer 1) errors = validate_all(data) if errors: print(f"❌ Validation failed: {len(errors)} errors") for error in errors: print(f" - {error.message}") return False # Convert to RDF (only if validation passed) print("✅ Validation passed, converting to RDF...") graph = rdflib_dumper.dump(data, target_class=HeritageCustodian) graph.serialize(rdf_path, format='turtle') print(f"✅ RDF written to {rdf_path}") return True ``` --- ### Integration with CI/CD **GitHub Actions Example**: ```yaml # .github/workflows/validate-data.yml name: Validate Heritage Custodian Data on: push: paths: - 'data/instances/**/*.yaml' pull_request: paths: - 'data/instances/**/*.yaml' jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Set up Python uses: actions/setup-python@v4 with: python-version: '3.10' - name: Install dependencies run: | pip install pyyaml linkml-runtime - name: Validate YAML instances run: | # Validate all YAML files in data/instances/ for file in data/instances/**/*.yaml; do echo "Validating $file..." python scripts/linkml_validators.py "$file" if [ $? -ne 0 ]; then echo "❌ Validation failed for $file" exit 1 fi done echo "✅ All files validated successfully" ``` **Exit Codes**: - `0`: Validation successful - `1`: Validation failed (errors found) - `2`: Script error (file not found, invalid YAML syntax) --- ## Validation Test Suite The project includes **3 comprehensive test examples** demonstrating validation behavior: ### Test 1: Valid Complete Example **File**: `schemas/20251121/examples/validation_tests/valid_complete_example.yaml` **Description**: A fictional heritage museum with: - 3 organizational units (departments) - 2 collections (properly aligned temporally) - 3 staff members (properly aligned temporally) - All bidirectional relationships correct **Expected Result**: ✅ **PASS** (no validation errors) **Key Features**: - All `valid_from` dates are consistent (collections/staff after units) - All inverse relationships present (`manages_collections` ↔ `managed_by_unit`) - Demonstrates best practices for data modeling --- ### Test 2: Invalid Temporal Violation **File**: `schemas/20251121/examples/validation_tests/invalid_temporal_violation.yaml` **Description**: A museum with **temporal inconsistencies**: - Collection founded in 2002, but managing unit not established until 2005 - Collection founded in 2008, but managing unit not established until 2010 - Staff employed in 2003, but employing unit not established until 2005 - Staff employed in 2009, but employing unit not established until 2010 **Expected Result**: ❌ **FAIL** with 4 errors **Violations**: 1. Collection `early-collection`: `valid_from: 2002-03-15` < Unit `valid_from: 2005-01-01` 2. Collection `another-early-collection`: `valid_from: 2008-09-01` < Unit `valid_from: 2010-06-01` 3. Staff `early-curator`: `valid_from: 2003-01-15` < Unit `valid_from: 2005-01-01` 4. Staff `early-researcher`: `valid_from: 2009-03-01` < Unit `valid_from: 2010-06-01` --- ### Test 3: Invalid Bidirectional Violation **File**: `schemas/20251121/examples/validation_tests/invalid_bidirectional_violation.yaml` **Description**: A museum with **missing inverse relationships**: - Collection references managing unit, but unit doesn't reference collection back - Staff references employing unit, but unit doesn't reference staff back **Expected Result**: ❌ **FAIL** with 2 errors **Violations**: 1. Collection `paintings-collection-003` → Unit `curatorial-dept-003` (forward exists), but Unit → Collection (inverse missing) 2. Staff `researcher-001-003` → Unit `research-dept-003` (forward exists), but Unit → Staff (inverse missing) --- ### Running Tests ```bash # Test 1: Valid example (should pass) python scripts/linkml_validators.py \ schemas/20251121/examples/validation_tests/valid_complete_example.yaml # ✅ Expected: "Validation successful! No errors found." # Test 2: Temporal violations (should fail) python scripts/linkml_validators.py \ schemas/20251121/examples/validation_tests/invalid_temporal_violation.yaml # ❌ Expected: "Validation failed with 4 errors" # Test 3: Bidirectional violations (should fail) python scripts/linkml_validators.py \ schemas/20251121/examples/validation_tests/invalid_bidirectional_violation.yaml # ❌ Expected: "Validation failed with 2 errors" ``` --- ## Integration Patterns ### Pattern 1: Validate on Data Import ```python def import_heritage_custodian(yaml_path): """Import and validate a heritage custodian YAML file.""" import yaml from linkml_validators import validate_all # Load YAML with open(yaml_path, 'r') as f: data = yaml.safe_load(f) # Validate FIRST errors = validate_all(data) if errors: raise ValueError(f"Validation failed: {errors}") # Process data (convert to RDF, store in database, etc.) process_data(data) ``` --- ### Pattern 2: Pre-commit Hook **File**: `.git/hooks/pre-commit` ```bash #!/bin/bash # Validate all staged YAML files before commit echo "Validating heritage custodian YAML files..." # Find all staged YAML files in data/instances/ staged_files=$(git diff --cached --name-only --diff-filter=ACM | grep "data/instances/.*\.yaml$") if [ -z "$staged_files" ]; then echo "No YAML files staged, skipping validation." exit 0 fi # Validate each file for file in $staged_files; do echo " Validating $file..." python scripts/linkml_validators.py "$file" if [ $? -ne 0 ]; then echo "❌ Validation failed for $file" echo "Commit aborted. Fix validation errors and try again." exit 1 fi done echo "✅ All YAML files validated successfully." exit 0 ``` **Installation**: ```bash chmod +x .git/hooks/pre-commit ``` --- ### Pattern 3: Batch Validation ```python def validate_directory(directory_path): """Validate all YAML files in a directory.""" import os import yaml from linkml_validators import validate_all results = {"passed": [], "failed": []} for root, dirs, files in os.walk(directory_path): for file in files: if file.endswith('.yaml'): yaml_path = os.path.join(root, file) with open(yaml_path, 'r') as f: data = yaml.safe_load(f) errors = validate_all(data) if errors: results["failed"].append({ "file": yaml_path, "errors": errors }) else: results["passed"].append(yaml_path) # Report results print(f"✅ Passed: {len(results['passed'])} files") print(f"❌ Failed: {len(results['failed'])} files") for failure in results["failed"]: print(f"\n{failure['file']}:") for error in failure["errors"]: print(f" - {error.message}") return results ``` --- ## Comparison with Other Approaches ### LinkML vs. Python Validator (Phase 5) | Feature | LinkML Validators | Phase 5 Python Validator | |---------|-------------------|--------------------------| | **Input** | YAML instances | RDF triples (after conversion) | | **Speed** | ⚡ Fast (ms) | 🐢 Moderate (sec) | | **Error Location** | YAML field names | RDF triple patterns | | **Use Case** | Development, CI/CD | Post-conversion validation | | **Integration** | Data pipeline ingestion | RDF quality assurance | **Recommendation**: Use **both** for defense-in-depth validation. --- ### LinkML vs. SHACL (Phase 7) | Feature | LinkML Validators | SHACL Shapes | |---------|-------------------|--------------| | **Input** | YAML instances | RDF graphs | | **Validation Time** | Before RDF conversion | During RDF ingestion | | **Error Messages** | Python-friendly | RDF-centric | | **Extensibility** | Python code | SPARQL-based constraints | | **Standards** | LinkML metamodel | W3C SHACL standard | | **Use Case** | Development | Triple store ingestion | **Recommendation**: - Use **LinkML** for early validation (development phase) - Use **SHACL** for production validation (RDF ingestion) --- ### LinkML vs. SPARQL Queries (Phase 6) | Feature | LinkML Validators | SPARQL Queries | |---------|-------------------|----------------| | **Input** | YAML instances | RDF triple store | | **Timing** | Before RDF conversion | After data is stored | | **Purpose** | **Prevention** | **Detection** | | **Speed** | ⚡ Fast | 🐢 Slow (depends on data size) | | **Use Case** | Data quality gates | Monitoring, auditing | **Recommendation**: - Use **LinkML** to **prevent** invalid data from entering system - Use **SPARQL** to **detect** existing violations in production data --- ## Troubleshooting ### Issue 1: "Missing required field" Error **Symptom**: ``` ValueError: Missing required field: name ``` **Cause**: YAML instance is missing a required field defined in the schema. **Solution**: ```yaml # ❌ Missing required field id: https://example.org/custodian/001 description: Some museum # ✅ Add required field id: https://example.org/custodian/001 name: Example Museum # ← Add this description: Some museum ``` --- ### Issue 2: "Expected date, got string" Error **Symptom**: ``` ValueError: Expected date, got string '2000/01/01' ``` **Cause**: Date format doesn't match ISO 8601 pattern (`YYYY-MM-DD`). **Solution**: ```yaml # ❌ Wrong date format valid_from: "2000/01/01" # Slashes instead of hyphens # ✅ Correct date format valid_from: "2000-01-01" # ISO 8601: YYYY-MM-DD ``` --- ### Issue 3: Validation Passes but SHACL Fails **Symptom**: LinkML validation passes, but SHACL validation fails with the same data. **Cause**: LinkML validators check **YAML structure**, SHACL validates **RDF graph patterns**. Some constraints (e.g., inverse relationships) may be implicit in YAML but explicit in RDF. **Solution**: Ensure YAML data includes **all required inverse relationships**: ```yaml # ✅ Explicit bidirectional relationships in YAML organizational_structure: - id: unit-001 manages_collections: # ← Inverse relationship - collection-001 collections_aspect: - id: collection-001 managed_by_unit: # ← Forward relationship - unit-001 ``` --- ### Issue 4: "List index out of range" or "KeyError" **Symptom**: Python exception during validation. **Cause**: YAML structure doesn't match expected schema (e.g., missing nested fields). **Solution**: Use defensive programming in custom validators: ```python # ❌ Unsafe access unit_valid_from = data['organizational_structure'][0]['valid_from'] # ✅ Safe access with defaults units = data.get('organizational_structure', []) unit_valid_from = units[0].get('valid_from') if units else None ``` --- ### Issue 5: Slow Validation Performance **Symptom**: Validation takes a long time for large datasets. **Cause**: Custom validators may have O(n²) complexity when checking relationships. **Solution**: Use indexed lookups: ```python # ❌ Slow: O(n²) nested loops for collection in collections: for unit in units: if unit['id'] in collection['managed_by_unit']: # Check relationship # ✅ Fast: O(n) with dict lookup unit_dates = {unit['id']: unit['valid_from'] for unit in units} for collection in collections: for unit_id in collection['managed_by_unit']: unit_date = unit_dates.get(unit_id) # O(1) lookup ``` --- ## Summary **LinkML Constraints Capabilities**: ✅ **Built-in Constraints** (declarative): - Required fields (`required: true`) - Data types (`range: date`, `range: float`) - Regex patterns (`pattern: "^\\d{4}-\\d{2}-\\d{2}$"`) - Cardinality (`multivalued: true/false`) - Min/max values (`minimum_value`, `maximum_value`) ✅ **Custom Validators** (programmatic): - Temporal consistency (collections/staff before units) - Bidirectional relationships (forward ↔ inverse) - Complex business rules (Python functions) ✅ **Integration**: - Command-line interface (`linkml_validators.py`) - Python API (`import linkml_validators`) - CI/CD workflows (GitHub Actions, pre-commit hooks) - Data pipelines (validate before RDF conversion) ✅ **Test Suite**: - Valid example (passes all rules) - Temporal violations (fails Rules 1 & 4) - Bidirectional violations (fails Rules 2 & 5) **Next Steps**: 1. ✅ **Phase 8 Complete**: LinkML constraints documented 2. ⏳ **Phase 9**: Apply validators to real-world heritage institution data 3. ⏳ **Performance Testing**: Benchmark validation speed on large datasets (10K+ institutions) 4. ⏳ **Additional Rules**: Extend validators for custody transfer events, legal form constraints --- ## References - **Phase 5**: `docs/VALIDATION_RULES.md` (Python validator) - **Phase 6**: `docs/SPARQL_QUERIES_ORGANIZATIONAL.md` (SPARQL queries) - **Phase 7**: `docs/SHACL_VALIDATION_SHAPES.md` (SHACL shapes) - **Phase 8**: This document (LinkML constraints) - **Schema**: `schemas/20251121/linkml/01_custodian_name_modular.yaml` - **Validators**: `scripts/linkml_validators.py` - **Test Suite**: `schemas/20251121/examples/validation_tests/` - **LinkML Documentation**: https://linkml.io/ --- **Version**: 1.0 **Phase**: 8 (Complete) **Date**: 2025-11-22