glam/docs/VALIDATION_RULES.md
kempersc 2761857b0d Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams
- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.
2025-11-22 23:01:13 +01:00

20 KiB

Validation Rules for Heritage Custodian Ontology (v0.7.0)

Date: 2025-11-22
Schema Version: v0.7.0 (Phase 5: Validation Framework)
Validator: scripts/validate_temporal_consistency.py
Test Suite: tests/test_temporal_validation.py


Overview

This document defines the validation rules for temporal consistency and bidirectional relationships in the Heritage Custodian Ontology. These rules ensure data quality across organizational structures, collections, and staff relationships.

Validation Categories:

  1. Collection-Unit Temporal Consistency (Phase 4: Collection-Department Integration)
  2. Collection-Unit Bidirectional Relationships (Phase 4)
  3. Custody Transfer Continuity (Phase 4: Organizational changes)
  4. Staff-Unit Temporal Consistency (Phase 3: Staff Role Tracking)
  5. Staff-Unit Bidirectional Relationships (Phase 3)

Rule 1: Collection-Unit Temporal Consistency

Description

Collection custody dates must fit within the managing organizational unit's validity period. A collection cannot be managed by a unit that doesn't exist.

Rule ID

COLLECTION_UNIT_TEMPORAL

Constraints

Constraint 1.1: Collection custody start date must be on or after unit founding date

CustodianCollection.valid_from >= OrganizationalStructure.valid_from

Constraint 1.2: Collection custody end date must be on or before unit dissolution date (if unit dissolved)

IF OrganizationalStructure.valid_to IS NOT NULL
THEN CustodianCollection.valid_to <= OrganizationalStructure.valid_to

Warning Condition: Collection custody ongoing but unit dissolved

IF OrganizationalStructure.valid_to IS NOT NULL
AND CustodianCollection.valid_to IS NULL
THEN WARN (missing custody transfer)

Examples

Valid Example 1: Collection within unit lifetime

---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
unit_type: DEPARTMENT
valid_from: "1985-01-01"  # Unit founded
valid_to: null  # Still active
managed_collections:
  - "https://example.org/collection/dutch-paintings"

---
id: "https://example.org/collection/dutch-paintings"
collection_name: "Dutch Paintings Collection"
managing_unit: "https://example.org/unit/dept-1"
valid_from: "1995-01-01"  # ✅ After unit founding (1985)
valid_to: null  # Ongoing

Result: PASS - Collection starts after unit founded


Invalid Example 1: Collection before unit exists

---
id: "https://example.org/unit/dept-1"
unit_name: "Special Collections Division"
unit_type: DIVISION
valid_from: "1982-01-01"  # Unit founded in 1982
managed_collections:
  - "https://example.org/collection/medieval-manuscripts"

---
id: "https://example.org/collection/medieval-manuscripts"
collection_name: "Medieval Manuscripts"
managing_unit: "https://example.org/unit/dept-1"
valid_from: "1798-01-01"  # ❌ Before unit exists (1982)!

Error:

[ERROR] COLLECTION_UNIT_TEMPORAL: Collection custody starts (1798-01-01) 
before managing unit exists (1982-01-01). 
Managing unit: Special Collections Division

Fix: Adjust collection valid_from to match or postdate unit founding:

valid_from: "1982-01-01"  # ✅ Custody starts when unit founded

Invalid Example 2: Collection extends beyond unit dissolution

---
id: "https://example.org/unit/dept-old"
unit_name: "Old Department"
unit_type: DEPARTMENT
valid_from: "1950-01-01"
valid_to: "2013-02-28"  # Unit dissolved in 2013
managed_collections:
  - "https://example.org/collection/coll-1"

---
id: "https://example.org/collection/coll-1"
collection_name: "Test Collection"
managing_unit: "https://example.org/unit/dept-old"
valid_from: "1960-01-01"
valid_to: "2020-12-31"  # ❌ Extends beyond unit dissolution (2013-02-28)!

Error:

[ERROR] COLLECTION_UNIT_TEMPORAL: Collection custody extends (2020-12-31) 
beyond managing unit validity (2013-02-28). 
Managing unit: Old Department

Fix: End collection custody when unit dissolves, create new version for new unit:

---
id: "https://example.org/collection/coll-1-v1"
collection_name: "Test Collection"
managing_unit: "https://example.org/unit/dept-old"
valid_from: "1960-01-01"
valid_to: "2013-02-28"  # ✅ Custody ends with unit dissolution

---
id: "https://example.org/collection/coll-1-v2"
collection_name: "Test Collection"
managing_unit: "https://example.org/unit/dept-new"
valid_from: "2013-03-01"  # ✅ Custody transferred to new unit
valid_to: "2020-12-31"

Warning Example: Collection ongoing after unit dissolved

---
id: "https://example.org/unit/dept-dissolved"
unit_name: "Dissolved Department"
unit_type: DEPARTMENT
valid_from: "1950-01-01"
valid_to: "2013-02-28"  # Unit dissolved
managed_collections:
  - "https://example.org/collection/coll-1"

---
id: "https://example.org/collection/coll-1"
collection_name: "Test Collection"
managing_unit: "https://example.org/unit/dept-dissolved"
valid_from: "1960-01-01"
valid_to: null  # ⚠️ Custody ongoing but unit dissolved!

Warning:

[WARNING] COLLECTION_UNIT_TEMPORAL: Collection custody ongoing but 
managing unit dissolved (2013-02-28). Missing custody transfer? 
Managing unit: Dissolved Department

Recommended Fix: Transfer custody to successor unit:

---
id: "https://example.org/collection/coll-1-v1"
collection_name: "Test Collection"
managing_unit: "https://example.org/unit/dept-dissolved"
valid_from: "1960-01-01"
valid_to: "2013-02-28"  # End custody with unit dissolution
provenance_note: "Custody transferred to New Department during 2013 reorganization"

---
id: "https://example.org/collection/coll-1-v2"
collection_name: "Test Collection"
managing_unit: "https://example.org/unit/dept-new"
valid_from: "2013-03-01"  # Transfer custody to new unit
valid_to: null
provenance_note: "Custody assumed from Dissolved Department"

Rule 2: Collection-Unit Bidirectional Consistency

Description

Bidirectional collection-unit relationships must be consistent. If a collection references a unit, the unit must list the collection, and vice versa.

Rule ID

COLLECTION_UNIT_BIDIRECTIONAL

Constraints

Constraint 2.1: Forward consistency (collection → unit)

IF CustodianCollection.managing_unit = unit_id
THEN OrganizationalStructure[unit_id].managed_collections MUST INCLUDE collection_id

Constraint 2.2: Reverse consistency (unit → collection)

IF OrganizationalStructure.managed_collections INCLUDES collection_id
THEN CustodianCollection[collection_id].managing_unit MUST EQUAL unit_id

Examples

Valid Example: Bidirectional relationship

---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
managed_collections:
  - "https://example.org/collection/dutch-paintings"  # ✅ Lists collection

---
id: "https://example.org/collection/dutch-paintings"
collection_name: "Dutch Paintings Collection"
managing_unit: "https://example.org/unit/dept-1"  # ✅ References unit

Result: PASS - Bidirectional relationship consistent


Invalid Example 1: Collection missing from unit.managed_collections

---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
managed_collections: []  # ❌ Empty list, doesn't include collection

---
id: "https://example.org/collection/dutch-paintings"
collection_name: "Dutch Paintings Collection"
managing_unit: "https://example.org/unit/dept-1"  # Collection references unit

Error:

[ERROR] COLLECTION_UNIT_BIDIRECTIONAL: Collection references unit 
'Paintings Department' as managing_unit, but unit does not list collection 
in managed_collections. Add collection to unit.managed_collections.

Fix: Add collection to unit's managed_collections:

---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
managed_collections:
  - "https://example.org/collection/dutch-paintings"  # ✅ Added

Invalid Example 2: Unit references non-existent collection

---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
managed_collections:
  - "https://example.org/collection/nonexistent"  # ❌ Collection doesn't exist

Error:

[ERROR] COLLECTION_UNIT_BIDIRECTIONAL: Unit references non-existent collection: 
https://example.org/collection/nonexistent. 
Remove from unit.managed_collections or create collection.

Fix: Either create the collection or remove the reference:

# Option 1: Create collection
---
id: "https://example.org/collection/nonexistent"
collection_name: "New Collection"
managing_unit: "https://example.org/unit/dept-1"

# Option 2: Remove reference
---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
managed_collections: []  # Removed non-existent reference

Rule 3: Custody Transfer Continuity

Description

Collection custody transfers must be continuous—no gaps or overlaps between versions. Collections don't disappear; custody must transfer during organizational changes.

Rule ID

CUSTODY_CONTINUITY

Constraints

Constraint 3.1: Continuous custody (no gaps > 1 day)

IF CustodianCollection version 1 ends (valid_to = T1)
AND CustodianCollection version 2 exists with same collection_name
THEN version 2 must start at T1 OR T1+1 day

Gap = version2.valid_from - version1.valid_to
IF Gap > 1 day THEN WARN

Constraint 3.2: No overlapping custody

IF CustodianCollection version 1 ends (valid_to = T1)
AND CustodianCollection version 2 starts (valid_from = T2)
AND T2 < T1
THEN ERROR (overlapping custody)

Examples

Valid Example: Continuous custody transfer

# Version 1: Before merger
---
id: "https://example.org/collection/paintings-v1"
collection_name: "Paintings Collection"
managing_unit: "https://example.org/unit/old-dept"
valid_from: "1995-01-01"
valid_to: "2013-02-28"  # Custody ends

# Version 2: After merger (next day)
---
id: "https://example.org/collection/paintings-v2"
collection_name: "Paintings Collection"
managing_unit: "https://example.org/unit/new-dept"
valid_from: "2013-03-01"  # ✅ Custody starts next day (1 day gap OK)
valid_to: null

Result: PASS - Continuous custody (1-day gap acceptable)


Warning Example: Custody gap

# Version 1: Ends 2013-02-28
---
id: "https://example.org/collection/paintings-v1"
collection_name: "Paintings Collection"
managing_unit: "https://example.org/unit/old-dept"
valid_from: "1995-01-01"
valid_to: "2013-02-28"

# Version 2: Starts 2013-05-01 (60-day gap!)
---
id: "https://example.org/collection/paintings-v2"
collection_name: "Paintings Collection"
managing_unit: "https://example.org/unit/new-dept"
valid_from: "2013-05-01"  # ⚠️ 60-day gap!
valid_to: null

Warning:

[WARNING] CUSTODY_CONTINUITY: Collection 'Paintings Collection' has custody gap: 
version ending 2013-02-28, next version starting 2013-05-01 (gap: 60 days). 
Expected continuous custody transfer.

Fix: Adjust dates to eliminate gap:

valid_from: "2013-03-01"  # ✅ Next day after previous version

Error Example: Overlapping custody

# Version 1: Ends 2013-12-31
---
id: "https://example.org/collection/paintings-v1"
collection_name: "Paintings Collection"
managing_unit: "https://example.org/unit/old-dept"
valid_from: "1995-01-01"
valid_to: "2013-12-31"

# Version 2: Starts 2013-06-01 (overlaps by 6 months!)
---
id: "https://example.org/collection/paintings-v2"
collection_name: "Paintings Collection"
managing_unit: "https://example.org/unit/new-dept"
valid_from: "2013-06-01"  # ❌ Overlapping custody!
valid_to: null

Error:

[ERROR] CUSTODY_CONTINUITY: Collection 'Paintings Collection' has overlapping 
custody periods: version ending 2013-12-31 overlaps with version starting 
2013-06-01 (overlap: 214 days).

Fix: Align dates so custody ends before new version starts:

# Version 1: End on merger date
valid_to: "2013-05-31"

# Version 2: Start day after merger
valid_from: "2013-06-01"  # ✅ Continuous, no overlap

Rule 4: Staff-Unit Temporal Consistency

Description

Staff role dates must fit within the organizational unit's validity period. A person cannot work for a unit that doesn't exist.

Rule ID

STAFF_UNIT_TEMPORAL

Constraints

Constraint 4.1: Role start date must be on or after unit founding date

PersonObservation.role_start_date >= OrganizationalStructure.valid_from

Constraint 4.2: Role end date must be on or before unit dissolution date (if unit dissolved)

IF OrganizationalStructure.valid_to IS NOT NULL
THEN PersonObservation.role_end_date <= OrganizationalStructure.valid_to

Warning Condition: Role ongoing but unit dissolved

IF OrganizationalStructure.valid_to IS NOT NULL
AND PersonObservation.role_end_date IS NULL
THEN WARN (missing staff reassignment)

Examples

Valid Example: Staff role within unit lifetime

---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
unit_type: DEPARTMENT
valid_from: "1985-01-01"
valid_to: null
staff_members:
  - "https://example.org/person/curator-001"

---
id: "https://example.org/person/curator-001"
person_name: "Dr. Jan Vermeer"
staff_role: CURATOR
unit_affiliation: "https://example.org/unit/dept-1"
role_start_date: "2010-01-01"  # ✅ After unit founding (1985)
role_end_date: null

Result: PASS - Role starts after unit founded


Invalid Example: Role before unit exists

---
id: "https://example.org/unit/dept-1"
unit_name: "Special Collections"
unit_type: DIVISION
valid_from: "1982-01-01"
staff_members:
  - "https://example.org/person/curator-001"

---
id: "https://example.org/person/curator-001"
person_name: "Dr. Smith"
staff_role: CURATOR
unit_affiliation: "https://example.org/unit/dept-1"
role_start_date: "1975-01-01"  # ❌ Before unit exists (1982)!
role_end_date: null

Error:

[ERROR] STAFF_UNIT_TEMPORAL: Staff role starts (1975-01-01) before unit exists (1982-01-01). 
Unit: Special Collections, Person: Dr. Smith

Fix: Adjust role start date or unit founding date.


Rule 5: Staff-Unit Bidirectional Consistency

Description

Bidirectional staff-unit relationships must be consistent. If a person references a unit, the unit must list the person, and vice versa.

Rule ID

STAFF_UNIT_BIDIRECTIONAL

Constraints

Constraint 5.1: Forward consistency (person → unit)

IF PersonObservation.unit_affiliation = unit_id
THEN OrganizationalStructure[unit_id].staff_members MUST INCLUDE person_id

Constraint 5.2: Reverse consistency (unit → person)

IF OrganizationalStructure.staff_members INCLUDES person_id
THEN PersonObservation[person_id].unit_affiliation MUST EQUAL unit_id

Examples

Valid Example: Bidirectional staff-unit relationship

---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
staff_members:
  - "https://example.org/person/curator-001"  # ✅ Lists person

---
id: "https://example.org/person/curator-001"
person_name: "Dr. Jan Vermeer"
staff_role: CURATOR
unit_affiliation: "https://example.org/unit/dept-1"  # ✅ References unit

Result: PASS - Bidirectional relationship consistent


Invalid Example: Person missing from unit.staff_members

---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
staff_members: []  # ❌ Empty, doesn't include person

---
id: "https://example.org/person/curator-001"
person_name: "Dr. Jan Vermeer"
staff_role: CURATOR
unit_affiliation: "https://example.org/unit/dept-1"  # Person references unit

Error:

[ERROR] STAFF_UNIT_BIDIRECTIONAL: Person references unit 'Paintings Department' 
as unit_affiliation, but unit does not list person in staff_members. 
Add person to unit.staff_members. Person: Dr. Jan Vermeer

Fix: Add person to unit's staff_members:

---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
staff_members:
  - "https://example.org/person/curator-001"  # ✅ Added

Validation Workflow

Using the Validator

Command:

python scripts/validate_temporal_consistency.py <yaml_file>

Example:

python scripts/validate_temporal_consistency.py \
  schemas/20251121/examples/collection_department_integration_examples.yaml

Output:

================================================================================
HERITAGE CUSTODIAN ONTOLOGY - TEMPORAL CONSISTENCY VALIDATOR
Schema Version: v0.7.0 (Phase 5)
================================================================================

🔍 Validating collection_department_integration_examples.yaml...
  - Organizational units: 5
  - Collections: 10
  - Person observations: 0
  - Change events: 0

================================================================================
VALIDATION SUMMARY
================================================================================
Entities validated: 15
Rules checked: 5
Errors: 0
Warnings: 0
Status: ✅ PASS
================================================================================

✅ All validation rules passed!

Exit Codes

  • 0: All validation rules passed (may have warnings)
  • 1: Validation failed (errors present)

Interpreting Results

Errors (🔴):

  • Severity: High (must fix)
  • Impact: Data integrity violation
  • Action: Fix immediately before using data

Warnings (🟡):

  • Severity: Medium (should fix)
  • Impact: Potential data quality issue
  • Action: Review and fix if appropriate

SHACL Shapes (RDF Validation)

Future work will include SHACL shapes for RDF triple store validation. Preview:

# Collection-Unit Temporal Constraint (SHACL)
:CollectionUnitTemporalConstraint
    a sh:NodeShape ;
    sh:targetClass custodian:CustodianCollection ;
    sh:sparql [
        sh:message "Collection custody starts before managing unit exists" ;
        sh:select """
            PREFIX custodian: <https://w3id.org/heritage/custodian/>
            PREFIX schema: <http://schema.org/>
            
            SELECT $this
            WHERE {
                $this custodian:managing_unit ?unit ;
                      schema:startDate ?coll_start .
                ?unit schema:startDate ?unit_start .
                FILTER (?coll_start < ?unit_start)
            }
        """ ;
    ] .

Integration with LinkML Schema

Validation rules are implemented in Python (runtime) but future versions will include LinkML schema constraints:

slots:
  managing_unit:
    range: OrganizationalStructure
    # Future: Add LinkML validation expression
    # validation:
    #   rule: "valid_from >= managing_unit.valid_from"

Testing

Test Suite: tests/test_temporal_validation.py

Coverage:

  • 19 test cases
  • 100% rule coverage (all 5 rules tested)
  • Valid cases (should pass)
  • Invalid cases (should fail with specific errors)
  • Warning cases (should generate warnings)
  • Integration tests (multiple rules together)

Run Tests:

python -m pytest tests/test_temporal_validation.py -v

Expected Output:

tests/test_temporal_validation.py::TestDateUtilities::test_parse_date_iso_string PASSED
tests/test_temporal_validation.py::TestDateUtilities::test_parse_date_iso_with_time PASSED
... (17 more tests)

============================== 19 passed in 0.20s ==============================

References

Schema Files

  • Main schema: schemas/20251121/linkml/01_custodian_name_modular.yaml (v0.7.0)
  • CustodianCollection: schemas/20251121/linkml/modules/classes/CustodianCollection.yaml
  • OrganizationalStructure: schemas/20251121/linkml/modules/classes/OrganizationalStructure.yaml
  • PersonObservation: schemas/20251121/linkml/modules/classes/PersonObservation.yaml

Implementation

  • Validator: scripts/validate_temporal_consistency.py (534 lines)
  • Test suite: tests/test_temporal_validation.py (19 tests)
  • Examples: schemas/20251121/examples/collection_department_integration_examples.yaml

Documentation

  • Phase 4 Completion: COLLECTION_DEPARTMENT_INTEGRATION_COMPLETE_20251122.md
  • Phase 3 Completion: PICO_STAFF_ROLES_COMPLETE_20251122.md
  • Phase 5 Completion: (to be created)

Version: 1.0
Date: 2025-11-22
Schema Version: v0.7.0
Validator Version: 1.0
Status: Complete