glam/docs/VALIDATION_RULES.md
kempersc 2761857b0d Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams
- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.
2025-11-22 23:01:13 +01:00

791 lines
20 KiB
Markdown

# Validation Rules for Heritage Custodian Ontology (v0.7.0)
**Date**: 2025-11-22
**Schema Version**: v0.7.0 (Phase 5: Validation Framework)
**Validator**: `scripts/validate_temporal_consistency.py`
**Test Suite**: `tests/test_temporal_validation.py`
---
## Overview
This document defines the validation rules for temporal consistency and bidirectional relationships in the Heritage Custodian Ontology. These rules ensure data quality across organizational structures, collections, and staff relationships.
**Validation Categories**:
1. **Collection-Unit Temporal Consistency** (Phase 4: Collection-Department Integration)
2. **Collection-Unit Bidirectional Relationships** (Phase 4)
3. **Custody Transfer Continuity** (Phase 4: Organizational changes)
4. **Staff-Unit Temporal Consistency** (Phase 3: Staff Role Tracking)
5. **Staff-Unit Bidirectional Relationships** (Phase 3)
---
## Rule 1: Collection-Unit Temporal Consistency
### Description
Collection custody dates must fit within the managing organizational unit's validity period. A collection cannot be managed by a unit that doesn't exist.
### Rule ID
`COLLECTION_UNIT_TEMPORAL`
### Constraints
**Constraint 1.1**: Collection custody start date must be **on or after** unit founding date
```
CustodianCollection.valid_from >= OrganizationalStructure.valid_from
```
**Constraint 1.2**: Collection custody end date must be **on or before** unit dissolution date (if unit dissolved)
```
IF OrganizationalStructure.valid_to IS NOT NULL
THEN CustodianCollection.valid_to <= OrganizationalStructure.valid_to
```
**Warning Condition**: Collection custody ongoing but unit dissolved
```
IF OrganizationalStructure.valid_to IS NOT NULL
AND CustodianCollection.valid_to IS NULL
THEN WARN (missing custody transfer)
```
---
### Examples
#### Valid Example 1: Collection within unit lifetime
```yaml
---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
unit_type: DEPARTMENT
valid_from: "1985-01-01" # Unit founded
valid_to: null # Still active
managed_collections:
- "https://example.org/collection/dutch-paintings"
---
id: "https://example.org/collection/dutch-paintings"
collection_name: "Dutch Paintings Collection"
managing_unit: "https://example.org/unit/dept-1"
valid_from: "1995-01-01" # ✅ After unit founding (1985)
valid_to: null # Ongoing
```
**Result**: ✅ **PASS** - Collection starts after unit founded
---
#### Invalid Example 1: Collection before unit exists
```yaml
---
id: "https://example.org/unit/dept-1"
unit_name: "Special Collections Division"
unit_type: DIVISION
valid_from: "1982-01-01" # Unit founded in 1982
managed_collections:
- "https://example.org/collection/medieval-manuscripts"
---
id: "https://example.org/collection/medieval-manuscripts"
collection_name: "Medieval Manuscripts"
managing_unit: "https://example.org/unit/dept-1"
valid_from: "1798-01-01" # ❌ Before unit exists (1982)!
```
**Error**:
```
[ERROR] COLLECTION_UNIT_TEMPORAL: Collection custody starts (1798-01-01)
before managing unit exists (1982-01-01).
Managing unit: Special Collections Division
```
**Fix**: Adjust collection `valid_from` to match or postdate unit founding:
```yaml
valid_from: "1982-01-01" # ✅ Custody starts when unit founded
```
---
#### Invalid Example 2: Collection extends beyond unit dissolution
```yaml
---
id: "https://example.org/unit/dept-old"
unit_name: "Old Department"
unit_type: DEPARTMENT
valid_from: "1950-01-01"
valid_to: "2013-02-28" # Unit dissolved in 2013
managed_collections:
- "https://example.org/collection/coll-1"
---
id: "https://example.org/collection/coll-1"
collection_name: "Test Collection"
managing_unit: "https://example.org/unit/dept-old"
valid_from: "1960-01-01"
valid_to: "2020-12-31" # ❌ Extends beyond unit dissolution (2013-02-28)!
```
**Error**:
```
[ERROR] COLLECTION_UNIT_TEMPORAL: Collection custody extends (2020-12-31)
beyond managing unit validity (2013-02-28).
Managing unit: Old Department
```
**Fix**: End collection custody when unit dissolves, create new version for new unit:
```yaml
---
id: "https://example.org/collection/coll-1-v1"
collection_name: "Test Collection"
managing_unit: "https://example.org/unit/dept-old"
valid_from: "1960-01-01"
valid_to: "2013-02-28" # ✅ Custody ends with unit dissolution
---
id: "https://example.org/collection/coll-1-v2"
collection_name: "Test Collection"
managing_unit: "https://example.org/unit/dept-new"
valid_from: "2013-03-01" # ✅ Custody transferred to new unit
valid_to: "2020-12-31"
```
---
#### Warning Example: Collection ongoing after unit dissolved
```yaml
---
id: "https://example.org/unit/dept-dissolved"
unit_name: "Dissolved Department"
unit_type: DEPARTMENT
valid_from: "1950-01-01"
valid_to: "2013-02-28" # Unit dissolved
managed_collections:
- "https://example.org/collection/coll-1"
---
id: "https://example.org/collection/coll-1"
collection_name: "Test Collection"
managing_unit: "https://example.org/unit/dept-dissolved"
valid_from: "1960-01-01"
valid_to: null # ⚠️ Custody ongoing but unit dissolved!
```
**Warning**:
```
[WARNING] COLLECTION_UNIT_TEMPORAL: Collection custody ongoing but
managing unit dissolved (2013-02-28). Missing custody transfer?
Managing unit: Dissolved Department
```
**Recommended Fix**: Transfer custody to successor unit:
```yaml
---
id: "https://example.org/collection/coll-1-v1"
collection_name: "Test Collection"
managing_unit: "https://example.org/unit/dept-dissolved"
valid_from: "1960-01-01"
valid_to: "2013-02-28" # End custody with unit dissolution
provenance_note: "Custody transferred to New Department during 2013 reorganization"
---
id: "https://example.org/collection/coll-1-v2"
collection_name: "Test Collection"
managing_unit: "https://example.org/unit/dept-new"
valid_from: "2013-03-01" # Transfer custody to new unit
valid_to: null
provenance_note: "Custody assumed from Dissolved Department"
```
---
## Rule 2: Collection-Unit Bidirectional Consistency
### Description
Bidirectional collection-unit relationships must be consistent. If a collection references a unit, the unit must list the collection, and vice versa.
### Rule ID
`COLLECTION_UNIT_BIDIRECTIONAL`
### Constraints
**Constraint 2.1**: Forward consistency (collection → unit)
```
IF CustodianCollection.managing_unit = unit_id
THEN OrganizationalStructure[unit_id].managed_collections MUST INCLUDE collection_id
```
**Constraint 2.2**: Reverse consistency (unit → collection)
```
IF OrganizationalStructure.managed_collections INCLUDES collection_id
THEN CustodianCollection[collection_id].managing_unit MUST EQUAL unit_id
```
---
### Examples
#### Valid Example: Bidirectional relationship
```yaml
---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
managed_collections:
- "https://example.org/collection/dutch-paintings" # ✅ Lists collection
---
id: "https://example.org/collection/dutch-paintings"
collection_name: "Dutch Paintings Collection"
managing_unit: "https://example.org/unit/dept-1" # ✅ References unit
```
**Result**: ✅ **PASS** - Bidirectional relationship consistent
---
#### Invalid Example 1: Collection missing from unit.managed_collections
```yaml
---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
managed_collections: [] # ❌ Empty list, doesn't include collection
---
id: "https://example.org/collection/dutch-paintings"
collection_name: "Dutch Paintings Collection"
managing_unit: "https://example.org/unit/dept-1" # Collection references unit
```
**Error**:
```
[ERROR] COLLECTION_UNIT_BIDIRECTIONAL: Collection references unit
'Paintings Department' as managing_unit, but unit does not list collection
in managed_collections. Add collection to unit.managed_collections.
```
**Fix**: Add collection to unit's `managed_collections`:
```yaml
---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
managed_collections:
- "https://example.org/collection/dutch-paintings" # ✅ Added
```
---
#### Invalid Example 2: Unit references non-existent collection
```yaml
---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
managed_collections:
- "https://example.org/collection/nonexistent" # ❌ Collection doesn't exist
```
**Error**:
```
[ERROR] COLLECTION_UNIT_BIDIRECTIONAL: Unit references non-existent collection:
https://example.org/collection/nonexistent.
Remove from unit.managed_collections or create collection.
```
**Fix**: Either create the collection or remove the reference:
```yaml
# Option 1: Create collection
---
id: "https://example.org/collection/nonexistent"
collection_name: "New Collection"
managing_unit: "https://example.org/unit/dept-1"
# Option 2: Remove reference
---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
managed_collections: [] # Removed non-existent reference
```
---
## Rule 3: Custody Transfer Continuity
### Description
Collection custody transfers must be continuous—no gaps or overlaps between versions. Collections don't disappear; custody must transfer during organizational changes.
### Rule ID
`CUSTODY_CONTINUITY`
### Constraints
**Constraint 3.1**: Continuous custody (no gaps > 1 day)
```
IF CustodianCollection version 1 ends (valid_to = T1)
AND CustodianCollection version 2 exists with same collection_name
THEN version 2 must start at T1 OR T1+1 day
Gap = version2.valid_from - version1.valid_to
IF Gap > 1 day THEN WARN
```
**Constraint 3.2**: No overlapping custody
```
IF CustodianCollection version 1 ends (valid_to = T1)
AND CustodianCollection version 2 starts (valid_from = T2)
AND T2 < T1
THEN ERROR (overlapping custody)
```
---
### Examples
#### Valid Example: Continuous custody transfer
```yaml
# Version 1: Before merger
---
id: "https://example.org/collection/paintings-v1"
collection_name: "Paintings Collection"
managing_unit: "https://example.org/unit/old-dept"
valid_from: "1995-01-01"
valid_to: "2013-02-28" # Custody ends
# Version 2: After merger (next day)
---
id: "https://example.org/collection/paintings-v2"
collection_name: "Paintings Collection"
managing_unit: "https://example.org/unit/new-dept"
valid_from: "2013-03-01" # ✅ Custody starts next day (1 day gap OK)
valid_to: null
```
**Result**: ✅ **PASS** - Continuous custody (1-day gap acceptable)
---
#### Warning Example: Custody gap
```yaml
# Version 1: Ends 2013-02-28
---
id: "https://example.org/collection/paintings-v1"
collection_name: "Paintings Collection"
managing_unit: "https://example.org/unit/old-dept"
valid_from: "1995-01-01"
valid_to: "2013-02-28"
# Version 2: Starts 2013-05-01 (60-day gap!)
---
id: "https://example.org/collection/paintings-v2"
collection_name: "Paintings Collection"
managing_unit: "https://example.org/unit/new-dept"
valid_from: "2013-05-01" # ⚠️ 60-day gap!
valid_to: null
```
**Warning**:
```
[WARNING] CUSTODY_CONTINUITY: Collection 'Paintings Collection' has custody gap:
version ending 2013-02-28, next version starting 2013-05-01 (gap: 60 days).
Expected continuous custody transfer.
```
**Fix**: Adjust dates to eliminate gap:
```yaml
valid_from: "2013-03-01" # ✅ Next day after previous version
```
---
#### Error Example: Overlapping custody
```yaml
# Version 1: Ends 2013-12-31
---
id: "https://example.org/collection/paintings-v1"
collection_name: "Paintings Collection"
managing_unit: "https://example.org/unit/old-dept"
valid_from: "1995-01-01"
valid_to: "2013-12-31"
# Version 2: Starts 2013-06-01 (overlaps by 6 months!)
---
id: "https://example.org/collection/paintings-v2"
collection_name: "Paintings Collection"
managing_unit: "https://example.org/unit/new-dept"
valid_from: "2013-06-01" # ❌ Overlapping custody!
valid_to: null
```
**Error**:
```
[ERROR] CUSTODY_CONTINUITY: Collection 'Paintings Collection' has overlapping
custody periods: version ending 2013-12-31 overlaps with version starting
2013-06-01 (overlap: 214 days).
```
**Fix**: Align dates so custody ends before new version starts:
```yaml
# Version 1: End on merger date
valid_to: "2013-05-31"
# Version 2: Start day after merger
valid_from: "2013-06-01" # ✅ Continuous, no overlap
```
---
## Rule 4: Staff-Unit Temporal Consistency
### Description
Staff role dates must fit within the organizational unit's validity period. A person cannot work for a unit that doesn't exist.
### Rule ID
`STAFF_UNIT_TEMPORAL`
### Constraints
**Constraint 4.1**: Role start date must be **on or after** unit founding date
```
PersonObservation.role_start_date >= OrganizationalStructure.valid_from
```
**Constraint 4.2**: Role end date must be **on or before** unit dissolution date (if unit dissolved)
```
IF OrganizationalStructure.valid_to IS NOT NULL
THEN PersonObservation.role_end_date <= OrganizationalStructure.valid_to
```
**Warning Condition**: Role ongoing but unit dissolved
```
IF OrganizationalStructure.valid_to IS NOT NULL
AND PersonObservation.role_end_date IS NULL
THEN WARN (missing staff reassignment)
```
---
### Examples
#### Valid Example: Staff role within unit lifetime
```yaml
---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
unit_type: DEPARTMENT
valid_from: "1985-01-01"
valid_to: null
staff_members:
- "https://example.org/person/curator-001"
---
id: "https://example.org/person/curator-001"
person_name: "Dr. Jan Vermeer"
staff_role: CURATOR
unit_affiliation: "https://example.org/unit/dept-1"
role_start_date: "2010-01-01" # ✅ After unit founding (1985)
role_end_date: null
```
**Result**: ✅ **PASS** - Role starts after unit founded
---
#### Invalid Example: Role before unit exists
```yaml
---
id: "https://example.org/unit/dept-1"
unit_name: "Special Collections"
unit_type: DIVISION
valid_from: "1982-01-01"
staff_members:
- "https://example.org/person/curator-001"
---
id: "https://example.org/person/curator-001"
person_name: "Dr. Smith"
staff_role: CURATOR
unit_affiliation: "https://example.org/unit/dept-1"
role_start_date: "1975-01-01" # ❌ Before unit exists (1982)!
role_end_date: null
```
**Error**:
```
[ERROR] STAFF_UNIT_TEMPORAL: Staff role starts (1975-01-01) before unit exists (1982-01-01).
Unit: Special Collections, Person: Dr. Smith
```
**Fix**: Adjust role start date or unit founding date.
---
## Rule 5: Staff-Unit Bidirectional Consistency
### Description
Bidirectional staff-unit relationships must be consistent. If a person references a unit, the unit must list the person, and vice versa.
### Rule ID
`STAFF_UNIT_BIDIRECTIONAL`
### Constraints
**Constraint 5.1**: Forward consistency (person → unit)
```
IF PersonObservation.unit_affiliation = unit_id
THEN OrganizationalStructure[unit_id].staff_members MUST INCLUDE person_id
```
**Constraint 5.2**: Reverse consistency (unit → person)
```
IF OrganizationalStructure.staff_members INCLUDES person_id
THEN PersonObservation[person_id].unit_affiliation MUST EQUAL unit_id
```
---
### Examples
#### Valid Example: Bidirectional staff-unit relationship
```yaml
---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
staff_members:
- "https://example.org/person/curator-001" # ✅ Lists person
---
id: "https://example.org/person/curator-001"
person_name: "Dr. Jan Vermeer"
staff_role: CURATOR
unit_affiliation: "https://example.org/unit/dept-1" # ✅ References unit
```
**Result**: ✅ **PASS** - Bidirectional relationship consistent
---
#### Invalid Example: Person missing from unit.staff_members
```yaml
---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
staff_members: [] # ❌ Empty, doesn't include person
---
id: "https://example.org/person/curator-001"
person_name: "Dr. Jan Vermeer"
staff_role: CURATOR
unit_affiliation: "https://example.org/unit/dept-1" # Person references unit
```
**Error**:
```
[ERROR] STAFF_UNIT_BIDIRECTIONAL: Person references unit 'Paintings Department'
as unit_affiliation, but unit does not list person in staff_members.
Add person to unit.staff_members. Person: Dr. Jan Vermeer
```
**Fix**: Add person to unit's `staff_members`:
```yaml
---
id: "https://example.org/unit/dept-1"
unit_name: "Paintings Department"
staff_members:
- "https://example.org/person/curator-001" # ✅ Added
```
---
## Validation Workflow
### Using the Validator
**Command**:
```bash
python scripts/validate_temporal_consistency.py <yaml_file>
```
**Example**:
```bash
python scripts/validate_temporal_consistency.py \
schemas/20251121/examples/collection_department_integration_examples.yaml
```
**Output**:
```
================================================================================
HERITAGE CUSTODIAN ONTOLOGY - TEMPORAL CONSISTENCY VALIDATOR
Schema Version: v0.7.0 (Phase 5)
================================================================================
🔍 Validating collection_department_integration_examples.yaml...
- Organizational units: 5
- Collections: 10
- Person observations: 0
- Change events: 0
================================================================================
VALIDATION SUMMARY
================================================================================
Entities validated: 15
Rules checked: 5
Errors: 0
Warnings: 0
Status: ✅ PASS
================================================================================
✅ All validation rules passed!
```
---
### Exit Codes
- **0**: All validation rules passed (may have warnings)
- **1**: Validation failed (errors present)
---
### Interpreting Results
**Errors (🔴)**:
- **Severity**: High (must fix)
- **Impact**: Data integrity violation
- **Action**: Fix immediately before using data
**Warnings (🟡)**:
- **Severity**: Medium (should fix)
- **Impact**: Potential data quality issue
- **Action**: Review and fix if appropriate
---
## SHACL Shapes (RDF Validation)
Future work will include SHACL shapes for RDF triple store validation. Preview:
```turtle
# Collection-Unit Temporal Constraint (SHACL)
:CollectionUnitTemporalConstraint
a sh:NodeShape ;
sh:targetClass custodian:CustodianCollection ;
sh:sparql [
sh:message "Collection custody starts before managing unit exists" ;
sh:select """
PREFIX custodian: <https://w3id.org/heritage/custodian/>
PREFIX schema: <http://schema.org/>
SELECT $this
WHERE {
$this custodian:managing_unit ?unit ;
schema:startDate ?coll_start .
?unit schema:startDate ?unit_start .
FILTER (?coll_start < ?unit_start)
}
""" ;
] .
```
---
## Integration with LinkML Schema
Validation rules are implemented in Python (runtime) but future versions will include LinkML schema constraints:
```yaml
slots:
managing_unit:
range: OrganizationalStructure
# Future: Add LinkML validation expression
# validation:
# rule: "valid_from >= managing_unit.valid_from"
```
---
## Testing
**Test Suite**: `tests/test_temporal_validation.py`
**Coverage**:
- 19 test cases
- 100% rule coverage (all 5 rules tested)
- Valid cases (should pass)
- Invalid cases (should fail with specific errors)
- Warning cases (should generate warnings)
- Integration tests (multiple rules together)
**Run Tests**:
```bash
python -m pytest tests/test_temporal_validation.py -v
```
**Expected Output**:
```
tests/test_temporal_validation.py::TestDateUtilities::test_parse_date_iso_string PASSED
tests/test_temporal_validation.py::TestDateUtilities::test_parse_date_iso_with_time PASSED
... (17 more tests)
============================== 19 passed in 0.20s ==============================
```
---
## References
### Schema Files
- Main schema: `schemas/20251121/linkml/01_custodian_name_modular.yaml` (v0.7.0)
- CustodianCollection: `schemas/20251121/linkml/modules/classes/CustodianCollection.yaml`
- OrganizationalStructure: `schemas/20251121/linkml/modules/classes/OrganizationalStructure.yaml`
- PersonObservation: `schemas/20251121/linkml/modules/classes/PersonObservation.yaml`
### Implementation
- Validator: `scripts/validate_temporal_consistency.py` (534 lines)
- Test suite: `tests/test_temporal_validation.py` (19 tests)
- Examples: `schemas/20251121/examples/collection_department_integration_examples.yaml`
### Documentation
- Phase 4 Completion: `COLLECTION_DEPARTMENT_INTEGRATION_COMPLETE_20251122.md`
- Phase 3 Completion: `PICO_STAFF_ROLES_COMPLETE_20251122.md`
- Phase 5 Completion: (to be created)
---
**Version**: 1.0
**Date**: 2025-11-22
**Schema Version**: v0.7.0
**Validator Version**: 1.0
**Status**: ✅ Complete