- Created SHACL shapes for validating temporal consistency and bidirectional relationships in custodial collections and staff observations. - Implemented a Python script to validate RDF data against the defined SHACL shapes using the pyshacl library. - Added command-line interface for validation with options for specifying data formats and output reports. - Included detailed error handling and reporting for validation results.
823 lines
22 KiB
Markdown
823 lines
22 KiB
Markdown
# SHACL Validation Shapes for Heritage Custodian Ontology
|
|
|
|
**Version**: 1.0.0
|
|
**Schema Version**: v0.7.0
|
|
**Created**: 2025-11-22
|
|
**SHACL Spec**: https://www.w3.org/TR/shacl/
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Overview](#overview)
|
|
2. [Installation](#installation)
|
|
3. [Usage](#usage)
|
|
4. [Validation Rules](#validation-rules)
|
|
5. [Shape Definitions](#shape-definitions)
|
|
6. [Examples](#examples)
|
|
7. [Integration](#integration)
|
|
8. [Comparison with Python Validator](#comparison-with-python-validator)
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
This document describes the **SHACL (Shapes Constraint Language)** validation shapes for the Heritage Custodian Ontology. SHACL shapes enforce data quality constraints at RDF ingestion time, preventing invalid data from entering triple stores.
|
|
|
|
### What is SHACL?
|
|
|
|
**SHACL** is a W3C recommendation for validating RDF graphs against a set of conditions (shapes). Unlike SPARQL queries that **detect** violations after data is stored, SHACL shapes **prevent** violations during data loading.
|
|
|
|
### Benefits of SHACL Validation
|
|
|
|
✅ **Prevention over Detection**: Reject invalid data before storage
|
|
✅ **Standardized Reports**: Machine-readable validation results
|
|
✅ **Triple Store Integration**: Native support in GraphDB, Jena, Virtuoso
|
|
✅ **Declarative Constraints**: Express rules in RDF (no external scripts)
|
|
✅ **Detailed Error Messages**: Precise identification of failing triples
|
|
|
|
---
|
|
|
|
## Installation
|
|
|
|
### Prerequisites
|
|
|
|
Install Python dependencies:
|
|
|
|
```bash
|
|
pip install pyshacl rdflib
|
|
```
|
|
|
|
**Libraries**:
|
|
- **pyshacl** (v0.25.0+): SHACL validator for Python
|
|
- **rdflib** (v7.0.0+): RDF graph library
|
|
|
|
### Verify Installation
|
|
|
|
```bash
|
|
python3 -c "import pyshacl; print(pyshacl.__version__)"
|
|
# Expected output: 0.25.0 (or later)
|
|
```
|
|
|
|
---
|
|
|
|
## Usage
|
|
|
|
### Command Line Validation
|
|
|
|
**Basic Usage**:
|
|
```bash
|
|
python scripts/validate_with_shacl.py data.ttl
|
|
```
|
|
|
|
**With Custom Shapes**:
|
|
```bash
|
|
python scripts/validate_with_shacl.py data.ttl --shapes custom_shapes.ttl
|
|
```
|
|
|
|
**Different RDF Formats**:
|
|
```bash
|
|
# JSON-LD data
|
|
python scripts/validate_with_shacl.py data.jsonld --format jsonld
|
|
|
|
# N-Triples data
|
|
python scripts/validate_with_shacl.py data.nt --format nt
|
|
```
|
|
|
|
**Save Validation Report**:
|
|
```bash
|
|
python scripts/validate_with_shacl.py data.ttl --output report.ttl
|
|
```
|
|
|
|
**Verbose Output**:
|
|
```bash
|
|
python scripts/validate_with_shacl.py data.ttl --verbose
|
|
```
|
|
|
|
### Python Library Usage
|
|
|
|
```python
|
|
from scripts.validate_with_shacl import validate_file
|
|
|
|
# Validate with default shapes
|
|
if validate_file("data.ttl"):
|
|
print("✅ Data is valid")
|
|
else:
|
|
print("❌ Data has violations")
|
|
|
|
# Validate with custom shapes
|
|
if validate_file("data.ttl", shapes_file="custom_shapes.ttl"):
|
|
print("✅ Valid")
|
|
```
|
|
|
|
### Triple Store Integration
|
|
|
|
**Apache Jena Fuseki**:
|
|
```bash
|
|
# Load shapes into Fuseki dataset
|
|
tdbloader2 --loc=/path/to/tdb custodian_validation_shapes.ttl
|
|
|
|
# Validate data during SPARQL UPDATE
|
|
# Fuseki automatically applies SHACL validation if shapes are loaded
|
|
```
|
|
|
|
**GraphDB**:
|
|
1. Create repository with SHACL validation enabled
|
|
2. Import shapes file into dedicated context: `http://shacl/shapes`
|
|
3. GraphDB validates all data changes automatically
|
|
|
|
---
|
|
|
|
## Validation Rules
|
|
|
|
This SHACL shapes file implements **5 core validation rules** from Phase 5:
|
|
|
|
| Rule ID | Name | Severity | Description |
|
|
|---------|------|----------|-------------|
|
|
| **Rule 1** | Collection-Unit Temporal Consistency | ERROR | Collection custody dates must fall within managing unit's validity period |
|
|
| **Rule 2** | Collection-Unit Bidirectional | ERROR | Collection → unit must have inverse unit → collection |
|
|
| **Rule 3** | Custody Transfer Continuity | WARNING | Custody transfers must be continuous (no gaps/overlaps) |
|
|
| **Rule 4** | Staff-Unit Temporal Consistency | ERROR | Staff employment dates must fall within unit's validity period |
|
|
| **Rule 5** | Staff-Unit Bidirectional | ERROR | Person → unit must have inverse unit → person |
|
|
|
|
Plus **3 additional shapes** for type and format constraints.
|
|
|
|
---
|
|
|
|
## Shape Definitions
|
|
|
|
### Rule 1: Collection-Unit Temporal Consistency
|
|
|
|
**Shape ID**: `custodian:CollectionUnitTemporalConsistencyShape`
|
|
|
|
**Target**: All instances of `custodian:CustodianCollection`
|
|
|
|
**Constraints**:
|
|
|
|
#### Constraint 1.1: Collection Starts After Unit Founding
|
|
|
|
```turtle
|
|
sh:sparql [
|
|
sh:message "Collection valid_from ({?collectionStart}) must be >= managing unit valid_from ({?unitStart})" ;
|
|
sh:select """
|
|
SELECT $this ?collectionStart ?unitStart ?managingUnit
|
|
WHERE {
|
|
$this custodian:managing_unit ?managingUnit ;
|
|
custodian:valid_from ?collectionStart .
|
|
|
|
?managingUnit custodian:valid_from ?unitStart .
|
|
|
|
# VIOLATION: Collection starts before unit exists
|
|
FILTER(?collectionStart < ?unitStart)
|
|
}
|
|
""" ;
|
|
] .
|
|
```
|
|
|
|
**Example Violation**:
|
|
```turtle
|
|
# Unit founded 2010
|
|
<https://example.org/unit/dept-1>
|
|
a custodian:OrganizationalStructure ;
|
|
custodian:valid_from "2010-01-01"^^xsd:date .
|
|
|
|
# Collection started 2005 (INVALID!)
|
|
<https://example.org/collection/col-1>
|
|
a custodian:CustodianCollection ;
|
|
custodian:managing_unit <https://example.org/unit/dept-1> ;
|
|
custodian:valid_from "2005-01-01"^^xsd:date .
|
|
```
|
|
|
|
**Violation Report**:
|
|
```
|
|
❌ Validation Result [Constraint Component: sh:SPARQLConstraintComponent]
|
|
Severity: sh:Violation
|
|
Message: Collection valid_from (2005-01-01) must be >= managing unit valid_from (2010-01-01)
|
|
Focus Node: https://example.org/collection/col-1
|
|
```
|
|
|
|
---
|
|
|
|
#### Constraint 1.2: Collection Ends Before Unit Dissolution
|
|
|
|
```turtle
|
|
sh:sparql [
|
|
sh:message "Collection valid_to ({?collectionEnd}) must be <= managing unit valid_to ({?unitEnd})" ;
|
|
sh:select """
|
|
SELECT $this ?collectionEnd ?unitEnd ?managingUnit
|
|
WHERE {
|
|
$this custodian:managing_unit ?managingUnit ;
|
|
custodian:valid_to ?collectionEnd .
|
|
|
|
?managingUnit custodian:valid_to ?unitEnd .
|
|
|
|
# Unit is dissolved
|
|
FILTER(BOUND(?unitEnd))
|
|
|
|
# VIOLATION: Collection custody ends after unit dissolution
|
|
FILTER(?collectionEnd > ?unitEnd)
|
|
}
|
|
""" ;
|
|
] .
|
|
```
|
|
|
|
**Example Violation**:
|
|
```turtle
|
|
# Unit dissolved 2020
|
|
<https://example.org/unit/dept-1>
|
|
a custodian:OrganizationalStructure ;
|
|
custodian:valid_from "2010-01-01"^^xsd:date ;
|
|
custodian:valid_to "2020-12-31"^^xsd:date .
|
|
|
|
# Collection custody ended 2023 (INVALID!)
|
|
<https://example.org/collection/col-1>
|
|
a custodian:CustodianCollection ;
|
|
custodian:managing_unit <https://example.org/unit/dept-1> ;
|
|
custodian:valid_from "2015-01-01"^^xsd:date ;
|
|
custodian:valid_to "2023-06-01"^^xsd:date .
|
|
```
|
|
|
|
---
|
|
|
|
#### Warning: Ongoing Custody After Unit Dissolution
|
|
|
|
```turtle
|
|
sh:sparql [
|
|
sh:severity sh:Warning ;
|
|
sh:message "Collection has ongoing custody but managing unit was dissolved" ;
|
|
sh:select """
|
|
SELECT $this ?managingUnit ?unitEnd
|
|
WHERE {
|
|
$this custodian:managing_unit ?managingUnit .
|
|
|
|
# Collection has no end date (ongoing)
|
|
FILTER NOT EXISTS { $this custodian:valid_to ?collectionEnd }
|
|
|
|
# But unit is dissolved
|
|
?managingUnit custodian:valid_to ?unitEnd .
|
|
}
|
|
""" ;
|
|
] .
|
|
```
|
|
|
|
**Example Warning**:
|
|
```turtle
|
|
# Unit dissolved 2020
|
|
<https://example.org/unit/dept-1>
|
|
custodian:valid_to "2020-12-31"^^xsd:date .
|
|
|
|
# Collection custody ongoing (WARNING!)
|
|
<https://example.org/collection/col-1>
|
|
custodian:managing_unit <https://example.org/unit/dept-1> ;
|
|
custodian:valid_from "2015-01-01"^^xsd:date .
|
|
# No valid_to → custody still active
|
|
```
|
|
|
|
**Interpretation**: Collection likely transferred to another unit but custody history not updated.
|
|
|
|
---
|
|
|
|
### Rule 2: Collection-Unit Bidirectional Relationships
|
|
|
|
**Shape ID**: `custodian:CollectionUnitBidirectionalShape`
|
|
|
|
**Target**: All instances of `custodian:CustodianCollection`
|
|
|
|
**Constraint**: If collection references `managing_unit`, unit must reference collection in `managed_collections`.
|
|
|
|
```turtle
|
|
sh:sparql [
|
|
sh:message "Collection references managing_unit {?unit} but unit does not list collection in managed_collections" ;
|
|
sh:select """
|
|
SELECT $this ?unit
|
|
WHERE {
|
|
$this custodian:managing_unit ?unit .
|
|
|
|
# VIOLATION: Unit does not reference collection back
|
|
FILTER NOT EXISTS {
|
|
?unit custodian:managed_collections $this
|
|
}
|
|
}
|
|
""" ;
|
|
] .
|
|
```
|
|
|
|
**Example Violation**:
|
|
```turtle
|
|
# Collection references unit
|
|
<https://example.org/collection/col-1>
|
|
custodian:managing_unit <https://example.org/unit/dept-1> .
|
|
|
|
# But unit does NOT reference collection (INVALID!)
|
|
<https://example.org/unit/dept-1>
|
|
a custodian:OrganizationalStructure .
|
|
# Missing: custodian:managed_collections <https://example.org/collection/col-1>
|
|
```
|
|
|
|
**Fix**:
|
|
```turtle
|
|
# Add inverse relationship
|
|
<https://example.org/unit/dept-1>
|
|
custodian:managed_collections <https://example.org/collection/col-1> .
|
|
```
|
|
|
|
---
|
|
|
|
### Rule 3: Custody Transfer Continuity
|
|
|
|
**Shape ID**: `custodian:CustodyTransferContinuityShape`
|
|
|
|
**Target**: All instances of `custodian:CustodianCollection`
|
|
|
|
**Constraints**:
|
|
|
|
#### Check for Gaps in Custody Chain
|
|
|
|
```turtle
|
|
sh:sparql [
|
|
sh:severity sh:Warning ;
|
|
sh:message "Custody gap detected: previous custody ended on {?prevEnd} but next custody started on {?nextStart}" ;
|
|
sh:select """
|
|
SELECT $this ?prevEnd ?nextStart ?gapDays
|
|
WHERE {
|
|
$this custodian:custody_history ?event1 ;
|
|
custodian:custody_history ?event2 .
|
|
|
|
?event1 custodian:transfer_date ?prevEnd .
|
|
?event2 custodian:transfer_date ?nextStart .
|
|
|
|
FILTER(?nextStart > ?prevEnd)
|
|
BIND((xsd:date(?nextStart) - xsd:date(?prevEnd)) AS ?gapDays)
|
|
|
|
# WARNING: Gap > 1 day
|
|
FILTER(?gapDays > 1)
|
|
}
|
|
""" ;
|
|
] .
|
|
```
|
|
|
|
**Example Warning**:
|
|
```turtle
|
|
<https://example.org/collection/col-1>
|
|
custodian:custody_history <https://example.org/event/transfer-1> ;
|
|
custodian:custody_history <https://example.org/event/transfer-2> .
|
|
|
|
<https://example.org/event/transfer-1>
|
|
custodian:transfer_date "2010-01-01"^^xsd:date .
|
|
|
|
<https://example.org/event/transfer-2>
|
|
custodian:transfer_date "2010-02-01"^^xsd:date .
|
|
# Gap of 31 days between transfers
|
|
```
|
|
|
|
---
|
|
|
|
#### Check for Overlaps in Custody Chain
|
|
|
|
```turtle
|
|
sh:sparql [
|
|
sh:message "Custody overlap detected: collection managed by {?custodian1} until {?end1} and simultaneously by {?custodian2} from {?start2}" ;
|
|
sh:select """
|
|
SELECT $this ?custodian1 ?end1 ?custodian2 ?start2
|
|
WHERE {
|
|
$this custodian:custody_history ?event1 ;
|
|
custodian:custody_history ?event2 .
|
|
|
|
?event1 custodian:new_custodian ?custodian1 ;
|
|
custodian:custody_end_date ?end1 .
|
|
|
|
?event2 custodian:new_custodian ?custodian2 ;
|
|
custodian:transfer_date ?start2 .
|
|
|
|
FILTER(?custodian1 != ?custodian2)
|
|
FILTER(?start2 < ?end1) # Overlap!
|
|
}
|
|
""" ;
|
|
] .
|
|
```
|
|
|
|
---
|
|
|
|
### Rule 4: Staff-Unit Temporal Consistency
|
|
|
|
**Shape ID**: `custodian:StaffUnitTemporalConsistencyShape`
|
|
|
|
**Target**: All instances of `custodian:PersonObservation`
|
|
|
|
**Constraints**: Same as Rule 1, but for staff employment dates vs. unit validity period.
|
|
|
|
#### Constraint 4.1: Employment Starts After Unit Founding
|
|
|
|
```turtle
|
|
sh:sparql [
|
|
sh:message "Staff employment_start_date ({?employmentStart}) must be >= unit valid_from ({?unitStart})" ;
|
|
sh:select """
|
|
SELECT $this ?employmentStart ?unitStart ?unit
|
|
WHERE {
|
|
$this custodian:unit_affiliation ?unit ;
|
|
custodian:employment_start_date ?employmentStart .
|
|
|
|
?unit custodian:valid_from ?unitStart .
|
|
|
|
FILTER(?employmentStart < ?unitStart)
|
|
}
|
|
""" ;
|
|
] .
|
|
```
|
|
|
|
**Example Violation**:
|
|
```turtle
|
|
# Unit founded 2015
|
|
<https://example.org/unit/dept-1>
|
|
custodian:valid_from "2015-01-01"^^xsd:date .
|
|
|
|
# Staff employed 2010 (INVALID!)
|
|
<https://example.org/person/john-doe>
|
|
custodian:unit_affiliation <https://example.org/unit/dept-1> ;
|
|
custodian:employment_start_date "2010-01-01"^^xsd:date .
|
|
```
|
|
|
|
---
|
|
|
|
### Rule 5: Staff-Unit Bidirectional Relationships
|
|
|
|
**Shape ID**: `custodian:StaffUnitBidirectionalShape`
|
|
|
|
**Target**: All instances of `custodian:PersonObservation`
|
|
|
|
**Constraint**: If person references `unit_affiliation`, unit must reference person in `staff_members` or `org:hasMember`.
|
|
|
|
```turtle
|
|
sh:sparql [
|
|
sh:message "Person references unit_affiliation {?unit} but unit does not list person in staff_members" ;
|
|
sh:select """
|
|
SELECT $this ?unit
|
|
WHERE {
|
|
$this custodian:unit_affiliation ?unit .
|
|
|
|
# VIOLATION: Unit does not reference person back
|
|
FILTER NOT EXISTS {
|
|
{ ?unit custodian:staff_members $this }
|
|
UNION
|
|
{ ?unit org:hasMember $this }
|
|
}
|
|
}
|
|
""" ;
|
|
] .
|
|
```
|
|
|
|
---
|
|
|
|
### Additional Shapes: Type and Format Constraints
|
|
|
|
#### Type Constraint: managing_unit Must Be OrganizationalStructure
|
|
|
|
```turtle
|
|
custodian:CollectionManagingUnitTypeShape
|
|
sh:property [
|
|
sh:path custodian:managing_unit ;
|
|
sh:class custodian:OrganizationalStructure ;
|
|
sh:message "managing_unit must be an instance of OrganizationalStructure" ;
|
|
] .
|
|
```
|
|
|
|
#### Type Constraint: unit_affiliation Must Be OrganizationalStructure
|
|
|
|
```turtle
|
|
custodian:PersonUnitAffiliationTypeShape
|
|
sh:property [
|
|
sh:path custodian:unit_affiliation ;
|
|
sh:class custodian:OrganizationalStructure ;
|
|
sh:message "unit_affiliation must be an instance of OrganizationalStructure" ;
|
|
] .
|
|
```
|
|
|
|
#### Format Constraint: Dates Must Be xsd:date or xsd:dateTime
|
|
|
|
```turtle
|
|
custodian:DatetimeFormatShape
|
|
sh:property [
|
|
sh:path custodian:valid_from ;
|
|
sh:or (
|
|
[ sh:datatype xsd:date ]
|
|
[ sh:datatype xsd:dateTime ]
|
|
) ;
|
|
] .
|
|
```
|
|
|
|
---
|
|
|
|
## Examples
|
|
|
|
### Example 1: Valid Collection-Unit Relationship
|
|
|
|
**Valid RDF Data**:
|
|
```turtle
|
|
@prefix custodian: <https://nde.nl/ontology/hc/custodian/> .
|
|
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
|
|
|
|
<https://example.org/unit/paintings-dept>
|
|
a custodian:OrganizationalStructure ;
|
|
custodian:unit_name "Paintings Department" ;
|
|
custodian:valid_from "1985-01-01"^^xsd:date ;
|
|
custodian:managed_collections <https://example.org/collection/dutch-paintings> .
|
|
|
|
<https://example.org/collection/dutch-paintings>
|
|
a custodian:CustodianCollection ;
|
|
custodian:collection_name "Dutch Paintings" ;
|
|
custodian:managing_unit <https://example.org/unit/paintings-dept> ;
|
|
custodian:valid_from "1995-01-01"^^xsd:date .
|
|
```
|
|
|
|
**Validation**:
|
|
```bash
|
|
python scripts/validate_with_shacl.py valid_data.ttl
|
|
# ✅ VALIDATION PASSED
|
|
# No constraint violations found.
|
|
```
|
|
|
|
---
|
|
|
|
### Example 2: Invalid - Temporal Violation
|
|
|
|
**Invalid RDF Data**:
|
|
```turtle
|
|
<https://example.org/unit/paintings-dept>
|
|
custodian:valid_from "1985-01-01"^^xsd:date .
|
|
|
|
<https://example.org/collection/dutch-paintings>
|
|
custodian:managing_unit <https://example.org/unit/paintings-dept> ;
|
|
custodian:valid_from "1970-01-01"^^xsd:date . # Before unit exists!
|
|
```
|
|
|
|
**Validation**:
|
|
```bash
|
|
python scripts/validate_with_shacl.py invalid_data.ttl
|
|
# ❌ VALIDATION FAILED
|
|
#
|
|
# Constraint Violations:
|
|
# --------------------------------------------------------------------------------
|
|
# Validation Result [Constraint Component: sh:SPARQLConstraintComponent]:
|
|
# Severity: sh:Violation
|
|
# Message: Collection valid_from (1970-01-01) must be >= managing unit valid_from (1985-01-01)
|
|
# Focus Node: https://example.org/collection/dutch-paintings
|
|
# Result Path: -
|
|
# Source Shape: custodian:CollectionUnitTemporalConsistencyShape
|
|
```
|
|
|
|
---
|
|
|
|
### Example 3: Invalid - Missing Bidirectional Relationship
|
|
|
|
**Invalid RDF Data**:
|
|
```turtle
|
|
<https://example.org/collection/dutch-paintings>
|
|
custodian:managing_unit <https://example.org/unit/paintings-dept> .
|
|
|
|
<https://example.org/unit/paintings-dept>
|
|
a custodian:OrganizationalStructure .
|
|
# Missing: custodian:managed_collections <https://example.org/collection/dutch-paintings>
|
|
```
|
|
|
|
**Validation**:
|
|
```bash
|
|
python scripts/validate_with_shacl.py invalid_data.ttl
|
|
# ❌ VALIDATION FAILED
|
|
#
|
|
# Constraint Violations:
|
|
# --------------------------------------------------------------------------------
|
|
# Validation Result:
|
|
# Severity: sh:Violation
|
|
# Message: Collection references managing_unit https://example.org/unit/paintings-dept
|
|
# but unit does not list collection in managed_collections
|
|
# Focus Node: https://example.org/collection/dutch-paintings
|
|
```
|
|
|
|
---
|
|
|
|
## Integration
|
|
|
|
### CI/CD Pipeline Integration
|
|
|
|
**GitHub Actions Example**:
|
|
```yaml
|
|
name: SHACL Validation
|
|
|
|
on: [push, pull_request]
|
|
|
|
jobs:
|
|
validate:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v3
|
|
|
|
- name: Set up Python
|
|
uses: actions/setup-python@v4
|
|
with:
|
|
python-version: '3.10'
|
|
|
|
- name: Install dependencies
|
|
run: pip install pyshacl rdflib
|
|
|
|
- name: Validate RDF data
|
|
run: |
|
|
python scripts/validate_with_shacl.py data/instances/*.ttl
|
|
|
|
- name: Upload validation report
|
|
if: failure()
|
|
uses: actions/upload-artifact@v3
|
|
with:
|
|
name: validation-report
|
|
path: validation_report.ttl
|
|
```
|
|
|
|
---
|
|
|
|
### Pre-commit Hook
|
|
|
|
**`.git/hooks/pre-commit`**:
|
|
```bash
|
|
#!/bin/bash
|
|
# Validate RDF files before commit
|
|
|
|
echo "Running SHACL validation..."
|
|
|
|
for file in data/instances/*.ttl; do
|
|
python scripts/validate_with_shacl.py "$file" --quiet
|
|
if [ $? -ne 0 ]; then
|
|
echo "❌ SHACL validation failed for $file"
|
|
echo "Fix violations before committing."
|
|
exit 1
|
|
fi
|
|
done
|
|
|
|
echo "✅ All files pass SHACL validation"
|
|
exit 0
|
|
```
|
|
|
|
---
|
|
|
|
## Comparison with Python Validator
|
|
|
|
### Phase 5 Python Validator vs. Phase 7 SHACL Shapes
|
|
|
|
| Aspect | Python Validator (Phase 5) | SHACL Shapes (Phase 7) |
|
|
|--------|---------------------------|------------------------|
|
|
| **Input Format** | YAML (LinkML instances) | RDF (Turtle, JSON-LD, etc.) |
|
|
| **Execution** | Standalone script | Triple store integrated OR pyshacl |
|
|
| **Performance** | Fast for <1,000 records | Optimized for >10,000 records |
|
|
| **Deployment** | Python runtime required | RDF triple store native |
|
|
| **Error Messages** | Custom CLI output | Standardized SHACL reports |
|
|
| **CI/CD** | Exit codes (0/1/2) | Exit codes (0/1/2) + RDF report |
|
|
| **Use Case** | Development validation | Production runtime validation |
|
|
|
|
### When to Use Which?
|
|
|
|
**Use Python Validator** (`validate_temporal_consistency.py`):
|
|
- ✅ During schema development (fast feedback on YAML instances)
|
|
- ✅ Pre-commit hooks for LinkML files
|
|
- ✅ Unit testing LinkML examples
|
|
- ✅ Before RDF conversion
|
|
|
|
**Use SHACL Shapes** (`validate_with_shacl.py`):
|
|
- ✅ Production RDF triple stores (GraphDB, Fuseki)
|
|
- ✅ Data ingestion pipelines
|
|
- ✅ Continuous monitoring (real-time validation)
|
|
- ✅ After RDF conversion (final quality gate)
|
|
|
|
**Best Practice**: Use **both**:
|
|
1. Python validator during development (YAML → validate → RDF)
|
|
2. SHACL shapes in production (RDF → validate → store)
|
|
|
|
---
|
|
|
|
## Advanced Usage
|
|
|
|
### Generate Validation Report
|
|
|
|
```bash
|
|
python scripts/validate_with_shacl.py data.ttl --output report.ttl
|
|
```
|
|
|
|
**Report Format** (Turtle):
|
|
```turtle
|
|
@prefix sh: <http://www.w3.org/ns/shacl#> .
|
|
|
|
[ a sh:ValidationReport ;
|
|
sh:conforms false ;
|
|
sh:result [
|
|
a sh:ValidationResult ;
|
|
sh:focusNode <https://example.org/collection/col-1> ;
|
|
sh:resultMessage "Collection valid_from (1970-01-01) must be >= ..." ;
|
|
sh:resultSeverity sh:Violation ;
|
|
sh:sourceConstraintComponent sh:SPARQLConstraintComponent ;
|
|
sh:sourceShape custodian:CollectionUnitTemporalConsistencyShape
|
|
]
|
|
] .
|
|
```
|
|
|
|
---
|
|
|
|
### Custom Severity Levels
|
|
|
|
SHACL supports three severity levels:
|
|
|
|
```turtle
|
|
sh:severity sh:Violation ; # ERROR (blocks data loading)
|
|
sh:severity sh:Warning ; # WARNING (logged but allowed)
|
|
sh:severity sh:Info ; # INFO (informational only)
|
|
```
|
|
|
|
**Example**: Custody gap is a **warning** (data quality issue but not invalid):
|
|
```turtle
|
|
custodian:CustodyTransferContinuityShape
|
|
sh:sparql [
|
|
sh:severity sh:Warning ; # Allow data but log warning
|
|
sh:message "Custody gap detected..." ;
|
|
...
|
|
] .
|
|
```
|
|
|
|
---
|
|
|
|
### Extending Shapes
|
|
|
|
Add custom validation rules by creating new shapes:
|
|
|
|
```turtle
|
|
# Custom rule: Collection name must not be empty
|
|
custodian:CollectionNameNotEmptyShape
|
|
a sh:NodeShape ;
|
|
sh:targetClass custodian:CustodianCollection ;
|
|
sh:property [
|
|
sh:path custodian:collection_name ;
|
|
sh:minLength 1 ;
|
|
sh:message "Collection name must not be empty" ;
|
|
] .
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
#### Issue 1: "pyshacl not found"
|
|
|
|
**Solution**:
|
|
```bash
|
|
pip install pyshacl rdflib
|
|
```
|
|
|
|
#### Issue 2: "Parse error: Invalid Turtle syntax"
|
|
|
|
**Solution**: Validate RDF syntax first:
|
|
```bash
|
|
rdfpipe -i turtle data.ttl > /dev/null
|
|
# If errors, fix syntax before SHACL validation
|
|
```
|
|
|
|
#### Issue 3: "No violations found but data is clearly invalid"
|
|
|
|
**Solution**: Check namespace prefixes match between shapes and data:
|
|
```turtle
|
|
# Shapes file uses:
|
|
@prefix custodian: <https://nde.nl/ontology/hc/custodian/> .
|
|
|
|
# Data file must use same namespace:
|
|
<https://nde.nl/ontology/hc/custodian/CustodianCollection>
|
|
```
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- **SHACL Specification**: https://www.w3.org/TR/shacl/
|
|
- **pyshacl Documentation**: https://github.com/RDFLib/pySHACL
|
|
- **SHACL Advanced Features**: https://www.w3.org/TR/shacl-af/
|
|
- **Python Validator (Phase 5)**: `scripts/validate_temporal_consistency.py`
|
|
- **SPARQL Queries (Phase 6)**: `docs/SPARQL_QUERIES_ORGANIZATIONAL.md`
|
|
- **Schema (v0.7.0)**: `schemas/20251121/linkml/01_custodian_name_modular.yaml`
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
### Phase 8: LinkML Schema Constraints
|
|
|
|
Embed validation rules directly into LinkML schema using:
|
|
- `minimum_value` / `maximum_value` for date comparisons
|
|
- `pattern` for format validation
|
|
- Custom validators with Python functions
|
|
- Slot-level constraints
|
|
|
|
**Goal**: Validate at **schema definition** level, not just RDF level.
|
|
|
|
---
|
|
|
|
**Document Version**: 1.0.0
|
|
**Schema Version**: v0.7.0
|
|
**Last Updated**: 2025-11-22
|
|
**SHACL Shapes File**: `schemas/20251121/shacl/custodian_validation_shapes.ttl` (474 lines)
|
|
**Validation Script**: `scripts/validate_with_shacl.py` (289 lines)
|
|
|