553 lines
16 KiB
Markdown
553 lines
16 KiB
Markdown
# Critical Architectural Fix - Observation-Reconstruction Relationships
|
|
|
|
**Date**: 2025-11-22
|
|
**Status**: 🚨 CRITICAL FIX NEEDED
|
|
**Priority**: HIGH
|
|
|
|
## Problem Identified
|
|
|
|
The current schema has **incorrect relationships** between key classes:
|
|
|
|
### Current Issues
|
|
|
|
1. **ConfidenceMeasure attached to wrong class**
|
|
- ❌ Current: `CustodianReconstruction.confidence_score` → `ConfidenceMeasure`
|
|
- ✅ Correct: `ReconstructionActivity.confidence_score` → `ConfidenceMeasure`
|
|
- **Rationale**: Confidence measures the PROCESS quality, not the RESULT
|
|
|
|
2. **CustodianObservation not linked to ReconstructionActivity**
|
|
- ❌ Current: No direct link from `CustodianObservation` → `ReconstructionActivity`
|
|
- ✅ Correct: `ReconstructionActivity.used` → `CustodianObservation` (multivalued)
|
|
- **Rationale**: PROV-O Activity uses Entities as inputs
|
|
|
|
3. **CustodianName incorrectly subclasses CustodianObservation**
|
|
- ❌ Current: `CustodianName is_a CustodianObservation`
|
|
- ✅ Correct: `CustodianName` is **DERIVED FROM** `CustodianObservation` (not inheritance!)
|
|
- **Rationale**: CustodianName is the OUTPUT of interpretation, not a type of observation
|
|
|
|
4. **CustodianName not directly linked to Custodian hub**
|
|
- ❌ Current: No direct link from `Custodian` → `CustodianName`
|
|
- ✅ Correct: `Custodian.preferred_label` → `CustodianName`
|
|
- **Rationale**: The hub needs a direct reference to its standardized emic name
|
|
|
|
---
|
|
|
|
## Correct Architecture Pattern
|
|
|
|
### PROV-O Activity Pattern
|
|
|
|
```
|
|
CustodianObservation (Entity - Input)
|
|
↓ prov:used
|
|
ReconstructionActivity (Activity - Process)
|
|
├── confidence_score → ConfidenceMeasure (quality of process)
|
|
├── prov:wasGeneratedBy → Agent
|
|
↓ prov:wasGeneratedBy (output)
|
|
CustodianReconstruction OR CustodianName (Entity - Output)
|
|
```
|
|
|
|
### Key Principles
|
|
|
|
1. **Activity Uses Entities** (`prov:used`)
|
|
- ReconstructionActivity consumes CustodianObservation(s)
|
|
- Multiple observations can feed into one activity
|
|
|
|
2. **Activity Generates Entities** (`prov:wasGeneratedBy`)
|
|
- ReconstructionActivity generates CustodianReconstruction (success case)
|
|
- OR ReconstructionActivity generates CustodianName (partial success)
|
|
- Activity can fail without generating any output
|
|
|
|
3. **Derivation Tracks Lineage** (`prov:wasDerivedFrom`)
|
|
- CustodianName derives from CustodianObservation(s)
|
|
- CustodianReconstruction derives from CustodianObservation(s)
|
|
- Derivation is separate from Activity (tracks transformation)
|
|
|
|
4. **Confidence Measures Activity Quality**
|
|
- Confidence attached to ReconstructionActivity (process quality)
|
|
- NOT attached to CustodianReconstruction (result)
|
|
- Represents: "How confident are we in this reconstruction process?"
|
|
|
|
---
|
|
|
|
## Required Changes
|
|
|
|
### 1. Move ConfidenceMeasure to ReconstructionActivity ✅
|
|
|
|
**File**: `modules/classes/ReconstructionActivity.yaml`
|
|
|
|
**Add slot**:
|
|
```yaml
|
|
slots:
|
|
- id
|
|
- activity_type
|
|
- method
|
|
- responsible_agent
|
|
- temporal_extent
|
|
- used # NEW - links to CustodianObservation(s)
|
|
- confidence_score # NEW - moved from CustodianReconstruction
|
|
- justification
|
|
```
|
|
|
|
**Add slot_usage**:
|
|
```yaml
|
|
slot_usage:
|
|
used:
|
|
slot_uri: prov:used
|
|
description: >-
|
|
CustodianObservation(s) used as input for this reconstruction activity.
|
|
PROV-O: used links Activity to consumed Entities.
|
|
Multiple observations can contribute to a single reconstruction.
|
|
range: CustodianObservation
|
|
multivalued: true
|
|
required: true
|
|
confidence_score:
|
|
slot_uri: prov:confidence
|
|
description: >-
|
|
Confidence in the reconstruction activity's process and methodology.
|
|
Measures quality of the reconstruction PROCESS, not the result.
|
|
Range: 0.0 (low confidence) to 1.0 (high confidence).
|
|
range: ConfidenceMeasure
|
|
required: false
|
|
```
|
|
|
|
**Remove from**: `modules/classes/CustodianReconstruction.yaml`
|
|
- Delete `confidence_score` slot
|
|
|
|
---
|
|
|
|
### 2. Change CustodianName from Inheritance to Derivation ✅
|
|
|
|
**File**: `modules/classes/CustodianName.yaml`
|
|
|
|
**Current** (WRONG):
|
|
```yaml
|
|
CustodianName:
|
|
is_a: CustodianObservation # ❌ Inheritance implies "is a type of"
|
|
```
|
|
|
|
**Corrected** (RIGHT):
|
|
```yaml
|
|
CustodianName:
|
|
# Remove is_a relationship
|
|
class_uri: skos:Concept # Or schema:name
|
|
description: >-
|
|
Standardized emic name derived from CustodianObservation(s).
|
|
|
|
NOT a subclass of CustodianObservation - rather, a CustodianName is
|
|
DERIVED FROM observation(s) through interpretation and standardization.
|
|
|
|
Can be generated by ReconstructionActivity (successful interpretation)
|
|
or remain standalone (direct extraction without full entity resolution).
|
|
|
|
slots:
|
|
- emic_name
|
|
- name_language
|
|
- standardized_name
|
|
- endorsement_source
|
|
- was_derived_from # NEW - links to CustodianObservation(s)
|
|
- was_generated_by # NEW - links to ReconstructionActivity (optional)
|
|
- refers_to_custodian # NEW - links to Custodian hub
|
|
- valid_from
|
|
- valid_to
|
|
- supersedes
|
|
- superseded_by
|
|
```
|
|
|
|
**Add slot_usage**:
|
|
```yaml
|
|
slot_usage:
|
|
was_derived_from:
|
|
slot_uri: prov:wasDerivedFrom
|
|
description: >-
|
|
CustodianObservation(s) from which this name was derived.
|
|
PROV-O: wasDerivedFrom establishes observation→name derivation.
|
|
A name can be derived from multiple observations (consolidation).
|
|
range: CustodianObservation
|
|
multivalued: true
|
|
required: true
|
|
was_generated_by:
|
|
slot_uri: prov:wasGeneratedBy
|
|
description: >-
|
|
ReconstructionActivity that generated this standardized name (optional).
|
|
If null, name was directly extracted without formal reconstruction activity.
|
|
range: ReconstructionActivity
|
|
required: false
|
|
refers_to_custodian:
|
|
slot_uri: dcterms:references
|
|
description: >-
|
|
The Custodian hub that this name identifies.
|
|
Links the standardized name back to the hub.
|
|
range: Custodian
|
|
required: true
|
|
```
|
|
|
|
---
|
|
|
|
### 3. Add Preferred Label Link from Custodian to CustodianName ✅
|
|
|
|
**File**: `modules/classes/Custodian.yaml`
|
|
|
|
**Add slot**:
|
|
```yaml
|
|
slots:
|
|
- hc_id
|
|
- preferred_label # NEW - links to primary CustodianName
|
|
- appellations
|
|
- identifiers
|
|
- created
|
|
- modified
|
|
```
|
|
|
|
**Add slot_usage**:
|
|
```yaml
|
|
slot_usage:
|
|
preferred_label:
|
|
slot_uri: skos:prefLabel
|
|
description: >-
|
|
The primary standardized emic name for this custodian.
|
|
SKOS: prefLabel for the preferred lexical label.
|
|
|
|
This is the CANONICAL name - the standardized label accepted by the
|
|
custodian itself for public representation.
|
|
|
|
Distinct from:
|
|
- Legal name (formal registered name in CustodianReconstruction)
|
|
- Alternative names (in appellations)
|
|
- Historical names (superseded CustodianNames)
|
|
range: CustodianName
|
|
required: false # May be null if name not yet established
|
|
examples:
|
|
- value: "Rijksmuseum"
|
|
description: "Primary emic name (not 'Stichting Rijksmuseum' legal name)"
|
|
```
|
|
|
|
---
|
|
|
|
### 4. Update CustodianObservation Documentation ✅
|
|
|
|
**File**: `modules/classes/CustodianObservation.yaml`
|
|
|
|
**Update description**:
|
|
```yaml
|
|
CustodianObservation:
|
|
class_uri: heritage:CustodianObservation
|
|
description: >-
|
|
Source-based evidence of a heritage custodian's existence.
|
|
|
|
CustodianObservations are INPUT ENTITIES for ReconstructionActivity:
|
|
- Multiple observations can be reconciled into a CustodianReconstruction
|
|
- Multiple observations can be standardized into a CustodianName
|
|
- Observations remain independent even after reconstruction
|
|
|
|
PROV-O Pattern:
|
|
CustodianObservation → prov:used → ReconstructionActivity → prov:wasGeneratedBy → CustodianReconstruction
|
|
CustodianObservation → prov:wasDerivedFrom ← CustodianName
|
|
```
|
|
|
|
---
|
|
|
|
## Revised Class Relationships
|
|
|
|
### Complete PROV-O Flow
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph Sources
|
|
Obs1[CustodianObservation 1<br/>ISIL Registry]
|
|
Obs2[CustodianObservation 2<br/>Museum Website]
|
|
Obs3[CustodianObservation 3<br/>Archival Document]
|
|
end
|
|
|
|
subgraph Activity
|
|
Act[ReconstructionActivity<br/>Entity Resolution]
|
|
Conf[ConfidenceMeasure<br/>Score: 0.92]
|
|
end
|
|
|
|
subgraph Outputs
|
|
Rec[CustodianReconstruction<br/>Legal Entity]
|
|
Name[CustodianName<br/>Standardized Emic Name]
|
|
end
|
|
|
|
subgraph Hub
|
|
Cust[Custodian Hub<br/>hc_id]
|
|
end
|
|
|
|
Obs1 -->|prov:used| Act
|
|
Obs2 -->|prov:used| Act
|
|
Obs3 -->|prov:used| Act
|
|
|
|
Act -->|has confidence_score| Conf
|
|
|
|
Act -->|prov:wasGeneratedBy| Rec
|
|
Act -->|prov:wasGeneratedBy| Name
|
|
|
|
Rec -->|prov:wasDerivedFrom| Obs1
|
|
Rec -->|prov:wasDerivedFrom| Obs2
|
|
|
|
Name -->|prov:wasDerivedFrom| Obs1
|
|
Name -->|prov:wasDerivedFrom| Obs3
|
|
|
|
Rec -->|refers_to_custodian| Cust
|
|
Name -->|refers_to_custodian| Cust
|
|
Obs1 -->|refers_to_custodian| Cust
|
|
Obs2 -->|refers_to_custodian| Cust
|
|
Obs3 -->|refers_to_custodian| Cust
|
|
|
|
Cust -->|skos:prefLabel| Name
|
|
```
|
|
|
|
---
|
|
|
|
## Success vs. Failure Scenarios
|
|
|
|
### Scenario 1: Successful Full Reconstruction ✅
|
|
|
|
```yaml
|
|
# INPUT: Multiple observations
|
|
CustodianObservation:
|
|
- id: obs-001
|
|
observed_name: "Rijks"
|
|
source: letterhead
|
|
- id: obs-002
|
|
observed_name: "Rijksmuseum Amsterdam"
|
|
source: ISIL registry
|
|
|
|
# PROCESS: Reconstruction activity
|
|
ReconstructionActivity:
|
|
id: act-001
|
|
used:
|
|
- obs-001
|
|
- obs-002
|
|
confidence_score:
|
|
confidence_value: 0.95
|
|
confidence_method: "Manual expert curation"
|
|
|
|
# OUTPUT: Both reconstruction AND standardized name
|
|
CustodianReconstruction:
|
|
id: rec-001
|
|
legal_name: "Stichting Rijksmuseum"
|
|
was_derived_from: [obs-001, obs-002]
|
|
was_generated_by: act-001
|
|
refers_to_custodian: hc:nl-nh-ams-m-rm-q190804
|
|
|
|
CustodianName:
|
|
id: name-001
|
|
emic_name: "Rijksmuseum"
|
|
was_derived_from: [obs-001, obs-002]
|
|
was_generated_by: act-001
|
|
refers_to_custodian: hc:nl-nh-ams-m-rm-q190804
|
|
|
|
# HUB: Links to preferred name
|
|
Custodian:
|
|
hc_id: hc:nl-nh-ams-m-rm-q190804
|
|
preferred_label: name-001
|
|
```
|
|
|
|
### Scenario 2: Partial Success - Name Only 🔸
|
|
|
|
```yaml
|
|
# INPUT: Single observation
|
|
CustodianObservation:
|
|
- id: obs-003
|
|
observed_name: "Museum van de Twintigste Eeuw"
|
|
source: archival document
|
|
|
|
# PROCESS: Attempted reconstruction
|
|
ReconstructionActivity:
|
|
id: act-002
|
|
used: [obs-003]
|
|
confidence_score:
|
|
confidence_value: 0.45
|
|
confidence_method: "Algorithmic matching - insufficient data"
|
|
|
|
# OUTPUT: Name only (reconstruction failed due to low confidence)
|
|
CustodianName:
|
|
id: name-002
|
|
emic_name: "Museum van de Twintigste Eeuw"
|
|
was_derived_from: [obs-003]
|
|
was_generated_by: act-002
|
|
refers_to_custodian: hc:nl-ut-utr-m-mtwe
|
|
|
|
# NO CustodianReconstruction created (insufficient evidence)
|
|
# HUB: Still links to name
|
|
Custodian:
|
|
hc_id: hc:nl-ut-utr-m-mtwe
|
|
preferred_label: name-002
|
|
```
|
|
|
|
### Scenario 3: Complete Failure ❌
|
|
|
|
```yaml
|
|
# INPUT: Ambiguous observation
|
|
CustodianObservation:
|
|
- id: obs-004
|
|
observed_name: "Stedelijk Museum"
|
|
source: ambiguous reference
|
|
|
|
# PROCESS: Failed reconstruction
|
|
ReconstructionActivity:
|
|
id: act-003
|
|
used: [obs-004]
|
|
confidence_score:
|
|
confidence_value: 0.15
|
|
confidence_method: "Multiple candidate matches found"
|
|
|
|
# OUTPUT: Nothing generated (activity failed)
|
|
# NO CustodianReconstruction
|
|
# NO CustodianName
|
|
# Observation remains unresolved
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Checklist
|
|
|
|
### Phase 1: Update Core Classes
|
|
|
|
- [ ] **ReconstructionActivity.yaml**
|
|
- [ ] Add `used` slot (CustodianObservation, multivalued)
|
|
- [ ] Add `confidence_score` slot (ConfidenceMeasure)
|
|
- [ ] Update documentation with PROV-O patterns
|
|
|
|
- [ ] **CustodianName.yaml**
|
|
- [ ] Remove `is_a: CustodianObservation`
|
|
- [ ] Add `was_derived_from` slot (CustodianObservation, multivalued)
|
|
- [ ] Add `was_generated_by` slot (ReconstructionActivity, optional)
|
|
- [ ] Add `refers_to_custodian` slot (Custodian)
|
|
- [ ] Update class_uri to `skos:Concept`
|
|
- [ ] Update documentation explaining derivation vs. inheritance
|
|
|
|
- [ ] **Custodian.yaml**
|
|
- [ ] Add `preferred_label` slot (CustodianName)
|
|
- [ ] Update documentation explaining preferred label usage
|
|
|
|
- [ ] **CustodianReconstruction.yaml**
|
|
- [ ] Remove `confidence_score` slot (moved to ReconstructionActivity)
|
|
- [ ] Update documentation clarifying it's generated by Activity
|
|
|
|
- [ ] **CustodianObservation.yaml**
|
|
- [ ] Update documentation explaining role as Activity input
|
|
- [ ] Add examples showing PROV-O flow
|
|
|
|
### Phase 2: Create/Update Slots
|
|
|
|
- [ ] **modules/slots/used.yaml** (NEW)
|
|
- Create slot for `prov:used` property
|
|
|
|
- [ ] **modules/slots/preferred_label.yaml** (NEW)
|
|
- Create slot for `skos:prefLabel` property
|
|
|
|
- [ ] **modules/slots/confidence_score.yaml** (UPDATE)
|
|
- Move from CustodianReconstruction to ReconstructionActivity
|
|
|
|
### Phase 3: Update Main Schema
|
|
|
|
- [ ] **01_custodian_name_modular.yaml**
|
|
- Add imports: `modules/slots/used`, `modules/slots/preferred_label`
|
|
- Update comments explaining PROV-O pattern
|
|
|
|
### Phase 4: Documentation
|
|
|
|
- [ ] Update `HUB_ARCHITECTURE_DIAGRAM.md` with correct flow
|
|
- [ ] Create examples showing all three scenarios
|
|
- [ ] Update PROV-O alignment documentation
|
|
|
|
### Phase 5: Validation
|
|
|
|
- [ ] Run `gen-owl` to validate schema
|
|
- [ ] Create test instances for all three scenarios
|
|
- [ ] Validate RDF output
|
|
- [ ] Update UML diagrams
|
|
|
|
---
|
|
|
|
## Ontology Properties Used
|
|
|
|
### PROV-O Properties
|
|
|
|
**prov:used** (Activity → Entity):
|
|
> "A prov:Entity that was used by this prov:Activity."
|
|
- Domain: prov:Activity
|
|
- Range: prov:Entity
|
|
- Use: ReconstructionActivity uses CustodianObservation(s) as input
|
|
|
|
**prov:wasGeneratedBy** (Entity → Activity):
|
|
> "Generation is the completion of production of a new entity by an activity."
|
|
- Domain: prov:Entity
|
|
- Range: prov:Activity
|
|
- Use: CustodianReconstruction/CustodianName generated by ReconstructionActivity
|
|
|
|
**prov:wasDerivedFrom** (Entity → Entity):
|
|
> "A derivation is a transformation of an entity into another."
|
|
- Domain: prov:Entity
|
|
- Range: prov:Entity
|
|
- Use: CustodianName/CustodianReconstruction derived from CustodianObservation(s)
|
|
|
|
**prov:confidence** (Activity → ConfidenceMeasure):
|
|
> "Confidence in the activity's process or methodology."
|
|
- Extension of PROV-O
|
|
- Domain: prov:Activity
|
|
- Range: xsd:float (0.0-1.0)
|
|
|
|
### SKOS Properties
|
|
|
|
**skos:prefLabel** (Concept → Literal):
|
|
> "The preferred lexical label for a resource, in a given language."
|
|
- Domain: skos:Concept
|
|
- Range: rdfs:Literal OR CustodianName (as structured value)
|
|
- Use: Custodian.preferred_label → CustodianName
|
|
|
|
---
|
|
|
|
## Rationale
|
|
|
|
### Why CustodianName is NOT a subclass of CustodianObservation
|
|
|
|
**Conceptual Distinction**:
|
|
- **CustodianObservation**: Evidence seen in a source (emic or etic)
|
|
- **CustodianName**: Standardized interpretation of observations
|
|
|
|
**Temporal Distinction**:
|
|
- **Observation**: Records historical state ("what was written in 1920")
|
|
- **Name**: Current standardized form ("what we call it now")
|
|
|
|
**Ontological Distinction**:
|
|
- **Observation**: `pico:PersonObservation`, `crm:E73_Information_Object`
|
|
- **Name**: `skos:Concept`, `schema:name`, `rdfs:label`
|
|
|
|
**Example**:
|
|
```
|
|
Observation 1: "Rijks" (seen on letterhead, 2015)
|
|
Observation 2: "Rijksmuseum Amsterdam" (seen in ISIL registry, 2020)
|
|
Observation 3: "The Rijksmuseum" (seen in guidebook, 2018)
|
|
|
|
↓ DERIVATION (not inheritance)
|
|
|
|
CustodianName: "Rijksmuseum" (standardized emic name, 2025)
|
|
```
|
|
|
|
The name is **derived from** observations through interpretation, not a **type of** observation.
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
**PROV-O Specification**:
|
|
- [PROV-O: The PROV Ontology](https://www.w3.org/TR/prov-o/)
|
|
- [PROV-O Usage Examples](https://www.w3.org/TR/prov-o/#examples)
|
|
- Local file: `/data/ontology/prov-o.rdf`
|
|
|
|
**SKOS Specification**:
|
|
- [SKOS Simple Knowledge Organization System](https://www.w3.org/TR/skos-reference/)
|
|
- prefLabel: https://www.w3.org/TR/skos-reference/#labels
|
|
- Local file: `/data/ontology/skos.rdf`
|
|
|
|
**PiCo Pattern**:
|
|
- [PiCo: Persons in Context](https://github.com/FICLIT/PiCo)
|
|
- Inspiration for observation-reconstruction pattern
|
|
|
|
---
|
|
|
|
**Status**: 🚨 AWAITING IMPLEMENTATION
|
|
**Priority**: HIGH - Fundamental architectural fix
|
|
**Impact**: Changes class relationships, moves properties, removes inheritance
|