# Critical Architectural Fix - Observation-Reconstruction Relationships
**Date**: 2025-11-22
**Status**: π¨ CRITICAL FIX NEEDED
**Priority**: HIGH
## Problem Identified
The current schema has **incorrect relationships** between key classes:
### Current Issues
1. **ConfidenceMeasure attached to wrong class**
- β Current: `CustodianReconstruction.confidence_score` β `ConfidenceMeasure`
- β
Correct: `ReconstructionActivity.confidence_score` β `ConfidenceMeasure`
- **Rationale**: Confidence measures the PROCESS quality, not the RESULT
2. **CustodianObservation not linked to ReconstructionActivity**
- β Current: No direct link from `CustodianObservation` β `ReconstructionActivity`
- β
Correct: `ReconstructionActivity.used` β `CustodianObservation` (multivalued)
- **Rationale**: PROV-O Activity uses Entities as inputs
3. **CustodianName incorrectly subclasses CustodianObservation**
- β Current: `CustodianName is_a CustodianObservation`
- β
Correct: `CustodianName` is **DERIVED FROM** `CustodianObservation` (not inheritance!)
- **Rationale**: CustodianName is the OUTPUT of interpretation, not a type of observation
4. **CustodianName not directly linked to Custodian hub**
- β Current: No direct link from `Custodian` β `CustodianName`
- β
Correct: `Custodian.preferred_label` β `CustodianName`
- **Rationale**: The hub needs a direct reference to its standardized emic name
---
## Correct Architecture Pattern
### PROV-O Activity Pattern
```
CustodianObservation (Entity - Input)
β prov:used
ReconstructionActivity (Activity - Process)
βββ confidence_score β ConfidenceMeasure (quality of process)
βββ prov:wasGeneratedBy β Agent
β prov:wasGeneratedBy (output)
CustodianReconstruction OR CustodianName (Entity - Output)
```
### Key Principles
1. **Activity Uses Entities** (`prov:used`)
- ReconstructionActivity consumes CustodianObservation(s)
- Multiple observations can feed into one activity
2. **Activity Generates Entities** (`prov:wasGeneratedBy`)
- ReconstructionActivity generates CustodianReconstruction (success case)
- OR ReconstructionActivity generates CustodianName (partial success)
- Activity can fail without generating any output
3. **Derivation Tracks Lineage** (`prov:wasDerivedFrom`)
- CustodianName derives from CustodianObservation(s)
- CustodianReconstruction derives from CustodianObservation(s)
- Derivation is separate from Activity (tracks transformation)
4. **Confidence Measures Activity Quality**
- Confidence attached to ReconstructionActivity (process quality)
- NOT attached to CustodianReconstruction (result)
- Represents: "How confident are we in this reconstruction process?"
---
## Required Changes
### 1. Move ConfidenceMeasure to ReconstructionActivity β
**File**: `modules/classes/ReconstructionActivity.yaml`
**Add slot**:
```yaml
slots:
- id
- activity_type
- method
- responsible_agent
- temporal_extent
- used # NEW - links to CustodianObservation(s)
- confidence_score # NEW - moved from CustodianReconstruction
- justification
```
**Add slot_usage**:
```yaml
slot_usage:
used:
slot_uri: prov:used
description: >-
CustodianObservation(s) used as input for this reconstruction activity.
PROV-O: used links Activity to consumed Entities.
Multiple observations can contribute to a single reconstruction.
range: CustodianObservation
multivalued: true
required: true
confidence_score:
slot_uri: prov:confidence
description: >-
Confidence in the reconstruction activity's process and methodology.
Measures quality of the reconstruction PROCESS, not the result.
Range: 0.0 (low confidence) to 1.0 (high confidence).
range: ConfidenceMeasure
required: false
```
**Remove from**: `modules/classes/CustodianReconstruction.yaml`
- Delete `confidence_score` slot
---
### 2. Change CustodianName from Inheritance to Derivation β
**File**: `modules/classes/CustodianName.yaml`
**Current** (WRONG):
```yaml
CustodianName:
is_a: CustodianObservation # β Inheritance implies "is a type of"
```
**Corrected** (RIGHT):
```yaml
CustodianName:
# Remove is_a relationship
class_uri: skos:Concept # Or schema:name
description: >-
Standardized emic name derived from CustodianObservation(s).
NOT a subclass of CustodianObservation - rather, a CustodianName is
DERIVED FROM observation(s) through interpretation and standardization.
Can be generated by ReconstructionActivity (successful interpretation)
or remain standalone (direct extraction without full entity resolution).
slots:
- emic_name
- name_language
- standardized_name
- endorsement_source
- was_derived_from # NEW - links to CustodianObservation(s)
- was_generated_by # NEW - links to ReconstructionActivity (optional)
- refers_to_custodian # NEW - links to Custodian hub
- valid_from
- valid_to
- supersedes
- superseded_by
```
**Add slot_usage**:
```yaml
slot_usage:
was_derived_from:
slot_uri: prov:wasDerivedFrom
description: >-
CustodianObservation(s) from which this name was derived.
PROV-O: wasDerivedFrom establishes observationβname derivation.
A name can be derived from multiple observations (consolidation).
range: CustodianObservation
multivalued: true
required: true
was_generated_by:
slot_uri: prov:wasGeneratedBy
description: >-
ReconstructionActivity that generated this standardized name (optional).
If null, name was directly extracted without formal reconstruction activity.
range: ReconstructionActivity
required: false
refers_to_custodian:
slot_uri: dcterms:references
description: >-
The Custodian hub that this name identifies.
Links the standardized name back to the hub.
range: Custodian
required: true
```
---
### 3. Add Preferred Label Link from Custodian to CustodianName β
**File**: `modules/classes/Custodian.yaml`
**Add slot**:
```yaml
slots:
- hc_id
- preferred_label # NEW - links to primary CustodianName
- appellations
- identifiers
- created
- modified
```
**Add slot_usage**:
```yaml
slot_usage:
preferred_label:
slot_uri: skos:prefLabel
description: >-
The primary standardized emic name for this custodian.
SKOS: prefLabel for the preferred lexical label.
This is the CANONICAL name - the standardized label accepted by the
custodian itself for public representation.
Distinct from:
- Legal name (formal registered name in CustodianReconstruction)
- Alternative names (in appellations)
- Historical names (superseded CustodianNames)
range: CustodianName
required: false # May be null if name not yet established
examples:
- value: "Rijksmuseum"
description: "Primary emic name (not 'Stichting Rijksmuseum' legal name)"
```
---
### 4. Update CustodianObservation Documentation β
**File**: `modules/classes/CustodianObservation.yaml`
**Update description**:
```yaml
CustodianObservation:
class_uri: heritage:CustodianObservation
description: >-
Source-based evidence of a heritage custodian's existence.
CustodianObservations are INPUT ENTITIES for ReconstructionActivity:
- Multiple observations can be reconciled into a CustodianReconstruction
- Multiple observations can be standardized into a CustodianName
- Observations remain independent even after reconstruction
PROV-O Pattern:
CustodianObservation β prov:used β ReconstructionActivity β prov:wasGeneratedBy β CustodianReconstruction
CustodianObservation β prov:wasDerivedFrom β CustodianName
```
---
## Revised Class Relationships
### Complete PROV-O Flow
```mermaid
graph TB
subgraph Sources
Obs1[CustodianObservation 1
ISIL Registry]
Obs2[CustodianObservation 2
Museum Website]
Obs3[CustodianObservation 3
Archival Document]
end
subgraph Activity
Act[ReconstructionActivity
Entity Resolution]
Conf[ConfidenceMeasure
Score: 0.92]
end
subgraph Outputs
Rec[CustodianReconstruction
Legal Entity]
Name[CustodianName
Standardized Emic Name]
end
subgraph Hub
Cust[Custodian Hub
hc_id]
end
Obs1 -->|prov:used| Act
Obs2 -->|prov:used| Act
Obs3 -->|prov:used| Act
Act -->|has confidence_score| Conf
Act -->|prov:wasGeneratedBy| Rec
Act -->|prov:wasGeneratedBy| Name
Rec -->|prov:wasDerivedFrom| Obs1
Rec -->|prov:wasDerivedFrom| Obs2
Name -->|prov:wasDerivedFrom| Obs1
Name -->|prov:wasDerivedFrom| Obs3
Rec -->|refers_to_custodian| Cust
Name -->|refers_to_custodian| Cust
Obs1 -->|refers_to_custodian| Cust
Obs2 -->|refers_to_custodian| Cust
Obs3 -->|refers_to_custodian| Cust
Cust -->|skos:prefLabel| Name
```
---
## Success vs. Failure Scenarios
### Scenario 1: Successful Full Reconstruction β
```yaml
# INPUT: Multiple observations
CustodianObservation:
- id: obs-001
observed_name: "Rijks"
source: letterhead
- id: obs-002
observed_name: "Rijksmuseum Amsterdam"
source: ISIL registry
# PROCESS: Reconstruction activity
ReconstructionActivity:
id: act-001
used:
- obs-001
- obs-002
confidence_score:
confidence_value: 0.95
confidence_method: "Manual expert curation"
# OUTPUT: Both reconstruction AND standardized name
CustodianReconstruction:
id: rec-001
legal_name: "Stichting Rijksmuseum"
was_derived_from: [obs-001, obs-002]
was_generated_by: act-001
refers_to_custodian: hc:nl-nh-ams-m-rm-q190804
CustodianName:
id: name-001
emic_name: "Rijksmuseum"
was_derived_from: [obs-001, obs-002]
was_generated_by: act-001
refers_to_custodian: hc:nl-nh-ams-m-rm-q190804
# HUB: Links to preferred name
Custodian:
hc_id: hc:nl-nh-ams-m-rm-q190804
preferred_label: name-001
```
### Scenario 2: Partial Success - Name Only πΈ
```yaml
# INPUT: Single observation
CustodianObservation:
- id: obs-003
observed_name: "Museum van de Twintigste Eeuw"
source: archival document
# PROCESS: Attempted reconstruction
ReconstructionActivity:
id: act-002
used: [obs-003]
confidence_score:
confidence_value: 0.45
confidence_method: "Algorithmic matching - insufficient data"
# OUTPUT: Name only (reconstruction failed due to low confidence)
CustodianName:
id: name-002
emic_name: "Museum van de Twintigste Eeuw"
was_derived_from: [obs-003]
was_generated_by: act-002
refers_to_custodian: hc:nl-ut-utr-m-mtwe
# NO CustodianReconstruction created (insufficient evidence)
# HUB: Still links to name
Custodian:
hc_id: hc:nl-ut-utr-m-mtwe
preferred_label: name-002
```
### Scenario 3: Complete Failure β
```yaml
# INPUT: Ambiguous observation
CustodianObservation:
- id: obs-004
observed_name: "Stedelijk Museum"
source: ambiguous reference
# PROCESS: Failed reconstruction
ReconstructionActivity:
id: act-003
used: [obs-004]
confidence_score:
confidence_value: 0.15
confidence_method: "Multiple candidate matches found"
# OUTPUT: Nothing generated (activity failed)
# NO CustodianReconstruction
# NO CustodianName
# Observation remains unresolved
```
---
## Implementation Checklist
### Phase 1: Update Core Classes
- [ ] **ReconstructionActivity.yaml**
- [ ] Add `used` slot (CustodianObservation, multivalued)
- [ ] Add `confidence_score` slot (ConfidenceMeasure)
- [ ] Update documentation with PROV-O patterns
- [ ] **CustodianName.yaml**
- [ ] Remove `is_a: CustodianObservation`
- [ ] Add `was_derived_from` slot (CustodianObservation, multivalued)
- [ ] Add `was_generated_by` slot (ReconstructionActivity, optional)
- [ ] Add `refers_to_custodian` slot (Custodian)
- [ ] Update class_uri to `skos:Concept`
- [ ] Update documentation explaining derivation vs. inheritance
- [ ] **Custodian.yaml**
- [ ] Add `preferred_label` slot (CustodianName)
- [ ] Update documentation explaining preferred label usage
- [ ] **CustodianReconstruction.yaml**
- [ ] Remove `confidence_score` slot (moved to ReconstructionActivity)
- [ ] Update documentation clarifying it's generated by Activity
- [ ] **CustodianObservation.yaml**
- [ ] Update documentation explaining role as Activity input
- [ ] Add examples showing PROV-O flow
### Phase 2: Create/Update Slots
- [ ] **modules/slots/used.yaml** (NEW)
- Create slot for `prov:used` property
- [ ] **modules/slots/preferred_label.yaml** (NEW)
- Create slot for `skos:prefLabel` property
- [ ] **modules/slots/confidence_score.yaml** (UPDATE)
- Move from CustodianReconstruction to ReconstructionActivity
### Phase 3: Update Main Schema
- [ ] **01_custodian_name_modular.yaml**
- Add imports: `modules/slots/used`, `modules/slots/preferred_label`
- Update comments explaining PROV-O pattern
### Phase 4: Documentation
- [ ] Update `HUB_ARCHITECTURE_DIAGRAM.md` with correct flow
- [ ] Create examples showing all three scenarios
- [ ] Update PROV-O alignment documentation
### Phase 5: Validation
- [ ] Run `gen-owl` to validate schema
- [ ] Create test instances for all three scenarios
- [ ] Validate RDF output
- [ ] Update UML diagrams
---
## Ontology Properties Used
### PROV-O Properties
**prov:used** (Activity β Entity):
> "A prov:Entity that was used by this prov:Activity."
- Domain: prov:Activity
- Range: prov:Entity
- Use: ReconstructionActivity uses CustodianObservation(s) as input
**prov:wasGeneratedBy** (Entity β Activity):
> "Generation is the completion of production of a new entity by an activity."
- Domain: prov:Entity
- Range: prov:Activity
- Use: CustodianReconstruction/CustodianName generated by ReconstructionActivity
**prov:wasDerivedFrom** (Entity β Entity):
> "A derivation is a transformation of an entity into another."
- Domain: prov:Entity
- Range: prov:Entity
- Use: CustodianName/CustodianReconstruction derived from CustodianObservation(s)
**prov:confidence** (Activity β ConfidenceMeasure):
> "Confidence in the activity's process or methodology."
- Extension of PROV-O
- Domain: prov:Activity
- Range: xsd:float (0.0-1.0)
### SKOS Properties
**skos:prefLabel** (Concept β Literal):
> "The preferred lexical label for a resource, in a given language."
- Domain: skos:Concept
- Range: rdfs:Literal OR CustodianName (as structured value)
- Use: Custodian.preferred_label β CustodianName
---
## Rationale
### Why CustodianName is NOT a subclass of CustodianObservation
**Conceptual Distinction**:
- **CustodianObservation**: Evidence seen in a source (emic or etic)
- **CustodianName**: Standardized interpretation of observations
**Temporal Distinction**:
- **Observation**: Records historical state ("what was written in 1920")
- **Name**: Current standardized form ("what we call it now")
**Ontological Distinction**:
- **Observation**: `pico:PersonObservation`, `crm:E73_Information_Object`
- **Name**: `skos:Concept`, `schema:name`, `rdfs:label`
**Example**:
```
Observation 1: "Rijks" (seen on letterhead, 2015)
Observation 2: "Rijksmuseum Amsterdam" (seen in ISIL registry, 2020)
Observation 3: "The Rijksmuseum" (seen in guidebook, 2018)
β DERIVATION (not inheritance)
CustodianName: "Rijksmuseum" (standardized emic name, 2025)
```
The name is **derived from** observations through interpretation, not a **type of** observation.
---
## References
**PROV-O Specification**:
- [PROV-O: The PROV Ontology](https://www.w3.org/TR/prov-o/)
- [PROV-O Usage Examples](https://www.w3.org/TR/prov-o/#examples)
- Local file: `/data/ontology/prov-o.rdf`
**SKOS Specification**:
- [SKOS Simple Knowledge Organization System](https://www.w3.org/TR/skos-reference/)
- prefLabel: https://www.w3.org/TR/skos-reference/#labels
- Local file: `/data/ontology/skos.rdf`
**PiCo Pattern**:
- [PiCo: Persons in Context](https://github.com/FICLIT/PiCo)
- Inspiration for observation-reconstruction pattern
---
**Status**: π¨ AWAITING IMPLEMENTATION
**Priority**: HIGH - Fundamental architectural fix
**Impact**: Changes class relationships, moves properties, removes inheritance