16 KiB
Critical Architectural Fix - Observation-Reconstruction Relationships
Date: 2025-11-22
Status: 🚨 CRITICAL FIX NEEDED
Priority: HIGH
Problem Identified
The current schema has incorrect relationships between key classes:
Current Issues
-
ConfidenceMeasure attached to wrong class
- ❌ Current:
CustodianReconstruction.confidence_score→ConfidenceMeasure - ✅ Correct:
ReconstructionActivity.confidence_score→ConfidenceMeasure - Rationale: Confidence measures the PROCESS quality, not the RESULT
- ❌ Current:
-
CustodianObservation not linked to ReconstructionActivity
- ❌ Current: No direct link from
CustodianObservation→ReconstructionActivity - ✅ Correct:
ReconstructionActivity.used→CustodianObservation(multivalued) - Rationale: PROV-O Activity uses Entities as inputs
- ❌ Current: No direct link from
-
CustodianName incorrectly subclasses CustodianObservation
- ❌ Current:
CustodianName is_a CustodianObservation - ✅ Correct:
CustodianNameis DERIVED FROMCustodianObservation(not inheritance!) - Rationale: CustodianName is the OUTPUT of interpretation, not a type of observation
- ❌ Current:
-
CustodianName not directly linked to Custodian hub
- ❌ Current: No direct link from
Custodian→CustodianName - ✅ Correct:
Custodian.preferred_label→CustodianName - Rationale: The hub needs a direct reference to its standardized emic name
- ❌ Current: No direct link from
Correct Architecture Pattern
PROV-O Activity Pattern
CustodianObservation (Entity - Input)
↓ prov:used
ReconstructionActivity (Activity - Process)
├── confidence_score → ConfidenceMeasure (quality of process)
├── prov:wasGeneratedBy → Agent
↓ prov:wasGeneratedBy (output)
CustodianReconstruction OR CustodianName (Entity - Output)
Key Principles
-
Activity Uses Entities (
prov:used)- ReconstructionActivity consumes CustodianObservation(s)
- Multiple observations can feed into one activity
-
Activity Generates Entities (
prov:wasGeneratedBy)- ReconstructionActivity generates CustodianReconstruction (success case)
- OR ReconstructionActivity generates CustodianName (partial success)
- Activity can fail without generating any output
-
Derivation Tracks Lineage (
prov:wasDerivedFrom)- CustodianName derives from CustodianObservation(s)
- CustodianReconstruction derives from CustodianObservation(s)
- Derivation is separate from Activity (tracks transformation)
-
Confidence Measures Activity Quality
- Confidence attached to ReconstructionActivity (process quality)
- NOT attached to CustodianReconstruction (result)
- Represents: "How confident are we in this reconstruction process?"
Required Changes
1. Move ConfidenceMeasure to ReconstructionActivity ✅
File: modules/classes/ReconstructionActivity.yaml
Add slot:
slots:
- id
- activity_type
- method
- responsible_agent
- temporal_extent
- used # NEW - links to CustodianObservation(s)
- confidence_score # NEW - moved from CustodianReconstruction
- justification
Add slot_usage:
slot_usage:
used:
slot_uri: prov:used
description: >-
CustodianObservation(s) used as input for this reconstruction activity.
PROV-O: used links Activity to consumed Entities.
Multiple observations can contribute to a single reconstruction.
range: CustodianObservation
multivalued: true
required: true
confidence_score:
slot_uri: prov:confidence
description: >-
Confidence in the reconstruction activity's process and methodology.
Measures quality of the reconstruction PROCESS, not the result.
Range: 0.0 (low confidence) to 1.0 (high confidence).
range: ConfidenceMeasure
required: false
Remove from: modules/classes/CustodianReconstruction.yaml
- Delete
confidence_scoreslot
2. Change CustodianName from Inheritance to Derivation ✅
File: modules/classes/CustodianName.yaml
Current (WRONG):
CustodianName:
is_a: CustodianObservation # ❌ Inheritance implies "is a type of"
Corrected (RIGHT):
CustodianName:
# Remove is_a relationship
class_uri: skos:Concept # Or schema:name
description: >-
Standardized emic name derived from CustodianObservation(s).
NOT a subclass of CustodianObservation - rather, a CustodianName is
DERIVED FROM observation(s) through interpretation and standardization.
Can be generated by ReconstructionActivity (successful interpretation)
or remain standalone (direct extraction without full entity resolution).
slots:
- emic_name
- name_language
- standardized_name
- endorsement_source
- was_derived_from # NEW - links to CustodianObservation(s)
- was_generated_by # NEW - links to ReconstructionActivity (optional)
- refers_to_custodian # NEW - links to Custodian hub
- valid_from
- valid_to
- supersedes
- superseded_by
Add slot_usage:
slot_usage:
was_derived_from:
slot_uri: prov:wasDerivedFrom
description: >-
CustodianObservation(s) from which this name was derived.
PROV-O: wasDerivedFrom establishes observation→name derivation.
A name can be derived from multiple observations (consolidation).
range: CustodianObservation
multivalued: true
required: true
was_generated_by:
slot_uri: prov:wasGeneratedBy
description: >-
ReconstructionActivity that generated this standardized name (optional).
If null, name was directly extracted without formal reconstruction activity.
range: ReconstructionActivity
required: false
refers_to_custodian:
slot_uri: dcterms:references
description: >-
The Custodian hub that this name identifies.
Links the standardized name back to the hub.
range: Custodian
required: true
3. Add Preferred Label Link from Custodian to CustodianName ✅
File: modules/classes/Custodian.yaml
Add slot:
slots:
- hc_id
- preferred_label # NEW - links to primary CustodianName
- appellations
- identifiers
- created
- modified
Add slot_usage:
slot_usage:
preferred_label:
slot_uri: skos:prefLabel
description: >-
The primary standardized emic name for this custodian.
SKOS: prefLabel for the preferred lexical label.
This is the CANONICAL name - the standardized label accepted by the
custodian itself for public representation.
Distinct from:
- Legal name (formal registered name in CustodianReconstruction)
- Alternative names (in appellations)
- Historical names (superseded CustodianNames)
range: CustodianName
required: false # May be null if name not yet established
examples:
- value: "Rijksmuseum"
description: "Primary emic name (not 'Stichting Rijksmuseum' legal name)"
4. Update CustodianObservation Documentation ✅
File: modules/classes/CustodianObservation.yaml
Update description:
CustodianObservation:
class_uri: heritage:CustodianObservation
description: >-
Source-based evidence of a heritage custodian's existence.
CustodianObservations are INPUT ENTITIES for ReconstructionActivity:
- Multiple observations can be reconciled into a CustodianReconstruction
- Multiple observations can be standardized into a CustodianName
- Observations remain independent even after reconstruction
PROV-O Pattern:
CustodianObservation → prov:used → ReconstructionActivity → prov:wasGeneratedBy → CustodianReconstruction
CustodianObservation → prov:wasDerivedFrom ← CustodianName
Revised Class Relationships
Complete PROV-O Flow
graph TB
subgraph Sources
Obs1[CustodianObservation 1<br/>ISIL Registry]
Obs2[CustodianObservation 2<br/>Museum Website]
Obs3[CustodianObservation 3<br/>Archival Document]
end
subgraph Activity
Act[ReconstructionActivity<br/>Entity Resolution]
Conf[ConfidenceMeasure<br/>Score: 0.92]
end
subgraph Outputs
Rec[CustodianReconstruction<br/>Legal Entity]
Name[CustodianName<br/>Standardized Emic Name]
end
subgraph Hub
Cust[Custodian Hub<br/>hc_id]
end
Obs1 -->|prov:used| Act
Obs2 -->|prov:used| Act
Obs3 -->|prov:used| Act
Act -->|has confidence_score| Conf
Act -->|prov:wasGeneratedBy| Rec
Act -->|prov:wasGeneratedBy| Name
Rec -->|prov:wasDerivedFrom| Obs1
Rec -->|prov:wasDerivedFrom| Obs2
Name -->|prov:wasDerivedFrom| Obs1
Name -->|prov:wasDerivedFrom| Obs3
Rec -->|refers_to_custodian| Cust
Name -->|refers_to_custodian| Cust
Obs1 -->|refers_to_custodian| Cust
Obs2 -->|refers_to_custodian| Cust
Obs3 -->|refers_to_custodian| Cust
Cust -->|skos:prefLabel| Name
Success vs. Failure Scenarios
Scenario 1: Successful Full Reconstruction ✅
# INPUT: Multiple observations
CustodianObservation:
- id: obs-001
observed_name: "Rijks"
source: letterhead
- id: obs-002
observed_name: "Rijksmuseum Amsterdam"
source: ISIL registry
# PROCESS: Reconstruction activity
ReconstructionActivity:
id: act-001
used:
- obs-001
- obs-002
confidence_score:
confidence_value: 0.95
confidence_method: "Manual expert curation"
# OUTPUT: Both reconstruction AND standardized name
CustodianReconstruction:
id: rec-001
legal_name: "Stichting Rijksmuseum"
was_derived_from: [obs-001, obs-002]
was_generated_by: act-001
refers_to_custodian: hc:nl-nh-ams-m-rm-q190804
CustodianName:
id: name-001
emic_name: "Rijksmuseum"
was_derived_from: [obs-001, obs-002]
was_generated_by: act-001
refers_to_custodian: hc:nl-nh-ams-m-rm-q190804
# HUB: Links to preferred name
Custodian:
hc_id: hc:nl-nh-ams-m-rm-q190804
preferred_label: name-001
Scenario 2: Partial Success - Name Only 🔸
# INPUT: Single observation
CustodianObservation:
- id: obs-003
observed_name: "Museum van de Twintigste Eeuw"
source: archival document
# PROCESS: Attempted reconstruction
ReconstructionActivity:
id: act-002
used: [obs-003]
confidence_score:
confidence_value: 0.45
confidence_method: "Algorithmic matching - insufficient data"
# OUTPUT: Name only (reconstruction failed due to low confidence)
CustodianName:
id: name-002
emic_name: "Museum van de Twintigste Eeuw"
was_derived_from: [obs-003]
was_generated_by: act-002
refers_to_custodian: hc:nl-ut-utr-m-mtwe
# NO CustodianReconstruction created (insufficient evidence)
# HUB: Still links to name
Custodian:
hc_id: hc:nl-ut-utr-m-mtwe
preferred_label: name-002
Scenario 3: Complete Failure ❌
# INPUT: Ambiguous observation
CustodianObservation:
- id: obs-004
observed_name: "Stedelijk Museum"
source: ambiguous reference
# PROCESS: Failed reconstruction
ReconstructionActivity:
id: act-003
used: [obs-004]
confidence_score:
confidence_value: 0.15
confidence_method: "Multiple candidate matches found"
# OUTPUT: Nothing generated (activity failed)
# NO CustodianReconstruction
# NO CustodianName
# Observation remains unresolved
Implementation Checklist
Phase 1: Update Core Classes
-
ReconstructionActivity.yaml
- Add
usedslot (CustodianObservation, multivalued) - Add
confidence_scoreslot (ConfidenceMeasure) - Update documentation with PROV-O patterns
- Add
-
CustodianName.yaml
- Remove
is_a: CustodianObservation - Add
was_derived_fromslot (CustodianObservation, multivalued) - Add
was_generated_byslot (ReconstructionActivity, optional) - Add
refers_to_custodianslot (Custodian) - Update class_uri to
skos:Concept - Update documentation explaining derivation vs. inheritance
- Remove
-
Custodian.yaml
- Add
preferred_labelslot (CustodianName) - Update documentation explaining preferred label usage
- Add
-
CustodianReconstruction.yaml
- Remove
confidence_scoreslot (moved to ReconstructionActivity) - Update documentation clarifying it's generated by Activity
- Remove
-
CustodianObservation.yaml
- Update documentation explaining role as Activity input
- Add examples showing PROV-O flow
Phase 2: Create/Update Slots
-
modules/slots/used.yaml (NEW)
- Create slot for
prov:usedproperty
- Create slot for
-
modules/slots/preferred_label.yaml (NEW)
- Create slot for
skos:prefLabelproperty
- Create slot for
-
modules/slots/confidence_score.yaml (UPDATE)
- Move from CustodianReconstruction to ReconstructionActivity
Phase 3: Update Main Schema
- 01_custodian_name_modular.yaml
- Add imports:
modules/slots/used,modules/slots/preferred_label - Update comments explaining PROV-O pattern
- Add imports:
Phase 4: Documentation
- Update
HUB_ARCHITECTURE_DIAGRAM.mdwith correct flow - Create examples showing all three scenarios
- Update PROV-O alignment documentation
Phase 5: Validation
- Run
gen-owlto validate schema - Create test instances for all three scenarios
- Validate RDF output
- Update UML diagrams
Ontology Properties Used
PROV-O Properties
prov:used (Activity → Entity):
"A prov:Entity that was used by this prov:Activity."
- Domain: prov:Activity
- Range: prov:Entity
- Use: ReconstructionActivity uses CustodianObservation(s) as input
prov:wasGeneratedBy (Entity → Activity):
"Generation is the completion of production of a new entity by an activity."
- Domain: prov:Entity
- Range: prov:Activity
- Use: CustodianReconstruction/CustodianName generated by ReconstructionActivity
prov:wasDerivedFrom (Entity → Entity):
"A derivation is a transformation of an entity into another."
- Domain: prov:Entity
- Range: prov:Entity
- Use: CustodianName/CustodianReconstruction derived from CustodianObservation(s)
prov:confidence (Activity → ConfidenceMeasure):
"Confidence in the activity's process or methodology."
- Extension of PROV-O
- Domain: prov:Activity
- Range: xsd:float (0.0-1.0)
SKOS Properties
skos:prefLabel (Concept → Literal):
"The preferred lexical label for a resource, in a given language."
- Domain: skos:Concept
- Range: rdfs:Literal OR CustodianName (as structured value)
- Use: Custodian.preferred_label → CustodianName
Rationale
Why CustodianName is NOT a subclass of CustodianObservation
Conceptual Distinction:
- CustodianObservation: Evidence seen in a source (emic or etic)
- CustodianName: Standardized interpretation of observations
Temporal Distinction:
- Observation: Records historical state ("what was written in 1920")
- Name: Current standardized form ("what we call it now")
Ontological Distinction:
- Observation:
pico:PersonObservation,crm:E73_Information_Object - Name:
skos:Concept,schema:name,rdfs:label
Example:
Observation 1: "Rijks" (seen on letterhead, 2015)
Observation 2: "Rijksmuseum Amsterdam" (seen in ISIL registry, 2020)
Observation 3: "The Rijksmuseum" (seen in guidebook, 2018)
↓ DERIVATION (not inheritance)
CustodianName: "Rijksmuseum" (standardized emic name, 2025)
The name is derived from observations through interpretation, not a type of observation.
References
PROV-O Specification:
- PROV-O: The PROV Ontology
- PROV-O Usage Examples
- Local file:
/data/ontology/prov-o.rdf
SKOS Specification:
- SKOS Simple Knowledge Organization System
- prefLabel: https://www.w3.org/TR/skos-reference/#labels
- Local file:
/data/ontology/skos.rdf
PiCo Pattern:
- PiCo: Persons in Context
- Inspiration for observation-reconstruction pattern
Status: 🚨 AWAITING IMPLEMENTATION
Priority: HIGH - Fundamental architectural fix
Impact: Changes class relationships, moves properties, removes inheritance