# Critical Architectural Fix - Observation-Reconstruction Relationships **Date**: 2025-11-22 **Status**: 🚨 CRITICAL FIX NEEDED **Priority**: HIGH ## Problem Identified The current schema has **incorrect relationships** between key classes: ### Current Issues 1. **ConfidenceMeasure attached to wrong class** - ❌ Current: `CustodianReconstruction.confidence_score` β†’ `ConfidenceMeasure` - βœ… Correct: `ReconstructionActivity.confidence_score` β†’ `ConfidenceMeasure` - **Rationale**: Confidence measures the PROCESS quality, not the RESULT 2. **CustodianObservation not linked to ReconstructionActivity** - ❌ Current: No direct link from `CustodianObservation` β†’ `ReconstructionActivity` - βœ… Correct: `ReconstructionActivity.used` β†’ `CustodianObservation` (multivalued) - **Rationale**: PROV-O Activity uses Entities as inputs 3. **CustodianName incorrectly subclasses CustodianObservation** - ❌ Current: `CustodianName is_a CustodianObservation` - βœ… Correct: `CustodianName` is **DERIVED FROM** `CustodianObservation` (not inheritance!) - **Rationale**: CustodianName is the OUTPUT of interpretation, not a type of observation 4. **CustodianName not directly linked to Custodian hub** - ❌ Current: No direct link from `Custodian` β†’ `CustodianName` - βœ… Correct: `Custodian.preferred_label` β†’ `CustodianName` - **Rationale**: The hub needs a direct reference to its standardized emic name --- ## Correct Architecture Pattern ### PROV-O Activity Pattern ``` CustodianObservation (Entity - Input) ↓ prov:used ReconstructionActivity (Activity - Process) β”œβ”€β”€ confidence_score β†’ ConfidenceMeasure (quality of process) β”œβ”€β”€ prov:wasGeneratedBy β†’ Agent ↓ prov:wasGeneratedBy (output) CustodianReconstruction OR CustodianName (Entity - Output) ``` ### Key Principles 1. **Activity Uses Entities** (`prov:used`) - ReconstructionActivity consumes CustodianObservation(s) - Multiple observations can feed into one activity 2. **Activity Generates Entities** (`prov:wasGeneratedBy`) - ReconstructionActivity generates CustodianReconstruction (success case) - OR ReconstructionActivity generates CustodianName (partial success) - Activity can fail without generating any output 3. **Derivation Tracks Lineage** (`prov:wasDerivedFrom`) - CustodianName derives from CustodianObservation(s) - CustodianReconstruction derives from CustodianObservation(s) - Derivation is separate from Activity (tracks transformation) 4. **Confidence Measures Activity Quality** - Confidence attached to ReconstructionActivity (process quality) - NOT attached to CustodianReconstruction (result) - Represents: "How confident are we in this reconstruction process?" --- ## Required Changes ### 1. Move ConfidenceMeasure to ReconstructionActivity βœ… **File**: `modules/classes/ReconstructionActivity.yaml` **Add slot**: ```yaml slots: - id - activity_type - method - responsible_agent - temporal_extent - used # NEW - links to CustodianObservation(s) - confidence_score # NEW - moved from CustodianReconstruction - justification ``` **Add slot_usage**: ```yaml slot_usage: used: slot_uri: prov:used description: >- CustodianObservation(s) used as input for this reconstruction activity. PROV-O: used links Activity to consumed Entities. Multiple observations can contribute to a single reconstruction. range: CustodianObservation multivalued: true required: true confidence_score: slot_uri: prov:confidence description: >- Confidence in the reconstruction activity's process and methodology. Measures quality of the reconstruction PROCESS, not the result. Range: 0.0 (low confidence) to 1.0 (high confidence). range: ConfidenceMeasure required: false ``` **Remove from**: `modules/classes/CustodianReconstruction.yaml` - Delete `confidence_score` slot --- ### 2. Change CustodianName from Inheritance to Derivation βœ… **File**: `modules/classes/CustodianName.yaml` **Current** (WRONG): ```yaml CustodianName: is_a: CustodianObservation # ❌ Inheritance implies "is a type of" ``` **Corrected** (RIGHT): ```yaml CustodianName: # Remove is_a relationship class_uri: skos:Concept # Or schema:name description: >- Standardized emic name derived from CustodianObservation(s). NOT a subclass of CustodianObservation - rather, a CustodianName is DERIVED FROM observation(s) through interpretation and standardization. Can be generated by ReconstructionActivity (successful interpretation) or remain standalone (direct extraction without full entity resolution). slots: - emic_name - name_language - standardized_name - endorsement_source - was_derived_from # NEW - links to CustodianObservation(s) - was_generated_by # NEW - links to ReconstructionActivity (optional) - refers_to_custodian # NEW - links to Custodian hub - valid_from - valid_to - supersedes - superseded_by ``` **Add slot_usage**: ```yaml slot_usage: was_derived_from: slot_uri: prov:wasDerivedFrom description: >- CustodianObservation(s) from which this name was derived. PROV-O: wasDerivedFrom establishes observationβ†’name derivation. A name can be derived from multiple observations (consolidation). range: CustodianObservation multivalued: true required: true was_generated_by: slot_uri: prov:wasGeneratedBy description: >- ReconstructionActivity that generated this standardized name (optional). If null, name was directly extracted without formal reconstruction activity. range: ReconstructionActivity required: false refers_to_custodian: slot_uri: dcterms:references description: >- The Custodian hub that this name identifies. Links the standardized name back to the hub. range: Custodian required: true ``` --- ### 3. Add Preferred Label Link from Custodian to CustodianName βœ… **File**: `modules/classes/Custodian.yaml` **Add slot**: ```yaml slots: - hc_id - preferred_label # NEW - links to primary CustodianName - appellations - identifiers - created - modified ``` **Add slot_usage**: ```yaml slot_usage: preferred_label: slot_uri: skos:prefLabel description: >- The primary standardized emic name for this custodian. SKOS: prefLabel for the preferred lexical label. This is the CANONICAL name - the standardized label accepted by the custodian itself for public representation. Distinct from: - Legal name (formal registered name in CustodianReconstruction) - Alternative names (in appellations) - Historical names (superseded CustodianNames) range: CustodianName required: false # May be null if name not yet established examples: - value: "Rijksmuseum" description: "Primary emic name (not 'Stichting Rijksmuseum' legal name)" ``` --- ### 4. Update CustodianObservation Documentation βœ… **File**: `modules/classes/CustodianObservation.yaml` **Update description**: ```yaml CustodianObservation: class_uri: heritage:CustodianObservation description: >- Source-based evidence of a heritage custodian's existence. CustodianObservations are INPUT ENTITIES for ReconstructionActivity: - Multiple observations can be reconciled into a CustodianReconstruction - Multiple observations can be standardized into a CustodianName - Observations remain independent even after reconstruction PROV-O Pattern: CustodianObservation β†’ prov:used β†’ ReconstructionActivity β†’ prov:wasGeneratedBy β†’ CustodianReconstruction CustodianObservation β†’ prov:wasDerivedFrom ← CustodianName ``` --- ## Revised Class Relationships ### Complete PROV-O Flow ```mermaid graph TB subgraph Sources Obs1[CustodianObservation 1
ISIL Registry] Obs2[CustodianObservation 2
Museum Website] Obs3[CustodianObservation 3
Archival Document] end subgraph Activity Act[ReconstructionActivity
Entity Resolution] Conf[ConfidenceMeasure
Score: 0.92] end subgraph Outputs Rec[CustodianReconstruction
Legal Entity] Name[CustodianName
Standardized Emic Name] end subgraph Hub Cust[Custodian Hub
hc_id] end Obs1 -->|prov:used| Act Obs2 -->|prov:used| Act Obs3 -->|prov:used| Act Act -->|has confidence_score| Conf Act -->|prov:wasGeneratedBy| Rec Act -->|prov:wasGeneratedBy| Name Rec -->|prov:wasDerivedFrom| Obs1 Rec -->|prov:wasDerivedFrom| Obs2 Name -->|prov:wasDerivedFrom| Obs1 Name -->|prov:wasDerivedFrom| Obs3 Rec -->|refers_to_custodian| Cust Name -->|refers_to_custodian| Cust Obs1 -->|refers_to_custodian| Cust Obs2 -->|refers_to_custodian| Cust Obs3 -->|refers_to_custodian| Cust Cust -->|skos:prefLabel| Name ``` --- ## Success vs. Failure Scenarios ### Scenario 1: Successful Full Reconstruction βœ… ```yaml # INPUT: Multiple observations CustodianObservation: - id: obs-001 observed_name: "Rijks" source: letterhead - id: obs-002 observed_name: "Rijksmuseum Amsterdam" source: ISIL registry # PROCESS: Reconstruction activity ReconstructionActivity: id: act-001 used: - obs-001 - obs-002 confidence_score: confidence_value: 0.95 confidence_method: "Manual expert curation" # OUTPUT: Both reconstruction AND standardized name CustodianReconstruction: id: rec-001 legal_name: "Stichting Rijksmuseum" was_derived_from: [obs-001, obs-002] was_generated_by: act-001 refers_to_custodian: hc:nl-nh-ams-m-rm-q190804 CustodianName: id: name-001 emic_name: "Rijksmuseum" was_derived_from: [obs-001, obs-002] was_generated_by: act-001 refers_to_custodian: hc:nl-nh-ams-m-rm-q190804 # HUB: Links to preferred name Custodian: hc_id: hc:nl-nh-ams-m-rm-q190804 preferred_label: name-001 ``` ### Scenario 2: Partial Success - Name Only πŸ”Έ ```yaml # INPUT: Single observation CustodianObservation: - id: obs-003 observed_name: "Museum van de Twintigste Eeuw" source: archival document # PROCESS: Attempted reconstruction ReconstructionActivity: id: act-002 used: [obs-003] confidence_score: confidence_value: 0.45 confidence_method: "Algorithmic matching - insufficient data" # OUTPUT: Name only (reconstruction failed due to low confidence) CustodianName: id: name-002 emic_name: "Museum van de Twintigste Eeuw" was_derived_from: [obs-003] was_generated_by: act-002 refers_to_custodian: hc:nl-ut-utr-m-mtwe # NO CustodianReconstruction created (insufficient evidence) # HUB: Still links to name Custodian: hc_id: hc:nl-ut-utr-m-mtwe preferred_label: name-002 ``` ### Scenario 3: Complete Failure ❌ ```yaml # INPUT: Ambiguous observation CustodianObservation: - id: obs-004 observed_name: "Stedelijk Museum" source: ambiguous reference # PROCESS: Failed reconstruction ReconstructionActivity: id: act-003 used: [obs-004] confidence_score: confidence_value: 0.15 confidence_method: "Multiple candidate matches found" # OUTPUT: Nothing generated (activity failed) # NO CustodianReconstruction # NO CustodianName # Observation remains unresolved ``` --- ## Implementation Checklist ### Phase 1: Update Core Classes - [ ] **ReconstructionActivity.yaml** - [ ] Add `used` slot (CustodianObservation, multivalued) - [ ] Add `confidence_score` slot (ConfidenceMeasure) - [ ] Update documentation with PROV-O patterns - [ ] **CustodianName.yaml** - [ ] Remove `is_a: CustodianObservation` - [ ] Add `was_derived_from` slot (CustodianObservation, multivalued) - [ ] Add `was_generated_by` slot (ReconstructionActivity, optional) - [ ] Add `refers_to_custodian` slot (Custodian) - [ ] Update class_uri to `skos:Concept` - [ ] Update documentation explaining derivation vs. inheritance - [ ] **Custodian.yaml** - [ ] Add `preferred_label` slot (CustodianName) - [ ] Update documentation explaining preferred label usage - [ ] **CustodianReconstruction.yaml** - [ ] Remove `confidence_score` slot (moved to ReconstructionActivity) - [ ] Update documentation clarifying it's generated by Activity - [ ] **CustodianObservation.yaml** - [ ] Update documentation explaining role as Activity input - [ ] Add examples showing PROV-O flow ### Phase 2: Create/Update Slots - [ ] **modules/slots/used.yaml** (NEW) - Create slot for `prov:used` property - [ ] **modules/slots/preferred_label.yaml** (NEW) - Create slot for `skos:prefLabel` property - [ ] **modules/slots/confidence_score.yaml** (UPDATE) - Move from CustodianReconstruction to ReconstructionActivity ### Phase 3: Update Main Schema - [ ] **01_custodian_name_modular.yaml** - Add imports: `modules/slots/used`, `modules/slots/preferred_label` - Update comments explaining PROV-O pattern ### Phase 4: Documentation - [ ] Update `HUB_ARCHITECTURE_DIAGRAM.md` with correct flow - [ ] Create examples showing all three scenarios - [ ] Update PROV-O alignment documentation ### Phase 5: Validation - [ ] Run `gen-owl` to validate schema - [ ] Create test instances for all three scenarios - [ ] Validate RDF output - [ ] Update UML diagrams --- ## Ontology Properties Used ### PROV-O Properties **prov:used** (Activity β†’ Entity): > "A prov:Entity that was used by this prov:Activity." - Domain: prov:Activity - Range: prov:Entity - Use: ReconstructionActivity uses CustodianObservation(s) as input **prov:wasGeneratedBy** (Entity β†’ Activity): > "Generation is the completion of production of a new entity by an activity." - Domain: prov:Entity - Range: prov:Activity - Use: CustodianReconstruction/CustodianName generated by ReconstructionActivity **prov:wasDerivedFrom** (Entity β†’ Entity): > "A derivation is a transformation of an entity into another." - Domain: prov:Entity - Range: prov:Entity - Use: CustodianName/CustodianReconstruction derived from CustodianObservation(s) **prov:confidence** (Activity β†’ ConfidenceMeasure): > "Confidence in the activity's process or methodology." - Extension of PROV-O - Domain: prov:Activity - Range: xsd:float (0.0-1.0) ### SKOS Properties **skos:prefLabel** (Concept β†’ Literal): > "The preferred lexical label for a resource, in a given language." - Domain: skos:Concept - Range: rdfs:Literal OR CustodianName (as structured value) - Use: Custodian.preferred_label β†’ CustodianName --- ## Rationale ### Why CustodianName is NOT a subclass of CustodianObservation **Conceptual Distinction**: - **CustodianObservation**: Evidence seen in a source (emic or etic) - **CustodianName**: Standardized interpretation of observations **Temporal Distinction**: - **Observation**: Records historical state ("what was written in 1920") - **Name**: Current standardized form ("what we call it now") **Ontological Distinction**: - **Observation**: `pico:PersonObservation`, `crm:E73_Information_Object` - **Name**: `skos:Concept`, `schema:name`, `rdfs:label` **Example**: ``` Observation 1: "Rijks" (seen on letterhead, 2015) Observation 2: "Rijksmuseum Amsterdam" (seen in ISIL registry, 2020) Observation 3: "The Rijksmuseum" (seen in guidebook, 2018) ↓ DERIVATION (not inheritance) CustodianName: "Rijksmuseum" (standardized emic name, 2025) ``` The name is **derived from** observations through interpretation, not a **type of** observation. --- ## References **PROV-O Specification**: - [PROV-O: The PROV Ontology](https://www.w3.org/TR/prov-o/) - [PROV-O Usage Examples](https://www.w3.org/TR/prov-o/#examples) - Local file: `/data/ontology/prov-o.rdf` **SKOS Specification**: - [SKOS Simple Knowledge Organization System](https://www.w3.org/TR/skos-reference/) - prefLabel: https://www.w3.org/TR/skos-reference/#labels - Local file: `/data/ontology/skos.rdf` **PiCo Pattern**: - [PiCo: Persons in Context](https://github.com/FICLIT/PiCo) - Inspiration for observation-reconstruction pattern --- **Status**: 🚨 AWAITING IMPLEMENTATION **Priority**: HIGH - Fundamental architectural fix **Impact**: Changes class relationships, moves properties, removes inheritance