glam/CRITICAL_ARCHITECTURAL_FIX_PROV.md

16 KiB

Critical Architectural Fix - Observation-Reconstruction Relationships

Date: 2025-11-22
Status: 🚨 CRITICAL FIX NEEDED
Priority: HIGH

Problem Identified

The current schema has incorrect relationships between key classes:

Current Issues

  1. ConfidenceMeasure attached to wrong class

    • Current: CustodianReconstruction.confidence_scoreConfidenceMeasure
    • Correct: ReconstructionActivity.confidence_scoreConfidenceMeasure
    • Rationale: Confidence measures the PROCESS quality, not the RESULT
  2. CustodianObservation not linked to ReconstructionActivity

    • Current: No direct link from CustodianObservationReconstructionActivity
    • Correct: ReconstructionActivity.usedCustodianObservation (multivalued)
    • Rationale: PROV-O Activity uses Entities as inputs
  3. CustodianName incorrectly subclasses CustodianObservation

    • Current: CustodianName is_a CustodianObservation
    • Correct: CustodianName is DERIVED FROM CustodianObservation (not inheritance!)
    • Rationale: CustodianName is the OUTPUT of interpretation, not a type of observation
  4. CustodianName not directly linked to Custodian hub

    • Current: No direct link from CustodianCustodianName
    • Correct: Custodian.preferred_labelCustodianName
    • Rationale: The hub needs a direct reference to its standardized emic name

Correct Architecture Pattern

PROV-O Activity Pattern

CustodianObservation (Entity - Input)
    ↓ prov:used
ReconstructionActivity (Activity - Process)
    ├── confidence_score → ConfidenceMeasure (quality of process)
    ├── prov:wasGeneratedBy → Agent
    ↓ prov:wasGeneratedBy (output)
CustodianReconstruction OR CustodianName (Entity - Output)

Key Principles

  1. Activity Uses Entities (prov:used)

    • ReconstructionActivity consumes CustodianObservation(s)
    • Multiple observations can feed into one activity
  2. Activity Generates Entities (prov:wasGeneratedBy)

    • ReconstructionActivity generates CustodianReconstruction (success case)
    • OR ReconstructionActivity generates CustodianName (partial success)
    • Activity can fail without generating any output
  3. Derivation Tracks Lineage (prov:wasDerivedFrom)

    • CustodianName derives from CustodianObservation(s)
    • CustodianReconstruction derives from CustodianObservation(s)
    • Derivation is separate from Activity (tracks transformation)
  4. Confidence Measures Activity Quality

    • Confidence attached to ReconstructionActivity (process quality)
    • NOT attached to CustodianReconstruction (result)
    • Represents: "How confident are we in this reconstruction process?"

Required Changes

1. Move ConfidenceMeasure to ReconstructionActivity

File: modules/classes/ReconstructionActivity.yaml

Add slot:

slots:
  - id
  - activity_type
  - method
  - responsible_agent
  - temporal_extent
  - used  # NEW - links to CustodianObservation(s)
  - confidence_score  # NEW - moved from CustodianReconstruction
  - justification

Add slot_usage:

slot_usage:
  used:
    slot_uri: prov:used
    description: >-
      CustodianObservation(s) used as input for this reconstruction activity.
      PROV-O: used links Activity to consumed Entities.
      Multiple observations can contribute to a single reconstruction.      
    range: CustodianObservation
    multivalued: true
    required: true
  confidence_score:
    slot_uri: prov:confidence
    description: >-
      Confidence in the reconstruction activity's process and methodology.
      Measures quality of the reconstruction PROCESS, not the result.
      Range: 0.0 (low confidence) to 1.0 (high confidence).      
    range: ConfidenceMeasure
    required: false

Remove from: modules/classes/CustodianReconstruction.yaml

  • Delete confidence_score slot

2. Change CustodianName from Inheritance to Derivation

File: modules/classes/CustodianName.yaml

Current (WRONG):

CustodianName:
  is_a: CustodianObservation  # ❌ Inheritance implies "is a type of"

Corrected (RIGHT):

CustodianName:
  # Remove is_a relationship
  class_uri: skos:Concept  # Or schema:name
  description: >-
    Standardized emic name derived from CustodianObservation(s).
    
    NOT a subclass of CustodianObservation - rather, a CustodianName is 
    DERIVED FROM observation(s) through interpretation and standardization.
    
    Can be generated by ReconstructionActivity (successful interpretation)
    or remain standalone (direct extraction without full entity resolution).    
  
  slots:
    - emic_name
    - name_language
    - standardized_name
    - endorsement_source
    - was_derived_from  # NEW - links to CustodianObservation(s)
    - was_generated_by  # NEW - links to ReconstructionActivity (optional)
    - refers_to_custodian  # NEW - links to Custodian hub
    - valid_from
    - valid_to
    - supersedes
    - superseded_by

Add slot_usage:

slot_usage:
  was_derived_from:
    slot_uri: prov:wasDerivedFrom
    description: >-
      CustodianObservation(s) from which this name was derived.
      PROV-O: wasDerivedFrom establishes observation→name derivation.
      A name can be derived from multiple observations (consolidation).      
    range: CustodianObservation
    multivalued: true
    required: true
  was_generated_by:
    slot_uri: prov:wasGeneratedBy
    description: >-
      ReconstructionActivity that generated this standardized name (optional).
      If null, name was directly extracted without formal reconstruction activity.      
    range: ReconstructionActivity
    required: false
  refers_to_custodian:
    slot_uri: dcterms:references
    description: >-
      The Custodian hub that this name identifies.
      Links the standardized name back to the hub.      
    range: Custodian
    required: true

File: modules/classes/Custodian.yaml

Add slot:

slots:
  - hc_id
  - preferred_label  # NEW - links to primary CustodianName
  - appellations
  - identifiers
  - created
  - modified

Add slot_usage:

slot_usage:
  preferred_label:
    slot_uri: skos:prefLabel
    description: >-
      The primary standardized emic name for this custodian.
      SKOS: prefLabel for the preferred lexical label.
      
      This is the CANONICAL name - the standardized label accepted by the 
      custodian itself for public representation.
      
      Distinct from:
      - Legal name (formal registered name in CustodianReconstruction)
      - Alternative names (in appellations)
      - Historical names (superseded CustodianNames)      
    range: CustodianName
    required: false  # May be null if name not yet established
    examples:
      - value: "Rijksmuseum"
        description: "Primary emic name (not 'Stichting Rijksmuseum' legal name)"

4. Update CustodianObservation Documentation

File: modules/classes/CustodianObservation.yaml

Update description:

CustodianObservation:
  class_uri: heritage:CustodianObservation
  description: >-
    Source-based evidence of a heritage custodian's existence.
    
    CustodianObservations are INPUT ENTITIES for ReconstructionActivity:
    - Multiple observations can be reconciled into a CustodianReconstruction
    - Multiple observations can be standardized into a CustodianName
    - Observations remain independent even after reconstruction
    
    PROV-O Pattern:
      CustodianObservation → prov:used → ReconstructionActivity → prov:wasGeneratedBy → CustodianReconstruction
      CustodianObservation → prov:wasDerivedFrom ← CustodianName    

Revised Class Relationships

Complete PROV-O Flow

graph TB
    subgraph Sources
        Obs1[CustodianObservation 1<br/>ISIL Registry]
        Obs2[CustodianObservation 2<br/>Museum Website]
        Obs3[CustodianObservation 3<br/>Archival Document]
    end
    
    subgraph Activity
        Act[ReconstructionActivity<br/>Entity Resolution]
        Conf[ConfidenceMeasure<br/>Score: 0.92]
    end
    
    subgraph Outputs
        Rec[CustodianReconstruction<br/>Legal Entity]
        Name[CustodianName<br/>Standardized Emic Name]
    end
    
    subgraph Hub
        Cust[Custodian Hub<br/>hc_id]
    end
    
    Obs1 -->|prov:used| Act
    Obs2 -->|prov:used| Act
    Obs3 -->|prov:used| Act
    
    Act -->|has confidence_score| Conf
    
    Act -->|prov:wasGeneratedBy| Rec
    Act -->|prov:wasGeneratedBy| Name
    
    Rec -->|prov:wasDerivedFrom| Obs1
    Rec -->|prov:wasDerivedFrom| Obs2
    
    Name -->|prov:wasDerivedFrom| Obs1
    Name -->|prov:wasDerivedFrom| Obs3
    
    Rec -->|refers_to_custodian| Cust
    Name -->|refers_to_custodian| Cust
    Obs1 -->|refers_to_custodian| Cust
    Obs2 -->|refers_to_custodian| Cust
    Obs3 -->|refers_to_custodian| Cust
    
    Cust -->|skos:prefLabel| Name

Success vs. Failure Scenarios

Scenario 1: Successful Full Reconstruction

# INPUT: Multiple observations
CustodianObservation:
  - id: obs-001
    observed_name: "Rijks"
    source: letterhead
  - id: obs-002
    observed_name: "Rijksmuseum Amsterdam"
    source: ISIL registry

# PROCESS: Reconstruction activity
ReconstructionActivity:
  id: act-001
  used:
    - obs-001
    - obs-002
  confidence_score:
    confidence_value: 0.95
    confidence_method: "Manual expert curation"

# OUTPUT: Both reconstruction AND standardized name
CustodianReconstruction:
  id: rec-001
  legal_name: "Stichting Rijksmuseum"
  was_derived_from: [obs-001, obs-002]
  was_generated_by: act-001
  refers_to_custodian: hc:nl-nh-ams-m-rm-q190804

CustodianName:
  id: name-001
  emic_name: "Rijksmuseum"
  was_derived_from: [obs-001, obs-002]
  was_generated_by: act-001
  refers_to_custodian: hc:nl-nh-ams-m-rm-q190804

# HUB: Links to preferred name
Custodian:
  hc_id: hc:nl-nh-ams-m-rm-q190804
  preferred_label: name-001

Scenario 2: Partial Success - Name Only 🔸

# INPUT: Single observation
CustodianObservation:
  - id: obs-003
    observed_name: "Museum van de Twintigste Eeuw"
    source: archival document

# PROCESS: Attempted reconstruction
ReconstructionActivity:
  id: act-002
  used: [obs-003]
  confidence_score:
    confidence_value: 0.45
    confidence_method: "Algorithmic matching - insufficient data"

# OUTPUT: Name only (reconstruction failed due to low confidence)
CustodianName:
  id: name-002
  emic_name: "Museum van de Twintigste Eeuw"
  was_derived_from: [obs-003]
  was_generated_by: act-002
  refers_to_custodian: hc:nl-ut-utr-m-mtwe

# NO CustodianReconstruction created (insufficient evidence)
# HUB: Still links to name
Custodian:
  hc_id: hc:nl-ut-utr-m-mtwe
  preferred_label: name-002

Scenario 3: Complete Failure

# INPUT: Ambiguous observation
CustodianObservation:
  - id: obs-004
    observed_name: "Stedelijk Museum"
    source: ambiguous reference

# PROCESS: Failed reconstruction
ReconstructionActivity:
  id: act-003
  used: [obs-004]
  confidence_score:
    confidence_value: 0.15
    confidence_method: "Multiple candidate matches found"

# OUTPUT: Nothing generated (activity failed)
# NO CustodianReconstruction
# NO CustodianName
# Observation remains unresolved

Implementation Checklist

Phase 1: Update Core Classes

  • ReconstructionActivity.yaml

    • Add used slot (CustodianObservation, multivalued)
    • Add confidence_score slot (ConfidenceMeasure)
    • Update documentation with PROV-O patterns
  • CustodianName.yaml

    • Remove is_a: CustodianObservation
    • Add was_derived_from slot (CustodianObservation, multivalued)
    • Add was_generated_by slot (ReconstructionActivity, optional)
    • Add refers_to_custodian slot (Custodian)
    • Update class_uri to skos:Concept
    • Update documentation explaining derivation vs. inheritance
  • Custodian.yaml

    • Add preferred_label slot (CustodianName)
    • Update documentation explaining preferred label usage
  • CustodianReconstruction.yaml

    • Remove confidence_score slot (moved to ReconstructionActivity)
    • Update documentation clarifying it's generated by Activity
  • CustodianObservation.yaml

    • Update documentation explaining role as Activity input
    • Add examples showing PROV-O flow

Phase 2: Create/Update Slots

  • modules/slots/used.yaml (NEW)

    • Create slot for prov:used property
  • modules/slots/preferred_label.yaml (NEW)

    • Create slot for skos:prefLabel property
  • modules/slots/confidence_score.yaml (UPDATE)

    • Move from CustodianReconstruction to ReconstructionActivity

Phase 3: Update Main Schema

  • 01_custodian_name_modular.yaml
    • Add imports: modules/slots/used, modules/slots/preferred_label
    • Update comments explaining PROV-O pattern

Phase 4: Documentation

  • Update HUB_ARCHITECTURE_DIAGRAM.md with correct flow
  • Create examples showing all three scenarios
  • Update PROV-O alignment documentation

Phase 5: Validation

  • Run gen-owl to validate schema
  • Create test instances for all three scenarios
  • Validate RDF output
  • Update UML diagrams

Ontology Properties Used

PROV-O Properties

prov:used (Activity → Entity):

"A prov:Entity that was used by this prov:Activity."

  • Domain: prov:Activity
  • Range: prov:Entity
  • Use: ReconstructionActivity uses CustodianObservation(s) as input

prov:wasGeneratedBy (Entity → Activity):

"Generation is the completion of production of a new entity by an activity."

  • Domain: prov:Entity
  • Range: prov:Activity
  • Use: CustodianReconstruction/CustodianName generated by ReconstructionActivity

prov:wasDerivedFrom (Entity → Entity):

"A derivation is a transformation of an entity into another."

  • Domain: prov:Entity
  • Range: prov:Entity
  • Use: CustodianName/CustodianReconstruction derived from CustodianObservation(s)

prov:confidence (Activity → ConfidenceMeasure):

"Confidence in the activity's process or methodology."

  • Extension of PROV-O
  • Domain: prov:Activity
  • Range: xsd:float (0.0-1.0)

SKOS Properties

skos:prefLabel (Concept → Literal):

"The preferred lexical label for a resource, in a given language."

  • Domain: skos:Concept
  • Range: rdfs:Literal OR CustodianName (as structured value)
  • Use: Custodian.preferred_label → CustodianName

Rationale

Why CustodianName is NOT a subclass of CustodianObservation

Conceptual Distinction:

  • CustodianObservation: Evidence seen in a source (emic or etic)
  • CustodianName: Standardized interpretation of observations

Temporal Distinction:

  • Observation: Records historical state ("what was written in 1920")
  • Name: Current standardized form ("what we call it now")

Ontological Distinction:

  • Observation: pico:PersonObservation, crm:E73_Information_Object
  • Name: skos:Concept, schema:name, rdfs:label

Example:

Observation 1: "Rijks" (seen on letterhead, 2015)
Observation 2: "Rijksmuseum Amsterdam" (seen in ISIL registry, 2020)
Observation 3: "The Rijksmuseum" (seen in guidebook, 2018)

↓ DERIVATION (not inheritance)

CustodianName: "Rijksmuseum" (standardized emic name, 2025)

The name is derived from observations through interpretation, not a type of observation.


References

PROV-O Specification:

SKOS Specification:

PiCo Pattern:


Status: 🚨 AWAITING IMPLEMENTATION
Priority: HIGH - Fundamental architectural fix
Impact: Changes class relationships, moves properties, removes inheritance