glam/CUSTODIAN_MULTI_ASPECT_REFACTORING.md

18 KiB

Custodian Multi-Aspect Refactoring

Date: 2025-11-22
Status: 🚨 CRITICAL ARCHITECTURE REFINEMENT
Priority: HIGH - Multi-aspect entity modeling

Summary

Refine the observation-reconstruction pattern to properly model heritage custodians as multi-aspect entities with three independent facets:

  1. CustodianLegalStatus - Formal legal entity (precise, registered)
  2. CustodianName - Emic label (ambiguous, contextual)
  3. CustodianPlace - Nominal place designation (not coordinates!)

All three aspects are possible outputs of ReconstructionActivity and independently identify the Custodian hub.


Architectural Principles

1. Custodian as Multi-Aspect Hub

The Custodian class is an aggregation hub for three independent aspects:

CustodianObservation (Evidence)
        ↓ prov:used
ReconstructionActivity (Process)
        ↓ prov:wasGeneratedBy (multiple possible outputs)
        ├─→ CustodianLegalStatus (formal legal entity)
        ├─→ CustodianName (emic label)
        └─→ CustodianPlace (nominal place reference)
                ↓ refers_to_custodian
            Custodian (hub)

Key Insight: ReconstructionActivity MAY generate 0, 1, 2, or all 3 aspects depending on available evidence.

2. CustodianLegalStatus (formerly CustodianReconstruction)

Purpose: Represent the FORMAL LEGAL ENTITY with precise definition.

Characteristics:

  • Precisely defined through legal registration
  • Has formal legal form (ISO 20275 codes)
  • Has registered legal name
  • Has KvK/company registration number
  • Less ambiguous than CustodianName

Rename Rationale:

  • "Reconstruction" implies the entire process, not just legal status
  • "LegalStatus" clarifies this is about FORMAL REGISTRATION
  • Distinguishes from other aspects (name, place)

Example:

CustodianLegalStatus:
  legal_name: "Stichting Rijksmuseum"
  legal_form: "http://purl.org/legal/LegalForm/Stichting"  # ISO 20275
  registration_number: "KvK 41215100"
  registration_date: "1995-01-01"
  refers_to_custodian: hc:nl-nh-ams-m-rm-q190804

3. CustodianName (refined definition)

Purpose: Represent the EMIC LABEL - how the custodian identifies itself publicly.

Characteristics:

  • Ambiguous (context-dependent)
  • May vary by audience/medium
  • NOT the legal name
  • Preferred public-facing label

Example:

CustodianName:
  emic_name: "Rijksmuseum"
  name_language: "nl"
  endorsement_source: "Museum website, signage"
  refers_to_custodian: hc:nl-nh-ams-m-rm-q190804

4. CustodianPlace (NEW CLASS)

Purpose: Represent NOMINAL PLACE DESIGNATION - how the custodian is identified by place reference.

CRITICAL: This is NOT geographic coordinates! This is a nominal reference to a place as a way of identifying the custodian.

Characteristics:

  • Nominal (name-based) place reference
  • May be vague or contextual
  • Historical place names
  • Different levels of specificity

Examples:

# Example 1: Building nickname as place reference
CustodianPlace:
  place_name: "het herenhuis in de Schilderswijk"
  place_specificity: NEIGHBORHOOD
  place_language: "nl"
  refers_to_custodian: hc:nl-zh-hag-m-xyz

# Example 2: Just "the mansion"
CustodianPlace:
  place_name: "the mansion"
  place_specificity: BUILDING
  place_language: "en"
  refers_to_custodian: hc:gb-lon-lon-m-abc

# Example 3: Museum as place designation
CustodianPlace:
  place_name: "Rijksmuseum"
  place_specificity: BUILDING
  place_language: "nl"
  place_note: "Used as place reference, not institution name"
  refers_to_custodian: hc:nl-nh-ams-m-rm-q190804

Ontology Alignment:

  • crm:E53_Place - CIDOC-CRM place entity
  • schema:Place - Schema.org place
  • NOT geo:Point (that's for coordinates in separate Location class!)

Distinction from Location Class:

CustodianPlace Location
Nominal reference Geographic coordinates
"the mansion in the Schilderswijk" lat: 52.0705, lon: 4.2894
Emic/contextual Precise/measured
May be ambiguous Unambiguous
Identifies custodian Locates custodian

Observation Linking - CRITICAL CHANGE

Current (WRONG)

CustodianObservation:
  refers_to_custodian: hc:nl-nh-ams-m-rm-q190804  # ❌ Direct link!

Problem: Observation by itself CANNOT guarantee successful identification of Custodian. Only the ReconstructionActivity can determine if identification succeeds.

Corrected (RIGHT)

CustodianObservation:
  # ❌ NO refers_to_custodian link!
  # Observation must go through ReconstructionActivity first
  
ReconstructionActivity:
  used: [obs-001, obs-002]
  # Activity attempts to generate outputs...
  
CustodianLegalStatus:
  was_generated_by: activity-001
  refers_to_custodian: hc:nl-nh-ams-m-rm-q190804  # ✅ Generated output links to hub
  
CustodianName:
  was_generated_by: activity-001
  refers_to_custodian: hc:nl-nh-ams-m-rm-q190804  # ✅ Generated output links to hub
  
CustodianPlace:
  was_generated_by: activity-001
  refers_to_custodian: hc:nl-nh-ams-m-rm-q190804  # ✅ Generated output links to hub

Rationale:

  • Observation is RAW EVIDENCE (input)
  • Only AFTER ReconstructionActivity can we know if custodian is identified
  • Activity may fail → No custodian identification
  • Activity may succeed → Generated aspects link to custodian

ReconstructionActivity Outcomes

Scenario 1: Full Success - All Three Aspects

# INPUT: Rich evidence
CustodianObservation:
  - id: obs-001
    observed_name: "Stichting Rijksmuseum"
    observation_source: "KvK registration"
  - id: obs-002
    observed_name: "Rijksmuseum"
    observation_source: "Museum website"
  - id: obs-003
    observed_name: "the museum on Museumplein"
    observation_source: "Archival letter, 1920"

# PROCESS: High-confidence reconstruction
ReconstructionActivity:
  id: act-001
  used: [obs-001, obs-002, obs-003]
  confidence_score: 0.95

# OUTPUT 1: Legal status
CustodianLegalStatus:
  legal_name: "Stichting Rijksmuseum"
  legal_form: "Stichting"
  was_derived_from: [obs-001]
  was_generated_by: act-001
  refers_to_custodian: hc:nl-nh-ams-m-rm-q190804

# OUTPUT 2: Emic name
CustodianName:
  emic_name: "Rijksmuseum"
  was_derived_from: [obs-002]
  was_generated_by: act-001
  refers_to_custodian: hc:nl-nh-ams-m-rm-q190804

# OUTPUT 3: Place designation
CustodianPlace:
  place_name: "het museum op het Museumplein"
  was_derived_from: [obs-003]
  was_generated_by: act-001
  refers_to_custodian: hc:nl-nh-ams-m-rm-q190804

# HUB: All three aspects identify the same custodian
Custodian:
  hc_id: hc:nl-nh-ams-m-rm-q190804
  preferred_label: <link to CustodianName>
  legal_status: <link to CustodianLegalStatus>
  place_designation: <link to CustodianPlace>

Scenario 2: Partial Success - Name Only

# INPUT: Limited evidence
CustodianObservation:
  - id: obs-004
    observed_name: "Museum van de Twintigste Eeuw"
    observation_source: "Exhibition catalog"

# PROCESS: Low confidence
ReconstructionActivity:
  id: act-002
  used: [obs-004]
  confidence_score: 0.45

# OUTPUT: Only name (no legal status, no place)
CustodianName:
  emic_name: "Museum van de Twintigste Eeuw"
  was_derived_from: [obs-004]
  was_generated_by: act-002
  refers_to_custodian: hc:nl-ut-utr-m-mtwe

# HUB: Only name aspect available
Custodian:
  hc_id: hc:nl-ut-utr-m-mtwe
  preferred_label: <link to CustodianName>
  legal_status: null  # Unknown
  place_designation: null  # Unknown

Scenario 3: Place-Only Success

# INPUT: Archival reference to place
CustodianObservation:
  - id: obs-005
    observed_name: "het herenhuis in de Schilderswijk"
    observation_source: "Notarial deed, 1850"

# PROCESS: Place-focused reconstruction
ReconstructionActivity:
  id: act-003
  used: [obs-005]
  confidence_score: 0.75

# OUTPUT: Only place designation
CustodianPlace:
  place_name: "het herenhuis in de Schilderswijk"
  place_language: "nl"
  place_specificity: NEIGHBORHOOD
  was_derived_from: [obs-005]
  was_generated_by: act-003
  refers_to_custodian: hc:nl-zh-hag-m-xyz

# HUB: Only place aspect available
Custodian:
  hc_id: hc:nl-zh-hag-m-xyz
  preferred_label: null
  legal_status: null
  place_designation: <link to CustodianPlace>

Scenario 4: Complete Failure

# INPUT: Ambiguous observation
CustodianObservation:
  - id: obs-006
    observed_name: "Stedelijk Museum"
    observation_source: "Vague reference"

# PROCESS: Failed disambiguation
ReconstructionActivity:
  id: act-004
  used: [obs-006]
  confidence_score: 0.15
  justification: "Cannot determine which Stedelijk Museum - requires manual review"

# OUTPUT: Nothing (activity failed)
# - No CustodianLegalStatus
# - No CustodianName
# - No CustodianPlace
# - No Custodian identified

Required Changes

1. Rename CustodianReconstruction → CustodianLegalStatus

Files to modify:

  • modules/classes/CustodianReconstruction.yamlmodules/classes/CustodianLegalStatus.yaml
  • Update class_uri to reflect legal status focus
  • Update documentation emphasizing formal legal entity

New description:

CustodianLegalStatus:
  class_uri: org:FormalOrganization
  description: >-
    Formal legal entity representing a heritage custodian.
    
    CRITICAL: CustodianLegalStatus is ONE ASPECT of a custodian - the LEGAL dimension.
    
    Characteristics:
    - Precisely defined through legal registration
    - Has formal legal form (ISO 20275 codes)
    - Has registered legal name
    - Has KvK/company registration number
    - LESS AMBIGUOUS than CustodianName
    
    Example distinction:
    - CustodianLegalStatus: "Stichting Rijksmuseum" (legal entity)
    - CustodianName: "Rijksmuseum" (emic label)
    - CustodianPlace: "het museum op het Museumplein" (place reference)
    
    All three aspects refer to the SAME Custodian hub.    

2. Create CustodianPlace Class

New file: modules/classes/CustodianPlace.yaml

id: https://nde.nl/ontology/hc/class/custodian-place
name: CustodianPlace
title: Custodian Place Class

imports:
  - linkml:types
  - Custodian
  - ReconstructionActivity
  - TimeSpan

classes:
  CustodianPlace:
    class_uri: crm:E53_Place
    description: >-
      Nominal place designation used to identify a heritage custodian.
      
      CRITICAL: This is NOT geographic coordinates! This is a NOMINAL REFERENCE
      to a place as a way of identifying the custodian.
      
      CustodianPlace represents how people refer to a custodian through place:
      - "het herenhuis in de Schilderswijk" (neighborhood reference)
      - "the mansion" (generic building reference)
      - "Rijksmuseum" (building name as place, not institution name)
      
      Distinction from Location class:
      - CustodianPlace: Nominal, contextual, may be ambiguous
      - Location: Geographic coordinates, precise, unambiguous
      
      Example:
      - CustodianPlace: "the mansion in the Schilderswijk, Den Haag"
      - Location: lat 52.0705, lon 4.2894, city "Den Haag"
      
      Ontology alignment:
      - crm:E53_Place (CIDOC-CRM place entity)
      - schema:Place (Schema.org place)
      
      Generated by ReconstructionActivity, refers to Custodian hub.      
    
    exact_mappings:
      - crm:E53_Place
      - schema:Place
    
    close_mappings:
      - dcterms:Location
    
    slots:
      - place_name
      - place_language
      - place_specificity
      - place_note
      - was_derived_from
      - was_generated_by
      - refers_to_custodian
      - valid_from
      - valid_to
    
    slot_usage:
      place_name:
        slot_uri: crm:P87_is_identified_by
        description: "Nominal place designation"
        range: string
        required: true
      
      place_language:
        slot_uri: dcterms:language
        description: "Language of place name"
        range: string
        required: false
      
      place_specificity:
        description: "Level of place specificity"
        range: PlaceSpecificityEnum
        required: false
      
      place_note:
        slot_uri: skos:note
        description: "Contextual notes about place reference"
        range: string
        required: false
      
      was_derived_from:
        slot_uri: prov:wasDerivedFrom
        description: "CustodianObservation(s) from which this place designation was derived"
        range: CustodianObservation
        multivalued: true
        required: true
      
      was_generated_by:
        slot_uri: prov:wasGeneratedBy
        description: "ReconstructionActivity that generated this place designation"
        range: ReconstructionActivity
        required: false
      
      refers_to_custodian:
        slot_uri: dcterms:references
        description: "The Custodian hub that this place designation identifies"
        range: Custodian
        required: true
      
      valid_from:
        slot_uri: schema:validFrom
        description: "Start of validity period for this place designation"
        range: date
        required: false
      
      valid_to:
        slot_uri: schema:validThrough
        description: "End of validity period for this place designation"
        range: date
        required: false

New enum: PlaceSpecificityEnum

# modules/enums/PlaceSpecificityEnum.yaml
id: https://nde.nl/ontology/hc/enum/place-specificity
name: PlaceSpecificityEnum
title: Place Specificity Enumeration

enums:
  PlaceSpecificityEnum:
    description: "Level of specificity for place designations"
    permissible_values:
      BUILDING:
        description: "Specific building reference"
        meaning: crm:E24_Physical_Human-Made_Thing
      STREET:
        description: "Street-level reference"
      NEIGHBORHOOD:
        description: "Neighborhood or district reference"
      CITY:
        description: "City-level reference"
      REGION:
        description: "Regional reference"
      VAGUE:
        description: "Vague or unspecified location"

3. Remove refers_to_custodian from CustodianObservation

File: modules/classes/CustodianObservation.yaml

Change:

# REMOVE this slot from CustodianObservation:
slots:
  - refers_to_custodian  # ❌ DELETE

slot_usage:
  refers_to_custodian:  # ❌ DELETE entire slot_usage
    ...

Update description:

CustodianObservation:
  description: >-
    Source-based evidence of a heritage custodian's existence.
    
    CRITICAL: CustodianObservation does NOT directly link to Custodian!
    - Observations are RAW EVIDENCE (input to ReconstructionActivity)
    - Only ReconstructionActivity can determine if custodian is successfully identified
    - Generated outputs (LegalStatus/Name/Place) link to Custodian, not observations
    
    PROV-O Flow:
      CustodianObservation → prov:used → ReconstructionActivity
      ReconstructionActivity → prov:wasGeneratedBy → CustodianLegalStatus/Name/Place
      CustodianLegalStatus/Name/Place → refers_to_custodian → Custodian    

File: modules/classes/Custodian.yaml

Add slots:

slots:
  - hc_id
  - preferred_label  # → CustodianName (already added)
  - legal_status  # NEW → CustodianLegalStatus
  - place_designation  # NEW → CustodianPlace
  - appellations
  - identifiers
  - created
  - modified

slot_usage:
  legal_status:
    slot_uri: org:hasRegisteredOrganization
    description: >-
      The formal legal entity representing this custodian.
      
      Links to CustodianLegalStatus with legal name, legal form, registration number.
      
      May be null if legal status not yet reconstructed.      
    range: CustodianLegalStatus
    required: false
  
  place_designation:
    slot_uri: crm:P53_has_former_or_current_location
    description: >-
      Nominal place designation used to identify this custodian.
      
      Links to CustodianPlace with contextual place reference.
      
      Example: "het herenhuis in de Schilderswijk" (not coordinates!)
      
      May be null if place designation not yet reconstructed.      
    range: CustodianPlace
    required: false

Implementation Checklist

Phase 1: Rename CustodianReconstruction

  • Rename file: CustodianReconstruction.yamlCustodianLegalStatus.yaml
  • Update class name throughout file
  • Update class_uri to org:FormalOrganization
  • Update description emphasizing legal dimension
  • Find and replace all references in other files

Phase 2: Create CustodianPlace

  • Create modules/classes/CustodianPlace.yaml
  • Create modules/enums/PlaceSpecificityEnum.yaml
  • Add imports to main schema
  • Remove refers_to_custodian slot from CustodianObservation
  • Update CustodianObservation documentation
  • Verify no other files reference this link

Phase 4: Update Custodian Hub

  • Add legal_status slot (→ CustodianLegalStatus)
  • Add place_designation slot (→ CustodianPlace)
  • Update hub documentation

Phase 5: Update Examples

  • Create multi-aspect success example (all 3 outputs)
  • Create partial success examples (1-2 outputs)
  • Create failure example (no outputs)
  • Update UML diagrams

Phase 6: Documentation

  • Update PROV-O flow documentation
  • Create multi-aspect modeling guide
  • Update ontology alignment documentation
  • Create CustodianPlace vs Location distinction guide

Key Ontology Alignments

CustodianLegalStatus

  • org:FormalOrganization - W3C Organization Ontology
  • cpov:RegisteredOrganization - CPOV
  • tooi:Overheidsorganisatie - TOOI (Dutch)

CustodianPlace

  • crm:E53_Place - CIDOC-CRM place
  • schema:Place - Schema.org place
  • dcterms:Location - Dublin Core location

CustodianName

  • skos:Concept - SKOS concept
  • schema:name - Schema.org name
  • foaf:name - FOAF name

References


Status: 🔄 Ready for implementation
Priority: HIGH - Fundamental multi-aspect modeling
Impact: Renames class, adds new class, removes observation link, updates hub