glam/CUSTODIAN_MULTI_ASPECT_REFACTORING.md

642 lines
18 KiB
Markdown

# Custodian Multi-Aspect Refactoring
Date: 2025-11-22
Status: 🚨 CRITICAL ARCHITECTURE REFINEMENT
Priority: HIGH - Multi-aspect entity modeling
## Summary
Refine the observation-reconstruction pattern to properly model heritage custodians as **multi-aspect entities** with three independent facets:
1. **CustodianLegalStatus** - Formal legal entity (precise, registered)
2. **CustodianName** - Emic label (ambiguous, contextual)
3. **CustodianPlace** - Nominal place designation (not coordinates!)
All three aspects are **possible outputs** of ReconstructionActivity and **independently identify** the Custodian hub.
---
## Architectural Principles
### 1. Custodian as Multi-Aspect Hub
The Custodian class is an **aggregation hub** for three independent aspects:
```
CustodianObservation (Evidence)
↓ prov:used
ReconstructionActivity (Process)
↓ prov:wasGeneratedBy (multiple possible outputs)
├─→ CustodianLegalStatus (formal legal entity)
├─→ CustodianName (emic label)
└─→ CustodianPlace (nominal place reference)
↓ refers_to_custodian
Custodian (hub)
```
**Key Insight**: ReconstructionActivity MAY generate 0, 1, 2, or all 3 aspects depending on available evidence.
### 2. CustodianLegalStatus (formerly CustodianReconstruction)
**Purpose**: Represent the FORMAL LEGAL ENTITY with precise definition.
**Characteristics**:
- Precisely defined through legal registration
- Has formal legal form (ISO 20275 codes)
- Has registered legal name
- Has KvK/company registration number
- **Less ambiguous** than CustodianName
**Rename Rationale**:
- "Reconstruction" implies the entire process, not just legal status
- "LegalStatus" clarifies this is about FORMAL REGISTRATION
- Distinguishes from other aspects (name, place)
**Example**:
```yaml
CustodianLegalStatus:
legal_name: "Stichting Rijksmuseum"
legal_form: "http://purl.org/legal/LegalForm/Stichting" # ISO 20275
registration_number: "KvK 41215100"
registration_date: "1995-01-01"
refers_to_custodian: hc:nl-nh-ams-m-rm-q190804
```
### 3. CustodianName (refined definition)
**Purpose**: Represent the EMIC LABEL - how the custodian identifies itself publicly.
**Characteristics**:
- Ambiguous (context-dependent)
- May vary by audience/medium
- NOT the legal name
- Preferred public-facing label
**Example**:
```yaml
CustodianName:
emic_name: "Rijksmuseum"
name_language: "nl"
endorsement_source: "Museum website, signage"
refers_to_custodian: hc:nl-nh-ams-m-rm-q190804
```
### 4. CustodianPlace (NEW CLASS)
**Purpose**: Represent NOMINAL PLACE DESIGNATION - how the custodian is identified by place reference.
**CRITICAL**: This is NOT geographic coordinates! This is a **nominal reference** to a place as a way of identifying the custodian.
**Characteristics**:
- Nominal (name-based) place reference
- May be vague or contextual
- Historical place names
- Different levels of specificity
**Examples**:
```yaml
# Example 1: Building nickname as place reference
CustodianPlace:
place_name: "het herenhuis in de Schilderswijk"
place_specificity: NEIGHBORHOOD
place_language: "nl"
refers_to_custodian: hc:nl-zh-hag-m-xyz
# Example 2: Just "the mansion"
CustodianPlace:
place_name: "the mansion"
place_specificity: BUILDING
place_language: "en"
refers_to_custodian: hc:gb-lon-lon-m-abc
# Example 3: Museum as place designation
CustodianPlace:
place_name: "Rijksmuseum"
place_specificity: BUILDING
place_language: "nl"
place_note: "Used as place reference, not institution name"
refers_to_custodian: hc:nl-nh-ams-m-rm-q190804
```
**Ontology Alignment**:
- `crm:E53_Place` - CIDOC-CRM place entity
- `schema:Place` - Schema.org place
- NOT `geo:Point` (that's for coordinates in separate Location class!)
**Distinction from Location Class**:
| CustodianPlace | Location |
|----------------|----------|
| Nominal reference | Geographic coordinates |
| "the mansion in the Schilderswijk" | lat: 52.0705, lon: 4.2894 |
| Emic/contextual | Precise/measured |
| May be ambiguous | Unambiguous |
| Identifies custodian | Locates custodian |
---
## Observation Linking - CRITICAL CHANGE
### Current (WRONG)
```yaml
CustodianObservation:
refers_to_custodian: hc:nl-nh-ams-m-rm-q190804 # ❌ Direct link!
```
**Problem**: Observation by itself CANNOT guarantee successful identification of Custodian. Only the ReconstructionActivity can determine if identification succeeds.
### Corrected (RIGHT)
```yaml
CustodianObservation:
# ❌ NO refers_to_custodian link!
# Observation must go through ReconstructionActivity first
ReconstructionActivity:
used: [obs-001, obs-002]
# Activity attempts to generate outputs...
CustodianLegalStatus:
was_generated_by: activity-001
refers_to_custodian: hc:nl-nh-ams-m-rm-q190804 # ✅ Generated output links to hub
CustodianName:
was_generated_by: activity-001
refers_to_custodian: hc:nl-nh-ams-m-rm-q190804 # ✅ Generated output links to hub
CustodianPlace:
was_generated_by: activity-001
refers_to_custodian: hc:nl-nh-ams-m-rm-q190804 # ✅ Generated output links to hub
```
**Rationale**:
- Observation is RAW EVIDENCE (input)
- Only AFTER ReconstructionActivity can we know if custodian is identified
- Activity may fail → No custodian identification
- Activity may succeed → Generated aspects link to custodian
---
## ReconstructionActivity Outcomes
### Scenario 1: Full Success - All Three Aspects ✅✅✅
```yaml
# INPUT: Rich evidence
CustodianObservation:
- id: obs-001
observed_name: "Stichting Rijksmuseum"
observation_source: "KvK registration"
- id: obs-002
observed_name: "Rijksmuseum"
observation_source: "Museum website"
- id: obs-003
observed_name: "the museum on Museumplein"
observation_source: "Archival letter, 1920"
# PROCESS: High-confidence reconstruction
ReconstructionActivity:
id: act-001
used: [obs-001, obs-002, obs-003]
confidence_score: 0.95
# OUTPUT 1: Legal status
CustodianLegalStatus:
legal_name: "Stichting Rijksmuseum"
legal_form: "Stichting"
was_derived_from: [obs-001]
was_generated_by: act-001
refers_to_custodian: hc:nl-nh-ams-m-rm-q190804
# OUTPUT 2: Emic name
CustodianName:
emic_name: "Rijksmuseum"
was_derived_from: [obs-002]
was_generated_by: act-001
refers_to_custodian: hc:nl-nh-ams-m-rm-q190804
# OUTPUT 3: Place designation
CustodianPlace:
place_name: "het museum op het Museumplein"
was_derived_from: [obs-003]
was_generated_by: act-001
refers_to_custodian: hc:nl-nh-ams-m-rm-q190804
# HUB: All three aspects identify the same custodian
Custodian:
hc_id: hc:nl-nh-ams-m-rm-q190804
preferred_label: <link to CustodianName>
legal_status: <link to CustodianLegalStatus>
place_designation: <link to CustodianPlace>
```
### Scenario 2: Partial Success - Name Only ✅
```yaml
# INPUT: Limited evidence
CustodianObservation:
- id: obs-004
observed_name: "Museum van de Twintigste Eeuw"
observation_source: "Exhibition catalog"
# PROCESS: Low confidence
ReconstructionActivity:
id: act-002
used: [obs-004]
confidence_score: 0.45
# OUTPUT: Only name (no legal status, no place)
CustodianName:
emic_name: "Museum van de Twintigste Eeuw"
was_derived_from: [obs-004]
was_generated_by: act-002
refers_to_custodian: hc:nl-ut-utr-m-mtwe
# HUB: Only name aspect available
Custodian:
hc_id: hc:nl-ut-utr-m-mtwe
preferred_label: <link to CustodianName>
legal_status: null # Unknown
place_designation: null # Unknown
```
### Scenario 3: Place-Only Success ✅
```yaml
# INPUT: Archival reference to place
CustodianObservation:
- id: obs-005
observed_name: "het herenhuis in de Schilderswijk"
observation_source: "Notarial deed, 1850"
# PROCESS: Place-focused reconstruction
ReconstructionActivity:
id: act-003
used: [obs-005]
confidence_score: 0.75
# OUTPUT: Only place designation
CustodianPlace:
place_name: "het herenhuis in de Schilderswijk"
place_language: "nl"
place_specificity: NEIGHBORHOOD
was_derived_from: [obs-005]
was_generated_by: act-003
refers_to_custodian: hc:nl-zh-hag-m-xyz
# HUB: Only place aspect available
Custodian:
hc_id: hc:nl-zh-hag-m-xyz
preferred_label: null
legal_status: null
place_designation: <link to CustodianPlace>
```
### Scenario 4: Complete Failure ❌
```yaml
# INPUT: Ambiguous observation
CustodianObservation:
- id: obs-006
observed_name: "Stedelijk Museum"
observation_source: "Vague reference"
# PROCESS: Failed disambiguation
ReconstructionActivity:
id: act-004
used: [obs-006]
confidence_score: 0.15
justification: "Cannot determine which Stedelijk Museum - requires manual review"
# OUTPUT: Nothing (activity failed)
# - No CustodianLegalStatus
# - No CustodianName
# - No CustodianPlace
# - No Custodian identified
```
---
## Required Changes
### 1. Rename CustodianReconstruction → CustodianLegalStatus ✅
**Files to modify**:
- `modules/classes/CustodianReconstruction.yaml``modules/classes/CustodianLegalStatus.yaml`
- Update `class_uri` to reflect legal status focus
- Update documentation emphasizing formal legal entity
**New description**:
```yaml
CustodianLegalStatus:
class_uri: org:FormalOrganization
description: >-
Formal legal entity representing a heritage custodian.
CRITICAL: CustodianLegalStatus is ONE ASPECT of a custodian - the LEGAL dimension.
Characteristics:
- Precisely defined through legal registration
- Has formal legal form (ISO 20275 codes)
- Has registered legal name
- Has KvK/company registration number
- LESS AMBIGUOUS than CustodianName
Example distinction:
- CustodianLegalStatus: "Stichting Rijksmuseum" (legal entity)
- CustodianName: "Rijksmuseum" (emic label)
- CustodianPlace: "het museum op het Museumplein" (place reference)
All three aspects refer to the SAME Custodian hub.
```
### 2. Create CustodianPlace Class ✅
**New file**: `modules/classes/CustodianPlace.yaml`
```yaml
id: https://nde.nl/ontology/hc/class/custodian-place
name: CustodianPlace
title: Custodian Place Class
imports:
- linkml:types
- Custodian
- ReconstructionActivity
- TimeSpan
classes:
CustodianPlace:
class_uri: crm:E53_Place
description: >-
Nominal place designation used to identify a heritage custodian.
CRITICAL: This is NOT geographic coordinates! This is a NOMINAL REFERENCE
to a place as a way of identifying the custodian.
CustodianPlace represents how people refer to a custodian through place:
- "het herenhuis in de Schilderswijk" (neighborhood reference)
- "the mansion" (generic building reference)
- "Rijksmuseum" (building name as place, not institution name)
Distinction from Location class:
- CustodianPlace: Nominal, contextual, may be ambiguous
- Location: Geographic coordinates, precise, unambiguous
Example:
- CustodianPlace: "the mansion in the Schilderswijk, Den Haag"
- Location: lat 52.0705, lon 4.2894, city "Den Haag"
Ontology alignment:
- crm:E53_Place (CIDOC-CRM place entity)
- schema:Place (Schema.org place)
Generated by ReconstructionActivity, refers to Custodian hub.
exact_mappings:
- crm:E53_Place
- schema:Place
close_mappings:
- dcterms:Location
slots:
- place_name
- place_language
- place_specificity
- place_note
- was_derived_from
- was_generated_by
- refers_to_custodian
- valid_from
- valid_to
slot_usage:
place_name:
slot_uri: crm:P87_is_identified_by
description: "Nominal place designation"
range: string
required: true
place_language:
slot_uri: dcterms:language
description: "Language of place name"
range: string
required: false
place_specificity:
description: "Level of place specificity"
range: PlaceSpecificityEnum
required: false
place_note:
slot_uri: skos:note
description: "Contextual notes about place reference"
range: string
required: false
was_derived_from:
slot_uri: prov:wasDerivedFrom
description: "CustodianObservation(s) from which this place designation was derived"
range: CustodianObservation
multivalued: true
required: true
was_generated_by:
slot_uri: prov:wasGeneratedBy
description: "ReconstructionActivity that generated this place designation"
range: ReconstructionActivity
required: false
refers_to_custodian:
slot_uri: dcterms:references
description: "The Custodian hub that this place designation identifies"
range: Custodian
required: true
valid_from:
slot_uri: schema:validFrom
description: "Start of validity period for this place designation"
range: date
required: false
valid_to:
slot_uri: schema:validThrough
description: "End of validity period for this place designation"
range: date
required: false
```
**New enum**: PlaceSpecificityEnum
```yaml
# modules/enums/PlaceSpecificityEnum.yaml
id: https://nde.nl/ontology/hc/enum/place-specificity
name: PlaceSpecificityEnum
title: Place Specificity Enumeration
enums:
PlaceSpecificityEnum:
description: "Level of specificity for place designations"
permissible_values:
BUILDING:
description: "Specific building reference"
meaning: crm:E24_Physical_Human-Made_Thing
STREET:
description: "Street-level reference"
NEIGHBORHOOD:
description: "Neighborhood or district reference"
CITY:
description: "City-level reference"
REGION:
description: "Regional reference"
VAGUE:
description: "Vague or unspecified location"
```
### 3. Remove refers_to_custodian from CustodianObservation ✅
**File**: `modules/classes/CustodianObservation.yaml`
**Change**:
```yaml
# REMOVE this slot from CustodianObservation:
slots:
- refers_to_custodian # ❌ DELETE
slot_usage:
refers_to_custodian: # ❌ DELETE entire slot_usage
...
```
**Update description**:
```yaml
CustodianObservation:
description: >-
Source-based evidence of a heritage custodian's existence.
CRITICAL: CustodianObservation does NOT directly link to Custodian!
- Observations are RAW EVIDENCE (input to ReconstructionActivity)
- Only ReconstructionActivity can determine if custodian is successfully identified
- Generated outputs (LegalStatus/Name/Place) link to Custodian, not observations
PROV-O Flow:
CustodianObservation → prov:used → ReconstructionActivity
ReconstructionActivity → prov:wasGeneratedBy → CustodianLegalStatus/Name/Place
CustodianLegalStatus/Name/Place → refers_to_custodian → Custodian
```
### 4. Update Custodian Hub Links ✅
**File**: `modules/classes/Custodian.yaml`
**Add slots**:
```yaml
slots:
- hc_id
- preferred_label # → CustodianName (already added)
- legal_status # NEW → CustodianLegalStatus
- place_designation # NEW → CustodianPlace
- appellations
- identifiers
- created
- modified
slot_usage:
legal_status:
slot_uri: org:hasRegisteredOrganization
description: >-
The formal legal entity representing this custodian.
Links to CustodianLegalStatus with legal name, legal form, registration number.
May be null if legal status not yet reconstructed.
range: CustodianLegalStatus
required: false
place_designation:
slot_uri: crm:P53_has_former_or_current_location
description: >-
Nominal place designation used to identify this custodian.
Links to CustodianPlace with contextual place reference.
Example: "het herenhuis in de Schilderswijk" (not coordinates!)
May be null if place designation not yet reconstructed.
range: CustodianPlace
required: false
```
---
## Implementation Checklist
### Phase 1: Rename CustodianReconstruction
- [ ] Rename file: `CustodianReconstruction.yaml``CustodianLegalStatus.yaml`
- [ ] Update class name throughout file
- [ ] Update `class_uri` to `org:FormalOrganization`
- [ ] Update description emphasizing legal dimension
- [ ] Find and replace all references in other files
### Phase 2: Create CustodianPlace
- [ ] Create `modules/classes/CustodianPlace.yaml`
- [ ] Create `modules/enums/PlaceSpecificityEnum.yaml`
- [ ] Add imports to main schema
### Phase 3: Remove Observation→Custodian Link
- [ ] Remove `refers_to_custodian` slot from CustodianObservation
- [ ] Update CustodianObservation documentation
- [ ] Verify no other files reference this link
### Phase 4: Update Custodian Hub
- [ ] Add `legal_status` slot (→ CustodianLegalStatus)
- [ ] Add `place_designation` slot (→ CustodianPlace)
- [ ] Update hub documentation
### Phase 5: Update Examples
- [ ] Create multi-aspect success example (all 3 outputs)
- [ ] Create partial success examples (1-2 outputs)
- [ ] Create failure example (no outputs)
- [ ] Update UML diagrams
### Phase 6: Documentation
- [ ] Update PROV-O flow documentation
- [ ] Create multi-aspect modeling guide
- [ ] Update ontology alignment documentation
- [ ] Create CustodianPlace vs Location distinction guide
---
## Key Ontology Alignments
### CustodianLegalStatus
- `org:FormalOrganization` - W3C Organization Ontology
- `cpov:RegisteredOrganization` - CPOV
- `tooi:Overheidsorganisatie` - TOOI (Dutch)
### CustodianPlace
- `crm:E53_Place` - CIDOC-CRM place
- `schema:Place` - Schema.org place
- `dcterms:Location` - Dublin Core location
### CustodianName
- `skos:Concept` - SKOS concept
- `schema:name` - Schema.org name
- `foaf:name` - FOAF name
---
## References
- CIDOC-CRM E53 Place: http://www.cidoc-crm.org/html/cidoc_crm_v7.1.3.html#E53
- W3C Org Ontology: https://www.w3.org/TR/vocab-org/
- PROV-O: https://www.w3.org/TR/prov-o/
---
**Status**: 🔄 Ready for implementation
**Priority**: HIGH - Fundamental multi-aspect modeling
**Impact**: Renames class, adds new class, removes observation link, updates hub