glam/CUSTODIAN_MULTI_ASPECT_REFACTORING.md
kempersc 8907aa6213 feat: Refactor Heritage Custodian Ontology to Multi-Aspect Model
- Implemented three independent aspects for custodians: CustodianLegalStatus, CustodianName, and CustodianPlace.
- Renamed CustodianReconstruction to CustodianLegalStatus and updated all references.
- Created new components for CustodianPlace and PlaceSpecificityEnum.
- Removed direct links from CustodianObservation to Custodian, aligning with PROV-O standards.
- Generated comprehensive example instance demonstrating the new architecture.
- Updated documentation to reflect changes and provide guidance on multi-aspect modeling.
- Added React hook for managing IndexedDB operations, including storing and loading transformation results.
- Created complete YAML example for Rijksmuseum, illustrating the integration of all three aspects.
2025-11-22 15:40:17 +01:00

457 lines
14 KiB
Markdown

# Custodian Multi-Aspect Refactoring - Complete Implementation
**Date**: 2025-11-22
**Status**: ✅ COMPLETE
**Schema Version**: 0.1.0 (modular LinkML)
**Impact**: Breaking change - Multi-aspect architecture
---
## Executive Summary
The Heritage Custodian Ontology has been fundamentally refactored to model custodians as **multi-aspect entities** with three independent facets that can change over time:
1. **CustodianLegalStatus** - Formal legal entity (precise, registered)
2. **CustodianName** - Emic label (ambiguous, contextual)
3. **CustodianPlace** - Nominal place designation (NOT coordinates!)
All three aspects are generated through **ReconstructionActivity** from **CustodianObservations** (raw evidence), following proper PROV-O patterns.
---
## Motivation: Why Multi-Aspect Modeling?
### The Problem with Monolithic "Reconstruction"
Previously, we had a single `CustodianReconstruction` class that tried to represent:
- Legal entity (formal registration)
- Operational name (emic label)
- Place reference (nominal location)
This created confusion:
- ❌ Mixed precise (legal) with ambiguous (name) information
- ❌ Implied all custodians have legal status (many don't!)
- ❌ No way to model temporal change in each aspect independently
- ❌ "Reconstruction" was ambiguous (process vs. result?)
### The Multi-Aspect Solution
Now we have **three separate aspects**, each with distinct characteristics:
| Aspect | Characteristic | Example (Rijksmuseum) | Can Exist Without Others? |
|--------|----------------|----------------------|---------------------------|
| **Legal Status** | Precise, registered | "Stichting Rijksmuseum" (KvK 41215422) | ✅ Yes (informal groups lack this) |
| **Name** | Ambiguous, contextual | "Rijksmuseum" (emic label) | ✅ Yes (unregistered groups have names) |
| **Place** | Nominal, may be vague | "het museum op het Museumplein" | ✅ Yes (historic place references) |
**Key insight**: These aspects **change independently over time**:
- Legal entity remains "Stichting Rijksmuseum" (since 1885)
- Name changed over time: "Rijks Museum" → "Rijksmuseum" → "Rijksmuseum Amsterdam"
- Place reference changed: Building moved in 1885 from Trippenhuis to current location
---
## Architectural Changes
### CRITICAL: Observations No Longer Link to Custodian
**Before** (INCORRECT):
```
CustodianObservation → refers_to_custodian → Custodian
```
**After** (CORRECT):
```
CustodianObservation → prov:used → ReconstructionActivity
ReconstructionActivity → prov:wasGeneratedBy → LegalStatus/Name/Place
LegalStatus/Name/Place → refers_to_custodian → Custodian
```
**Rationale**: Only ReconstructionActivity can determine if a custodian is successfully identified. Raw observations are just evidence - they don't directly assert identity.
### Three Independent Aspects
```mermaid
graph TD
O1[CustodianObservation 1: KvK registry]
O2[CustodianObservation 2: Website]
O3[CustodianObservation 3: Guidebook]
A[ReconstructionActivity: Entity Resolution]
L[CustodianLegalStatus: Stichting Rijksmuseum]
N[CustodianName: Rijksmuseum]
P[CustodianPlace: het museum op het Museumplein]
H[Custodian Hub: nl-nh-ams-m-rm-q190804]
O1 -->|prov:used| A
O2 -->|prov:used| A
O3 -->|prov:used| A
A -->|prov:wasGeneratedBy| L
A -->|prov:wasGeneratedBy| N
A -->|prov:wasGeneratedBy| P
L -->|refers_to_custodian| H
N -->|refers_to_custodian| H
P -->|refers_to_custodian| H
H -->|legal_status| L
H -->|preferred_label| N
H -->|place_designation| P
```
---
## What Changed: File-by-File Breakdown
### 1. Renamed: CustodianReconstruction → CustodianLegalStatus
**File**: `modules/classes/CustodianReconstruction.yaml``CustodianLegalStatus.yaml`
**Why**: "Reconstruction" was ambiguous (process vs. result?). "LegalStatus" clearly indicates this is ONE ASPECT - the formal legal dimension.
**Key changes**:
- `class_uri`: Changed to `org:FormalOrganization`
- Description emphasizes formal legal entity
- Only for registered legal entities (individuals, organizations, governments)
- Informal groups WITHOUT legal status don't get this aspect
**Example**:
```yaml
custodian_legal_statuses:
- id: https://w3id.org/heritage/legal/rijksmuseum
refers_to_custodian: https://nde.nl/ontology/hc/nl-nh-ams-m-rm-q190804
legal_name:
full_name: "Stichting Rijksmuseum"
legal_form:
elf_code: "8888" # Dutch foundation
registration_numbers:
- number: "41215422"
type: "KvK"
```
### 2. New: CustodianPlace Class
**File**: `modules/classes/CustodianPlace.yaml`
**Purpose**: Nominal place designation used to identify a custodian (NOT geographic coordinates!)
**Critical distinction**: CustodianPlace ≠ Location
- CustodianPlace: "het herenhuis in de Schilderswijk" (nominal, contextual)
- Location: lat 52.0705, lon 4.2894 (precise, geographic)
**class_uri**: `crm:E53_Place` (CIDOC-CRM place entity)
**New enum**: `PlaceSpecificityEnum` (BUILDING, STREET, NEIGHBORHOOD, CITY, REGION, VAGUE)
**Example**:
```yaml
custodian_places:
- id: https://w3id.org/heritage/place/rijks-museumplein-1920
refers_to_custodian: https://nde.nl/ontology/hc/nl-nh-ams-m-rm-q190804
place_name: "het museum op het Museumplein"
place_specificity: STREET
valid_from: "1920-01-01"
```
### 3. Modified: CustodianObservation
**File**: `modules/classes/CustodianObservation.yaml`
**REMOVED**: `refers_to_custodian` slot
**Why**: Observations are RAW EVIDENCE, not assertions of identity. Only ReconstructionActivity can determine if custodian is successfully identified.
**Now**:
- Observations feed into ReconstructionActivity via `prov:used`
- ReconstructionActivity generates aspects (LegalStatus/Name/Place)
- Aspects link to Custodian hub via `refers_to_custodian`
### 4. Modified: Custodian Hub
**File**: `modules/classes/Custodian.yaml`
**ADDED slots**:
- `legal_status` → CustodianLegalStatus (may be null)
- `place_designation` → CustodianPlace (may be null)
- `preferred_label` → CustodianName (already existed)
**Hub now aggregates THREE independent aspects**:
```yaml
custodians:
- hc_id: https://nde.nl/ontology/hc/nl-nh-ams-m-rm-q190804
legal_status: https://w3id.org/heritage/legal/rijksmuseum
preferred_label: https://w3id.org/heritage/name/rijksmuseum-emic
place_designation: https://w3id.org/heritage/place/rijks-museumplein-1920
```
### 5. Modified: Main Schema
**File**: `01_custodian_name_modular.yaml`
**ADDED imports**:
- `modules/classes/CustodianPlace`
- `modules/enums/PlaceSpecificityEnum`
- `modules/slots/place_designation`
- `modules/slots/place_name`
- `modules/slots/place_language`
- `modules/slots/place_specificity`
- `modules/slots/place_note`
**RENAMED**: All references to `CustodianReconstruction``CustodianLegalStatus`
### 6. Batch Updated: 22+ Module Files
All slot definitions, class references, and mappings updated:
- `CustodianReconstruction``CustodianLegalStatus`
- Updated ontology mappings
- Updated descriptions to reflect multi-aspect architecture
---
## Validation & Generation
### Schema Validation ✅
```bash
gen-owl -f ttl schemas/20251121/linkml/01_custodian_name_modular.yaml \
> schemas/20251121/rdf/01_custodian_multi_aspect.owl.ttl
```
**Result**: 2,630 lines, no critical errors
### RDF Generation ✅
All 4 formats generated from LinkML:
1. OWL/Turtle (160KB) - Primary
2. N-Triples (4KB)
3. JSON-LD (4KB)
4. RDF/XML (4KB)
### UML Generation ✅
```bash
gen-yuml schemas/20251121/linkml/01_custodian_name_modular.yaml \
> schemas/20251121/uml/mermaid/01_custodian_multi_aspect.mmd
```
**Result**: 745B Mermaid diagram
### Example Instance ✅
Complete multi-aspect example: `schemas/20251121/examples/multi_aspect_rijksmuseum_complete.yaml`
Demonstrates:
- 3 CustodianObservations (KvK, website, guidebook)
- 1 ReconstructionActivity (entity resolution)
- 3 generated aspects (LegalStatus, Name, Place)
- 1 Custodian hub aggregating all aspects
- PROV-O flow with confidence measures
---
## Use Cases: When to Use Each Aspect
### CustodianLegalStatus (Formal Legal Entity)
**Use when**:
- Custodian is formally registered (organization, corporation, government)
- You have legal name, registration number, legal form
- Precise legal identity matters (contracts, official records)
**Don't use when**:
- Informal groups (no legal registration)
- Historical entities before legal registration existed
- Unknown legal status
**Example**: "Stichting Rijksmuseum" (KvK 41215422)
### CustodianName (Emic Label)
**Use when**:
- You have how custodian presents itself
- Operational name differs from legal name
- Standardizing names across sources
**Always use** (every custodian has at least one name!)
**Example**: "Rijksmuseum" (emic label, not "Stichting Rijksmuseum")
### CustodianPlace (Nominal Place Designation)
**Use when**:
- Historical documents refer to custodian by place
- Place name identifies the custodian (not just locates it)
- Archival research needs place-based references
**Don't confuse with Location** (lat/lon coordinates)
**Example**: "het museum op het Museumplein" (nominal reference in 1920s guidebooks)
---
## Data Migration Guide
### Step 1: Update Existing CustodianReconstruction Instances
**Before**:
```yaml
custodian_reconstructions:
- id: https://w3id.org/heritage/recon/rijksmuseum
refers_to_custodian: ...
legal_name: "Stichting Rijksmuseum"
```
**After**:
```yaml
custodian_legal_statuses: # ← Renamed key
- id: https://w3id.org/heritage/legal/rijksmuseum # ← New ID pattern
refers_to_custodian: ...
legal_name:
full_name: "Stichting Rijksmuseum" # ← Now structured
```
### Step 2: Remove Direct Observation → Custodian Links
**Before**:
```yaml
custodian_observations:
- id: ...
observed_name: "Rijksmuseum"
refers_to_custodian: https://nde.nl/ontology/hc/nl-nh-ams-m-rm-q190804 # ← REMOVE THIS
```
**After**:
```yaml
custodian_observations:
- id: ...
observed_name: "Rijksmuseum"
# NO refers_to_custodian!
reconstruction_activities:
- id: ...
used:
- observation_id_here # ← Link via activity
```
### Step 3: Add Place Aspects (If Applicable)
If your sources reference custodians by place:
```yaml
custodian_places:
- id: https://w3id.org/heritage/place/your-institution
place_name: "het herenhuis in de Schilderswijk"
place_specificity: NEIGHBORHOOD
refers_to_custodian: ...
was_derived_from:
- observation_id
```
### Step 4: Update Custodian Hubs
Add new slots:
```yaml
custodians:
- hc_id: ...
preferred_label: name_id # Already existed
legal_status: legal_status_id # ← NEW
place_designation: place_id # ← NEW
```
---
## Ontology Alignment
### CustodianLegalStatus
- **Primary**: `org:FormalOrganization` (W3C Organization Ontology)
- **Exact**: `rico:CorporateBody`, `foaf:Organization`
- **Close**: `crm:E40_Legal_Body`, `cpov:PublicOrganisation`
- **For individuals**: `foaf:Person`, `crm:E21_Person`
### CustodianPlace
- **Primary**: `crm:E53_Place` (CIDOC-CRM place entity)
- **Exact**: `schema:Place`
- **Close**: `dcterms:Location`, `geo:Feature`
- **Related**: `crm:E27_Site`
### CustodianName
- **Primary**: `skos:Concept` (preferred label pattern)
- **Exact**: `crm:E41_Appellation`
- **Related**: `pico:PersonObservation` (PiCo emic/etic pattern)
---
## Testing & Validation
### Validation Commands
```bash
# Validate LinkML schema
gen-owl -f ttl schemas/20251121/linkml/01_custodian_name_modular.yaml > /tmp/test.ttl
# Validate example instance
linkml-validate -s schemas/20251121/linkml/01_custodian_name_modular.yaml \
schemas/20251121/examples/multi_aspect_rijksmuseum_complete.yaml
```
### Verification Checklist
- [ ] Schema validates with no critical errors
- [ ] All three aspects present in RDF
- [ ] CustodianReconstruction fully replaced with CustodianLegalStatus
- [ ] No direct observation → custodian links
- [ ] Example instance validates
- [ ] RDF serializations match ontology mappings
### Verification Results (2025-11-22)
- ✅ 34 CustodianLegalStatus references in RDF
- ✅ 15 CustodianPlace references in RDF
- ✅ 21 PlaceSpecificityEnum references in RDF
- ✅ Schema validates (2,630 lines OWL/Turtle)
- ✅ All imports resolved
- ✅ Complete example instance created
---
## Future Work
### Immediate Next Steps
1. Migrate existing example instances to multi-aspect pattern
2. Create data migration scripts
3. Update all documentation
### Additional Aspects (Future Phases)
4. **Collection aspect** - Heritage materials held by custodian
5. **Event aspect** - Organizational change events (mergers, relocations)
6. **Person aspect** - Staff, curators (PiCo pattern for people)
### Long-term Integration
7. Full TOOI alignment (Dutch government organizations)
8. Full CPOV alignment (EU public sector)
9. Full CIDOC-CRM alignment (cultural heritage domain)
10. TypeDB schema generation from LinkML
---
## Key Takeaways
1. **Multi-aspect modeling** provides precision: Legal (precise) ≠ Name (ambiguous) ≠ Place (nominal)
2. **Independent temporal lifecycles**: Each aspect can change over time without affecting others
3. **Source transparency**: All aspects explicitly derived from observations via ReconstructionActivity
4. **PROV-O compliance**: Proper observation → activity → entity flow
5. **Flexibility**: Not all custodians have all aspects (informal groups lack legal status, etc.)
6. **Ontology alignment**: Better mapping to domain ontologies (CIDOC-CRM, PROV-O, W3C Org)
7. **Breaking change**: Requires data migration, but provides foundation for nuanced heritage metadata
---
**Document Version**: 1.0
**Schema Version**: 0.1.0
**Status**: ✅ COMPLETE IMPLEMENTATION
**Next Review**: After data migration + additional examples
---
For questions or clarifications, see:
- `QUICK_STATUS_CUSTODIAN_SCHEMA_MOD_20251122.md` - Quick reference
- `schemas/20251121/examples/multi_aspect_rijksmuseum_complete.yaml` - Complete example
- `schemas/20251121/linkml/modules/classes/` - Individual class definitions