glam/CUSTODIAN_COLLECTION_ADDITION_20251122.md
kempersc 2761857b0d Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams
- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.
2025-11-22 23:01:13 +01:00

516 lines
18 KiB
Markdown

# CustodianCollection Addition - Session Summary
**Date**: 2025-11-22
**Time**: 18:23 UTC
**Schema Version**: 0.1.0 → 0.3.0
**Status**: ✅ COMPLETE - Validated, Generated, Documented
---
## Executive Summary
Added **CustodianCollection** as the fourth reconstruction output of the Heritage Custodian Ontology, completing the multi-aspect modeling of heritage institutions. Collections represent the heritage materials managed by custodians and are crucial for modeling metonymic discourse ("The Rijksmuseum has a Rembrandt" = the collection contains it).
---
## Architecture Evolution
### Before: Three Aspects
```
Custodian (hub)
├─ preferred_label → CustodianName (emic name)
├─ legal_status → CustodianLegalStatus (legal entity)
└─ place_designation → CustodianPlace (nominal place)
```
### After: Four Aspects ✅
```
Custodian (hub)
├─ preferred_label → CustodianName (emic name)
├─ legal_status → CustodianLegalStatus (legal entity)
├─ place_designation → CustodianPlace (nominal place)
└─ has_collection → CustodianCollection (heritage materials) ← NEW!
```
---
## Files Created
### 1. Class Definition
**`modules/classes/CustodianCollection.yaml`** (128 lines)
- `class_uri: crm:E78_Curated_Holding`
- Maps to CIDOC-CRM, RiC-O, BIBFRAME
- Represents aggregations of heritage materials
- Supports multiple collection types (archival, museum, library, etc.)
### 2. Collection-Specific Slots (9 files)
| File | Purpose | Property Mapping |
|------|---------|------------------|
| **`collection_name.yaml`** | Name of collection | `dcterms:title` |
| **`collection_description.yaml`** | Narrative description | `dcterms:description` |
| **`collection_type.yaml`** | Type(s) of materials | `dcterms:type` |
| **`collection_scope.yaml`** | Subject/thematic focus | `dcterms:coverage` |
| **`temporal_coverage.yaml`** | Time period of materials | `dcterms:temporal` |
| **`extent.yaml`** | Size/quantity | `dcterms:extent` |
| **`arrangement_system.yaml`** | Intellectual organization | `rico:hasRecordSetType` |
| **`provenance_note.yaml`** | Acquisition history | `crm:P24_transferred_title_of` |
| **`has_collection.yaml`** | Links Custodian to Collection | `crm:P46_is_composed_of` |
---
## Files Modified
### Custodian Class
**`modules/classes/Custodian.yaml`**
**Changes**:
- Added `has_collection` to slots list (line 99)
- Added `has_collection` slot_usage documentation:
- `slot_uri: crm:P46_is_composed_of`
- `range: CustodianCollection`
- `multivalued: true`
- Extensive documentation on metonymic relationships
- Updated comments: "Four aspects" (was "Three aspects")
### Main Schema
**`01_custodian_name_modular.yaml`**
**Changes**:
- Added CustodianCollection to class imports (line 133)
- Added 9 new slot imports:
- `arrangement_system`
- `collection_description`
- `collection_name`
- `collection_scope`
- `collection_type`
- `extent`
- `has_collection`
- `provenance_note`
- `temporal_coverage`
- Updated schema description with collection aspect
- Updated file count: 19 classes + 7 enums + 70 slots = 96 definition files
---
## Ontology Alignment
### Primary Ontologies
| Ontology | Class | Use Case |
|----------|-------|----------|
| **CIDOC-CRM** | `crm:E78_Curated_Holding` | Museum collections, curated aggregations |
| **RiC-O** | `rico:RecordSet` | Archival fonds, series, file groups |
| **BIBFRAME** | `bf:Collection` | Library special collections |
| **Schema.org** | `schema:Collection` | General aggregations |
### Key Properties
| Slot | Ontology Property | Description |
|------|-------------------|-------------|
| `collection_name` | `dcterms:title` | Name of collection (may differ from custodian) |
| `collection_description` | `dcterms:description` | Narrative description |
| `collection_type` | `dcterms:type` | Material types (multivalued) |
| `collection_scope` | `dcterms:coverage` | Subject/thematic focus |
| `temporal_coverage` | `dcterms:temporal` | Time period covered by materials |
| `extent` | `dcterms:extent` | Size (linear meters, object counts) |
| `arrangement_system` | `rico:hasRecordSetType` | Intellectual organization |
| `provenance_note` | `crm:P24_transferred_title_of` | Acquisition history |
| `has_collection` | `crm:P46_is_composed_of` | Custodian-to-Collection link |
### Inverse Relationships
```turtle
# Forward (Custodian → Collection)
:custodian crm:P46_is_composed_of :collection .
# Inverse (Collection → Custodian)
:collection crm:P46i_forms_part_of :custodian .
```
---
## Collection Types Supported
The `collection_type` slot supports multiple material types:
- **`archival_records`** - Historical documents, correspondence, records (RiC-O)
- **`museum_objects`** - Cultural artifacts, art objects (CIDOC-CRM)
- **`library_holdings`** - Books, serials, manuscripts (BIBFRAME)
- **`monuments`** - Built heritage, archaeological sites (CIDOC-CRM E27_Site)
- **`archaeological_materials`** - Excavation finds, archaeological assemblages
- **`natural_history_specimens`** - Biological specimens, geological samples
- **`digital_born`** - Born-digital collections (web archives, digital art)
- **`photographs`** - Photographic collections
- **`manuscripts`** - Handwritten documents, medieval codices
Collections can have **multiple types** (e.g., mixed archival + museum collections).
---
## ER Diagram Verification
### Generated Diagram
**File**: `schemas/20251121/uml/mermaid/01_custodian_name_modular_20251122_182317_er.mmd`
### Verified Relationships
**Custodian → CustodianCollection**
```
Custodian ||--}o CustodianCollection : "has_collection"
```
- One custodian can have multiple collections (multivalued)
- Collections are optional (some custodians may have no collection data)
**CustodianCollection → Custodian**
```
CustodianCollection ||--|| Custodian : "refers_to_custodian"
```
- Every collection must refer to exactly one custodian hub
**CustodianCollection → ReconstructionActivity**
```
CustodianCollection ||--|o ReconstructionActivity : "was_generated_by"
```
- Documents scholarly reconstruction process (PiCo pattern)
**CustodianCollection → CustodianObservation**
```
CustodianCollection ||--}| CustodianObservation : "was_derived_from"
```
- Links reconstructed collection to source observations (PROV-O)
**CustodianCollection → TimeSpan**
```
CustodianCollection ||--|o TimeSpan : "temporal_coverage"
```
- Time period covered by materials (NOT collection creation date)
---
## RDF Generation Results
### Generated Files (Timestamp: 20251122_182317)
```bash
schemas/20251121/rdf/
├── 01_custodian_name_modular_20251122_182317.owl.ttl (179 KB)
├── 01_custodian_name_modular_20251122_182317.nt (508 KB)
├── 01_custodian_name_modular_20251122_182317.jsonld (425 KB)
└── 01_custodian_name_modular_20251122_182317.rdf (367 KB)
```
### Validation Status
**Schema compiles successfully** (no errors)
**Warnings** (non-critical, expected):
- ⚠️ Multiple owl types for `language` (rdfs:Literal vs owl:Thing) - cosmetic
- ⚠️ Schema namespace override - expected with modular design
---
## Example Use Cases
### Use Case 1: Museum Collection
```yaml
Custodian:
hc_id: https://nde.nl/ontology/hc/cust/rijksmuseum
preferred_label:
emic_name: "Rijksmuseum"
has_collection:
- id: https://nde.nl/ontology/hc/collection/rijksmuseum-001
collection_name: "Rijksmuseum Collection"
collection_description: "Dutch art and history from 1100-2000"
collection_type:
- "museum_objects"
- "library_holdings" # Art library
collection_scope: "Dutch Golden Age painting, Asian art, Delftware, prints"
temporal_coverage:
begin_of_the_begin: "1100-01-01T00:00:00Z"
end_of_the_end: "2000-12-31T23:59:59Z"
extent: "1 million objects, 35,000 artworks on display"
arrangement_system: "Classified by medium, period, and geography"
provenance_note: "Collection established 1800 as national art collection, nationalized 1808"
```
### Use Case 2: Archival Collection
```yaml
Custodian:
hc_id: https://nde.nl/ontology/hc/cust/noord-hollands-archief
preferred_label:
emic_name: "Noord-Hollands Archief"
has_collection:
- id: https://nde.nl/ontology/hc/collection/nha-archives-001
collection_name: "Provincial Archives of Noord-Holland"
collection_description: "Government records, notarial archives, family papers"
collection_type:
- "archival_records"
collection_scope: "Provincial government, municipalities, families, estates"
temporal_coverage:
begin_of_the_begin: "1289-01-01T00:00:00Z" # Earliest document
end_of_the_end: "2025-11-22T00:00:00Z" # Ongoing accessions
extent: "60 linear kilometers of archival materials"
arrangement_system: "ISAD(G) hierarchical structure, respect des fonds"
provenance_note: "Formed 2001 from merger of Gemeentearchief Haarlem (1910) and Rijksarchief in Noord-Holland (1802)"
```
### Use Case 3: Mixed Collection (Museum + Archive)
```yaml
Custodian:
hc_id: https://nde.nl/ontology/hc/cust/verzetsmuseum
preferred_label:
emic_name: "Verzetsmuseum"
has_collection:
- id: https://nde.nl/ontology/hc/collection/verzetsmuseum-001
collection_name: "Dutch Resistance Museum Collection"
collection_type:
- "museum_objects" # Artifacts, uniforms, weapons
- "archival_records" # Personal papers, resistance documents
- "photographs" # Photo archive
collection_scope: "Dutch resistance during WWII (1940-1945)"
temporal_coverage:
begin_of_the_begin: "1940-05-10T00:00:00Z" # German invasion
end_of_the_end: "1945-05-05T00:00:00Z" # Liberation
extent: "10,000 objects, 25,000 photographs, 500 linear meters archival materials"
```
---
## Metonymic Relationships Explained
### What is Metonymy?
**Metonymy** = Using one entity to refer to a related entity
In heritage discourse, people commonly say:
- "The Rijksmuseum has a Rembrandt" (= the collection contains it)
- "The British Library digitized its manuscripts" (= the collection was digitized)
- "The National Archives preserves colonial records" (= the collection preserves them)
They are **NOT** referring to the legal entity or the building, but to the **collection**.
### Why This Matters
Before CustodianCollection, the ontology had no way to model:
1. **Collection identity** - Collections have names distinct from custodians
2. **Multiple collections** - One custodian can manage multiple collections
3. **Custody transfers** - Collections move between custodians over time
4. **Joint custody** - Multiple custodians can share collection management
5. **Collection-level provenance** - Acquisition history, custody changes
### Modeling Strategy
```
Person says: "The Rijksmuseum has a Rembrandt"
Observation: CustodianObservation (observed statement)
Reconstruction: Parse as metonymic reference
├─ Custodian: Rijksmuseum (legal entity)
└─ CustodianCollection: Rijksmuseum Collection (contains Rembrandt)
```
---
## Key Design Decisions
### Decision 1: Fourth Aspect vs. Custodian Slot
**Why separate class instead of `Custodian.collections` slot?**
**Separate class (chosen)**:
- Collections have independent lifecycle (can be transferred, split, merged)
- Collections need extensive metadata (9 specialized slots)
- Collections are reconstructed outputs (require ReconstructionActivity link)
- Collections can have temporal validity independent of custodian
**Simple slot**:
- Would couple collection lifecycle to custodian
- Harder to model custody transfers
- Cannot link to observations/reconstructions separately
### Decision 2: CIDOC-CRM E78 vs. RiC-O RecordSet
**Why multiple ontology mappings?**
Different heritage domains use different ontologies:
- **Museums**: CIDOC-CRM E78_Curated_Holding (managed aggregations)
- **Archives**: RiC-O RecordSet (archival fonds, series)
- **Libraries**: BIBFRAME Collection (special collections)
**Solution**: Use `collection_type` to determine which ontology applies:
- `archival_records``rico:RecordSet`
- `museum_objects``crm:E78_Curated_Holding`
- `library_holdings``bf:Collection`
Collections can implement **multiple ontology classes** simultaneously.
### Decision 3: temporal_coverage vs. Dates
**Why TimeSpan for temporal_coverage?**
`temporal_coverage` = **Time period covered by collection materials** (NOT collection creation dates)
Examples:
- Rijksmuseum collection: 1100-2000 (artworks span 9 centuries)
- Medieval manuscripts collection: 800-1500 (manuscripts created in Middle Ages)
- WWII archive: 1940-1945 (documents from war period)
**CustodianCollection creation dates** tracked separately via `valid_from`/`valid_to` slots.
---
## File Count Summary
### Before CustodianCollection
- 18 classes + 7 enums + 61 slots = 86 files
- Grand total: 88 files (including metadata.yaml + main schema)
### After CustodianCollection
- **19 classes** (+1: CustodianCollection)
- **7 enums** (unchanged)
- **70 slots** (+9: collection slots + linkers)
- **= 96 definition files**
- **Grand total: 98 files** (including metadata.yaml + main schema)
---
## Testing & Validation
### Schema Validation ✅
```bash
$ cd schemas/20251121/linkml
$ gen-owl -f ttl 01_custodian_name_modular.yaml 2>&1 | head -20
# Result: SUCCESS
# - Output: 179 KB Turtle file
# - No schema errors
# - Expected warnings only (language type ambiguity)
```
### ER Diagram Generation ✅
```bash
$ gen-erdiagram 01_custodian_name_modular.yaml > \
../uml/mermaid/01_custodian_name_modular_20251122_182317_er.mmd
# Result: SUCCESS
# - 5.9 KB Mermaid ER diagram
# - All CustodianCollection relationships present
# - Verified cardinalities correct
```
### RDF Format Generation ✅
```bash
# All 4 RDF formats generated successfully
$ ls -lh schemas/20251121/rdf/*20251122_182317*
-rw-r--r-- 179K 01_custodian_name_modular_20251122_182317.owl.ttl
-rw-r--r-- 508K 01_custodian_name_modular_20251122_182317.nt
-rw-r--r-- 425K 01_custodian_name_modular_20251122_182317.jsonld
-rw-r--r-- 367K 01_custodian_name_modular_20251122_182317.rdf
```
---
## Session Context
### Phase 1 (Nov 22, 10:00-12:00 UTC)
**Connected Orphaned Classes to Custodian**
- Problem: CustodianAppellation and CustodianIdentifier had no path to Custodian hub
- Solution: Added `variant_of_name` and `identifies_custodian` slots
- Result: All classes reachable from Custodian hub
### Phase 2 (Nov 22, 14:00-16:00 UTC)
**Appellation Refactoring for SKOS Alignment**
- Problem: CustodianAppellation directly on Custodian violated SKOS semantics
- Solution: Moved alternative names to CustodianName (SKOS Concept)
- Result: `skos:prefLabel` (CustodianName) + `skos:altLabel` (CustodianAppellation)
### Phase 3 (Nov 22, 18:00-18:30 UTC) ← **THIS SESSION**
**Added CustodianCollection as Fourth Aspect**
- Problem: No way to model heritage materials or metonymic references
- Solution: Created CustodianCollection with 9 specialized slots
- Result: Complete four-aspect modeling (Name, LegalStatus, Place, Collection)
---
## Next Steps (Pending)
### Documentation
- [ ] Update `README.md` with four-aspect architecture
- [ ] Create `COLLECTION_EXAMPLES.md` with real-world examples
- [ ] Update ontology alignment documentation
### Testing
- [ ] Create test instances with CustodianCollection
- Rijksmuseum (museum collection)
- Noord-Hollands Archief (archival collection)
- Koninklijke Bibliotheek (library holdings)
- [ ] Unit tests for collection aspect
- [ ] Validation tests for temporal_coverage TimeSpan
### Features
- [ ] Collection-level provenance events (custody transfers, acquisitions)
- [ ] Collection splits/mergers (track fonds reorganization)
- [ ] Digital surrogates (link physical collections to digitized versions)
---
## References
### Schema Files
- **Main schema**: `schemas/20251121/linkml/01_custodian_name_modular.yaml`
- **CustodianCollection class**: `schemas/20251121/linkml/modules/classes/CustodianCollection.yaml`
- **Collection slots**: `schemas/20251121/linkml/modules/slots/collection_*.yaml`
### Generated Outputs
- **RDF (Turtle)**: `schemas/20251121/rdf/01_custodian_name_modular_20251122_182317.owl.ttl`
- **ER Diagram**: `schemas/20251121/uml/mermaid/01_custodian_name_modular_20251122_182317_er.mmd`
### Ontology Documentation
- **CIDOC-CRM**: `data/ontology/CIDOC_CRM_v7.1.3.rdf` (E78_Curated_Holding)
- **RiC-O**: `data/ontology/RiC-O_1-1.rdf` (RecordSet)
- **BIBFRAME**: `data/ontology/bibframe_vocabulary.rdf` (Collection)
---
## Session Metadata
| Attribute | Value |
|-----------|-------|
| **Session Date** | 2025-11-22 |
| **Session Time** | 18:00-18:30 UTC (30 minutes) |
| **Agent** | Claude (OpenCode) |
| **User** | kempersc |
| **Schema Version Before** | 0.1.0 (18 classes, 61 slots) |
| **Schema Version After** | 0.3.0 (19 classes, 70 slots) |
| **Files Created** | 10 (1 class + 9 slots) |
| **Files Modified** | 2 (Custodian.yaml, main schema) |
| **Validation Status** | ✅ PASS (gen-owl, gen-erdiagram) |
| **RDF Formats Generated** | 4 (Turtle, N-Triples, JSON-LD, RDF/XML) |
| **Diagram Generated** | ER diagram (Mermaid) |
| **Documentation Created** | This file |
---
## Conclusion
The Heritage Custodian Ontology now models heritage institutions as **four-aspect entities**:
1. **CustodianName** (emic label) - SKOS Concept
2. **CustodianLegalStatus** (legal entity) - W3C ORG, TOOI, CPOV
3. **CustodianPlace** (nominal location) - CIDOC-CRM E53_Place
4. **CustodianCollection** (heritage materials) - CIDOC-CRM E78, RiC-O RecordSet, BIBFRAME Collection ← **NEW!**
Each aspect:
- Has independent temporal lifecycle
- Is reconstructed from CustodianObservation sources
- Links back to Custodian hub via `refers_to_custodian`
- Maps to established ontologies (CIDOC-CRM, RiC-O, BIBFRAME, SKOS, W3C ORG)
**Status**: ✅ **COMPLETE - Ready for instance creation and testing**
---
**Document Version**: 1.0
**Generated**: 2025-11-22T18:30:00Z
**Author**: AI Agent (Claude via OpenCode)