glam/CUSTODIAN_COLLECTION_ADDITION_20251122.md
kempersc 2761857b0d Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams
- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.
2025-11-22 23:01:13 +01:00

18 KiB

CustodianCollection Addition - Session Summary

Date: 2025-11-22
Time: 18:23 UTC
Schema Version: 0.1.0 → 0.3.0
Status: COMPLETE - Validated, Generated, Documented


Executive Summary

Added CustodianCollection as the fourth reconstruction output of the Heritage Custodian Ontology, completing the multi-aspect modeling of heritage institutions. Collections represent the heritage materials managed by custodians and are crucial for modeling metonymic discourse ("The Rijksmuseum has a Rembrandt" = the collection contains it).


Architecture Evolution

Before: Three Aspects

Custodian (hub)
  ├─ preferred_label → CustodianName (emic name)
  ├─ legal_status → CustodianLegalStatus (legal entity)
  └─ place_designation → CustodianPlace (nominal place)

After: Four Aspects

Custodian (hub)
  ├─ preferred_label → CustodianName (emic name)
  ├─ legal_status → CustodianLegalStatus (legal entity)
  ├─ place_designation → CustodianPlace (nominal place)
  └─ has_collection → CustodianCollection (heritage materials) ← NEW!

Files Created

1. Class Definition

modules/classes/CustodianCollection.yaml (128 lines)

  • class_uri: crm:E78_Curated_Holding
  • Maps to CIDOC-CRM, RiC-O, BIBFRAME
  • Represents aggregations of heritage materials
  • Supports multiple collection types (archival, museum, library, etc.)

2. Collection-Specific Slots (9 files)

File Purpose Property Mapping
collection_name.yaml Name of collection dcterms:title
collection_description.yaml Narrative description dcterms:description
collection_type.yaml Type(s) of materials dcterms:type
collection_scope.yaml Subject/thematic focus dcterms:coverage
temporal_coverage.yaml Time period of materials dcterms:temporal
extent.yaml Size/quantity dcterms:extent
arrangement_system.yaml Intellectual organization rico:hasRecordSetType
provenance_note.yaml Acquisition history crm:P24_transferred_title_of
has_collection.yaml Links Custodian to Collection crm:P46_is_composed_of

Files Modified

Custodian Class

modules/classes/Custodian.yaml

Changes:

  • Added has_collection to slots list (line 99)
  • Added has_collection slot_usage documentation:
    • slot_uri: crm:P46_is_composed_of
    • range: CustodianCollection
    • multivalued: true
    • Extensive documentation on metonymic relationships
  • Updated comments: "Four aspects" (was "Three aspects")

Main Schema

01_custodian_name_modular.yaml

Changes:

  • Added CustodianCollection to class imports (line 133)
  • Added 9 new slot imports:
    • arrangement_system
    • collection_description
    • collection_name
    • collection_scope
    • collection_type
    • extent
    • has_collection
    • provenance_note
    • temporal_coverage
  • Updated schema description with collection aspect
  • Updated file count: 19 classes + 7 enums + 70 slots = 96 definition files

Ontology Alignment

Primary Ontologies

Ontology Class Use Case
CIDOC-CRM crm:E78_Curated_Holding Museum collections, curated aggregations
RiC-O rico:RecordSet Archival fonds, series, file groups
BIBFRAME bf:Collection Library special collections
Schema.org schema:Collection General aggregations

Key Properties

Slot Ontology Property Description
collection_name dcterms:title Name of collection (may differ from custodian)
collection_description dcterms:description Narrative description
collection_type dcterms:type Material types (multivalued)
collection_scope dcterms:coverage Subject/thematic focus
temporal_coverage dcterms:temporal Time period covered by materials
extent dcterms:extent Size (linear meters, object counts)
arrangement_system rico:hasRecordSetType Intellectual organization
provenance_note crm:P24_transferred_title_of Acquisition history
has_collection crm:P46_is_composed_of Custodian-to-Collection link

Inverse Relationships

# Forward (Custodian → Collection)
:custodian crm:P46_is_composed_of :collection .

# Inverse (Collection → Custodian)
:collection crm:P46i_forms_part_of :custodian .

Collection Types Supported

The collection_type slot supports multiple material types:

  • archival_records - Historical documents, correspondence, records (RiC-O)
  • museum_objects - Cultural artifacts, art objects (CIDOC-CRM)
  • library_holdings - Books, serials, manuscripts (BIBFRAME)
  • monuments - Built heritage, archaeological sites (CIDOC-CRM E27_Site)
  • archaeological_materials - Excavation finds, archaeological assemblages
  • natural_history_specimens - Biological specimens, geological samples
  • digital_born - Born-digital collections (web archives, digital art)
  • photographs - Photographic collections
  • manuscripts - Handwritten documents, medieval codices

Collections can have multiple types (e.g., mixed archival + museum collections).


ER Diagram Verification

Generated Diagram

File: schemas/20251121/uml/mermaid/01_custodian_name_modular_20251122_182317_er.mmd

Verified Relationships

Custodian → CustodianCollection

Custodian ||--}o CustodianCollection : "has_collection"
  • One custodian can have multiple collections (multivalued)
  • Collections are optional (some custodians may have no collection data)

CustodianCollection → Custodian

CustodianCollection ||--|| Custodian : "refers_to_custodian"
  • Every collection must refer to exactly one custodian hub

CustodianCollection → ReconstructionActivity

CustodianCollection ||--|o ReconstructionActivity : "was_generated_by"
  • Documents scholarly reconstruction process (PiCo pattern)

CustodianCollection → CustodianObservation

CustodianCollection ||--}| CustodianObservation : "was_derived_from"
  • Links reconstructed collection to source observations (PROV-O)

CustodianCollection → TimeSpan

CustodianCollection ||--|o TimeSpan : "temporal_coverage"
  • Time period covered by materials (NOT collection creation date)

RDF Generation Results

Generated Files (Timestamp: 20251122_182317)

schemas/20251121/rdf/
├── 01_custodian_name_modular_20251122_182317.owl.ttl  (179 KB)
├── 01_custodian_name_modular_20251122_182317.nt       (508 KB)
├── 01_custodian_name_modular_20251122_182317.jsonld   (425 KB)
└── 01_custodian_name_modular_20251122_182317.rdf      (367 KB)

Validation Status

Schema compiles successfully (no errors)

Warnings (non-critical, expected):

  • ⚠️ Multiple owl types for language (rdfs:Literal vs owl:Thing) - cosmetic
  • ⚠️ Schema namespace override - expected with modular design

Example Use Cases

Use Case 1: Museum Collection

Custodian:
  hc_id: https://nde.nl/ontology/hc/cust/rijksmuseum
  preferred_label:
    emic_name: "Rijksmuseum"
  has_collection:
    - id: https://nde.nl/ontology/hc/collection/rijksmuseum-001
      collection_name: "Rijksmuseum Collection"
      collection_description: "Dutch art and history from 1100-2000"
      collection_type:
        - "museum_objects"
        - "library_holdings"  # Art library
      collection_scope: "Dutch Golden Age painting, Asian art, Delftware, prints"
      temporal_coverage:
        begin_of_the_begin: "1100-01-01T00:00:00Z"
        end_of_the_end: "2000-12-31T23:59:59Z"
      extent: "1 million objects, 35,000 artworks on display"
      arrangement_system: "Classified by medium, period, and geography"
      provenance_note: "Collection established 1800 as national art collection, nationalized 1808"

Use Case 2: Archival Collection

Custodian:
  hc_id: https://nde.nl/ontology/hc/cust/noord-hollands-archief
  preferred_label:
    emic_name: "Noord-Hollands Archief"
  has_collection:
    - id: https://nde.nl/ontology/hc/collection/nha-archives-001
      collection_name: "Provincial Archives of Noord-Holland"
      collection_description: "Government records, notarial archives, family papers"
      collection_type:
        - "archival_records"
      collection_scope: "Provincial government, municipalities, families, estates"
      temporal_coverage:
        begin_of_the_begin: "1289-01-01T00:00:00Z"  # Earliest document
        end_of_the_end: "2025-11-22T00:00:00Z"      # Ongoing accessions
      extent: "60 linear kilometers of archival materials"
      arrangement_system: "ISAD(G) hierarchical structure, respect des fonds"
      provenance_note: "Formed 2001 from merger of Gemeentearchief Haarlem (1910) and Rijksarchief in Noord-Holland (1802)"

Use Case 3: Mixed Collection (Museum + Archive)

Custodian:
  hc_id: https://nde.nl/ontology/hc/cust/verzetsmuseum
  preferred_label:
    emic_name: "Verzetsmuseum"
  has_collection:
    - id: https://nde.nl/ontology/hc/collection/verzetsmuseum-001
      collection_name: "Dutch Resistance Museum Collection"
      collection_type:
        - "museum_objects"      # Artifacts, uniforms, weapons
        - "archival_records"    # Personal papers, resistance documents
        - "photographs"         # Photo archive
      collection_scope: "Dutch resistance during WWII (1940-1945)"
      temporal_coverage:
        begin_of_the_begin: "1940-05-10T00:00:00Z"  # German invasion
        end_of_the_end: "1945-05-05T00:00:00Z"      # Liberation
      extent: "10,000 objects, 25,000 photographs, 500 linear meters archival materials"

Metonymic Relationships Explained

What is Metonymy?

Metonymy = Using one entity to refer to a related entity

In heritage discourse, people commonly say:

  • "The Rijksmuseum has a Rembrandt" (= the collection contains it)
  • "The British Library digitized its manuscripts" (= the collection was digitized)
  • "The National Archives preserves colonial records" (= the collection preserves them)

They are NOT referring to the legal entity or the building, but to the collection.

Why This Matters

Before CustodianCollection, the ontology had no way to model:

  1. Collection identity - Collections have names distinct from custodians
  2. Multiple collections - One custodian can manage multiple collections
  3. Custody transfers - Collections move between custodians over time
  4. Joint custody - Multiple custodians can share collection management
  5. Collection-level provenance - Acquisition history, custody changes

Modeling Strategy

Person says: "The Rijksmuseum has a Rembrandt"
              ↓
Observation:  CustodianObservation (observed statement)
              ↓
Reconstruction: Parse as metonymic reference
              ↓
              ├─ Custodian: Rijksmuseum (legal entity)
              └─ CustodianCollection: Rijksmuseum Collection (contains Rembrandt)

Key Design Decisions

Decision 1: Fourth Aspect vs. Custodian Slot

Why separate class instead of Custodian.collections slot?

Separate class (chosen):

  • Collections have independent lifecycle (can be transferred, split, merged)
  • Collections need extensive metadata (9 specialized slots)
  • Collections are reconstructed outputs (require ReconstructionActivity link)
  • Collections can have temporal validity independent of custodian

Simple slot:

  • Would couple collection lifecycle to custodian
  • Harder to model custody transfers
  • Cannot link to observations/reconstructions separately

Decision 2: CIDOC-CRM E78 vs. RiC-O RecordSet

Why multiple ontology mappings?

Different heritage domains use different ontologies:

  • Museums: CIDOC-CRM E78_Curated_Holding (managed aggregations)
  • Archives: RiC-O RecordSet (archival fonds, series)
  • Libraries: BIBFRAME Collection (special collections)

Solution: Use collection_type to determine which ontology applies:

  • archival_recordsrico:RecordSet
  • museum_objectscrm:E78_Curated_Holding
  • library_holdingsbf:Collection

Collections can implement multiple ontology classes simultaneously.

Decision 3: temporal_coverage vs. Dates

Why TimeSpan for temporal_coverage?

temporal_coverage = Time period covered by collection materials (NOT collection creation dates)

Examples:

  • Rijksmuseum collection: 1100-2000 (artworks span 9 centuries)
  • Medieval manuscripts collection: 800-1500 (manuscripts created in Middle Ages)
  • WWII archive: 1940-1945 (documents from war period)

CustodianCollection creation dates tracked separately via valid_from/valid_to slots.


File Count Summary

Before CustodianCollection

  • 18 classes + 7 enums + 61 slots = 86 files
  • Grand total: 88 files (including metadata.yaml + main schema)

After CustodianCollection

  • 19 classes (+1: CustodianCollection)
  • 7 enums (unchanged)
  • 70 slots (+9: collection slots + linkers)
  • = 96 definition files
  • Grand total: 98 files (including metadata.yaml + main schema)

Testing & Validation

Schema Validation

$ cd schemas/20251121/linkml
$ gen-owl -f ttl 01_custodian_name_modular.yaml 2>&1 | head -20

# Result: SUCCESS
# - Output: 179 KB Turtle file
# - No schema errors
# - Expected warnings only (language type ambiguity)

ER Diagram Generation

$ gen-erdiagram 01_custodian_name_modular.yaml > \
  ../uml/mermaid/01_custodian_name_modular_20251122_182317_er.mmd

# Result: SUCCESS
# - 5.9 KB Mermaid ER diagram
# - All CustodianCollection relationships present
# - Verified cardinalities correct

RDF Format Generation

# All 4 RDF formats generated successfully
$ ls -lh schemas/20251121/rdf/*20251122_182317*
-rw-r--r--  179K  01_custodian_name_modular_20251122_182317.owl.ttl
-rw-r--r--  508K  01_custodian_name_modular_20251122_182317.nt
-rw-r--r--  425K  01_custodian_name_modular_20251122_182317.jsonld
-rw-r--r--  367K  01_custodian_name_modular_20251122_182317.rdf

Session Context

Phase 1 (Nov 22, 10:00-12:00 UTC)

Connected Orphaned Classes to Custodian

  • Problem: CustodianAppellation and CustodianIdentifier had no path to Custodian hub
  • Solution: Added variant_of_name and identifies_custodian slots
  • Result: All classes reachable from Custodian hub

Phase 2 (Nov 22, 14:00-16:00 UTC)

Appellation Refactoring for SKOS Alignment

  • Problem: CustodianAppellation directly on Custodian violated SKOS semantics
  • Solution: Moved alternative names to CustodianName (SKOS Concept)
  • Result: skos:prefLabel (CustodianName) + skos:altLabel (CustodianAppellation)

Phase 3 (Nov 22, 18:00-18:30 UTC) ← THIS SESSION

Added CustodianCollection as Fourth Aspect

  • Problem: No way to model heritage materials or metonymic references
  • Solution: Created CustodianCollection with 9 specialized slots
  • Result: Complete four-aspect modeling (Name, LegalStatus, Place, Collection)

Next Steps (Pending)

Documentation

  • Update README.md with four-aspect architecture
  • Create COLLECTION_EXAMPLES.md with real-world examples
  • Update ontology alignment documentation

Testing

  • Create test instances with CustodianCollection
    • Rijksmuseum (museum collection)
    • Noord-Hollands Archief (archival collection)
    • Koninklijke Bibliotheek (library holdings)
  • Unit tests for collection aspect
  • Validation tests for temporal_coverage TimeSpan

Features

  • Collection-level provenance events (custody transfers, acquisitions)
  • Collection splits/mergers (track fonds reorganization)
  • Digital surrogates (link physical collections to digitized versions)

References

Schema Files

  • Main schema: schemas/20251121/linkml/01_custodian_name_modular.yaml
  • CustodianCollection class: schemas/20251121/linkml/modules/classes/CustodianCollection.yaml
  • Collection slots: schemas/20251121/linkml/modules/slots/collection_*.yaml

Generated Outputs

  • RDF (Turtle): schemas/20251121/rdf/01_custodian_name_modular_20251122_182317.owl.ttl
  • ER Diagram: schemas/20251121/uml/mermaid/01_custodian_name_modular_20251122_182317_er.mmd

Ontology Documentation

  • CIDOC-CRM: data/ontology/CIDOC_CRM_v7.1.3.rdf (E78_Curated_Holding)
  • RiC-O: data/ontology/RiC-O_1-1.rdf (RecordSet)
  • BIBFRAME: data/ontology/bibframe_vocabulary.rdf (Collection)

Session Metadata

Attribute Value
Session Date 2025-11-22
Session Time 18:00-18:30 UTC (30 minutes)
Agent Claude (OpenCode)
User kempersc
Schema Version Before 0.1.0 (18 classes, 61 slots)
Schema Version After 0.3.0 (19 classes, 70 slots)
Files Created 10 (1 class + 9 slots)
Files Modified 2 (Custodian.yaml, main schema)
Validation Status PASS (gen-owl, gen-erdiagram)
RDF Formats Generated 4 (Turtle, N-Triples, JSON-LD, RDF/XML)
Diagram Generated ER diagram (Mermaid)
Documentation Created This file

Conclusion

The Heritage Custodian Ontology now models heritage institutions as four-aspect entities:

  1. CustodianName (emic label) - SKOS Concept
  2. CustodianLegalStatus (legal entity) - W3C ORG, TOOI, CPOV
  3. CustodianPlace (nominal location) - CIDOC-CRM E53_Place
  4. CustodianCollection (heritage materials) - CIDOC-CRM E78, RiC-O RecordSet, BIBFRAME Collection ← NEW!

Each aspect:

  • Has independent temporal lifecycle
  • Is reconstructed from CustodianObservation sources
  • Links back to Custodian hub via refers_to_custodian
  • Maps to established ontologies (CIDOC-CRM, RiC-O, BIBFRAME, SKOS, W3C ORG)

Status: COMPLETE - Ready for instance creation and testing


Document Version: 1.0
Generated: 2025-11-22T18:30:00Z
Author: AI Agent (Claude via OpenCode)