glam/schemas/20251121/HUB_ARCHITECTURE_COMPLETION_SUMMARY.md
kempersc 284b575e88 Add UML diagrams for Custodian Hub v2 in Mermaid and PlantUML formats
- Introduced a new Mermaid diagram for Custodian Hub v2, detailing entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument.
- Established relationships between entities, including temporal extents, derivations, and revisions.
- Added a comprehensive PlantUML diagram reflecting the same structure and relationships, including enumerations for various types and statuses relevant to custodians and observations.
- Enhanced documentation to clarify the hub architecture pattern and its implications for data integrity and source authority.
2025-11-21 22:30:07 +01:00

17 KiB

Hub Architecture Implementation - Completion Summary

Date: November 21, 2025, 22:24
Schema Version: v0.1.0 (Hub Architecture)
Status: COMPLETE

What Was Accomplished

1. Corrected Fundamental Conceptual Error

Problem Identified: Previous agent misunderstood the relationship between Custodian and CustodianName:

  • OLD (WRONG): Treated CustodianName as an identifier for Custodian entities
  • NEW (CORRECT): Custodians are identified by persistent URIs (hc_id), and names are observations about custodians

Key Insight: Names are evidence, not identifiers. The hub persists independently of any single piece of evidence.


2. Implemented Hub Architecture Pattern

The Custodian class is now a minimal abstract hub that:

  • Contains only the persistent identifier (hc_id: https://nde.nl/ontology/hc/{abstracted-ghcid})
  • Acts as a connection point for all observations and reconstructions
  • Allows multiple, potentially conflicting pieces of evidence to coexist
  • Prevents privileging any single source as authoritative

Hub Structure:

Custodian (Hub)
    ↑ refers_to_custodian
    ├── CustodianObservation (evidence from sources)
    ├── CustodianName (observed names in specific contexts)
    └── CustodianReconstruction (formal entity interpretations)

Design Philosophy: The hub is NOT a thing with properties - it's a connection point that enables:

  • Conflict tolerance: Multiple observations can contradict each other
  • Complete provenance: Every piece of evidence is traceable
  • Temporal evolution: Interpretations can change without losing history

3. Schema Changes Summary

A. New Slots Created (7 files)

  1. hc_id.yaml - Persistent identifier for custodian hub

    • Format: https://nde.nl/ontology/hc/{abstracted-ghcid}
    • Example: https://nde.nl/ontology/hc/nl-nh-ams-m-rm-q190804
    • Maps to: dcterms:identifier
  2. refers_to_custodian.yaml - Links observations/reconstructions to hub

    • Maps to: dcterms:references
    • Range: uriorcurie (must match hub ID pattern)
  3. observation_source.yaml - Direct source reference (simplified)

    • Maps to: dcterms:source
    • Alternative to full SourceDocument class when full metadata isn't needed
  4. reconstruction_method.yaml - Documents synthesis methodology

    • Maps to: prov:hadPlan
    • Examples: "Manual expert curation", "Automated fuzzy matching (threshold 0.85)"
  5. entity_type.yaml - Categorizes reconstructed entities

    • Range: EntityTypeEnum (INDIVIDUAL, GROUP, ORGANIZATION, GOVERNMENT, CORPORATION)
    • Maps to: rdf:type
  6. emic_name.yaml - Self-designated name from custodian's perspective

    • Maps to: skos:prefLabel
    • Respects cultural context and indigenous naming practices
  7. name_language.yaml - Language code for observed names

    • Maps to: dcterms:language
    • Pattern: ISO 639-1 or BCP 47 codes (e.g., "nl", "en", "pt-BR")

B. New Enum Created (1 file)

EntityTypeEnum.yaml - Formal entity type classification

  • INDIVIDUAL: Single person (crm:E21_Person)
  • GROUP: Informal collective (crm:E74_Group)
  • ORGANIZATION: Formal organization (org:Organization)
  • GOVERNMENT: Government body (cpov:PublicOrganisation)
  • CORPORATION: Commercial corporation (org:FormalOrganization)

C. Updated Classes (4 files)

  1. Custodian.yaml

    • Changed: From generic base class to minimal hub
    • Primary slot: Changed from id to hc_id with pattern validation
    • Role: Abstract connection point (no descriptive properties)
  2. CustodianObservation.yaml

    • Added: refers_to_custodian slot (links to hub)
    • Added: observation_source slot (simplified source tracking)
    • Emphasis: All observations must reference the hub
  3. CustodianName.yaml

    • Added: emic_name slot (observed self-designated name)
    • Added: name_language slot (language code)
    • Clarification: Names are NOT identifiers, just observations with temporal/contextual validity
  4. CustodianReconstruction.yaml

    • Added: refers_to_custodian slot (links to hub)
    • Added: entity_type slot (formal categorization)
    • Added: reconstruction_method slot (methodology documentation)

D. Updated Main Schema

01_custodian_name_modular.yaml:

  • Added imports for 7 new hub architecture slots
  • Added import for EntityTypeEnum
  • Added missing agent-related slot imports:
    • activity_type
    • affiliation
    • agent_name
    • agent_type
    • alternative_observed_names
  • Updated description to explain hub architecture pattern
  • Total imports now: 69 slot modules + 6 enum modules + 12 class modules = 87 module imports

4. Generated Artifacts

RDF/OWL Schema

  • File: rdf/custodian_hub_v2.ttl
  • Size: 91 KB (1,560 lines)
  • Format: Turtle (RDF)
  • Generated with: gen-owl -f ttl
  • Namespaces: CIDOC-CRM, PROV-O, Dublin Core, FOAF, SKOS, Schema.org, W3C Org, W3C Time

Key Classes in RDF:

<https://nde.nl/ontology/hc/custodian.owl.ttl> a owl:Ontology ;
    rdfs:label "heritage-custodian-observation-reconstruction" ;
    dcterms:license "https://creativecommons.org/licenses/by-sa/4.0/" ;
    dcterms:title "Heritage Custodian Observation and Reconstruction Pattern" ;
    pav:version "0.1.0" .

UML Diagrams

  1. PlantUML Diagram

    • File: uml/plantuml/custodian_hub_v2.puml
    • Size: 6,685 bytes
    • Classes: 12
    • Enums: 6
    • Can render to: PNG, SVG, PDF via PlantUML server
  2. Mermaid Diagram

    • File: uml/mermaid/custodian_hub_v2.mmd
    • Size: 3,347 bytes
    • Format: Mermaid ER diagram
    • Can render in: GitHub, GitLab, Markdown viewers

5. Bug Fixes Applied

A. Enum Module Structure Fix

Problem: EntityTypeEnum.yaml was missing proper schema wrapper
Solution: Added schema structure:

id: https://nde.nl/ontology/hc/enum/EntityTypeEnum
name: EntityTypeEnum
title: Entity Type Enumeration

imports:
  - linkml:types

enums:
  EntityTypeEnum:
    # ... enum definition

B. Slot Module Structure Fix

Problem: All 7 new hub architecture slots lacked schema wrapper
Solution: Wrapped each slot in proper schema structure:

id: https://nde.nl/ontology/hc/slot/{slot_name}
name: {slot_name}-slot

slots:
  {slot_name}:
    # ... slot definition

C. Missing Slot Imports Fix

Problem: Main schema didn't import agent-related slots referenced by Agent and ReconstructionActivity classes
Solution: Added 5 missing slot imports:

  • activity_type (for ReconstructionActivity)
  • affiliation (for Agent)
  • agent_name (for Agent)
  • agent_type (for Agent)
  • alternative_observed_names (for CustodianName)

6. Ontology Alignment

Base Ontologies Integrated:

Concept LinkML Class Ontology Mapping
Heritage custodian hub Custodian crm:E39_Actor (CIDOC-CRM)
Evidence of custodian CustodianObservation pico:PersonObservation (PiCo pattern)
Observed name CustodianName skos:prefLabel + dcterms:temporal
Formal entity CustodianReconstruction rico:Agent / cpov:PublicOrganisation
Entity resolution ReconstructionActivity prov:Activity
Responsible party Agent prov:Agent + foaf:Agent
Temporal extent TimeSpan time:Interval (W3C Time)

Inspiration: PiCo (Persons in Context) ontology for observation/entity distinction


7. Documentation Created

  1. SESSION_SUMMARY_20251121_LINKML_HUB_ARCHITECTURE_COMPLETE.md

    • Complete change log with rationale
    • Before/after comparisons
    • Implementation details
  2. CUSTODIAN_HUB_ARCHITECTURE.md

    • Architecture explanation for future agents
    • Design patterns and anti-patterns
    • Integration guidelines
  3. examples/hub_architecture_rijksmuseum.yaml

    • Example using Rijksmuseum data
    • Shows hub with multiple observations and reconstructions
    • Demonstrates temporal validity and provenance tracking
  4. HUB_ARCHITECTURE_COMPLETION_SUMMARY.md (this file)

    • Comprehensive summary of all changes
    • Reference for next session

File Statistics

Total Files Modified/Created: 16

Modified:

  • linkml/01_custodian_name_modular.yaml (main schema)
  • linkml/modules/classes/Custodian.yaml
  • linkml/modules/classes/CustodianObservation.yaml
  • linkml/modules/classes/CustodianName.yaml
  • linkml/modules/classes/CustodianReconstruction.yaml

Created (New slot modules):

  • linkml/modules/slots/hc_id.yaml
  • linkml/modules/slots/refers_to_custodian.yaml
  • linkml/modules/slots/observation_source.yaml
  • linkml/modules/slots/reconstruction_method.yaml
  • linkml/modules/slots/entity_type.yaml
  • linkml/modules/slots/emic_name.yaml
  • linkml/modules/slots/name_language.yaml

Created (New enum module):

  • linkml/modules/enums/EntityTypeEnum.yaml

Generated:

  • rdf/custodian_hub_v2.ttl (91 KB RDF/OWL schema)
  • uml/plantuml/custodian_hub_v2.puml (6.7 KB PlantUML diagram)
  • uml/mermaid/custodian_hub_v2.mmd (3.3 KB Mermaid ER diagram)

Validation Status

Schema Validation

  • LinkML schema loads without errors (SchemaView successful)
  • RDF generation succeeds (91 KB output, 1,560 lines)
  • PlantUML generation succeeds (12 classes, 6 enums recognized)
  • Mermaid generation succeeds (ER diagram with relationships)
  • ⚠️ Minor warnings:
    • Schema.org namespace mapping conflict (harmless)
    • Multiple OWL types for some literals (LinkML generator quirk)

Data Validation

  • 📝 TODO: Validate example instance (examples/hub_architecture_rijksmuseum.yaml) with linkml-validate
  • 📝 TODO: Create additional test instances for edge cases

Key Design Decisions

1. Persistent Identifier Format

Decision: Use NDE ontology namespace with abstracted GHCID
Format: https://nde.nl/ontology/hc/{abstracted-ghcid}
Example: https://nde.nl/ontology/hc/nl-nh-ams-m-rm-q190804
Rationale: Stable, resolvable URIs that align with GHCID system but abstracted for ontology use

2. Hub Pattern Over Property-Rich Entity

Decision: Custodian class has ONLY hc_id (+ metadata)
Rationale:

  • Prevents privileging any single source
  • Allows conflicting observations to coexist
  • Enables complete provenance tracking
  • Supports temporal evolution of interpretations

3. Observation/Reconstruction Distinction

Decision: Separate classes for evidence (CustodianObservation) vs. formal entities (CustodianReconstruction)
Rationale:

  • PiCo ontology pattern (proven in person/organization modeling)
  • Clear semantic distinction between "what we observed" vs. "what we concluded"
  • Supports fuzzy temporal boundaries with TimeSpan

4. TimeSpan Integration

Decision: Use fuzzy temporal boundaries (begin/end of begin, begin/end of end)
Rationale:

  • Heritage institutions often have uncertain founding dates
  • Dissolution may be gradual (e.g., "ceased operations sometime in 1980s")
  • W3C Time Ontology alignment

5. Multilingual Name Support

Decision: name_language slot with ISO 639-1/BCP 47 codes
Rationale:

  • Global heritage institutions have names in multiple languages
  • Emic names must preserve original language context
  • Enables language-tagged literals in RDF

Critical Understanding for Future Agents

The Hub Is NOT a Thing with Properties

WRONG: Thinking of Custodian as a "museum entity" with name, address, etc.
RIGHT: Thinking of Custodian as a persistent identifier that connects observations

Analogy: The hub is like a physical pin on a bulletin board:

  • The pin (hub) doesn't contain information itself
  • Notes (observations) are attached to the pin with strings (refers_to_custodian)
  • Multiple notes can contradict each other (conflict tolerance)
  • Remove one note, the pin remains (PID stability)
  • The pin's location (hc_id) never changes

Data Flows TO the Hub, Not FROM It

Observations → Hub ← Reconstructions

# Evidence (observation) points to hub
<https://nde.nl/ontology/hc/observation/isil-2024-001>
    refers_to_custodian <https://nde.nl/ontology/hc/nl-nh-ams-m-rm-q190804> .

# Interpretation (reconstruction) points to hub
<https://nde.nl/ontology/hc/reconstruction/expert-curated-001>
    refers_to_custodian <https://nde.nl/ontology/hc/nl-nh-ams-m-rm-q190804> .

# Hub has NO outgoing properties except metadata
<https://nde.nl/ontology/hc/nl-nh-ams-m-rm-q190804>
    hc_id "https://nde.nl/ontology/hc/nl-nh-ams-m-rm-q190804" ;
    dcterms:created "2025-11-21T22:00:00Z" .

Names Are Observations, Not Identifiers

WRONG: "Rijksmuseum" identifies the institution
RIGHT: "Rijksmuseum" is an observed name from a specific source at a specific time

# Name as observation (temporal, contextual)
- emic_name: Rijksmuseum
  name_language: nl
  refers_to_custodian: https://nde.nl/ontology/hc/nl-nh-ams-m-rm-q190804
  observation_source: https://www.rijksmuseum.nl
  observation_date: 2025-11-21
  name_validity_period:
    begin_of_the_begin: 1885-01-01  # Opened as Rijksmuseum in 1885

Next Steps

Immediate (This Session if Continuing)

  1. Validate Example Instance

    linkml-validate -s linkml/01_custodian_name_modular.yaml \
        examples/hub_architecture_rijksmuseum.yaml
    
  2. Generate Additional RDF Formats

    # JSON-LD
    gen-owl -f jsonld linkml/01_custodian_name_modular.yaml > rdf/custodian_hub_v2.jsonld
    
    # N-Triples
    gen-owl -f nt linkml/01_custodian_name_modular.yaml > rdf/custodian_hub_v2.nt
    
    # RDF/XML
    gen-owl -f rdf linkml/01_custodian_name_modular.yaml > rdf/custodian_hub_v2.rdf
    
  3. Render UML Diagrams

    # PlantUML → PNG (requires PlantUML installed)
    plantuml -tpng uml/plantuml/custodian_hub_v2.puml
    
    # Or use online renderer
    open http://www.plantuml.com/plantuml/uml/
    

Short-term (Next Session)

  1. Create Data Conversion Scripts

    • Migrate existing GHCID-based data to hub architecture
    • Generate hub identifiers from existing records
    • Create observations from authoritative sources (ISIL, Wikidata)
    • Synthesize reconstructions from merged data
  2. Implement SPARQL Query Examples

    • Query all observations for a given hub
    • Find custodians with conflicting names
    • Retrieve reconstructions by entity type
    • Temporal queries (active custodians in 1950-1980)
  3. Build Validation Test Suite

    • Valid hub structures
    • Invalid references (observation without hub)
    • Temporal consistency checks
    • Provenance completeness tests

Long-term (Project Roadmap)

  1. Integration with Wikidata

    • Map hub IDs to Wikidata Q-numbers
    • Import Wikidata statements as observations
    • Bidirectional reconciliation
  2. TypeDB Schema Migration

    • Translate hub architecture to TypeDB schema
    • Implement hub pattern in TypeDB relations
    • Test complex temporal queries
  3. RDF Triplestore Deployment

    • Load RDF into GraphDB/Blazegraph/Virtuoso
    • Create SPARQL endpoint
    • Implement federated queries with Wikidata
  4. Documentation Site

    • Generate browsable ontology documentation
    • Provide worked examples
    • API reference for data consumers

References

Ontology Documentation

LinkML Resources

Project Documentation

  • schemas/20251121/linkml/01_custodian_name_modular.yaml - Main schema
  • schemas/20251121/SESSION_SUMMARY_20251121_LINKML_HUB_ARCHITECTURE_COMPLETE.md - Detailed change log
  • schemas/20251121/CUSTODIAN_HUB_ARCHITECTURE.md - Architecture guide
  • schemas/20251121/examples/hub_architecture_rijksmuseum.yaml - Example instance

Acknowledgments

Schema Version: v0.1.0
Implementation Date: November 21, 2025
Agent: OpenCODE (session 20251121-2216-2224)
Ontology Pattern: Inspired by PiCo (Persons in Context) - FICLIT Project

License: Creative Commons BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/)


Session Metadata

Start Time: 2025-11-21T22:16:00Z
End Time: 2025-11-21T22:24:00Z
Duration: 8 minutes
Files Modified: 16
Lines of Code Changed: ~300
RDF Output: 91 KB (1,560 lines)
Documentation Generated: 4 files

Status: READY FOR NEXT PHASE (data conversion and validation)