glam/APPELLATION_REFACTORING_PHASE2_20251122.md
kempersc 2761857b0d Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams
- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.
2025-11-22 23:01:13 +01:00

15 KiB

CustodianAppellation Relationship Refactoring - Phase 2

Date: 2025-11-22
Session: OpenCode AI Agent (Phase 2)
Status: COMPLETE


Executive Summary

Refactored the Heritage Custodian Ontology to correctly model the relationship between CustodianAppellation and CustodianName, aligning with W3C SKOS best practices and ensuring CustodianIdentifier is the only class that directly identifies the Custodian hub.

This is Phase 2 of the Appellation/Identifier refactoring:

  • Phase 1 (Nov 22, morning): Connected orphaned classes to Custodian hub
  • Phase 2 (Nov 22, afternoon): Moved appellations from Custodian to CustodianName

Problem Statement

Phase 1 Architecture (Correct but Improvable)

After Phase 1 refactoring:

Custodian --[crm:P1_is_identified_by]--> CustodianAppellation
Custodian --[crm:P48_has_preferred_identifier]--> CustodianIdentifier

Issues Discovered:

  1. Semantic confusion: crm:P1_is_identified_by suggests appellations identify the Custodian hub
  2. Inconsistent hub design: Both Identifier and Appellation claimed to identify the hub
  3. Missing name aspect: Appellations should be variants of the canonical name, not hub identifiers
  4. Ontology misalignment: W3C Org Ontology uses skos:altLabel for alternative names, not crm:P1

Solution

Phase 2 Architecture (Correct and Aligned with SKOS)

Custodian (hub)
  └─ skos:prefLabel ──> CustodianName (canonical emic name)
       └─ skos:altLabel ──> CustodianAppellation (name variants)

Custodian (hub)
  └─ crm:P48_has_preferred_identifier ──> CustodianIdentifier (external IDs)

Key Changes from Phase 1:

  1. CustodianAppellation connects to CustodianName (not Custodian)
  2. Uses skos:altLabel (standard property for alternative lexical labels)
  3. Only CustodianIdentifier identifies the hub (maintains clean hub architecture)
  4. Inverse relationship: CustodianAppellation.variant_of_nameCustodianName (using skos:broader)

Files Modified

1. Schema Module Files

/schemas/20251121/linkml/modules/classes/Appellation.yaml

Changes:

  • Updated description: "alternative name variants for CustodianName" (was "custodian")
  • Changed slot from identifies_custodianvariant_of_name
  • Changed slot_uri from crm:P1i_identifiesskos:broader
  • Range changed from CustodianCustodianName
  • Added import for CustodianName class
  • Updated documentation with SKOS altLabel rationale

/schemas/20251121/linkml/modules/classes/Custodian.yaml

Changes:

  • Removed appellations slot from slots list (line 99)
  • Removed appellations slot_usage block (lines 169-178)
  • Updated documentation: "Alternative names (in CustodianName.alternative_names list)"

/schemas/20251121/linkml/modules/classes/CustodianName.yaml

Changes:

  • Added alternative_names slot to slots list
  • Added alternative_names slot_usage:
    • slot_uri: skos:altLabel
    • range: CustodianAppellation
    • multivalued: true
    • Examples: "BnF", "Rijks", translations, historical variants
  • Added related mappings: foaf:nick, gleif:hasOtherName

2. New Slot Files Created

/schemas/20251121/linkml/modules/slots/alternative_names.yaml

Purpose: CustodianName → CustodianAppellation relationship

  • slot_uri: skos:altLabel
  • range: CustodianAppellation
  • multivalued: true
  • Domain: CustodianName (SKOS Concept)

/schemas/20251121/linkml/modules/slots/variant_of_name.yaml

Purpose: CustodianAppellation → CustodianName inverse relationship

  • slot_uri: skos:broader
  • range: CustodianName
  • Domain: E41_Appellation
  • Inverse of skos:altLabel

3. Main Schema File

/schemas/20251121/linkml/01_custodian_name_modular.yaml

Changes:

  • Removed import: modules/slots/appellations
  • Added imports:
    • modules/slots/alternative_names
    • modules/slots/variant_of_name
  • Updated change log:
    • "New slots (3): alternative_names (CustodianName → CustodianAppellation), variant_of_name (inverse), identifies_custodian (Identifier → Custodian)"
    • "Architecture change: CustodianAppellation now connects to CustodianName (not Custodian) using skos:altLabel"

4. Deprecated Files

/schemas/20251121/linkml/modules/slots/appellations.yaml ⚠️ DEPRECATED

Status: Marked as deprecated with clear migration path

  • Added deprecation notice explaining why it was replaced
  • Documents old architecture vs. new architecture
  • Points to replacement files (alternative_names.yaml, variant_of_name.yaml)
  • Kept for historical reference only

Ontology Alignment

SKOS (Simple Knowledge Organization System)

Primary Property: skos:altLabel

  • Definition: "An alternative lexical label for a resource"
  • Use case: "Trading names, colloquial names, abbreviations, acronyms"
  • Source: W3C Org Ontology (org:alternativeName → skos:altLabel)

Inverse Property: skos:broader

  • Links alternative label back to its preferred concept
  • Standard SKOS hierarchical relationship
  • W3C Org Ontology: org:alternativeName (subproperty of skos:altLabel)
  • GLEIF: gleif:hasOtherName (subproperty of skos:altLabel)
  • FOAF: foaf:nick (for nicknames)
  • Schema.org: schema:alternateName (close match)

Generated Outputs

RDF Formats (Timestamp: 20251122_181217 - 20251122_181224)

schemas/20251121/rdf/
├── 01_custodian_name_modular_20251122_181217.owl.ttl  (160 KB - OWL/Turtle)
├── 01_custodian_name_modular_20251122_181224.nt       (458 KB - N-Triples)
├── 01_custodian_name_modular_20251122_181224.jsonld   (382 KB - JSON-LD)
└── 01_custodian_name_modular_20251122_181224.rdf      (330 KB - RDF/XML)

Validation: All files generated successfully with gen-owl and rdfpipe

UML Diagrams (Timestamp: 20251122_181237)

schemas/20251121/uml/mermaid/
└── 01_custodian_name_modular_20251122_181237_er.mmd  (176 lines - ER diagram)

Key Relationships Verified:

CustodianName ||--}o CustodianAppellation : "alternative_names"
CustodianAppellation ||--|o CustodianName : "variant_of_name"

Architecture Evolution

Phase 1 → Phase 2 Comparison

Aspect Phase 1 Phase 2
Appellation connects to Custodian (hub) CustodianName (aspect)
Property used crm:P1_is_identified_by skos:altLabel
Inverse property crm:P1i_identifies skos:broader
Semantic meaning "Appellation identifies hub" "Appellation is variant of name"
Ontology alignment CIDOC-CRM W3C SKOS + CIDOC-CRM
Slot name appellations alternative_names

Examples

Phase 1 (Before Phase 2)

# Phase 1 architecture - Connected but semantically confused
Custodian:
  id: https://nde.nl/ontology/hc/cust/bnf
  appellations:  # ⚠️ Direct connection to hub (problematic)
    - appellation_value: "BnF"
      appellation_type: ABBREVIATION
      identifies_custodian: https://nde.nl/ontology/hc/cust/bnf  # Suggests it identifies hub

Phase 2 (After Phase 2)

# Phase 2 architecture - CORRECT!
Custodian:
  id: https://nde.nl/ontology/hc/cust/bnf
  preferred_label:
    refers_to_custodian: https://nde.nl/ontology/hc/cust/bnf
    emic_name: "Bibliothèque nationale de France"
    alternative_names:  # ✅ Connection through CustodianName
      - appellation_value: "BnF"
        appellation_type: ABBREVIATION
        variant_of_name: <link back to CustodianName>
      - appellation_value: "National Library of France"
        appellation_language: "en"
        appellation_type: TRANSLATION

RDF Serialization

Turtle (TTL) - Phase 2

@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix crm: <http://www.cidoc-crm.org/cidoc-crm/> .

<https://nde.nl/ontology/hc/cust/bnf> 
    skos:prefLabel <https://nde.nl/ontology/hc/name/bnf-001> .

<https://nde.nl/ontology/hc/name/bnf-001>
    a crm:E33_Linguistic_Object ;
    rdf:value "Bibliothèque nationale de France" ;
    skos:altLabel <https://nde.nl/ontology/hc/appellation/bnf-abbrev> .

<https://nde.nl/ontology/hc/appellation/bnf-abbrev>
    a crm:E41_Appellation ;
    rdf:value "BnF" ;
    skos:broader <https://nde.nl/ontology/hc/name/bnf-001> .

Validation Results

LinkML Schema Validation

$ gen-owl -f ttl 01_custodian_name_modular.yaml
# Output: 160 KB OWL/Turtle file with no errors

Warnings (non-critical):

  • ⚠️ Multiple owl types for language (rdfs:Literal vs owl:Thing) - expected for ambiguous ranges
  • ⚠️ Schema namespace override (schema.org vs schema:) - cosmetic, doesn't affect semantics

ER Diagram Validation

Relationships Confirmed:

  1. Custodian ||--|o CustodianName : "preferred_label" (hub → name)
  2. CustodianName ||--}o CustodianAppellation : "alternative_names" (name → variants, one-to-many)
  3. CustodianAppellation ||--|o CustodianName : "variant_of_name" (variant → name, inverse)
  4. Custodian ||--}o CustodianIdentifier : "identifiers" (hub → external IDs)
  5. CustodianIdentifier ||--|o Custodian : "identifies_custodian" (ID → hub, identifies)

Key Observation: No direct Custodian ↔ Appellation relationship exists (by design!)


Impact Analysis

Benefits

  1. Semantic clarity: Appellations are now clearly name variants, not hub identifiers
  2. Ontology alignment: Uses standard skos:altLabel (W3C recommended practice)
  3. Clean hub architecture: Only CustodianIdentifier identifies the hub
  4. Multi-aspect modeling: Names can have independent alternative labels
  5. Bidirectional relationships: Both forward (alternative_names) and inverse (variant_of_name)

Breaking Changes from Phase 1

⚠️ Data Migration Required:

Phase 1 data structure:

Custodian:
  appellations: [list of CustodianAppellation]

Phase 2 data structure:

Custodian:
  preferred_label:  # CustodianName
    alternative_names: [list of CustodianAppellation]

Migration Script: TODO - Create conversion script for existing data


Design Rationale

Why skos:altLabel Instead of crm:P1_is_identified_by?

CIDOC-CRM crm:P1_is_identified_by:

  • Purpose: "Names and labels used to identify this custodian"
  • Problem: Suggests appellations identify the hub entity
  • Conflicts with: CustodianIdentifier being the only hub identifier

SKOS skos:altLabel:

  • Purpose: "Alternative lexical label for a resource"
  • Standard for: Trading names, colloquial names, abbreviations
  • Aligns with: W3C Org Ontology best practices
  • Clear semantics: Alternative labels for a name aspect, not hub identifiers

Why CustodianName, Not Custodian?

Aspect-Based Architecture:

  • CustodianName = One aspect of the custodian (the emic designation)
  • CustodianIdentifier = Different aspect (external identifiers)
  • CustodianLegalStatus = Different aspect (legal entity)

Each aspect has independent lifecycle:

  • Names can have alternative variants (appellations)
  • Identifiers can reference external systems (ISIL, Wikidata)
  • Legal statuses can have registration numbers (KvK, company ID)

Mixing aspects breaks the model:

  • Custodian.appellations → Implies hub has name variants (wrong level of abstraction)
  • CustodianName.alternative_names → Correct level (names have variants)

Testing Checklist

  • LinkML schema validation passes
  • OWL/Turtle generation succeeds
  • RDF format conversions (N-Triples, JSON-LD, RDF/XML)
  • Mermaid ER diagram generation
  • Relationships verified in ER diagram
  • Deprecated file marked with migration path
  • Main schema imports updated
  • Unit tests for data instances (TODO)
  • Migration script for existing data (TODO)

Files to Update

  1. README.md - Update architecture diagrams showing new relationships
  2. SCHEMA_MODULES.md - Document alternative_names and variant_of_name slots
  3. ONTOLOGY_EXTENSIONS.md - Add section on SKOS altLabel usage
  4. Data migration guide - Create step-by-step conversion instructions

Reference Documents


Phase Comparison Summary

Phase Date Focus Status
Phase 1 2025-11-22 AM Connect orphaned Appellation/Identifier to Custodian hub Complete
Phase 2 2025-11-22 PM Move Appellation from Custodian to CustodianName (SKOS alignment) Complete

See Also:

  • APPELLATION_IDENTIFIER_REFACTORING_20251122.md - Phase 1 documentation
  • LEGAL_ENTITY_REFACTORING.md - Legal entity model (context for Phase 1)

Next Steps

Immediate (Required Before v0.2.0 Release)

  1. Create data migration script (scripts/migrate_appellations_phase2_20251122.py)

    • Convert Phase 1 Custodian.appellations to Phase 2 CustodianName.alternative_names
    • Validate all existing YAML instance files
    • Generate migration report
  2. Update documentation:

    • README.md architecture diagrams
    • SCHEMA_MODULES.md slot documentation
    • Examples in LinkML schema comments
  3. Add unit tests:

    • Test CustodianName with alternative_names
    • Test CustodianAppellation.variant_of_name inverse
    • Validate SKOS altLabel RDF serialization

Future Enhancements

  1. Add language-tagged appellations:

    • Support multilingual variants with proper @lang tags
    • RDF example: skos:altLabel "BnF"@fr, "National Library of France"@en
  2. Appellation provenance:

    • Track source of alternative names
    • Add temporal validity (when name was used)
  3. Authority control integration:

    • Link appellations to name authority records (VIAF, ISNI)
    • Validate variant forms against authority files

Conclusion

Phase 2 successfully aligns the Heritage Custodian Ontology with W3C SKOS best practices, maintains clean hub architecture, and provides clear semantic distinction between:

  • CustodianIdentifier: External identifiers that reference the hub
  • CustodianAppellation: Alternative name variants for the canonical emic name

This change improves ontology interoperability, semantic clarity, and prepares the schema for future extensions (multilingual support, authority control, provenance tracking).


Version: v0.1.0 → v0.2.0 (Phase 2)
Schema Status: Validated
RDF Generation: Complete (4 formats)
Diagrams: Generated (Mermaid ER)
Data Migration: Pending (Phase 2 → Phase 1 conversion script needed)


End of Phase 2 Report