glam/SESSION_SUMMARY_20251121_SCHEMA_METADATA_REFINEMENT.md
kempersc fa5680f0dd Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats
- Introduced custodian_hub_v3.mmd, custodian_hub_v4_final.mmd, and custodian_hub_v5_FINAL.mmd for Mermaid representation.
- Created custodian_hub_FINAL.puml and custodian_hub_v3.puml for PlantUML representation.
- Defined entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument.
- Established relationships and associations between entities, including temporal extents, observations, and reconstruction activities.
- Incorporated enumerations for various types, statuses, and classifications relevant to custodians and their activities.
2025-11-22 14:33:51 +01:00

9.3 KiB

Session Summary: Schema Metadata Refinement (2025-11-21)

Overview

Refined the LinkML schema to use proper structured metadata fields instead of informal notes, improving machine-readability and semantic precision.

Changes Made

Metadata Field Migration

Migrated informal notes to proper LinkML metadata fields across 4 classes:

Class Before After
Custodian Informal notes about PiCo equivalence Structured comments + see_also with reference URLs
CustodianObservation 5 informal note strings comments (5 items) + see_also (2 URLs)
CustodianName 4 informal note strings comments (4 items) + see_also (2 URLs)
CustodianReconstruction 4 informal note strings comments (4 items) + see_also (3 URLs)
ReconstructionActivity 2 informal note strings comments (3 items) + see_also (2 URLs)

LinkML Metadata Fields Used

  1. comments - External-facing explanations for human readers and tools

    • Usage patterns, distinctions between classes, examples
    • Semantic clarifications (emic vs etic, observation vs entity)
  2. see_also - Reference URLs for further documentation

    • CIDOC-CRM class documentation
    • PROV-O specifications
    • PiCo GitHub repository
    • Wikidata concept pages
    • W3C Organization Ontology
    • Schema.org/SKOS specifications
  3. notes - Reserved for slot-level usage guidance (NOT class-level)

    • Kept for legal_name slot (distinguishes legal vs. operational names)
    • Kept for legal_form slot (ISO 20275 reference information)

Removed Redundancies

  • Removed pico:PersonObservation from duplicate close_mappings in CustodianObservation
    • Already existed in exact_mappings (line 147)
  • Prevented duplicate close_mappings keys in CustodianReconstruction
    • Added pico:PersonReconstruction to existing list instead of creating new key

Before/After Examples

Custodian Class

Before:

notes:
  - "Equivalent to PiCo's pico:Person (abstract base)"
  - "Maps to CIDOC-CRM E39_Actor..."

After:

comments:
  - "Subclasses represent observation vs. entity distinction..."
  - "Broader semantic scope than 'organization'..."
see_also:
  - "https://ontome.net/ontology/c39" # CIDOC-CRM E39_Actor
  - "https://github.com/FICLIT/PiCo" # PiCo pattern
  - "https://www.wikidata.org/wiki/Q115641683" # Wikidata custodian

CustodianObservation Class

Before:

notes:
  - "Equivalent to PiCo's pico:PersonObservation"
  - "Can represent BOTH emic (insider) and etic (outsider) perspectives"
  - "Multiple observations (emic + etic) can link to one reconstruction (entity)"
  - "Examples: 'Rijks' (emic observation), 'The Rijksmuseum in Amsterdam' (etic observation)"
  - "See CustodianName subclass for standardized emic name specifically"

After:

comments:
  - "Can represent BOTH emic (insider) and etic (outsider) perspectives, depending on source type"
  - "Multiple observations (emic + etic, from different sources) can link to one reconstruction (entity)"
  - "Example emic observation: 'Rijks' (letterhead), 'John Smith Collection' (private collector's self-identification)"
  - "Example etic observation: 'The Rijksmuseum in Amsterdam' (guidebook), 'Smith's holdings' (auction catalog)"
  - "See CustodianName subclass for standardized emic name specifically"
see_also:
  - "https://github.com/FICLIT/PiCo" # PiCo ontology observation pattern
  - "https://www.w3.org/TR/prov-o/#Entity" # PROV-O Entity documentation

CustodianName Class

Before:

notes:
  - "Subclass of CustodianObservation, NOT a separate entity type"
  - "Captures the specific emic perspective among many possible observations"
  - "Used for standardization and consistent reference"
  - "Examples: Official website header, legal statutes, custodianal letterhead"

After:

comments:
  - "Subclass of CustodianObservation, NOT a separate entity type - inherits all observation properties"
  - "Captures the specific emic perspective among many possible observations recorded in sources"
  - "Used for standardization and consistent reference across datasets and citations"
  - "Typical sources: official website header, board resolution, legal statutes, organizational letterhead, self-identification documents"
see_also:
  - "https://www.w3.org/2004/02/skos/core#prefLabel" # SKOS preferred label pattern
  - "https://schema.org/name" # Schema.org naming conventions

CustodianReconstruction Class

Before:

notes:
  - "Equivalent to PiCo's pico:PersonReconstruction"
  - "Etic perspective: 'what is the formal legal entity'"
  - "One reconstruction can have many observations (emic names)"
  - "Examples: Stichting Rijksmuseum (legal entity) ← 'Rijks', 'Rijksmuseum Amsterdam' (observations)"

After:

comments:
  - "Represents the etic (outsider) perspective: 'what is the formal entity after analysis and reconciliation?'"
  - "One reconstruction can have many observations from different sources (emic names, translations, historical spellings, third-party references)"
  - "Example: 'Stichting Rijksmuseum' (legal entity/reconstruction) ← 'Rijks' (letterhead), 'Rijksmuseum Amsterdam' (signage), 'The Rijksmuseum' (guidebook)"
  - "Reconstruction process documented via prov:wasGeneratedBy → ReconstructionActivity (entity resolution, reconciliation, expert review)"
see_also:
  - "https://github.com/FICLIT/PiCo" # PiCo reconstruction pattern
  - "https://www.w3.org/TR/prov-o/#wasGeneratedBy" # PROV-O generation documentation
  - "https://lov.linkeddata.es/dataset/lov/vocabs/org" # W3C Organization Ontology

ReconstructionActivity Class

Before:

notes:
  - "Equivalent to PiCo's prov:Activity for PersonReconstruction"
  - "Documents the 'how' and 'who' of entity resolution"

After:

comments:
  - "Documents the 'how' (method) and 'who' (responsible_agent) of entity resolution and reconstruction"
  - "Enables provenance tracking for data quality assessment and citation purposes"
  - "Example methods: manual expert curation, algorithmic fuzzy matching, hybrid semi-automated with human review"
see_also:
  - "https://www.w3.org/TR/prov-o/#Activity" # PROV-O Activity documentation
  - "https://github.com/FICLIT/PiCo" # PiCo activity pattern for person reconstruction

Benefits of This Refactoring

1. Machine-Readability

  • Structured fields enable automated documentation generation (Sphinx, MkDocs)
  • Tools can parse see_also URLs for reference linking
  • comments can be extracted for tooltips in data entry interfaces

2. Semantic Precision

  • Clear separation between equivalence (exact_mappings, close_mappings) and references (see_also)
  • Removed informal statements like "Equivalent to X" from notes → formal mappings
  • Consistent use of proper LinkML metamodel fields

3. Documentation Quality

  • More detailed comments provide better context for human readers
  • Reference URLs make it easy to verify ontology alignment
  • Examples improved with multi-actor examples (individuals, groups, organizations)

4. Maintainability

  • Easier to update reference URLs centrally
  • Clearer distinction between class-level metadata and slot-level notes
  • Reduced redundancy (ontology mappings not repeated in notes)

Files Modified

  • schemas/20251121/linkml/01_custodian_name.yaml (845 lines)
    • 5 classes updated with structured metadata
    • 13 see_also URLs added
    • 19 comments items refined and expanded
    • YAML syntax validated

Validation

$ python3 -c "import yaml; yaml.safe_load(open('schemas/20251121/linkml/01_custodian_name.yaml'))"
✅ YAML syntax valid

Next Steps

  1. Metadata refinement - COMPLETE
  2. Regenerate RDF formats from updated LinkML schema:
    gen-owl -f ttl schemas/20251121/linkml/01_custodian_name.yaml > schemas/20251121/rdf/01_custodian_name.owl.ttl
    rdfpipe schemas/20251121/rdf/01_custodian_name.owl.ttl -o nt > schemas/20251121/rdf/01_custodian_name.nt
    rdfpipe schemas/20251121/rdf/01_custodian_name.owl.ttl -o jsonld > schemas/20251121/rdf/01_custodian_name.jsonld
    # ... repeat for all 8 RDF formats
    
  3. Validate with LinkML tools:
    linkml-validate -s schemas/20251121/linkml/01_custodian_name.yaml
    linkml-validate -s schemas/20251121/linkml/01_custodian_name.yaml schemas/20251121/examples/*.yaml
    
  4. Update TypeDB schema (manual translation)
  5. Update UML/Mermaid diagrams
  6. Create example instances demonstrating multi-ontology usage

Impact on Documentation

AGENTS.md

  • Already updated with Rule 0 emphasizing LinkML as single source of truth
  • Metadata refinement aligns with existing guidance

Ontology Mappings Documentation

  • schemas/20251121/ONTOLOGY_MAPPINGS.md - No changes needed
  • Reference URLs in see_also provide direct links to ontology documentation

Session Context

Part of: Custodian Schema Renaming + Ontology Alignment session (2025-11-21)

Related Sessions:

  • SESSION_SUMMARY_20251121_CUSTODIAN_RENAMING.md - Organization → Custodian renaming
  • schemas/20251121/ONTOLOGY_MAPPINGS.md - Comprehensive ontology alignment

Status: Complete


Version: v0.2.2 (metadata refinement)
Last Updated: 2025-11-21
Schema File: schemas/20251121/linkml/01_custodian_name.yaml