glam/frontend/public/schemas/20251121/DIAGRAM_GENERATION_WORKFLOW.md
kempersc 13938c92ca chore(schemas): sync LinkML schemas to frontend apps
Copies authoritative schemas from schemas/20251121/ to:
- frontend/public/schemas/20251121/
- apps/archief-assistent/public/schemas/20251121/

This ensures slot definitions with corrected ontology property
references (commit 2808dad6cd) are available to frontend apps.
2026-01-10 15:02:25 +01:00

9.8 KiB

Diagram Generation from LinkML Schema - Proper Workflow

Date: 2025-11-21
Status: WORKING CORRECTLY


The Correct Workflow

Mermaid diagrams MUST be generated from the LinkML schema files, not manually edited.

Step 1: Edit LinkML Schema Files

All changes start with the source schema files:

schemas/20251121/linkml/01_custodian_name_modular.yaml         # Main schema
schemas/20251121/linkml/modules/slots/refers_to_custodian.yaml  # Slot definitions
schemas/20251121/linkml/modules/classes/Custodian.yaml          # Class definitions

Step 2: Regenerate Diagrams

Use the generation script (NOT manual editing):

cd /Users/kempersc/apps/glam

# Generate Mermaid diagram
python scripts/generate_mermaid_modular.py \
    schemas/20251121/linkml/01_custodian_name_modular.yaml \
    schemas/20251121/uml/mermaid/custodian_name_v5_final.mmd

# Generate PlantUML diagram
python scripts/generate_plantuml_modular.py \
    schemas/20251121/linkml/01_custodian_name_modular.yaml \
    schemas/20251121/uml/plantuml/custodian_name_final.puml

Step 3: Verify Hub Connections

Run the test script to verify:

cd /Users/kempersc/apps/glam
python tests/test_mermaid_generation.py

Expected output:

🎉 ALL TESTS PASSED
  ✅ Base slot defined correctly (range: Custodian, required: True)
  ✅ Induced slots in all 3 classes correct
  ✅ Mermaid diagram generated successfully
  ✅ All 3 hub connections present in diagram
  ✅ Custodian class properly defined

Step 4: Generate RDF/OWL Formats

cd /Users/kempersc/apps/glam/schemas/20251121

# Generate OWL/Turtle
gen-owl -f ttl linkml/01_custodian_name_modular.yaml > rdf/custodian_hub.owl.ttl

# Generate JSON-LD
gen-owl -f jsonld linkml/01_custodian_name_modular.yaml > rdf/custodian_hub.jsonld

# Generate N-Triples
gen-owl -f nt linkml/01_custodian_name_modular.yaml > rdf/custodian_hub.nt

Why This Workflow Matters

WRONG Approach: Manual Editing

# DON'T do this!
vim schemas/20251121/uml/mermaid/custodian_name_v5_final.mmd  # ← Editing diagram directly!

Problems:

  • Changes are lost on next generation
  • Diagram diverges from schema (source of truth)
  • No way to verify correctness
  • Breaks reproducibility

CORRECT Approach: Schema-First

# DO this instead!
vim schemas/20251121/linkml/modules/slots/refers_to_custodian.yaml  # ← Edit schema!
python scripts/generate_mermaid_modular.py schema.yaml output.mmd   # ← Regenerate
python tests/test_mermaid_generation.py                             # ← Verify

Benefits:

  • Schema is single source of truth
  • Diagrams always match schema
  • Changes are reproducible
  • Automated testing catches errors

How the Generator Works

The Fixed generate_mermaid_modular.py Script

# Key fix: Use induced_slot() instead of raw slot_usage
for slot_name in sv.class_slots(class_name):
    # This properly merges base slot with slot_usage overrides
    slot = sv.induced_slot(slot_name, class_name)  # ← CRITICAL!
    
    if slot and slot.range and slot.range in sv.all_classes():
        # Generate relationship line
        lines.append(f'{class_name} {cardinality} {slot.range} : "{slot_name}"')

Why induced_slot() is critical:

  • It merges the base slot definition with class-specific slot_usage overrides
  • Returns the final effective slot as it appears in that class
  • Without it, the generator would see range: uriorcurie from slot_usage instead of range: Custodian from the base slot

What Gets Generated

From the schema, the generator produces:

Custodian {
    uriorcurie hc_id  
    datetime created  
    datetime modified  
}

CustodianReconstruction ||--|| Custodian : "refers_to_custodian"
CustodianName ||--|| Custodian : "refers_to_custodian"
CustodianObservation ||--|| Custodian : "refers_to_custodian"

These 3 relationship lines come from:

  1. CustodianReconstruction has slot refers_to_custodian with range Custodian
  2. CustodianName has slot refers_to_custodian with range Custodian
  3. CustodianObservation has slot refers_to_custodian with range Custodian

Test-Driven Development

The test script (tests/test_mermaid_generation.py) validates:

Test 1: Base Slot Definition

base_slot = sv.get_slot("refers_to_custodian")
assert base_slot.range == "Custodian"
assert base_slot.required == True

Test 2: Induced Slots in Each Class

for class_name in ['CustodianObservation', 'CustodianName', 'CustodianReconstruction']:
    induced_slot = sv.induced_slot("refers_to_custodian", class_name)
    assert induced_slot.range == "Custodian"
    assert induced_slot.required == True

Test 3: Diagram Generation

mermaid = generate_mermaid_from_schemaview(sv)
assert len(mermaid) > 0

Test 4: Hub Connections Present

expected = [
    'CustodianReconstruction ||--|| Custodian : "refers_to_custodian"',
    'CustodianName ||--|| Custodian : "refers_to_custodian"',
    'CustodianObservation ||--|| Custodian : "refers_to_custodian"'
]
for connection in expected:
    assert connection in mermaid

Test 5: Custodian Class Definition

assert "Custodian {" in mermaid
assert "hc_id" in mermaid

Common Mistakes to Avoid

Mistake 1: Editing slot_usage Instead of Base Slot

WRONG:

# In CustodianObservation.yaml
slot_usage:
  refers_to_custodian:
    range: uriorcurie  # ← This overrides the base slot and breaks relationships!

RIGHT:

# In refers_to_custodian.yaml (base slot definition)
slots:
  refers_to_custodian:
    range: Custodian  # ← Define range here, in the base slot
    required: true

Mistake 2: Using Raw slot_usage in Generator

WRONG:

# In generator script
slot = cls.slot_usage.get(slot_name)  # ← Gets overrides only!

RIGHT:

# In generator script
slot = sv.induced_slot(slot_name, class_name)  # ← Gets effective merged slot!

Mistake 3: Forgetting to Regenerate After Schema Changes

WRONG:

vim linkml/modules/slots/refers_to_custodian.yaml
# Done! (forgot to regenerate)

RIGHT:

vim linkml/modules/slots/refers_to_custodian.yaml
python scripts/generate_mermaid_modular.py ...  # ← Regenerate
python tests/test_mermaid_generation.py         # ← Test

Automated Regeneration Script

Create a convenience script to regenerate all artifacts:

#!/bin/bash
# scripts/regenerate_all.sh

set -e  # Exit on error

SCHEMA="schemas/20251121/linkml/01_custodian_name_modular.yaml"

echo "🔄 Regenerating all artifacts from LinkML schema..."
echo ""

# Mermaid diagram
echo "📊 Generating Mermaid diagram..."
python scripts/generate_mermaid_modular.py \
    "$SCHEMA" \
    schemas/20251121/uml/mermaid/custodian_name_v5_final.mmd

# PlantUML diagram
echo "📊 Generating PlantUML diagram..."
python scripts/generate_plantuml_modular.py \
    "$SCHEMA" \
    schemas/20251121/uml/plantuml/custodian_name_final.puml

# RDF formats
echo "🔗 Generating RDF/OWL formats..."
cd schemas/20251121
gen-owl -f ttl linkml/01_custodian_name_modular.yaml > rdf/custodian_hub.owl.ttl
gen-owl -f jsonld linkml/01_custodian_name_modular.yaml > rdf/custodian_hub.jsonld
gen-owl -f nt linkml/01_custodian_name_modular.yaml > rdf/custodian_hub.nt
cd ../..

# Run tests
echo "✅ Running verification tests..."
python tests/test_mermaid_generation.py

echo ""
echo "🎉 All artifacts regenerated successfully!"

Usage:

chmod +x scripts/regenerate_all.sh
./scripts/regenerate_all.sh

Files Reference

Source (Single Source of Truth)

schemas/20251121/linkml/01_custodian_name_modular.yaml         # Main schema
schemas/20251121/linkml/modules/slots/refers_to_custodian.yaml # Hub connector slot
schemas/20251121/linkml/modules/classes/Custodian.yaml         # Hub class

Generators (Tools)

scripts/generate_mermaid_modular.py   # Mermaid diagram generator (FIXED with induced_slot)
scripts/generate_plantuml_modular.py  # PlantUML diagram generator
scripts/regenerate_all.sh             # Convenience script (run after schema changes)

Generated Artifacts (Do NOT Edit Manually!)

schemas/20251121/uml/mermaid/custodian_name_v5_final.mmd  # Mermaid diagram
schemas/20251121/uml/plantuml/custodian_name_final.puml   # PlantUML diagram
schemas/20251121/rdf/custodian_hub.owl.ttl                # RDF/OWL Turtle
schemas/20251121/rdf/custodian_hub.jsonld                 # JSON-LD
schemas/20251121/rdf/custodian_hub.nt                     # N-Triples

Tests (Verification)

tests/test_mermaid_generation.py  # Automated test for diagram generation

Quick Reference

After Editing Schema

# 1. Regenerate all artifacts
./scripts/regenerate_all.sh

# 2. Or regenerate individually:
python scripts/generate_mermaid_modular.py schema.yaml output.mmd
gen-owl -f ttl schema.yaml > output.ttl

# 3. Verify
python tests/test_mermaid_generation.py

Verify Hub Connections

# Quick check
grep "||--|| Custodian" schemas/20251121/uml/mermaid/custodian_name_v5_final.mmd

# Should show 3 lines:
# CustodianReconstruction ||--|| Custodian : "refers_to_custodian"
# CustodianName ||--|| Custodian : "refers_to_custodian"
# CustodianObservation ||--|| Custodian : "refers_to_custodian"

Summary

DO:

  • Edit LinkML schema files (YAML)
  • Regenerate diagrams from schema
  • Run tests to verify
  • Commit both schema and generated files

DON'T:

  • Edit diagram files directly
  • Use raw slot_usage in generators
  • Skip regeneration after schema changes
  • Commit divergent diagrams

Golden Rule: Schema files are the single source of truth. Everything else is generated from them.


Last Updated: 2025-11-21
Verified Working: All tests passing
Next Review: After any schema changes