glam/BEFORE_AFTER_MERMAID_COMPARISON.md
2025-11-25 12:48:07 +01:00

9.5 KiB

Before vs After: Complete Schema Mermaid Diagram

Date: 2025-11-24


The Problem

LinkML's default behavior generates 53 separate Mermaid diagrams (one per class):

schemas/20251121/uml/mermaid/auto_generated/
├── ArchiveOrganizationType.mmd
├── BioCustodianType.mmd
├── CommercialOrganizationType.mmd
├── ConfidenceMeasure.mmd
├── Consortium.mmd
├── Country.mmd
├── Custodian.mmd                    ← Core class (abstract)
├── CustodianAppellation.mmd
├── CustodianCollection.mmd
├── CustodianIdentifier.mmd
├── CustodianLegalStatus.mmd         ← Core class
├── CustodianName.mmd                ← Core class
├── CustodianObservation.mmd         ← Core class
├── CustodianPlace.mmd               ← Core class
├── CustodianType.mmd                ← Core class (abstract)
├── DigitalPlatformType.mmd
├── EducationalProviderType.mmd
├── EncompassingBody.mmd             ← NEW: Abstract parent
├── FeaturePlaceType.mmd
├── GalleryType.mmd
├── HolySiteType.mmd
├── IntangibleHeritageType.mmd
├── LegalEntity.mmd
├── LegalResponsibility.mmd
├── LegalStatus.mmd
├── LibraryType.mmd
├── MixedCustodianType.mmd
├── MuseumType.mmd
├── NetworkOrganisation.mmd          ← NEW: Service providers
├── NonProfitType.mmd
├── OfficialInstitutionType.mmd
├── OrganizationalChangeEvent.mmd
├── OrganizationalStructure.mmd
├── PersonObservation.mmd
├── PersonalCollectionType.mmd
├── ReconstructionActivity.mmd
├── ReconstructionAgent.mmd
├── RegistrationAuthority.mmd
├── RegistrationInfo.mmd
├── ResearchOrganizationType.mmd
├── Settlement.mmd
├── Subregion.mmd
├── TasteScentHeritageType.mmd
├── TimeSpan.mmd
├── UmbrellaOrganisation.mmd         ← NEW: Legal parents
└── UnspecifiedType.mmd

Total: 53 files, 212 KB

Problem: To understand the schema architecture, you need to open 53 files and mentally reconstruct relationships.


The Solution

One comprehensive diagram showing everything:

schemas/20251121/uml/mermaid/
└── complete_schema_20251124_004329.mmd

Total: 1 file, 31 KB

Visual Comparison

Before: Fragmented View (53 files)

To understand how Custodian relates to other classes, you need to:

  1. Open Custodian.mmd → see immediate relationships
  2. Open CustodianObservation.mmd → see observation pattern
  3. Open CustodianLegalStatus.mmd → see legal aspect
  4. Open CustodianName.mmd → see name aspect
  5. Open CustodianPlace.mmd → see place aspect
  6. Open CustodianCollection.mmd → see collection aspect
  7. Open OrganizationalStructure.mmd → see internal structure
  8. Open EncompassingBody.mmd → see external governance
  9. Open UmbrellaOrganisation.mmd → see legal parents
  10. Open NetworkOrganisation.mmd → see service providers
  11. Open Consortium.mmd → see peer collaborations

Result: Mental overhead, lost context switching between 11+ files


After: Unified View (1 file)

Open complete_schema_20251124_004329.mmd → see everything at once:

classDiagram
  %% All 53 classes defined with attributes
  class Custodian
  Custodian : *hc_id uriorcurie
  Custodian : preferred_label CustodianName
  Custodian : custodian_type CustodianType
  Custodian : legal_status CustodianLegalStatus
  Custodian : place_designation CustodianPlace
  <<abstract>> Custodian
  
  class EncompassingBody
  EncompassingBody : id uriorcurie
  EncompassingBody : organization_name string
  <<abstract>> EncompassingBody
  
  class UmbrellaOrganisation
  UmbrellaOrganisation : governance_authority string
  
  class NetworkOrganisation
  NetworkOrganisation : service_offerings string
  
  class Consortium
  Consortium : membership_criteria string
  
  %% All 149 relationships visible
  EncompassingBody <|-- UmbrellaOrganisation : inherits
  EncompassingBody <|-- NetworkOrganisation : inherits
  EncompassingBody <|-- Consortium : inherits
  
  CustodianObservation --> "1" Custodian : identifies_custodian
  CustodianLegalStatus --> "1" Custodian : refers_to_custodian
  CustodianName --> "1" Custodian : refers_to_custodian
  CustodianPlace --> "1" Custodian : refers_to_custodian
  CustodianCollection --> "1" Custodian : refers_to_custodian
  
  %% ... 140+ more relationships

Result: Complete architecture visible in one view, no context switching


Feature Comparison

Feature Per-Class (Before) Complete (After)
Files generated 53 1
Total size 212 KB 31 KB
Classes shown 1 per file 53 in one file
Relationships Immediate neighbors only All 149 relationships
Abstract classes Marked per-file All 3 marked in context
Inheritance hierarchy Fragmented Complete tree visible
Hub pattern Hidden across files Immediately clear
EncompassingBody architecture 4 separate files Unified hierarchy
CustodianType taxonomy 19 separate files Full taxonomy tree
Context switching High (11+ files for Custodian) None
Onboarding time Hours (explore 53 files) Minutes (one diagram)
Presentation-ready Too fragmented Yes
Print-friendly 53 pages 1 diagram
Whiteboard-friendly Can't draw all Shows structure

Use Cases: When to Use What

Per-Class Diagrams (LinkML Default)

Best for:

  • Detailed class documentation
  • API reference generation
  • Field-level schema understanding
  • Developer onboarding (one class at a time)

Not good for:

  • Understanding overall architecture
  • Seeing cross-class relationships
  • Presentations and talks
  • Executive summaries

Complete Diagram (This Extension)

Best for:

  • Architecture overview - Understand schema structure at a glance
  • Presentations - Conference talks, webinars, workshops
  • Ontology consultations - Show alignment with CIDOC-CRM, W3C ORG, etc.
  • Onboarding - New developers see the big picture first
  • Documentation - Schema overview chapter in guides
  • Academic papers - Illustrate data model in publications
  • Stakeholder communication - Non-technical audience understanding

Not good for:

  • Field-level details (too many attributes = unreadable)
  • API documentation (too high-level)

Real-World Impact

Before (Fragmented)

Scenario: New developer joins project, asks "How does the hub pattern work?"

Answer:

"Open these files in order:
1. Custodian.mmd - see the hub
2. CustodianObservation.mmd - see observations
3. CustodianLegalStatus.mmd - see legal aspect
4. CustodianName.mmd - see name aspect
5. CustodianPlace.mmd - see place aspect
6. ReconstructionActivity.mmd - see the derivation process
Now mentally integrate all 6 diagrams to understand the pattern."

Time to understanding: 2-4 hours (with confusion)


After (Unified)

Scenario: Same question

Answer:

"Open complete_schema_20251124_004329.mmd and look at the center.
You'll see Custodian (hub) with 5 arrows pointing TO it from:
- CustodianObservation (sources)
- CustodianLegalStatus (legal aspect)
- CustodianName (emic name)
- CustodianPlace (location aspect)
- CustodianCollection (holdings)
All derived via ReconstructionActivity."

Time to understanding: 5-10 minutes (clear)


Technical Comparison

Generation Method

Per-Class (LinkML Default):

# Uses gen-yuml (part of LinkML docs generator)
gen-yuml schemas/01_custodian_name_modular.yaml \
  --output-dir schemas/20251121/uml/mermaid/auto_generated/

# Result: 53 files, one per class

Complete (This Extension):

# Uses custom script with SchemaView API
python3 scripts/generate_complete_mermaid_diagram.py

# Result: 1 file with all classes and relationships

Customization

Per-Class:

  • Limited customization (LinkML generator)
  • All-or-nothing (can't filter classes)
  • Fixed format

Complete:

  • Fully customizable (Python script)
  • Can filter by class, module, type
  • Can adjust attribute count per class
  • Can focus on specific relationship types
  • Easy to extend for new use cases

Storage Efficiency

Per-Class: 212 KB across 53 files

  • Each file has boilerplate (header, footer)
  • Class definitions repeated for relationships
  • Redundant metadata

Complete: 31 KB in 1 file

  • Single header/footer
  • Class definitions once
  • Relationships deduplicated

Savings: 85% reduction in total size


Conclusion

Both approaches have value:

  • Use per-class diagrams for detailed documentation and API reference
  • Use complete diagram for architecture understanding and communication

The complete diagram complements rather than replaces per-class diagrams.

Best practice: Generate both, use for different audiences.


Generated Files

  • Before: schemas/20251121/uml/mermaid/auto_generated/*.mmd (53 files)
  • After: schemas/20251121/uml/mermaid/complete_schema_20251124_004329.mmd (1 file)
  • Script: scripts/generate_complete_mermaid_diagram.py

Try It Yourself

# Generate complete diagram
cd /Users/kempersc/apps/glam
python3 scripts/generate_complete_mermaid_diagram.py

# View online
open https://mermaid.live/
# Paste contents of complete_schema_*.mmd

# Compare with per-class diagram
open schemas/20251121/uml/mermaid/auto_generated/Custodian.mmd