glam/BEFORE_AFTER_MERMAID_COMPARISON.md
2025-11-25 12:48:07 +01:00

328 lines
9.5 KiB
Markdown

# Before vs After: Complete Schema Mermaid Diagram
**Date**: 2025-11-24
---
## The Problem
LinkML's default behavior generates **53 separate Mermaid diagrams** (one per class):
```
schemas/20251121/uml/mermaid/auto_generated/
├── ArchiveOrganizationType.mmd
├── BioCustodianType.mmd
├── CommercialOrganizationType.mmd
├── ConfidenceMeasure.mmd
├── Consortium.mmd
├── Country.mmd
├── Custodian.mmd ← Core class (abstract)
├── CustodianAppellation.mmd
├── CustodianCollection.mmd
├── CustodianIdentifier.mmd
├── CustodianLegalStatus.mmd ← Core class
├── CustodianName.mmd ← Core class
├── CustodianObservation.mmd ← Core class
├── CustodianPlace.mmd ← Core class
├── CustodianType.mmd ← Core class (abstract)
├── DigitalPlatformType.mmd
├── EducationalProviderType.mmd
├── EncompassingBody.mmd ← NEW: Abstract parent
├── FeaturePlaceType.mmd
├── GalleryType.mmd
├── HolySiteType.mmd
├── IntangibleHeritageType.mmd
├── LegalEntity.mmd
├── LegalResponsibility.mmd
├── LegalStatus.mmd
├── LibraryType.mmd
├── MixedCustodianType.mmd
├── MuseumType.mmd
├── NetworkOrganisation.mmd ← NEW: Service providers
├── NonProfitType.mmd
├── OfficialInstitutionType.mmd
├── OrganizationalChangeEvent.mmd
├── OrganizationalStructure.mmd
├── PersonObservation.mmd
├── PersonalCollectionType.mmd
├── ReconstructionActivity.mmd
├── ReconstructionAgent.mmd
├── RegistrationAuthority.mmd
├── RegistrationInfo.mmd
├── ResearchOrganizationType.mmd
├── Settlement.mmd
├── Subregion.mmd
├── TasteScentHeritageType.mmd
├── TimeSpan.mmd
├── UmbrellaOrganisation.mmd ← NEW: Legal parents
└── UnspecifiedType.mmd
Total: 53 files, 212 KB
```
**Problem**: To understand the schema architecture, you need to open 53 files and mentally reconstruct relationships.
---
## The Solution
One comprehensive diagram showing **everything**:
```
schemas/20251121/uml/mermaid/
└── complete_schema_20251124_004329.mmd
Total: 1 file, 31 KB
```
---
## Visual Comparison
### Before: Fragmented View (53 files)
To understand how `Custodian` relates to other classes, you need to:
1. Open `Custodian.mmd` → see immediate relationships
2. Open `CustodianObservation.mmd` → see observation pattern
3. Open `CustodianLegalStatus.mmd` → see legal aspect
4. Open `CustodianName.mmd` → see name aspect
5. Open `CustodianPlace.mmd` → see place aspect
6. Open `CustodianCollection.mmd` → see collection aspect
7. Open `OrganizationalStructure.mmd` → see internal structure
8. Open `EncompassingBody.mmd` → see external governance
9. Open `UmbrellaOrganisation.mmd` → see legal parents
10. Open `NetworkOrganisation.mmd` → see service providers
11. Open `Consortium.mmd` → see peer collaborations
**Result**: Mental overhead, lost context switching between 11+ files
---
### After: Unified View (1 file)
Open `complete_schema_20251124_004329.mmd` → see **everything** at once:
```mermaid
classDiagram
%% All 53 classes defined with attributes
class Custodian
Custodian : *hc_id uriorcurie
Custodian : preferred_label CustodianName
Custodian : custodian_type CustodianType
Custodian : legal_status CustodianLegalStatus
Custodian : place_designation CustodianPlace
<<abstract>> Custodian
class EncompassingBody
EncompassingBody : id uriorcurie
EncompassingBody : organization_name string
<<abstract>> EncompassingBody
class UmbrellaOrganisation
UmbrellaOrganisation : governance_authority string
class NetworkOrganisation
NetworkOrganisation : service_offerings string
class Consortium
Consortium : membership_criteria string
%% All 149 relationships visible
EncompassingBody <|-- UmbrellaOrganisation : inherits
EncompassingBody <|-- NetworkOrganisation : inherits
EncompassingBody <|-- Consortium : inherits
CustodianObservation --> "1" Custodian : identifies_custodian
CustodianLegalStatus --> "1" Custodian : refers_to_custodian
CustodianName --> "1" Custodian : refers_to_custodian
CustodianPlace --> "1" Custodian : refers_to_custodian
CustodianCollection --> "1" Custodian : refers_to_custodian
%% ... 140+ more relationships
```
**Result**: Complete architecture visible in one view, no context switching
---
## Feature Comparison
| Feature | Per-Class (Before) | Complete (After) |
|---------|-------------------|------------------|
| **Files generated** | 53 | 1 |
| **Total size** | 212 KB | 31 KB |
| **Classes shown** | 1 per file | 53 in one file |
| **Relationships** | Immediate neighbors only | All 149 relationships |
| **Abstract classes** | Marked per-file | All 3 marked in context |
| **Inheritance hierarchy** | Fragmented | Complete tree visible |
| **Hub pattern** | Hidden across files | Immediately clear |
| **EncompassingBody architecture** | 4 separate files | Unified hierarchy |
| **CustodianType taxonomy** | 19 separate files | Full taxonomy tree |
| **Context switching** | High (11+ files for Custodian) | None |
| **Onboarding time** | Hours (explore 53 files) | Minutes (one diagram) |
| **Presentation-ready** | ❌ Too fragmented | ✅ Yes |
| **Print-friendly** | ❌ 53 pages | ✅ 1 diagram |
| **Whiteboard-friendly** | ❌ Can't draw all | ✅ Shows structure |
---
## Use Cases: When to Use What
### Per-Class Diagrams (LinkML Default)
**Best for**:
- Detailed class documentation
- API reference generation
- Field-level schema understanding
- Developer onboarding (one class at a time)
**Not good for**:
- Understanding overall architecture
- Seeing cross-class relationships
- Presentations and talks
- Executive summaries
### Complete Diagram (This Extension)
**Best for**:
- **Architecture overview** - Understand schema structure at a glance
- **Presentations** - Conference talks, webinars, workshops
- **Ontology consultations** - Show alignment with CIDOC-CRM, W3C ORG, etc.
- **Onboarding** - New developers see the big picture first
- **Documentation** - Schema overview chapter in guides
- **Academic papers** - Illustrate data model in publications
- **Stakeholder communication** - Non-technical audience understanding
**Not good for**:
- Field-level details (too many attributes = unreadable)
- API documentation (too high-level)
---
## Real-World Impact
### Before (Fragmented)
**Scenario**: New developer joins project, asks "How does the hub pattern work?"
**Answer**:
```
"Open these files in order:
1. Custodian.mmd - see the hub
2. CustodianObservation.mmd - see observations
3. CustodianLegalStatus.mmd - see legal aspect
4. CustodianName.mmd - see name aspect
5. CustodianPlace.mmd - see place aspect
6. ReconstructionActivity.mmd - see the derivation process
Now mentally integrate all 6 diagrams to understand the pattern."
```
**Time to understanding**: 2-4 hours (with confusion)
---
### After (Unified)
**Scenario**: Same question
**Answer**:
```
"Open complete_schema_20251124_004329.mmd and look at the center.
You'll see Custodian (hub) with 5 arrows pointing TO it from:
- CustodianObservation (sources)
- CustodianLegalStatus (legal aspect)
- CustodianName (emic name)
- CustodianPlace (location aspect)
- CustodianCollection (holdings)
All derived via ReconstructionActivity."
```
**Time to understanding**: 5-10 minutes (clear)
---
## Technical Comparison
### Generation Method
**Per-Class (LinkML Default)**:
```bash
# Uses gen-yuml (part of LinkML docs generator)
gen-yuml schemas/01_custodian_name_modular.yaml \
--output-dir schemas/20251121/uml/mermaid/auto_generated/
# Result: 53 files, one per class
```
**Complete (This Extension)**:
```bash
# Uses custom script with SchemaView API
python3 scripts/generate_complete_mermaid_diagram.py
# Result: 1 file with all classes and relationships
```
### Customization
**Per-Class**:
- Limited customization (LinkML generator)
- All-or-nothing (can't filter classes)
- Fixed format
**Complete**:
- Fully customizable (Python script)
- Can filter by class, module, type
- Can adjust attribute count per class
- Can focus on specific relationship types
- Easy to extend for new use cases
---
## Storage Efficiency
**Per-Class**: 212 KB across 53 files
- Each file has boilerplate (header, footer)
- Class definitions repeated for relationships
- Redundant metadata
**Complete**: 31 KB in 1 file
- Single header/footer
- Class definitions once
- Relationships deduplicated
**Savings**: 85% reduction in total size
---
## Conclusion
Both approaches have value:
- **Use per-class diagrams** for detailed documentation and API reference
- **Use complete diagram** for architecture understanding and communication
The complete diagram **complements** rather than **replaces** per-class diagrams.
**Best practice**: Generate both, use for different audiences.
---
## Generated Files
- **Before**: `schemas/20251121/uml/mermaid/auto_generated/*.mmd` (53 files)
- **After**: `schemas/20251121/uml/mermaid/complete_schema_20251124_004329.mmd` (1 file)
- **Script**: `scripts/generate_complete_mermaid_diagram.py`
---
## Try It Yourself
```bash
# Generate complete diagram
cd /Users/kempersc/apps/glam
python3 scripts/generate_complete_mermaid_diagram.py
# View online
open https://mermaid.live/
# Paste contents of complete_schema_*.mmd
# Compare with per-class diagram
open schemas/20251121/uml/mermaid/auto_generated/Custodian.mmd
```