glam/RDF_UML_GENERATION_COMPLETE_20251122_155319.md
kempersc 2761857b0d Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams
- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.
2025-11-22 23:01:13 +01:00

11 KiB

RDF and UML Generation Complete - Session Summary

Date: 2025-11-22
Session: Namespace Conflict Resolution & Visualization Generation
Status: COMPLETE
Final Timestamp: 20251122_155319


Executive Summary

Successfully resolved all namespace conflicts in the modular LinkML schema and generated complete RDF and UML outputs. The session overcame LinkML's gen-yuml path resolution bug by creating custom OWL → UML converter scripts using PlantUML and Mermaid.

Key Achievements:

  • Fixed 5 class files with namespace conflicts
  • Generated 4 RDF formats (1.3MB total)
  • Created 3 UML visualization formats (PlantUML PNG/SVG, Mermaid)
  • Built 2 reusable OWL converter scripts
  • Documented complete regeneration workflow

Problem Solved: Namespace Conflicts

Issue

Multiple module files contained duplicate prefix definitions that conflicted with modules/metadata.yaml:

WARNING: schema namespace already mapped to http://schema.org/ - Overriding with https://schema.org/
WARNING: heritage namespace already mapped to https://nde.nl/ontology/hc/# - Overriding with https://nde.nl/ontology/hc/
WARNING: tooi namespace already mapped to https://standaarden.overheid.nl/tooi# - Overriding with https://identifier.overheid.nl/tooi/def/ont/

Solution

Removed duplicate prefixes from 5 files and added ../metadata imports:

File Duplicates Removed Unique Prefixes Kept
LegalEntityType.yaml 8 (heritage, schema, org, cpov, crm, tooi, foaf, owl) rov
LegalForm.yaml 4 (heritage, schema, org, tooi) rov, gleif, iso20275
RegistrationInfo.yaml 4 (heritage, schema, org, tooi) rov
LegalName.yaml 3 (heritage, schema, tooi) rov
ISO20275_mapping.yaml 3 (heritage, org, schema) iso20275, wd

Result: Clean RDF generation with zero namespace warnings.


Generated Artifacts

RDF Files (Timestamp: 20251122_155319)

Format Size Lines Status Use Case
OWL/Turtle 159KB 2,619 Primary format, human-readable
N-Triples 456KB 3,027 Bulk loading, line-oriented processing
JSON-LD 380KB 14,094 Web APIs, JavaScript integration
RDF/XML 328KB 4,585 Legacy systems, XML tools

Total: 1.3MB across 4 serialization formats
Triple count: 3,027 triples

Location: schemas/20251121/rdf/custodian_multi_aspect_20251122_155319.*

UML Visualizations (Timestamp: 20251122_155319)

Format Size Tool Status Use Case
PlantUML Source 1.5KB Custom script Editable diagram source
PlantUML PNG 47KB PlantUML CLI Raster image for documents
PlantUML SVG 51KB PlantUML CLI Vector graphic (web, scaling)
Mermaid 1.6KB Custom script GitHub README, Markdown

Location:

  • schemas/20251121/uml/plantuml/custodian_multi_aspect_20251122_155319.*
  • schemas/20251121/uml/mermaid/custodian_multi_aspect_20251122_155319.mmd

Classes visualized: 35 HC ontology classes with properties and inheritance


Custom Converter Scripts

1. scripts/owl_to_plantuml.py

Purpose: Convert OWL/Turtle RDF to PlantUML class diagram

Features:

  • Parses RDF graph using rdflib
  • Extracts classes, properties, inheritance (rdfs:subClassOf)
  • Generates PlantUML syntax with class notes
  • Supports property type annotations

Usage:

python3 scripts/owl_to_plantuml.py input.owl.ttl output.puml
plantuml output.puml           # Render to PNG
plantuml -tsvg output.puml     # Render to SVG

Stats: 153 lines, handles 35 classes, includes RDFS/OWL reasoning

2. scripts/owl_to_mermaid.py

Purpose: Convert OWL/Turtle RDF to Mermaid class diagram

Features:

  • Parses RDF graph using rdflib
  • Generates Mermaid classDiagram syntax
  • Limits properties to 8 per class (readability)
  • Compatible with GitHub, GitLab, VS Code preview

Usage:

python3 scripts/owl_to_mermaid.py input.owl.ttl output.mmd

Stats: 133 lines, handles 35 classes, web-friendly output


Regeneration Workflow

Step 1: Generate RDF from LinkML

cd schemas/20251121/linkml
TIMESTAMP=$(date +%Y%m%d_%H%M%S)

# Generate OWL/Turtle
gen-owl -f ttl 01_custodian_name_modular.yaml 2>/dev/null \
  > ../rdf/custodian_multi_aspect_${TIMESTAMP}.owl.ttl

# Convert to other formats
cd ../rdf
rdfpipe custodian_multi_aspect_${TIMESTAMP}.owl.ttl -o nt 2>/dev/null \
  > custodian_multi_aspect_${TIMESTAMP}.nt
rdfpipe custodian_multi_aspect_${TIMESTAMP}.owl.ttl -o json-ld 2>/dev/null \
  > custodian_multi_aspect_${TIMESTAMP}.jsonld
rdfpipe custodian_multi_aspect_${TIMESTAMP}.owl.ttl -o xml 2>/dev/null \
  > custodian_multi_aspect_${TIMESTAMP}.rdf

Step 2: Generate UML Visualizations

# PlantUML
python3 scripts/owl_to_plantuml.py \
  schemas/20251121/rdf/custodian_multi_aspect_${TIMESTAMP}.owl.ttl \
  schemas/20251121/uml/plantuml/custodian_multi_aspect_${TIMESTAMP}.puml

cd schemas/20251121/uml/plantuml
plantuml custodian_multi_aspect_${TIMESTAMP}.puml
plantuml -tsvg custodian_multi_aspect_${TIMESTAMP}.puml

# Mermaid
python3 scripts/owl_to_mermaid.py \
  schemas/20251121/rdf/custodian_multi_aspect_${TIMESTAMP}.owl.ttl \
  schemas/20251121/uml/mermaid/custodian_multi_aspect_${TIMESTAMP}.mmd

Step 3: Validate Output

# Check file sizes
ls -lh schemas/20251121/rdf/custodian_multi_aspect_${TIMESTAMP}.*
ls -lh schemas/20251121/uml/*/custodian_multi_aspect_${TIMESTAMP}.*

# Optional: Validate RDF syntax
rapper -i turtle -c schemas/20251121/rdf/custodian_multi_aspect_${TIMESTAMP}.owl.ttl

Ontology Structure (35 Classes)

Core Hub Pattern

  • Custodian - Minimal hub (persistent ID only)
  • CustodianObservation - Source-based references
  • ReconstructionActivity - Entity resolution process

Three Independent Aspects

  1. CustodianLegalStatus - Formal legal entity (registered)
  2. CustodianName - Standardized emic name (ambiguous)
  3. CustodianPlace - Nominal place designation

Supporting Classes

  • Provenance: ConfidenceMeasure, SourceDocument, ReconstructionAgent
  • Temporal: TimeSpan (begin_of_begin, end_of_end)
  • Identity: Identifier, Appellation, LanguageCode
  • Legal: LegalEntityType, LegalForm, LegalName, RegistrationInfo

Enumerations (5)

  • AgentTypeEnum - PERSON, ORGANIZATION, SOFTWARE
  • AppellationTypeEnum - Name classifications
  • EntityTypeEnum - Legal entity types
  • LegalStatusEnum - ACTIVE, DISSOLVED, MERGED, etc.
  • PlaceSpecificityEnum - CITY, REGION, COUNTRY, etc.

Technical Details

Namespace Consistency

All modules now use standardized namespace URIs:

Prefix URI Source
heritage https://nde.nl/ontology/hc/ modules/metadata.yaml
schema https://schema.org/ modules/metadata.yaml (HTTPS!)
tooi https://identifier.overheid.nl/tooi/def/ont/ modules/metadata.yaml

Why this matters:

  • Prevents duplicate triples (same property, different namespace)
  • Enables consistent SPARQL queries
  • Maintains Linked Open Data best practices

Import Path Pattern

# Standard pattern for class modules
imports:
  - linkml:types
  - ../metadata        # ← Shared prefixes
  - ./SiblingClass     # ← Same-directory classes

# Only declare unique prefixes
prefixes:
  linkml: https://w3id.org/linkml/
  rov: http://www.w3.org/ns/regorg#  # ← Not in metadata.yaml

What Didn't Work (But We Solved)

Issue: LinkML gen-yuml Path Resolution Bug

Error:

FileNotFoundError: [Errno 2] No such file or directory: 
  '/Users/kempersc/apps/glam/schemas/20251121/linkml/ReconstructionAgent.yaml'

Root cause: gen-yuml looks for ReconstructionAgent.yaml at schema root instead of modules/classes/ReconstructionAgent.yaml

Solution: Created custom OWL → UML converters that:

  1. Parse already-generated OWL/Turtle (which works correctly)
  2. Extract class structure from RDF triples
  3. Generate PlantUML/Mermaid from RDF graph

Advantage: More flexible than gen-yuml, can customize diagram layout


Files Modified/Created

Modified (5 schema files)

  1. schemas/20251121/linkml/modules/classes/LegalEntityType.yaml
  2. schemas/20251121/linkml/modules/classes/LegalForm.yaml
  3. schemas/20251121/linkml/modules/classes/RegistrationInfo.yaml
  4. schemas/20251121/linkml/modules/classes/LegalName.yaml
  5. schemas/20251121/linkml/modules/mappings/ISO20275_mapping.yaml

Generated (8 artifact files)

RDF:

  1. custodian_multi_aspect_20251122_155319.owl.ttl (159KB)
  2. custodian_multi_aspect_20251122_155319.nt (456KB)
  3. custodian_multi_aspect_20251122_155319.jsonld (380KB)
  4. custodian_multi_aspect_20251122_155319.rdf (328KB)

UML: 5. custodian_multi_aspect_20251122_155319.puml (1.5KB) 6. custodian_multi_aspect_20251122_155319.png (47KB) 7. custodian_multi_aspect_20251122_155319.svg (51KB) 8. custodian_multi_aspect_20251122_155319.mmd (1.6KB)

Created (2 scripts)

  1. scripts/owl_to_plantuml.py (153 lines)
  2. scripts/owl_to_mermaid.py (133 lines)

Total: 15 files (5 modified, 8 generated, 2 created)


Success Criteria

  • All namespace conflicts resolved (zero warnings)
  • 4 RDF formats generated successfully (1.3MB)
  • UML visualizations created (PlantUML + Mermaid)
  • Reusable converter scripts documented
  • Full regeneration workflow documented
  • All files use proper timestamps (YYYYMMDD_HHMMSS)

Integration with Project Documentation

This session builds on:

  • RDF_GENERATION_SUMMARY.md - RDF usage guide (created earlier today)
  • .opencode/SCHEMA_GENERATION_RULES.md - Timestamp policy (Rule 1)
  • AGENTS.md - LinkML master schema policy (Rule 0)

Next Steps (Optional)

Immediate

  • Test RDF in SPARQL endpoint (Apache Jena Fuseki)
  • Validate OWL with Protégé or HermiT reasoner
  • Generate HTML docs from LinkML schema

Short-term

  • File bug report: LinkML gen-yuml path resolution
  • Create SPARQL query examples
  • Add RDF validation to CI/CD

Long-term

  • Implement OWL reasoning rules
  • Create SHACL shapes for validation
  • Generate JSON-LD @context file

Conclusion

Status: COMPLETE

The Heritage Custodian Ontology has been successfully converted to RDF and visualized in multiple formats. All namespace conflicts resolved, ensuring clean Linked Open Data output.

Ready for:

  • SPARQL querying and reasoning
  • Semantic web integration
  • Ontology-based data validation
  • Knowledge graph construction

Deliverables:

  • 4 RDF serialization formats
  • 3 UML visualization formats
  • 2 reusable converter scripts
  • Complete regeneration documentation

Session Completed: 2025-11-22 15:55:19
Artifact Timestamp: 20251122_155319
Documentation: RDF_UML_GENERATION_COMPLETE_20251122_155319.md