glam/RDF_UML_GENERATION_COMPLETE_20251122_old.md
kempersc 2761857b0d Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams
- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.
2025-11-22 23:01:13 +01:00

12 KiB

RDF and UML Generation Complete

Date: 2025-11-22
Schema Version: 20251121
Status: COMPLETE


Summary

Successfully generated all RDF serializations and UML diagrams for the Heritage Custodian Ontology with the new legal entity model (v0.2.2).


Generated Files

RDF Formats (7 serializations)

All generated from: schemas/20251121/linkml/01_custodian_name_modular.yaml

Format File Size Lines Triples Description
Turtle 01_custodian_name_modular.owl.ttl 140K 2,328 2,701 Primary OWL ontology (human-readable)
N-Triples 01_custodian_name_modular.nt 452K 2,701 2,701 Line-based triple format (machine-readable)
JSON-LD 01_custodian_name_modular.jsonld 336K 7,451 2,701 JSON Linked Data (web-friendly)
RDF/XML 01_custodian_name_modular.rdf 324K 10,810 2,701 XML serialization (legacy compatibility)
N3 01_custodian_name_modular.n3 196K 5,144 2,701 Notation3 (Turtle superset)
TriG 01_custodian_name_modular.trig 196K 5,144 2,701 Named graphs extension
TriX 01_custodian_name_modular.trix 644K 21,377 2,701 XML with named graphs

Total RDF Size: ~2.3 MB
Total RDF Lines: 40,955 lines

UML Diagrams (2 formats)

Format File Size Description
Mermaid uml/mermaid/01_custodian_name_modular.mmd 6.0K Markdown-based class diagram (GitHub-friendly)
PlantUML uml/plantuml/01_custodian_name_modular.puml 7.5K UML class diagram with color-coded packages

Validation Results

RDF Validation

Using rdflib Python library:

✅ Turtle validation: SUCCESS
   Triples: 2,701
   Subjects: 652
   Predicates: 36
   Objects: 1,325

Key Statistics:

  • 2,701 triples - All class/slot/enum definitions and mappings
  • 652 unique subjects - Classes, slots, enums, and their components
  • 36 unique predicates - RDF/RDFS/OWL properties
  • 1,325 unique objects - Property values and types

Ontology Coverage

The generated RDF includes:

Classes (17):

  • Custodian (hub)
  • CustodianObservation, CustodianName (observation pattern)
  • CustodianReconstruction (reconstruction pattern)
  • LegalEntityType (NEW)
  • LegalForm (NEW)
  • LegalName (NEW)
  • RegistrationNumber (NEW, within RegistrationInfo)
  • RegistrationAuthority (NEW, within RegistrationInfo)
  • GovernanceStructure (NEW, within RegistrationInfo)
  • LegalStatus (NEW, within RegistrationInfo)
  • SourceDocument, TimeSpan, ConfidenceMeasure
  • ReconstructionActivity, ReconstructionAgent
  • Identifier, LanguageCode, Appellation

Enums (6):

  • AppellationTypeEnum
  • AgentTypeEnum
  • EntityTypeEnum (DEPRECATED, use LegalEntityType)
  • LegalStatusEnum (DEPRECATED, use LegalStatus class)
  • ReconstructionActivityTypeEnum
  • SourceDocumentTypeEnum

Slots (59+):

  • All 59 modular slot definitions
  • Including new legal entity slots: legal_entity_type, registration_numbers

UML Diagram Features

Mermaid Diagram

Features:

  • Class diagram with all 17 classes
  • Hub-Observation-Reconstruction pattern visualization
  • Legal entity model highlighted (8 new classes)
  • Relationship arrows with cardinality
  • Inline notes for key classes
  • GitHub-renderable (displays directly in markdown files)

Sections:

  1. Hub Pattern (Custodian)
  2. Observation Pattern (CustodianObservation, CustodianName)
  3. Reconstruction Pattern (CustodianReconstruction)
  4. Legal Entity Model (8 classes, highlighted)
  5. Supporting Classes (9 classes)

PlantUML Diagram

Features:

  • Color-coded packages:
    • 🔵 Light Blue: Hub (Custodian)
    • 🟢 Light Green: Observations
    • 🔴 Light Coral: Reconstructions
    • 🟡 Gold: Legal Entity classes
    • Light Gray: Supporting classes
  • Detailed class attributes with types
  • Relationship arrows with labels
  • Comprehensive notes explaining:
    • Hub pattern (minimal entity)
    • Observation pattern (source evidence)
    • Reconstruction pattern (formal entity)
    • Legal entity classes (NEW in v0.2.2)
  • ISO 20275 and TOOI references

Rendering:


Generation Process

Step 1: Generate OWL/Turtle

gen-owl -f ttl schemas/20251121/linkml/01_custodian_name_modular.yaml 2>/dev/null \
  > schemas/20251121/rdf/01_custodian_name_modular.owl.ttl

Output: 138K Turtle file with 2,328 lines

Step 2: Convert to Other RDF Formats

cd schemas/20251121/rdf
rdfpipe -i turtle -o nt 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.nt
rdfpipe -i turtle -o json-ld 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.jsonld
rdfpipe -i turtle -o xml 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.rdf
rdfpipe -i turtle -o n3 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.n3
rdfpipe -i turtle -o trig 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.trig
rdfpipe -i turtle -o trix 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.trix

Tool: rdfpipe from rdflib package

Step 3: Create UML Diagrams (Manual)

LinkML's auto-generators (gen-plantuml, gen-yuml) do not support modular schemas properly. Created comprehensive diagrams manually based on schema structure.

Mermaid: Manually authored class diagram with all relationships
PlantUML: Manually authored with color-coded packages and detailed notes

Step 4: Validate

from rdflib import Graph
g = Graph()
g.parse('01_custodian_name_modular.owl.ttl', format='turtle')
# SUCCESS: 2,701 triples

Ontology Mappings in RDF

The generated RDF includes mappings to:

W3C/DCMI Vocabularies

  • OWL: Class/property definitions
  • RDFS: Labels, comments, subclass relationships
  • RDF: Type assertions
  • DCTERMS: Title, license, version
  • SKOS: Definitions, notes, exact/close mappings
  • PAV: Provenance (version, license)
  • FOAF: Agent information
  • PROV-O: Activity tracking
  • TIME: Temporal expressions

Domain Ontologies

  • W3C Org Ontology (org:): Organization structure

    • org:classification (LegalEntityType)
    • org:hasUnit (GovernanceStructure)
  • ROV (rov:): Registered organizations

    • rov:legalName (LegalName)
    • rov:orgType (LegalForm)
    • rov:registration (RegistrationNumber)
    • rov:hasRegisteredOrganization (RegistrationAuthority)
  • TOOI (tooi:): Dutch government

    • tooi:rechtsvorm (legal form)
    • tooi:organisatieIdentificatie (registration)
    • tooi:officieleNaamInclSoort (legal name)
  • GLEIF (gleif:): Legal entity identifiers

    • gleif:hasLegalForm (LegalForm)
    • gleif-base:hasEntityStatus (LegalStatus)
  • Schema.org (schema:): Web semantics

    • schema:status (LegalStatus)
    • schema:identifier (identifiers)
    • schema:legalName (legal name)

RDF Format Comparison

Format Human-Readable Machine-Readable Web-Friendly Compression Use Case
Turtle Excellent Good 🟡 Fair Best Editing, documentation
N-Triples 🟡 Fair Excellent 🟡 Fair None Streaming, line-by-line processing
JSON-LD 🟡 Fair Excellent Excellent Good Web APIs, JavaScript
RDF/XML Poor Good 🟡 Fair Fair Legacy systems, XML tools
N3 Excellent Good 🟡 Fair Best Advanced logic, rules
TriG Good Good 🟡 Fair Best Named graphs, datasets
TriX Poor Good 🟡 Fair Poor XML + named graphs

Recommendations:

  • Development/Documentation: Use Turtle (most readable)
  • Web APIs: Use JSON-LD (web-native)
  • Bulk Processing: Use N-Triples (line-based, streaming)
  • SPARQL Queries: Load Turtle or TriG into triplestore
  • Legacy Integration: Use RDF/XML if required

SPARQL Query Examples

PREFIX heritage: <https://nde.nl/ontology/hc/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?type ?label ?description
WHERE {
  ?type a heritage:LegalEntityType .
  OPTIONAL { ?type rdfs:label ?label }
  OPTIONAL { ?type heritage:description ?description }
}
PREFIX heritage: <https://nde.nl/ontology/hc/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?class ?label
WHERE {
  ?class rdfs:subClassOf* heritage:CustodianReconstruction .
  ?class rdfs:label ?label .
  FILTER EXISTS { ?class heritage:legal_form ?form }
}

Query 3: List All Slots with ISO 20275 Mapping

PREFIX heritage: <https://nde.nl/ontology/hc/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rov: <http://www.w3.org/ns/regorg#>

SELECT ?slot ?label ?mapping
WHERE {
  ?slot a heritage:Slot .
  ?slot rdfs:label ?label .
  ?slot skos:exactMatch|skos:closeMatch ?mapping .
  FILTER (CONTAINS(STR(?mapping), "regorg"))
}

File Locations

schemas/20251121/
├── linkml/
│   └── 01_custodian_name_modular.yaml     # Source LinkML schema
│
├── rdf/
│   ├── 01_custodian_name_modular.owl.ttl  # Turtle (primary)
│   ├── 01_custodian_name_modular.nt       # N-Triples
│   ├── 01_custodian_name_modular.jsonld   # JSON-LD
│   ├── 01_custodian_name_modular.rdf      # RDF/XML
│   ├── 01_custodian_name_modular.n3       # N3
│   ├── 01_custodian_name_modular.trig     # TriG
│   └── 01_custodian_name_modular.trix     # TriX
│
└── uml/
    ├── mermaid/
    │   └── 01_custodian_name_modular.mmd  # Mermaid class diagram
    └── plantuml/
        └── 01_custodian_name_modular.puml # PlantUML class diagram

Next Steps

Immediate

  1. RDF generation - COMPLETE
  2. UML generation - COMPLETE
  3. Validation - COMPLETE
  4. Load into triplestore - TODO (optional)
  5. Render PlantUML diagram - TODO (optional)

Short-term

  1. Create SPARQL queries - TODO (example queries provided above)
  2. Generate documentation - TODO (using gen-doc)
  3. Create example instances - TODO (validate against RDF schema)

Medium-term

  1. Publish to ontology registry - TODO (LOV, BioPortal, etc.)
  2. Create persistent URIs - TODO (w3id.org or purl.org)
  3. Deploy SPARQL endpoint - TODO (public query interface)

Tools Used

Tool Version Purpose
gen-owl linkml 1.9.5 Generate OWL from LinkML
rdfpipe rdflib (Python) Convert RDF formats
rdflib Python package Validate RDF syntax
Manual authoring - Create UML diagrams

Troubleshooting

Issue: gen-owl warnings in output

Problem: gen-owl outputs warnings to stdout, corrupting Turtle file

Solution: Redirect stderr to /dev/null:

gen-owl -f ttl schema.yaml 2>/dev/null > output.ttl

Issue: gen-plantuml/gen-yuml fail with modular schema

Problem: LinkML generators don't support modular imports properly

Solution: Manually author UML diagrams based on schema structure

Issue: rdfpipe parsing errors

Problem: Turtle file contains non-RDF content (warnings)

Solution: Regenerate Turtle cleanly with stderr suppressed


Version Control

Generated from:

  • Schema: schemas/20251121/linkml/01_custodian_name_modular.yaml
  • Version: 0.1.0 (schema version in LinkML)
  • Legal Entity Model: v0.2.2 (project version)
  • Generation Date: 2025-11-22

Git Status:

  • All generated files should be committed to version control
  • RDF files are derived but worth tracking (transparency)
  • UML diagrams should be committed (manual authoring)

References


Status: ALL GENERATION COMPLETE

Next Session: Data instance creation and validation