glam/RDF_UML_GENERATION_COMPLETE_20251122_old.md
kempersc 2761857b0d Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams
- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.
2025-11-22 23:01:13 +01:00

408 lines
12 KiB
Markdown

# RDF and UML Generation Complete
**Date**: 2025-11-22
**Schema Version**: 20251121
**Status**: ✅ **COMPLETE**
---
## Summary
Successfully generated all RDF serializations and UML diagrams for the Heritage Custodian Ontology with the new legal entity model (v0.2.2).
---
## Generated Files
### RDF Formats (7 serializations)
All generated from: `schemas/20251121/linkml/01_custodian_name_modular.yaml`
| Format | File | Size | Lines | Triples | Description |
|--------|------|------|-------|---------|-------------|
| **Turtle** | `01_custodian_name_modular.owl.ttl` | 140K | 2,328 | 2,701 | Primary OWL ontology (human-readable) |
| **N-Triples** | `01_custodian_name_modular.nt` | 452K | 2,701 | 2,701 | Line-based triple format (machine-readable) |
| **JSON-LD** | `01_custodian_name_modular.jsonld` | 336K | 7,451 | 2,701 | JSON Linked Data (web-friendly) |
| **RDF/XML** | `01_custodian_name_modular.rdf` | 324K | 10,810 | 2,701 | XML serialization (legacy compatibility) |
| **N3** | `01_custodian_name_modular.n3` | 196K | 5,144 | 2,701 | Notation3 (Turtle superset) |
| **TriG** | `01_custodian_name_modular.trig` | 196K | 5,144 | 2,701 | Named graphs extension |
| **TriX** | `01_custodian_name_modular.trix` | 644K | 21,377 | 2,701 | XML with named graphs |
**Total RDF Size**: ~2.3 MB
**Total RDF Lines**: 40,955 lines
### UML Diagrams (2 formats)
| Format | File | Size | Description |
|--------|------|------|-------------|
| **Mermaid** | `uml/mermaid/01_custodian_name_modular.mmd` | 6.0K | Markdown-based class diagram (GitHub-friendly) |
| **PlantUML** | `uml/plantuml/01_custodian_name_modular.puml` | 7.5K | UML class diagram with color-coded packages |
---
## Validation Results
### RDF Validation ✅
Using `rdflib` Python library:
```
✅ Turtle validation: SUCCESS
Triples: 2,701
Subjects: 652
Predicates: 36
Objects: 1,325
```
**Key Statistics**:
- **2,701 triples** - All class/slot/enum definitions and mappings
- **652 unique subjects** - Classes, slots, enums, and their components
- **36 unique predicates** - RDF/RDFS/OWL properties
- **1,325 unique objects** - Property values and types
### Ontology Coverage
The generated RDF includes:
**Classes (17)**:
- Custodian (hub)
- CustodianObservation, CustodianName (observation pattern)
- CustodianReconstruction (reconstruction pattern)
- **LegalEntityType** (NEW)
- **LegalForm** (NEW)
- **LegalName** (NEW)
- **RegistrationNumber** (NEW, within RegistrationInfo)
- **RegistrationAuthority** (NEW, within RegistrationInfo)
- **GovernanceStructure** (NEW, within RegistrationInfo)
- **LegalStatus** (NEW, within RegistrationInfo)
- SourceDocument, TimeSpan, ConfidenceMeasure
- ReconstructionActivity, ReconstructionAgent
- Identifier, LanguageCode, Appellation
**Enums (6)**:
- AppellationTypeEnum
- AgentTypeEnum
- EntityTypeEnum (DEPRECATED, use LegalEntityType)
- LegalStatusEnum (DEPRECATED, use LegalStatus class)
- ReconstructionActivityTypeEnum
- SourceDocumentTypeEnum
**Slots (59+)**:
- All 59 modular slot definitions
- Including new legal entity slots: `legal_entity_type`, `registration_numbers`
---
## UML Diagram Features
### Mermaid Diagram
**Features**:
- Class diagram with all 17 classes
- Hub-Observation-Reconstruction pattern visualization
- Legal entity model highlighted (8 new classes)
- Relationship arrows with cardinality
- Inline notes for key classes
- GitHub-renderable (displays directly in markdown files)
**Sections**:
1. Hub Pattern (Custodian)
2. Observation Pattern (CustodianObservation, CustodianName)
3. Reconstruction Pattern (CustodianReconstruction)
4. Legal Entity Model (8 classes, highlighted)
5. Supporting Classes (9 classes)
### PlantUML Diagram
**Features**:
- Color-coded packages:
- 🔵 Light Blue: Hub (Custodian)
- 🟢 Light Green: Observations
- 🔴 Light Coral: Reconstructions
- 🟡 Gold: Legal Entity classes
- ⚪ Light Gray: Supporting classes
- Detailed class attributes with types
- Relationship arrows with labels
- Comprehensive notes explaining:
- Hub pattern (minimal entity)
- Observation pattern (source evidence)
- Reconstruction pattern (formal entity)
- Legal entity classes (NEW in v0.2.2)
- ISO 20275 and TOOI references
**Rendering**:
- Use PlantUML server: https://www.plantuml.com/plantuml/
- Or local PlantUML CLI: `plantuml 01_custodian_name_modular.puml`
---
## Generation Process
### Step 1: Generate OWL/Turtle
```bash
gen-owl -f ttl schemas/20251121/linkml/01_custodian_name_modular.yaml 2>/dev/null \
> schemas/20251121/rdf/01_custodian_name_modular.owl.ttl
```
**Output**: 138K Turtle file with 2,328 lines
### Step 2: Convert to Other RDF Formats
```bash
cd schemas/20251121/rdf
rdfpipe -i turtle -o nt 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.nt
rdfpipe -i turtle -o json-ld 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.jsonld
rdfpipe -i turtle -o xml 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.rdf
rdfpipe -i turtle -o n3 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.n3
rdfpipe -i turtle -o trig 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.trig
rdfpipe -i turtle -o trix 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.trix
```
**Tool**: `rdfpipe` from `rdflib` package
### Step 3: Create UML Diagrams (Manual)
LinkML's auto-generators (`gen-plantuml`, `gen-yuml`) do not support modular schemas properly. Created comprehensive diagrams manually based on schema structure.
**Mermaid**: Manually authored class diagram with all relationships
**PlantUML**: Manually authored with color-coded packages and detailed notes
### Step 4: Validate
```python
from rdflib import Graph
g = Graph()
g.parse('01_custodian_name_modular.owl.ttl', format='turtle')
# SUCCESS: 2,701 triples
```
---
## Ontology Mappings in RDF
The generated RDF includes mappings to:
### W3C/DCMI Vocabularies
- **OWL**: Class/property definitions
- **RDFS**: Labels, comments, subclass relationships
- **RDF**: Type assertions
- **DCTERMS**: Title, license, version
- **SKOS**: Definitions, notes, exact/close mappings
- **PAV**: Provenance (version, license)
- **FOAF**: Agent information
- **PROV-O**: Activity tracking
- **TIME**: Temporal expressions
### Domain Ontologies
- **W3C Org Ontology** (`org:`): Organization structure
- `org:classification` (LegalEntityType)
- `org:hasUnit` (GovernanceStructure)
- **ROV** (`rov:`): Registered organizations
- `rov:legalName` (LegalName)
- `rov:orgType` (LegalForm)
- `rov:registration` (RegistrationNumber)
- `rov:hasRegisteredOrganization` (RegistrationAuthority)
- **TOOI** (`tooi:`): Dutch government
- `tooi:rechtsvorm` (legal form)
- `tooi:organisatieIdentificatie` (registration)
- `tooi:officieleNaamInclSoort` (legal name)
- **GLEIF** (`gleif:`): Legal entity identifiers
- `gleif:hasLegalForm` (LegalForm)
- `gleif-base:hasEntityStatus` (LegalStatus)
- **Schema.org** (`schema:`): Web semantics
- `schema:status` (LegalStatus)
- `schema:identifier` (identifiers)
- `schema:legalName` (legal name)
---
## RDF Format Comparison
| Format | Human-Readable | Machine-Readable | Web-Friendly | Compression | Use Case |
|--------|----------------|------------------|--------------|-------------|----------|
| **Turtle** | ✅ Excellent | ✅ Good | 🟡 Fair | Best | Editing, documentation |
| **N-Triples** | 🟡 Fair | ✅ Excellent | 🟡 Fair | None | Streaming, line-by-line processing |
| **JSON-LD** | 🟡 Fair | ✅ Excellent | ✅ Excellent | Good | Web APIs, JavaScript |
| **RDF/XML** | ❌ Poor | ✅ Good | 🟡 Fair | Fair | Legacy systems, XML tools |
| **N3** | ✅ Excellent | ✅ Good | 🟡 Fair | Best | Advanced logic, rules |
| **TriG** | ✅ Good | ✅ Good | 🟡 Fair | Best | Named graphs, datasets |
| **TriX** | ❌ Poor | ✅ Good | 🟡 Fair | Poor | XML + named graphs |
**Recommendations**:
- **Development/Documentation**: Use Turtle (most readable)
- **Web APIs**: Use JSON-LD (web-native)
- **Bulk Processing**: Use N-Triples (line-based, streaming)
- **SPARQL Queries**: Load Turtle or TriG into triplestore
- **Legacy Integration**: Use RDF/XML if required
---
## SPARQL Query Examples
### Query 1: Find All Legal Entity Types
```sparql
PREFIX heritage: <https://nde.nl/ontology/hc/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?type ?label ?description
WHERE {
?type a heritage:LegalEntityType .
OPTIONAL { ?type rdfs:label ?label }
OPTIONAL { ?type heritage:description ?description }
}
```
### Query 2: Find All Classes with Legal Form
```sparql
PREFIX heritage: <https://nde.nl/ontology/hc/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?label
WHERE {
?class rdfs:subClassOf* heritage:CustodianReconstruction .
?class rdfs:label ?label .
FILTER EXISTS { ?class heritage:legal_form ?form }
}
```
### Query 3: List All Slots with ISO 20275 Mapping
```sparql
PREFIX heritage: <https://nde.nl/ontology/hc/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rov: <http://www.w3.org/ns/regorg#>
SELECT ?slot ?label ?mapping
WHERE {
?slot a heritage:Slot .
?slot rdfs:label ?label .
?slot skos:exactMatch|skos:closeMatch ?mapping .
FILTER (CONTAINS(STR(?mapping), "regorg"))
}
```
---
## File Locations
```
schemas/20251121/
├── linkml/
│ └── 01_custodian_name_modular.yaml # Source LinkML schema
├── rdf/
│ ├── 01_custodian_name_modular.owl.ttl # Turtle (primary)
│ ├── 01_custodian_name_modular.nt # N-Triples
│ ├── 01_custodian_name_modular.jsonld # JSON-LD
│ ├── 01_custodian_name_modular.rdf # RDF/XML
│ ├── 01_custodian_name_modular.n3 # N3
│ ├── 01_custodian_name_modular.trig # TriG
│ └── 01_custodian_name_modular.trix # TriX
└── uml/
├── mermaid/
│ └── 01_custodian_name_modular.mmd # Mermaid class diagram
└── plantuml/
└── 01_custodian_name_modular.puml # PlantUML class diagram
```
---
## Next Steps
### Immediate
1.**RDF generation** - COMPLETE
2.**UML generation** - COMPLETE
3.**Validation** - COMPLETE
4.**Load into triplestore** - TODO (optional)
5.**Render PlantUML diagram** - TODO (optional)
### Short-term
6.**Create SPARQL queries** - TODO (example queries provided above)
7.**Generate documentation** - TODO (using `gen-doc`)
8.**Create example instances** - TODO (validate against RDF schema)
### Medium-term
9.**Publish to ontology registry** - TODO (LOV, BioPortal, etc.)
10.**Create persistent URIs** - TODO (w3id.org or purl.org)
11.**Deploy SPARQL endpoint** - TODO (public query interface)
---
## Tools Used
| Tool | Version | Purpose |
|------|---------|---------|
| `gen-owl` | linkml 1.9.5 | Generate OWL from LinkML |
| `rdfpipe` | rdflib (Python) | Convert RDF formats |
| `rdflib` | Python package | Validate RDF syntax |
| Manual authoring | - | Create UML diagrams |
---
## Troubleshooting
### Issue: gen-owl warnings in output
**Problem**: `gen-owl` outputs warnings to stdout, corrupting Turtle file
**Solution**: Redirect stderr to /dev/null:
```bash
gen-owl -f ttl schema.yaml 2>/dev/null > output.ttl
```
### Issue: gen-plantuml/gen-yuml fail with modular schema
**Problem**: LinkML generators don't support modular imports properly
**Solution**: Manually author UML diagrams based on schema structure
### Issue: rdfpipe parsing errors
**Problem**: Turtle file contains non-RDF content (warnings)
**Solution**: Regenerate Turtle cleanly with stderr suppressed
---
## Version Control
**Generated from**:
- Schema: `schemas/20251121/linkml/01_custodian_name_modular.yaml`
- Version: 0.1.0 (schema version in LinkML)
- Legal Entity Model: v0.2.2 (project version)
- Generation Date: 2025-11-22
**Git Status**:
- All generated files should be committed to version control
- RDF files are derived but worth tracking (transparency)
- UML diagrams should be committed (manual authoring)
---
## References
- **LinkML Documentation**: https://linkml.io/
- **RDF 1.1 Primer**: https://www.w3.org/TR/rdf11-primer/
- **OWL 2 Primer**: https://www.w3.org/TR/owl2-primer/
- **SPARQL 1.1 Query**: https://www.w3.org/TR/sparql11-query/
- **Mermaid Docs**: https://mermaid.js.org/
- **PlantUML Docs**: https://plantuml.com/class-diagram
---
**Status**: ✅ **ALL GENERATION COMPLETE**
**Next Session**: Data instance creation and validation