glam/RDF_UML_GENERATION_COMPLETE_20251122_155319.md
kempersc 2761857b0d Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams
- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.
2025-11-22 23:01:13 +01:00

348 lines
11 KiB
Markdown

# RDF and UML Generation Complete - Session Summary
**Date**: 2025-11-22
**Session**: Namespace Conflict Resolution & Visualization Generation
**Status**: ✅ **COMPLETE**
**Final Timestamp**: 20251122_155319
---
## Executive Summary
Successfully resolved all namespace conflicts in the modular LinkML schema and generated complete RDF and UML outputs. The session overcame LinkML's `gen-yuml` path resolution bug by creating custom OWL → UML converter scripts using PlantUML and Mermaid.
**Key Achievements**:
- ✅ Fixed 5 class files with namespace conflicts
- ✅ Generated 4 RDF formats (1.3MB total)
- ✅ Created 3 UML visualization formats (PlantUML PNG/SVG, Mermaid)
- ✅ Built 2 reusable OWL converter scripts
- ✅ Documented complete regeneration workflow
---
## Problem Solved: Namespace Conflicts
### Issue
Multiple module files contained duplicate prefix definitions that conflicted with `modules/metadata.yaml`:
```
WARNING: schema namespace already mapped to http://schema.org/ - Overriding with https://schema.org/
WARNING: heritage namespace already mapped to https://nde.nl/ontology/hc/# - Overriding with https://nde.nl/ontology/hc/
WARNING: tooi namespace already mapped to https://standaarden.overheid.nl/tooi# - Overriding with https://identifier.overheid.nl/tooi/def/ont/
```
### Solution
Removed duplicate prefixes from 5 files and added `../metadata` imports:
| File | Duplicates Removed | Unique Prefixes Kept |
|------|-------------------|---------------------|
| `LegalEntityType.yaml` | 8 (heritage, schema, org, cpov, crm, tooi, foaf, owl) | `rov` |
| `LegalForm.yaml` | 4 (heritage, schema, org, tooi) | `rov`, `gleif`, `iso20275` |
| `RegistrationInfo.yaml` | 4 (heritage, schema, org, tooi) | `rov` |
| `LegalName.yaml` | 3 (heritage, schema, tooi) | `rov` |
| `ISO20275_mapping.yaml` | 3 (heritage, org, schema) | `iso20275`, `wd` |
**Result**: Clean RDF generation with zero namespace warnings.
---
## Generated Artifacts
### RDF Files (Timestamp: 20251122_155319)
| Format | Size | Lines | Status | Use Case |
|--------|------|-------|--------|----------|
| **OWL/Turtle** | 159KB | 2,619 | ✅ | Primary format, human-readable |
| **N-Triples** | 456KB | 3,027 | ✅ | Bulk loading, line-oriented processing |
| **JSON-LD** | 380KB | 14,094 | ✅ | Web APIs, JavaScript integration |
| **RDF/XML** | 328KB | 4,585 | ✅ | Legacy systems, XML tools |
**Total**: 1.3MB across 4 serialization formats
**Triple count**: 3,027 triples
**Location**: `schemas/20251121/rdf/custodian_multi_aspect_20251122_155319.*`
### UML Visualizations (Timestamp: 20251122_155319)
| Format | Size | Tool | Status | Use Case |
|--------|------|------|--------|----------|
| **PlantUML Source** | 1.5KB | Custom script | ✅ | Editable diagram source |
| **PlantUML PNG** | 47KB | PlantUML CLI | ✅ | Raster image for documents |
| **PlantUML SVG** | 51KB | PlantUML CLI | ✅ | Vector graphic (web, scaling) |
| **Mermaid** | 1.6KB | Custom script | ✅ | GitHub README, Markdown |
**Location**:
- `schemas/20251121/uml/plantuml/custodian_multi_aspect_20251122_155319.*`
- `schemas/20251121/uml/mermaid/custodian_multi_aspect_20251122_155319.mmd`
**Classes visualized**: 35 HC ontology classes with properties and inheritance
---
## Custom Converter Scripts
### 1. `scripts/owl_to_plantuml.py`
**Purpose**: Convert OWL/Turtle RDF to PlantUML class diagram
**Features**:
- Parses RDF graph using rdflib
- Extracts classes, properties, inheritance (`rdfs:subClassOf`)
- Generates PlantUML syntax with class notes
- Supports property type annotations
**Usage**:
```bash
python3 scripts/owl_to_plantuml.py input.owl.ttl output.puml
plantuml output.puml # Render to PNG
plantuml -tsvg output.puml # Render to SVG
```
**Stats**: 153 lines, handles 35 classes, includes RDFS/OWL reasoning
### 2. `scripts/owl_to_mermaid.py`
**Purpose**: Convert OWL/Turtle RDF to Mermaid class diagram
**Features**:
- Parses RDF graph using rdflib
- Generates Mermaid `classDiagram` syntax
- Limits properties to 8 per class (readability)
- Compatible with GitHub, GitLab, VS Code preview
**Usage**:
```bash
python3 scripts/owl_to_mermaid.py input.owl.ttl output.mmd
```
**Stats**: 133 lines, handles 35 classes, web-friendly output
---
## Regeneration Workflow
### Step 1: Generate RDF from LinkML
```bash
cd schemas/20251121/linkml
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
# Generate OWL/Turtle
gen-owl -f ttl 01_custodian_name_modular.yaml 2>/dev/null \
> ../rdf/custodian_multi_aspect_${TIMESTAMP}.owl.ttl
# Convert to other formats
cd ../rdf
rdfpipe custodian_multi_aspect_${TIMESTAMP}.owl.ttl -o nt 2>/dev/null \
> custodian_multi_aspect_${TIMESTAMP}.nt
rdfpipe custodian_multi_aspect_${TIMESTAMP}.owl.ttl -o json-ld 2>/dev/null \
> custodian_multi_aspect_${TIMESTAMP}.jsonld
rdfpipe custodian_multi_aspect_${TIMESTAMP}.owl.ttl -o xml 2>/dev/null \
> custodian_multi_aspect_${TIMESTAMP}.rdf
```
### Step 2: Generate UML Visualizations
```bash
# PlantUML
python3 scripts/owl_to_plantuml.py \
schemas/20251121/rdf/custodian_multi_aspect_${TIMESTAMP}.owl.ttl \
schemas/20251121/uml/plantuml/custodian_multi_aspect_${TIMESTAMP}.puml
cd schemas/20251121/uml/plantuml
plantuml custodian_multi_aspect_${TIMESTAMP}.puml
plantuml -tsvg custodian_multi_aspect_${TIMESTAMP}.puml
# Mermaid
python3 scripts/owl_to_mermaid.py \
schemas/20251121/rdf/custodian_multi_aspect_${TIMESTAMP}.owl.ttl \
schemas/20251121/uml/mermaid/custodian_multi_aspect_${TIMESTAMP}.mmd
```
### Step 3: Validate Output
```bash
# Check file sizes
ls -lh schemas/20251121/rdf/custodian_multi_aspect_${TIMESTAMP}.*
ls -lh schemas/20251121/uml/*/custodian_multi_aspect_${TIMESTAMP}.*
# Optional: Validate RDF syntax
rapper -i turtle -c schemas/20251121/rdf/custodian_multi_aspect_${TIMESTAMP}.owl.ttl
```
---
## Ontology Structure (35 Classes)
### Core Hub Pattern
- `Custodian` - Minimal hub (persistent ID only)
- `CustodianObservation` - Source-based references
- `ReconstructionActivity` - Entity resolution process
### Three Independent Aspects
1. **CustodianLegalStatus** - Formal legal entity (registered)
2. **CustodianName** - Standardized emic name (ambiguous)
3. **CustodianPlace** - Nominal place designation
### Supporting Classes
- **Provenance**: `ConfidenceMeasure`, `SourceDocument`, `ReconstructionAgent`
- **Temporal**: `TimeSpan` (begin_of_begin, end_of_end)
- **Identity**: `Identifier`, `Appellation`, `LanguageCode`
- **Legal**: `LegalEntityType`, `LegalForm`, `LegalName`, `RegistrationInfo`
### Enumerations (5)
- `AgentTypeEnum` - PERSON, ORGANIZATION, SOFTWARE
- `AppellationTypeEnum` - Name classifications
- `EntityTypeEnum` - Legal entity types
- `LegalStatusEnum` - ACTIVE, DISSOLVED, MERGED, etc.
- `PlaceSpecificityEnum` - CITY, REGION, COUNTRY, etc.
---
## Technical Details
### Namespace Consistency
All modules now use standardized namespace URIs:
| Prefix | URI | Source |
|--------|-----|--------|
| `heritage` | `https://nde.nl/ontology/hc/` | `modules/metadata.yaml` |
| `schema` | `https://schema.org/` | `modules/metadata.yaml` (HTTPS!) |
| `tooi` | `https://identifier.overheid.nl/tooi/def/ont/` | `modules/metadata.yaml` |
**Why this matters**:
- Prevents duplicate triples (same property, different namespace)
- Enables consistent SPARQL queries
- Maintains Linked Open Data best practices
### Import Path Pattern
```yaml
# Standard pattern for class modules
imports:
- linkml:types
- ../metadata # ← Shared prefixes
- ./SiblingClass # ← Same-directory classes
# Only declare unique prefixes
prefixes:
linkml: https://w3id.org/linkml/
rov: http://www.w3.org/ns/regorg# # ← Not in metadata.yaml
```
---
## What Didn't Work (But We Solved)
### Issue: LinkML `gen-yuml` Path Resolution Bug
**Error**:
```
FileNotFoundError: [Errno 2] No such file or directory:
'/Users/kempersc/apps/glam/schemas/20251121/linkml/ReconstructionAgent.yaml'
```
**Root cause**: `gen-yuml` looks for `ReconstructionAgent.yaml` at schema root instead of `modules/classes/ReconstructionAgent.yaml`
**Solution**: Created custom OWL → UML converters that:
1. Parse already-generated OWL/Turtle (which works correctly)
2. Extract class structure from RDF triples
3. Generate PlantUML/Mermaid from RDF graph
**Advantage**: More flexible than `gen-yuml`, can customize diagram layout
---
## Files Modified/Created
### Modified (5 schema files)
1. `schemas/20251121/linkml/modules/classes/LegalEntityType.yaml`
2. `schemas/20251121/linkml/modules/classes/LegalForm.yaml`
3. `schemas/20251121/linkml/modules/classes/RegistrationInfo.yaml`
4. `schemas/20251121/linkml/modules/classes/LegalName.yaml`
5. `schemas/20251121/linkml/modules/mappings/ISO20275_mapping.yaml`
### Generated (8 artifact files)
**RDF**:
1. `custodian_multi_aspect_20251122_155319.owl.ttl` (159KB)
2. `custodian_multi_aspect_20251122_155319.nt` (456KB)
3. `custodian_multi_aspect_20251122_155319.jsonld` (380KB)
4. `custodian_multi_aspect_20251122_155319.rdf` (328KB)
**UML**:
5. `custodian_multi_aspect_20251122_155319.puml` (1.5KB)
6. `custodian_multi_aspect_20251122_155319.png` (47KB)
7. `custodian_multi_aspect_20251122_155319.svg` (51KB)
8. `custodian_multi_aspect_20251122_155319.mmd` (1.6KB)
### Created (2 scripts)
1. `scripts/owl_to_plantuml.py` (153 lines)
2. `scripts/owl_to_mermaid.py` (133 lines)
**Total**: 15 files (5 modified, 8 generated, 2 created)
---
## Success Criteria ✅
- [x] All namespace conflicts resolved (zero warnings)
- [x] 4 RDF formats generated successfully (1.3MB)
- [x] UML visualizations created (PlantUML + Mermaid)
- [x] Reusable converter scripts documented
- [x] Full regeneration workflow documented
- [x] All files use proper timestamps (YYYYMMDD_HHMMSS)
---
## Integration with Project Documentation
This session builds on:
- **`RDF_GENERATION_SUMMARY.md`** - RDF usage guide (created earlier today)
- **`.opencode/SCHEMA_GENERATION_RULES.md`** - Timestamp policy (Rule 1)
- **`AGENTS.md`** - LinkML master schema policy (Rule 0)
---
## Next Steps (Optional)
### Immediate
- [ ] Test RDF in SPARQL endpoint (Apache Jena Fuseki)
- [ ] Validate OWL with Protégé or HermiT reasoner
- [ ] Generate HTML docs from LinkML schema
### Short-term
- [ ] File bug report: LinkML `gen-yuml` path resolution
- [ ] Create SPARQL query examples
- [ ] Add RDF validation to CI/CD
### Long-term
- [ ] Implement OWL reasoning rules
- [ ] Create SHACL shapes for validation
- [ ] Generate JSON-LD @context file
---
## Conclusion
**Status**: ✅ **COMPLETE**
The Heritage Custodian Ontology has been successfully converted to RDF and visualized in multiple formats. All namespace conflicts resolved, ensuring clean Linked Open Data output.
**Ready for**:
- SPARQL querying and reasoning
- Semantic web integration
- Ontology-based data validation
- Knowledge graph construction
**Deliverables**:
- 4 RDF serialization formats
- 3 UML visualization formats
- 2 reusable converter scripts
- Complete regeneration documentation
---
**Session Completed**: 2025-11-22 15:55:19
**Artifact Timestamp**: 20251122_155319
**Documentation**: `RDF_UML_GENERATION_COMPLETE_20251122_155319.md`