# RDF and UML Generation Complete **Date**: 2025-11-22 **Schema Version**: 20251121 **Status**: ✅ **COMPLETE** --- ## Summary Successfully generated all RDF serializations and UML diagrams for the Heritage Custodian Ontology with the new legal entity model (v0.2.2). --- ## Generated Files ### RDF Formats (7 serializations) All generated from: `schemas/20251121/linkml/01_custodian_name_modular.yaml` | Format | File | Size | Lines | Triples | Description | |--------|------|------|-------|---------|-------------| | **Turtle** | `01_custodian_name_modular.owl.ttl` | 140K | 2,328 | 2,701 | Primary OWL ontology (human-readable) | | **N-Triples** | `01_custodian_name_modular.nt` | 452K | 2,701 | 2,701 | Line-based triple format (machine-readable) | | **JSON-LD** | `01_custodian_name_modular.jsonld` | 336K | 7,451 | 2,701 | JSON Linked Data (web-friendly) | | **RDF/XML** | `01_custodian_name_modular.rdf` | 324K | 10,810 | 2,701 | XML serialization (legacy compatibility) | | **N3** | `01_custodian_name_modular.n3` | 196K | 5,144 | 2,701 | Notation3 (Turtle superset) | | **TriG** | `01_custodian_name_modular.trig` | 196K | 5,144 | 2,701 | Named graphs extension | | **TriX** | `01_custodian_name_modular.trix` | 644K | 21,377 | 2,701 | XML with named graphs | **Total RDF Size**: ~2.3 MB **Total RDF Lines**: 40,955 lines ### UML Diagrams (2 formats) | Format | File | Size | Description | |--------|------|------|-------------| | **Mermaid** | `uml/mermaid/01_custodian_name_modular.mmd` | 6.0K | Markdown-based class diagram (GitHub-friendly) | | **PlantUML** | `uml/plantuml/01_custodian_name_modular.puml` | 7.5K | UML class diagram with color-coded packages | --- ## Validation Results ### RDF Validation ✅ Using `rdflib` Python library: ``` ✅ Turtle validation: SUCCESS Triples: 2,701 Subjects: 652 Predicates: 36 Objects: 1,325 ``` **Key Statistics**: - **2,701 triples** - All class/slot/enum definitions and mappings - **652 unique subjects** - Classes, slots, enums, and their components - **36 unique predicates** - RDF/RDFS/OWL properties - **1,325 unique objects** - Property values and types ### Ontology Coverage The generated RDF includes: **Classes (17)**: - Custodian (hub) - CustodianObservation, CustodianName (observation pattern) - CustodianReconstruction (reconstruction pattern) - **LegalEntityType** (NEW) - **LegalForm** (NEW) - **LegalName** (NEW) - **RegistrationNumber** (NEW, within RegistrationInfo) - **RegistrationAuthority** (NEW, within RegistrationInfo) - **GovernanceStructure** (NEW, within RegistrationInfo) - **LegalStatus** (NEW, within RegistrationInfo) - SourceDocument, TimeSpan, ConfidenceMeasure - ReconstructionActivity, ReconstructionAgent - Identifier, LanguageCode, Appellation **Enums (6)**: - AppellationTypeEnum - AgentTypeEnum - EntityTypeEnum (DEPRECATED, use LegalEntityType) - LegalStatusEnum (DEPRECATED, use LegalStatus class) - ReconstructionActivityTypeEnum - SourceDocumentTypeEnum **Slots (59+)**: - All 59 modular slot definitions - Including new legal entity slots: `legal_entity_type`, `registration_numbers` --- ## UML Diagram Features ### Mermaid Diagram **Features**: - Class diagram with all 17 classes - Hub-Observation-Reconstruction pattern visualization - Legal entity model highlighted (8 new classes) - Relationship arrows with cardinality - Inline notes for key classes - GitHub-renderable (displays directly in markdown files) **Sections**: 1. Hub Pattern (Custodian) 2. Observation Pattern (CustodianObservation, CustodianName) 3. Reconstruction Pattern (CustodianReconstruction) 4. Legal Entity Model (8 classes, highlighted) 5. Supporting Classes (9 classes) ### PlantUML Diagram **Features**: - Color-coded packages: - 🔵 Light Blue: Hub (Custodian) - 🟢 Light Green: Observations - 🔴 Light Coral: Reconstructions - 🟡 Gold: Legal Entity classes - ⚪ Light Gray: Supporting classes - Detailed class attributes with types - Relationship arrows with labels - Comprehensive notes explaining: - Hub pattern (minimal entity) - Observation pattern (source evidence) - Reconstruction pattern (formal entity) - Legal entity classes (NEW in v0.2.2) - ISO 20275 and TOOI references **Rendering**: - Use PlantUML server: https://www.plantuml.com/plantuml/ - Or local PlantUML CLI: `plantuml 01_custodian_name_modular.puml` --- ## Generation Process ### Step 1: Generate OWL/Turtle ```bash gen-owl -f ttl schemas/20251121/linkml/01_custodian_name_modular.yaml 2>/dev/null \ > schemas/20251121/rdf/01_custodian_name_modular.owl.ttl ``` **Output**: 138K Turtle file with 2,328 lines ### Step 2: Convert to Other RDF Formats ```bash cd schemas/20251121/rdf rdfpipe -i turtle -o nt 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.nt rdfpipe -i turtle -o json-ld 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.jsonld rdfpipe -i turtle -o xml 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.rdf rdfpipe -i turtle -o n3 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.n3 rdfpipe -i turtle -o trig 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.trig rdfpipe -i turtle -o trix 01_custodian_name_modular.owl.ttl > 01_custodian_name_modular.trix ``` **Tool**: `rdfpipe` from `rdflib` package ### Step 3: Create UML Diagrams (Manual) LinkML's auto-generators (`gen-plantuml`, `gen-yuml`) do not support modular schemas properly. Created comprehensive diagrams manually based on schema structure. **Mermaid**: Manually authored class diagram with all relationships **PlantUML**: Manually authored with color-coded packages and detailed notes ### Step 4: Validate ```python from rdflib import Graph g = Graph() g.parse('01_custodian_name_modular.owl.ttl', format='turtle') # SUCCESS: 2,701 triples ``` --- ## Ontology Mappings in RDF The generated RDF includes mappings to: ### W3C/DCMI Vocabularies - **OWL**: Class/property definitions - **RDFS**: Labels, comments, subclass relationships - **RDF**: Type assertions - **DCTERMS**: Title, license, version - **SKOS**: Definitions, notes, exact/close mappings - **PAV**: Provenance (version, license) - **FOAF**: Agent information - **PROV-O**: Activity tracking - **TIME**: Temporal expressions ### Domain Ontologies - **W3C Org Ontology** (`org:`): Organization structure - `org:classification` (LegalEntityType) - `org:hasUnit` (GovernanceStructure) - **ROV** (`rov:`): Registered organizations - `rov:legalName` (LegalName) - `rov:orgType` (LegalForm) - `rov:registration` (RegistrationNumber) - `rov:hasRegisteredOrganization` (RegistrationAuthority) - **TOOI** (`tooi:`): Dutch government - `tooi:rechtsvorm` (legal form) - `tooi:organisatieIdentificatie` (registration) - `tooi:officieleNaamInclSoort` (legal name) - **GLEIF** (`gleif:`): Legal entity identifiers - `gleif:hasLegalForm` (LegalForm) - `gleif-base:hasEntityStatus` (LegalStatus) - **Schema.org** (`schema:`): Web semantics - `schema:status` (LegalStatus) - `schema:identifier` (identifiers) - `schema:legalName` (legal name) --- ## RDF Format Comparison | Format | Human-Readable | Machine-Readable | Web-Friendly | Compression | Use Case | |--------|----------------|------------------|--------------|-------------|----------| | **Turtle** | ✅ Excellent | ✅ Good | 🟡 Fair | Best | Editing, documentation | | **N-Triples** | 🟡 Fair | ✅ Excellent | 🟡 Fair | None | Streaming, line-by-line processing | | **JSON-LD** | 🟡 Fair | ✅ Excellent | ✅ Excellent | Good | Web APIs, JavaScript | | **RDF/XML** | ❌ Poor | ✅ Good | 🟡 Fair | Fair | Legacy systems, XML tools | | **N3** | ✅ Excellent | ✅ Good | 🟡 Fair | Best | Advanced logic, rules | | **TriG** | ✅ Good | ✅ Good | 🟡 Fair | Best | Named graphs, datasets | | **TriX** | ❌ Poor | ✅ Good | 🟡 Fair | Poor | XML + named graphs | **Recommendations**: - **Development/Documentation**: Use Turtle (most readable) - **Web APIs**: Use JSON-LD (web-native) - **Bulk Processing**: Use N-Triples (line-based, streaming) - **SPARQL Queries**: Load Turtle or TriG into triplestore - **Legacy Integration**: Use RDF/XML if required --- ## SPARQL Query Examples ### Query 1: Find All Legal Entity Types ```sparql PREFIX heritage: PREFIX rdfs: SELECT ?type ?label ?description WHERE { ?type a heritage:LegalEntityType . OPTIONAL { ?type rdfs:label ?label } OPTIONAL { ?type heritage:description ?description } } ``` ### Query 2: Find All Classes with Legal Form ```sparql PREFIX heritage: PREFIX rdfs: SELECT ?class ?label WHERE { ?class rdfs:subClassOf* heritage:CustodianReconstruction . ?class rdfs:label ?label . FILTER EXISTS { ?class heritage:legal_form ?form } } ``` ### Query 3: List All Slots with ISO 20275 Mapping ```sparql PREFIX heritage: PREFIX skos: PREFIX rov: SELECT ?slot ?label ?mapping WHERE { ?slot a heritage:Slot . ?slot rdfs:label ?label . ?slot skos:exactMatch|skos:closeMatch ?mapping . FILTER (CONTAINS(STR(?mapping), "regorg")) } ``` --- ## File Locations ``` schemas/20251121/ ├── linkml/ │ └── 01_custodian_name_modular.yaml # Source LinkML schema │ ├── rdf/ │ ├── 01_custodian_name_modular.owl.ttl # Turtle (primary) │ ├── 01_custodian_name_modular.nt # N-Triples │ ├── 01_custodian_name_modular.jsonld # JSON-LD │ ├── 01_custodian_name_modular.rdf # RDF/XML │ ├── 01_custodian_name_modular.n3 # N3 │ ├── 01_custodian_name_modular.trig # TriG │ └── 01_custodian_name_modular.trix # TriX │ └── uml/ ├── mermaid/ │ └── 01_custodian_name_modular.mmd # Mermaid class diagram └── plantuml/ └── 01_custodian_name_modular.puml # PlantUML class diagram ``` --- ## Next Steps ### Immediate 1. ✅ **RDF generation** - COMPLETE 2. ✅ **UML generation** - COMPLETE 3. ✅ **Validation** - COMPLETE 4. ⏳ **Load into triplestore** - TODO (optional) 5. ⏳ **Render PlantUML diagram** - TODO (optional) ### Short-term 6. ⏳ **Create SPARQL queries** - TODO (example queries provided above) 7. ⏳ **Generate documentation** - TODO (using `gen-doc`) 8. ⏳ **Create example instances** - TODO (validate against RDF schema) ### Medium-term 9. ⏳ **Publish to ontology registry** - TODO (LOV, BioPortal, etc.) 10. ⏳ **Create persistent URIs** - TODO (w3id.org or purl.org) 11. ⏳ **Deploy SPARQL endpoint** - TODO (public query interface) --- ## Tools Used | Tool | Version | Purpose | |------|---------|---------| | `gen-owl` | linkml 1.9.5 | Generate OWL from LinkML | | `rdfpipe` | rdflib (Python) | Convert RDF formats | | `rdflib` | Python package | Validate RDF syntax | | Manual authoring | - | Create UML diagrams | --- ## Troubleshooting ### Issue: gen-owl warnings in output **Problem**: `gen-owl` outputs warnings to stdout, corrupting Turtle file **Solution**: Redirect stderr to /dev/null: ```bash gen-owl -f ttl schema.yaml 2>/dev/null > output.ttl ``` ### Issue: gen-plantuml/gen-yuml fail with modular schema **Problem**: LinkML generators don't support modular imports properly **Solution**: Manually author UML diagrams based on schema structure ### Issue: rdfpipe parsing errors **Problem**: Turtle file contains non-RDF content (warnings) **Solution**: Regenerate Turtle cleanly with stderr suppressed --- ## Version Control **Generated from**: - Schema: `schemas/20251121/linkml/01_custodian_name_modular.yaml` - Version: 0.1.0 (schema version in LinkML) - Legal Entity Model: v0.2.2 (project version) - Generation Date: 2025-11-22 **Git Status**: - All generated files should be committed to version control - RDF files are derived but worth tracking (transparency) - UML diagrams should be committed (manual authoring) --- ## References - **LinkML Documentation**: https://linkml.io/ - **RDF 1.1 Primer**: https://www.w3.org/TR/rdf11-primer/ - **OWL 2 Primer**: https://www.w3.org/TR/owl2-primer/ - **SPARQL 1.1 Query**: https://www.w3.org/TR/sparql11-query/ - **Mermaid Docs**: https://mermaid.js.org/ - **PlantUML Docs**: https://plantuml.com/class-diagram --- **Status**: ✅ **ALL GENERATION COMPLETE** **Next Session**: Data instance creation and validation