glam/RDF_GENERATION_SUMMARY.md
kempersc 2761857b0d Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams
- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.
2025-11-22 23:01:13 +01:00

8.8 KiB

RDF Generation Summary: Multi-Aspect Custodian Schema

Date: 2025-11-22
Status: ALL FORMATS GENERATED SUCCESSFULLY

Overview

Successfully generated all 4 RDF serialization formats from the multi-aspect custodian schema (01_custodian_name_modular.yaml).

Generation Process

Key Issue Resolved: gen-owl Warning Interference

Problem: gen-owl writes WARNING messages to stdout, which contaminated the TTL output and broke downstream rdfpipe parsing.

Solution: Redirect stderr when generating RDF formats:

# Correct command pattern
gen-owl -f ttl schema.yaml 2>/dev/null > output.owl.ttl
rdfpipe -i turtle -o format output.owl.ttl > output.format 2>/dev/null

Additional Fixes Required

  1. Import Path Corrections (3 files):

    • modules/classes/CustodianPlace.yaml - Changed - Custodian- ./Custodian
    • modules/classes/CustodianName.yaml - Changed - Custodian- ./Custodian
    • modules/classes/CustodianLegalStatus.yaml - Changed - Custodian- ./Custodian

    Reason: LinkML requires relative paths for module imports

  2. Timestamp Implementation:

    • All generated files now include timestamps per .opencode/SCHEMA_GENERATION_RULES.md
    • Format: {base_name}_{YYYYMMDD}_{HHMMSS}.{extension}

Generated Files

Timestamp: 20251122_154430

Format Filename Size Lines Description
OWL/Turtle custodian_multi_aspect_20251122_154430.owl.ttl 159KB 2,619 Primary RDF format (human-readable)
N-Triples custodian_multi_aspect_20251122_154430.nt 456KB 3,027 Triple-per-line format (machine-optimized)
JSON-LD custodian_multi_aspect_20251122_154430.jsonld 380KB 14,094 JSON-LD (web-friendly)
RDF/XML custodian_multi_aspect_20251122_154430.rdf 328KB 4,585 XML serialization (legacy compatibility)

Total: 1.3MB across 4 formats

Validation Results

Schema Validation

  • All imports resolved correctly
  • No critical LinkML errors
  • All classes defined (CustodianLegalStatus, CustodianName, CustodianPlace)
  • All enums defined (PlaceSpecificityEnum, etc.)
  • All 61 slots defined

Content Verification

OWL/Turtle (custodian_multi_aspect_20251122_154430.owl.ttl):

  • Valid Turtle syntax
  • Ontology metadata present (title, version, license)
  • All class definitions present
  • All property definitions present
  • SKOS documentation included

N-Triples (custodian_multi_aspect_20251122_154430.nt):

  • Valid N-Triples syntax
  • 3,027 triples generated
  • All statements expanded

JSON-LD (custodian_multi_aspect_20251122_154430.jsonld):

  • Valid JSON-LD syntax
  • @context included
  • 14,094 lines (expanded representation)
  • All class/property URIs resolved

RDF/XML (custodian_multi_aspect_20251122_154430.rdf):

  • Valid XML syntax
  • Namespace declarations present
  • 4,585 lines
  • rdf:Description elements properly formed

Reference Classes Verified

Confirmed presence in RDF output:

  • CustodianLegalStatus (34 references) - Formal legal entities
  • CustodianName (existing) - Emic labels
  • CustodianPlace (15 references) - Nominal place designations
  • PlaceSpecificityEnum (21 references) - Place specificity levels
  • Custodian (hub) - Central entity aggregating aspects
  • CustodianObservation - Source evidence
  • ReconstructionActivity - PROV-O activity linking observations to aspects

Ontology Alignments Verified

Class Mappings

Class Ontology Mapping Status
CustodianLegalStatus org:FormalOrganization Present
CustodianName skos:Concept Present
CustodianPlace crm:E53_Place Present
Custodian prov:Entity Present
ReconstructionActivity prov:Activity Present

Property Mappings

Slot Ontology Mapping Status
legal_entity_type org:classification Present
legal_form tooi:organisatievorm Present
place_name crm:P87_is_identified_by Present
preferred_label skos:prefLabel Present
name_authority prov:wasAttributedTo Present

Known Issues

Non-Critical Warnings (Suppressed with 2>/dev/null)

  1. Namespace Conflicts:

    • schema.org mapped to both http:// and https:// versions
    • Does not affect RDF validity
    • Consider consolidating to HTTPS in future version
  2. Import Resolution:

    • Fixed: Bare class names in imports (e.g., - Custodian) now use relative paths (- ./Custodian)
    • All imports now resolve correctly

Usage Examples

Loading into RDF Database

Apache Jena (Fuseki):

# Load Turtle format (most efficient)
tdbloader --loc=/path/to/tdb custodian_multi_aspect_20251122_154430.owl.ttl

RDF4J:

# Upload any format via HTTP
curl -X POST -H "Content-Type: text/turtle" \
  --data-binary @custodian_multi_aspect_20251122_154430.owl.ttl \
  http://localhost:8080/rdf4j-server/repositories/heritage

SPARQL Query Example

PREFIX hc: <https://nde.nl/ontology/hc/>
PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>
PREFIX org: <http://www.w3.org/ns/org#>

# Find all custodians with legal status and place designations
SELECT ?custodian ?legalName ?placeName WHERE {
  ?custodian a hc:Custodian .
  ?custodian hc:legal_status ?legal .
  ?legal hc:legal_name ?legalName .
  ?custodian hc:place_designation ?place .
  ?place hc:place_name ?placeName .
}

Python RDFLib Usage

from rdflib import Graph

# Load any format
g = Graph()
g.parse("custodian_multi_aspect_20251122_154430.owl.ttl", format="turtle")

# Or JSON-LD
g.parse("custodian_multi_aspect_20251122_154430.jsonld", format="json-ld")

# Query
results = g.query("""
    SELECT ?class WHERE {
        ?class a owl:Class .
    }
""")

for row in results:
    print(row.class)

Generation Commands (For Reference)

# Set timestamp
TIMESTAMP=$(date +%Y%m%d_%H%M%S)

# Generate OWL/Turtle (primary format)
gen-owl -f ttl schemas/20251121/linkml/01_custodian_name_modular.yaml \
    2>/dev/null > "schemas/20251121/rdf/custodian_multi_aspect_${TIMESTAMP}.owl.ttl"

# Convert to N-Triples
rdfpipe -i turtle -o nt "schemas/20251121/rdf/custodian_multi_aspect_${TIMESTAMP}.owl.ttl" \
    2>/dev/null > "schemas/20251121/rdf/custodian_multi_aspect_${TIMESTAMP}.nt"

# Convert to JSON-LD
rdfpipe -i turtle -o json-ld "schemas/20251121/rdf/custodian_multi_aspect_${TIMESTAMP}.owl.ttl" \
    2>/dev/null > "schemas/20251121/rdf/custodian_multi_aspect_${TIMESTAMP}.jsonld"

# Convert to RDF/XML
rdfpipe -i turtle -o xml "schemas/20251121/rdf/custodian_multi_aspect_${TIMESTAMP}.owl.ttl" \
    2>/dev/null > "schemas/20251121/rdf/custodian_multi_aspect_${TIMESTAMP}.rdf"

Next Steps

Immediate

  • All RDF formats generated successfully
  • All formats validated
  • Documentation updated

Short-term

  • Resolve schema.org namespace conflict (http vs https)
  • Generate additional serialization formats:
    • N3 (Notation3)
    • TriG (named graphs)
    • TriX (XML with named graphs)
  • Create RDF validation test suite

Medium-term

  • Load into triple store (Apache Jena / RDF4J)
  • Create example SPARQL queries
  • Generate schema documentation from RDF (LODE, pyLODE)
  • Publish schema to W3C namespace (if applicable)

File Locations

Schema Source:

  • schemas/20251121/linkml/01_custodian_name_modular.yaml

Generated RDF:

  • schemas/20251121/rdf/custodian_multi_aspect_20251122_154430.owl.ttl
  • schemas/20251121/rdf/custodian_multi_aspect_20251122_154430.nt
  • schemas/20251121/rdf/custodian_multi_aspect_20251122_154430.jsonld
  • schemas/20251121/rdf/custodian_multi_aspect_20251122_154430.rdf

Example Instance:

  • schemas/20251121/examples/multi_aspect_rijksmuseum_complete.yaml

Documentation:

  • QUICK_STATUS_CUSTODIAN_SCHEMA_MOD_20251122.md
  • CUSTODIAN_MULTI_ASPECT_REFACTORING.md
  • SESSION_SUMMARY_20251122_CUSTODIAN_MULTI_ASPECT.md
  • .opencode/SCHEMA_GENERATION_RULES.md

Verification Checklist

  • OWL/Turtle validates (2,619 lines)
  • N-Triples validates (3,027 lines)
  • JSON-LD validates (14,094 lines)
  • RDF/XML validates (4,585 lines)
  • All classes present in RDF
  • All properties present in RDF
  • All enums present in RDF
  • Ontology alignments verified
  • Timestamps applied to all files
  • Documentation updated
  • Generation commands documented

Status: COMPLETE - ALL FORMATS GENERATED SUCCESSFULLY
Timestamp: 2025-11-22 15:44:30
Total Files: 4 RDF formats (1.3MB total)
Schema Version: v0.1.0
Multi-Aspect Architecture: Fully implemented and validated