glam/.opencode/SCHEMA_GENERATION_RULES.md
kempersc 2761857b0d Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams
- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.
2025-11-22 23:01:13 +01:00

9.7 KiB

Schema Generation Rules for AI Agents

Date: 2025-11-22
Purpose: Standard rules for generating derived artifacts from LinkML schemas


Rule 1: Always Use Full Timestamps in Generated File Names

MANDATORY: When generating derived artifacts (RDF, UML, etc.) from LinkML schemas, ALWAYS include a full timestamp (date AND time) in the filename.

Format

{base_name}_{YYYYMMDD}_{HHMMSS}.{extension}

Examples

# ✅ CORRECT - Full timestamp (date + time)
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
gen-yuml schemas/linkml/schema.yaml > schemas/uml/mermaid/schema_${TIMESTAMP}.mmd
gen-owl -f ttl schemas/linkml/schema.yaml > schemas/rdf/schema_${TIMESTAMP}.owl.ttl

# Examples of correct filenames:
custodian_multi_aspect_20251122_154136.mmd
custodian_multi_aspect_20251122_154430.owl.ttl
custodian_multi_aspect_20251122_154430.nt
custodian_multi_aspect_20251122_154430.jsonld
custodian_multi_aspect_20251122_154430.rdf

# ❌ WRONG - No timestamp
schema.mmd
01_custodian_name.owl.ttl

# ❌ WRONG - Date only (MISSING TIME!)
schema_20251122.mmd
custodian_multi_aspect_20251122.owl.ttl

# ❌ WRONG - Time only (missing date)
schema_154430.mmd

Rationale

  1. Version tracking: Full timestamps enable precise version identification
  2. No overwrites: Multiple generations on same day don't conflict
  3. Debugging: Can identify exact time when changes were made
  4. Rollback: Easy to revert to specific versions
  5. Audit trail: Documents schema evolution with chronological precision
  6. Prevents overwrites: Never lose previous versions
  7. Multiple sessions per day: Teams may generate artifacts multiple times daily
  8. Git-friendly: Easy to diff between versions
  9. Reproducibility: Can correlate generated artifacts with git commits

Critical Note

The timestamp must include BOTH date and time (YYYYMMDD_HHMMSS), not just date. This allows multiple generation runs per day without filename conflicts.


Rule 2: LinkML is the Single Source of Truth

NEVER manually create or edit derived files. Always generate from LinkML.

Correct Workflow

1. Edit LinkML schema (.yaml)
2. Generate RDF formats (gen-owl + rdfpipe)
3. Generate UML diagrams (gen-yuml)
4. Generate TypeDB schema (manual translation, but documented)
5. Validate examples (linkml-validate)

Incorrect Workflow

❌ Editing .ttl files directly
❌ Creating .jsonld manually
❌ Drawing UML diagrams by hand
❌ Modifying TypeDB schema without updating LinkML

Rule 3: Generate All RDF Serialization Formats

When generating RDF from LinkML, produce all standard serialization formats:

Required Formats

  1. OWL/Turtle (.owl.ttl) - Primary, human-readable
  2. N-Triples (.nt) - Simple, line-based
  3. JSON-LD (.jsonld) - Web-friendly, JSON-based
  4. RDF/XML (.rdf) - XML-based, traditional

Generation Commands

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BASE_NAME="schema_${TIMESTAMP}"

# 1. Generate OWL/Turtle (primary)
gen-owl -f ttl schemas/linkml/schema.yaml > schemas/rdf/${BASE_NAME}.owl.ttl

# 2. Convert to other formats using rdfpipe
rdfpipe --input-format turtle --output-format nt schemas/rdf/${BASE_NAME}.owl.ttl > schemas/rdf/${BASE_NAME}.nt
rdfpipe --input-format turtle --output-format json-ld schemas/rdf/${BASE_NAME}.owl.ttl > schemas/rdf/${BASE_NAME}.jsonld
rdfpipe --input-format turtle --output-format xml schemas/rdf/${BASE_NAME}.owl.ttl > schemas/rdf/${BASE_NAME}.rdf

Rule 4: Validate Before Committing

Before committing schema changes, ALWAYS:

  1. Validate LinkML schema:

    gen-owl -f ttl schemas/linkml/schema.yaml > /tmp/test_validation.ttl
    # Check for errors in output
    
  2. Validate example instances:

    linkml-validate -s schemas/linkml/schema.yaml schemas/examples/instance.yaml
    
  3. Check RDF triples count:

    wc -l schemas/rdf/*.nt  # N-Triples are easy to count
    
  4. Verify class presence:

    grep -c "ClassName" schemas/rdf/*.owl.ttl
    

Rule 5: Document Schema Changes

Every schema change requires:

  1. Quick status document: QUICK_STATUS_{TOPIC}_{YYYYMMDD}.md
  2. Session summary: SESSION_SUMMARY_{YYYYMMDD}_{TOPIC}.md
  3. Updated examples: Add/update instance files demonstrating changes
  4. Commit message: Reference quick status document

Template: Quick Status Document

# Quick Status: {Topic}
Date: YYYY-MM-DD  
Status: ✅ COMPLETE / ⏳ IN PROGRESS  
Priority: HIGH / MEDIUM / LOW

## What We Did
...

## Key Changes
...

## Files Modified
...

## Validation Results
...

## Next Steps
...

Rule 6: Example Instances Are Required

For every new class or major schema change:

  1. Create at least ONE complete example instance
  2. Place in schemas/{version}/examples/
  3. Use descriptive filenames: {class_name}_{use_case}_{timestamp}.yaml
  4. Include all required slots and at least 2-3 optional slots
  5. Add inline comments explaining non-obvious fields

Example Instance Template

---
# Complete Example: {ClassName}
# Date: YYYY-MM-DD
# Use Case: {Description}
# Status: Valid instance conforming to schema version {X.Y.Z}

instances:
  - id: https://example.org/id
    required_field_1: "value"
    required_field_2: "value"
    optional_field: "value"  # Explanation of when to use this field
    # ... more fields

Rule 7: UML Diagram Conventions

When generating UML diagrams:

File Naming

{schema_name}_{diagram_type}_{YYYYMMDD}_{HHMMSS}.mmd

Examples:

  • custodian_class_diagram_20251122_154136.mmd
  • prov_flow_sequence_20251122_154200.mmd

Diagram Types

  • class_diagram - Class hierarchies and relationships
  • sequence - PROV-O temporal flows
  • state - State transitions (e.g., organizational change events)
  • er - Entity-relationship (database perspective)

Storage Location

schemas/{version}/uml/mermaid/{timestamp_files}.mmd

Rule 8: TypeDB Schema Updates

TypeDB schemas are manually translated from LinkML (not auto-generated).

Required Steps

  1. Update LinkML schema first
  2. Regenerate RDF to verify OWL alignment
  3. Manually update TypeDB schema (.tql)
  4. Document translation decisions
  5. Test TypeDB queries

Translation Documentation

Create TYPEDB_TRANSLATION_NOTES.md documenting:

  • LinkML class → TypeDB entity/relation mapping
  • Slot → attribute mapping
  • Constraints and rules
  • Query examples

Rule 9: Version Control for Generated Files

What to Commit

DO commit:

  • LinkML schema files (.yaml)
  • Example instances (.yaml)
  • Documentation (.md)
  • Latest timestamped RDF (keep last 3 versions)
  • Latest timestamped UML (keep last 3 versions)

DO NOT commit:

  • Temporary validation files (/tmp/*)
  • Old versions (>3 generations old)
  • Duplicate non-timestamped files

Cleanup Script

# Keep only last 3 timestamped versions of each schema
cd schemas/rdf
ls -t schema_*.owl.ttl | tail -n +4 | xargs rm -f

Rule 10: Generation Workflow Template

Standard workflow for schema changes:

#!/bin/bash
# Schema Generation Workflow
# Usage: ./generate_schema_artifacts.sh

set -e  # Exit on error

SCHEMA_FILE="schemas/20251121/linkml/01_custodian_name_modular.yaml"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BASE_NAME="custodian_${TIMESTAMP}"

echo "=== Schema Generation Workflow ==="
echo "Timestamp: $TIMESTAMP"
echo ""

# Step 1: Validate LinkML
echo "Step 1: Validating LinkML schema..."
gen-owl -f ttl "$SCHEMA_FILE" > /tmp/validation_test.ttl 2>&1
echo "✅ Schema valid"

# Step 2: Generate RDF formats
echo "Step 2: Generating RDF formats..."
gen-owl -f ttl "$SCHEMA_FILE" > "schemas/20251121/rdf/${BASE_NAME}.owl.ttl"
rdfpipe --input-format turtle --output-format nt "schemas/20251121/rdf/${BASE_NAME}.owl.ttl" > "schemas/20251121/rdf/${BASE_NAME}.nt"
rdfpipe --input-format turtle --output-format json-ld "schemas/20251121/rdf/${BASE_NAME}.owl.ttl" > "schemas/20251121/rdf/${BASE_NAME}.jsonld"
rdfpipe --input-format turtle --output-format xml "schemas/20251121/rdf/${BASE_NAME}.owl.ttl" > "schemas/20251121/rdf/${BASE_NAME}.rdf"
echo "✅ RDF formats generated"

# Step 3: Generate UML
echo "Step 3: Generating UML diagrams..."
gen-yuml "$SCHEMA_FILE" > "schemas/20251121/uml/mermaid/${BASE_NAME}.mmd"
echo "✅ UML diagram generated"

# Step 4: Validate examples
echo "Step 4: Validating example instances..."
for example in schemas/20251121/examples/*.yaml; do
    linkml-validate -s "$SCHEMA_FILE" "$example" || echo "⚠️  Warning: $example failed validation"
done
echo "✅ Examples validated"

# Step 5: Report
echo ""
echo "=== Generation Complete ==="
ls -lh "schemas/20251121/rdf/${BASE_NAME}".* | awk '{print $9, "("$5")"}'
ls -lh "schemas/20251121/uml/mermaid/${BASE_NAME}.mmd" | awk '{print $9, "("$5")"}'
echo ""
echo "Next: Update documentation and commit"

Quick Reference Commands

Generate All Artifacts

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
gen-owl -f ttl schema.yaml > schema_${TIMESTAMP}.owl.ttl
gen-yuml schema.yaml > schema_${TIMESTAMP}.mmd

Validate

gen-owl -f ttl schema.yaml > /tmp/test.ttl  # Check for errors
linkml-validate -s schema.yaml instance.yaml

Convert RDF Formats

rdfpipe -i turtle -o nt file.ttl > file.nt
rdfpipe -i turtle -o json-ld file.ttl > file.jsonld
rdfpipe -i turtle -o xml file.ttl > file.rdf

Check RDF Content

grep -c "ClassName" file.owl.ttl  # Count class references
wc -l file.nt  # Count triples

Status: ACTIVE RULES
Version: 1.0
Last Updated: 2025-11-22
Applies To: All LinkML schema work in this project

See Also:

  • .opencode/HYPER_MODULAR_STRUCTURE.md - Module organization
  • .opencode/SLOT_NAMING_CONVENTIONS.md - Slot naming patterns
  • AGENTS.md - AI agent instructions