glam/.opencode/SCHEMA_GENERATION_RULES.md
kempersc 2761857b0d Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams
- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.
2025-11-22 23:01:13 +01:00

360 lines
9.7 KiB
Markdown

# Schema Generation Rules for AI Agents
**Date**: 2025-11-22
**Purpose**: Standard rules for generating derived artifacts from LinkML schemas
---
## Rule 1: Always Use Full Timestamps in Generated File Names
**MANDATORY**: When generating derived artifacts (RDF, UML, etc.) from LinkML schemas, **ALWAYS** include a full timestamp (date AND time) in the filename.
### Format
```
{base_name}_{YYYYMMDD}_{HHMMSS}.{extension}
```
### Examples
```bash
# ✅ CORRECT - Full timestamp (date + time)
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
gen-yuml schemas/linkml/schema.yaml > schemas/uml/mermaid/schema_${TIMESTAMP}.mmd
gen-owl -f ttl schemas/linkml/schema.yaml > schemas/rdf/schema_${TIMESTAMP}.owl.ttl
# Examples of correct filenames:
custodian_multi_aspect_20251122_154136.mmd
custodian_multi_aspect_20251122_154430.owl.ttl
custodian_multi_aspect_20251122_154430.nt
custodian_multi_aspect_20251122_154430.jsonld
custodian_multi_aspect_20251122_154430.rdf
# ❌ WRONG - No timestamp
schema.mmd
01_custodian_name.owl.ttl
# ❌ WRONG - Date only (MISSING TIME!)
schema_20251122.mmd
custodian_multi_aspect_20251122.owl.ttl
# ❌ WRONG - Time only (missing date)
schema_154430.mmd
```
### Rationale
1. **Version tracking**: Full timestamps enable precise version identification
2. **No overwrites**: Multiple generations on same day don't conflict
3. **Debugging**: Can identify exact time when changes were made
4. **Rollback**: Easy to revert to specific versions
5. **Audit trail**: Documents schema evolution with chronological precision
6. **Prevents overwrites**: Never lose previous versions
7. **Multiple sessions per day**: Teams may generate artifacts multiple times daily
8. **Git-friendly**: Easy to diff between versions
9. **Reproducibility**: Can correlate generated artifacts with git commits
### Critical Note
The timestamp must include BOTH date and time (YYYYMMDD_HHMMSS), not just date. This allows multiple generation runs per day without filename conflicts.
---
## Rule 2: LinkML is the Single Source of Truth
**NEVER** manually create or edit derived files. Always generate from LinkML.
### Correct Workflow ✅
```
1. Edit LinkML schema (.yaml)
2. Generate RDF formats (gen-owl + rdfpipe)
3. Generate UML diagrams (gen-yuml)
4. Generate TypeDB schema (manual translation, but documented)
5. Validate examples (linkml-validate)
```
### Incorrect Workflow ❌
```
❌ Editing .ttl files directly
❌ Creating .jsonld manually
❌ Drawing UML diagrams by hand
❌ Modifying TypeDB schema without updating LinkML
```
---
## Rule 3: Generate All RDF Serialization Formats
When generating RDF from LinkML, produce all standard serialization formats:
### Required Formats
1. **OWL/Turtle** (.owl.ttl) - Primary, human-readable
2. **N-Triples** (.nt) - Simple, line-based
3. **JSON-LD** (.jsonld) - Web-friendly, JSON-based
4. **RDF/XML** (.rdf) - XML-based, traditional
### Generation Commands
```bash
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BASE_NAME="schema_${TIMESTAMP}"
# 1. Generate OWL/Turtle (primary)
gen-owl -f ttl schemas/linkml/schema.yaml > schemas/rdf/${BASE_NAME}.owl.ttl
# 2. Convert to other formats using rdfpipe
rdfpipe --input-format turtle --output-format nt schemas/rdf/${BASE_NAME}.owl.ttl > schemas/rdf/${BASE_NAME}.nt
rdfpipe --input-format turtle --output-format json-ld schemas/rdf/${BASE_NAME}.owl.ttl > schemas/rdf/${BASE_NAME}.jsonld
rdfpipe --input-format turtle --output-format xml schemas/rdf/${BASE_NAME}.owl.ttl > schemas/rdf/${BASE_NAME}.rdf
```
---
## Rule 4: Validate Before Committing
Before committing schema changes, **ALWAYS**:
1. **Validate LinkML schema**:
```bash
gen-owl -f ttl schemas/linkml/schema.yaml > /tmp/test_validation.ttl
# Check for errors in output
```
2. **Validate example instances**:
```bash
linkml-validate -s schemas/linkml/schema.yaml schemas/examples/instance.yaml
```
3. **Check RDF triples count**:
```bash
wc -l schemas/rdf/*.nt # N-Triples are easy to count
```
4. **Verify class presence**:
```bash
grep -c "ClassName" schemas/rdf/*.owl.ttl
```
---
## Rule 5: Document Schema Changes
Every schema change requires:
1. **Quick status document**: `QUICK_STATUS_{TOPIC}_{YYYYMMDD}.md`
2. **Session summary**: `SESSION_SUMMARY_{YYYYMMDD}_{TOPIC}.md`
3. **Updated examples**: Add/update instance files demonstrating changes
4. **Commit message**: Reference quick status document
### Template: Quick Status Document
```markdown
# Quick Status: {Topic}
Date: YYYY-MM-DD
Status: ✅ COMPLETE / ⏳ IN PROGRESS
Priority: HIGH / MEDIUM / LOW
## What We Did
...
## Key Changes
...
## Files Modified
...
## Validation Results
...
## Next Steps
...
```
---
## Rule 6: Example Instances Are Required
For every new class or major schema change:
1. Create at least ONE complete example instance
2. Place in `schemas/{version}/examples/`
3. Use descriptive filenames: `{class_name}_{use_case}_{timestamp}.yaml`
4. Include all required slots and at least 2-3 optional slots
5. Add inline comments explaining non-obvious fields
### Example Instance Template
```yaml
---
# Complete Example: {ClassName}
# Date: YYYY-MM-DD
# Use Case: {Description}
# Status: Valid instance conforming to schema version {X.Y.Z}
instances:
- id: https://example.org/id
required_field_1: "value"
required_field_2: "value"
optional_field: "value" # Explanation of when to use this field
# ... more fields
```
---
## Rule 7: UML Diagram Conventions
When generating UML diagrams:
### File Naming
```
{schema_name}_{diagram_type}_{YYYYMMDD}_{HHMMSS}.mmd
```
Examples:
- `custodian_class_diagram_20251122_154136.mmd`
- `prov_flow_sequence_20251122_154200.mmd`
### Diagram Types
- `class_diagram` - Class hierarchies and relationships
- `sequence` - PROV-O temporal flows
- `state` - State transitions (e.g., organizational change events)
- `er` - Entity-relationship (database perspective)
### Storage Location
```
schemas/{version}/uml/mermaid/{timestamp_files}.mmd
```
---
## Rule 8: TypeDB Schema Updates
TypeDB schemas are **manually translated** from LinkML (not auto-generated).
### Required Steps
1. Update LinkML schema first
2. Regenerate RDF to verify OWL alignment
3. Manually update TypeDB schema (.tql)
4. Document translation decisions
5. Test TypeDB queries
### Translation Documentation
Create `TYPEDB_TRANSLATION_NOTES.md` documenting:
- LinkML class → TypeDB entity/relation mapping
- Slot → attribute mapping
- Constraints and rules
- Query examples
---
## Rule 9: Version Control for Generated Files
### What to Commit
**DO commit**:
- LinkML schema files (.yaml)
- Example instances (.yaml)
- Documentation (.md)
- Latest timestamped RDF (keep last 3 versions)
- Latest timestamped UML (keep last 3 versions)
**DO NOT commit**:
- Temporary validation files (/tmp/*)
- Old versions (>3 generations old)
- Duplicate non-timestamped files
### Cleanup Script
```bash
# Keep only last 3 timestamped versions of each schema
cd schemas/rdf
ls -t schema_*.owl.ttl | tail -n +4 | xargs rm -f
```
---
## Rule 10: Generation Workflow Template
Standard workflow for schema changes:
```bash
#!/bin/bash
# Schema Generation Workflow
# Usage: ./generate_schema_artifacts.sh
set -e # Exit on error
SCHEMA_FILE="schemas/20251121/linkml/01_custodian_name_modular.yaml"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BASE_NAME="custodian_${TIMESTAMP}"
echo "=== Schema Generation Workflow ==="
echo "Timestamp: $TIMESTAMP"
echo ""
# Step 1: Validate LinkML
echo "Step 1: Validating LinkML schema..."
gen-owl -f ttl "$SCHEMA_FILE" > /tmp/validation_test.ttl 2>&1
echo "✅ Schema valid"
# Step 2: Generate RDF formats
echo "Step 2: Generating RDF formats..."
gen-owl -f ttl "$SCHEMA_FILE" > "schemas/20251121/rdf/${BASE_NAME}.owl.ttl"
rdfpipe --input-format turtle --output-format nt "schemas/20251121/rdf/${BASE_NAME}.owl.ttl" > "schemas/20251121/rdf/${BASE_NAME}.nt"
rdfpipe --input-format turtle --output-format json-ld "schemas/20251121/rdf/${BASE_NAME}.owl.ttl" > "schemas/20251121/rdf/${BASE_NAME}.jsonld"
rdfpipe --input-format turtle --output-format xml "schemas/20251121/rdf/${BASE_NAME}.owl.ttl" > "schemas/20251121/rdf/${BASE_NAME}.rdf"
echo "✅ RDF formats generated"
# Step 3: Generate UML
echo "Step 3: Generating UML diagrams..."
gen-yuml "$SCHEMA_FILE" > "schemas/20251121/uml/mermaid/${BASE_NAME}.mmd"
echo "✅ UML diagram generated"
# Step 4: Validate examples
echo "Step 4: Validating example instances..."
for example in schemas/20251121/examples/*.yaml; do
linkml-validate -s "$SCHEMA_FILE" "$example" || echo "⚠️ Warning: $example failed validation"
done
echo "✅ Examples validated"
# Step 5: Report
echo ""
echo "=== Generation Complete ==="
ls -lh "schemas/20251121/rdf/${BASE_NAME}".* | awk '{print $9, "("$5")"}'
ls -lh "schemas/20251121/uml/mermaid/${BASE_NAME}.mmd" | awk '{print $9, "("$5")"}'
echo ""
echo "Next: Update documentation and commit"
```
---
## Quick Reference Commands
### Generate All Artifacts
```bash
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
gen-owl -f ttl schema.yaml > schema_${TIMESTAMP}.owl.ttl
gen-yuml schema.yaml > schema_${TIMESTAMP}.mmd
```
### Validate
```bash
gen-owl -f ttl schema.yaml > /tmp/test.ttl # Check for errors
linkml-validate -s schema.yaml instance.yaml
```
### Convert RDF Formats
```bash
rdfpipe -i turtle -o nt file.ttl > file.nt
rdfpipe -i turtle -o json-ld file.ttl > file.jsonld
rdfpipe -i turtle -o xml file.ttl > file.rdf
```
### Check RDF Content
```bash
grep -c "ClassName" file.owl.ttl # Count class references
wc -l file.nt # Count triples
```
---
**Status**: ✅ ACTIVE RULES
**Version**: 1.0
**Last Updated**: 2025-11-22
**Applies To**: All LinkML schema work in this project
**See Also**:
- `.opencode/HYPER_MODULAR_STRUCTURE.md` - Module organization
- `.opencode/SLOT_NAMING_CONVENTIONS.md` - Slot naming patterns
- `AGENTS.md` - AI agent instructions