glam/SESSION_SUMMARY_20251121_SCHEMA_AUTHORITY_COMPLETE.md
2025-11-21 22:12:33 +01:00

341 lines
11 KiB
Markdown

# Session Summary: Schema Authority Documentation Complete
**Session Date**: 2025-11-21
**Session Type**: Documentation update (schema authority)
**Status**: ✅ COMPLETE
---
## Session Overview
This session completed the final documentation task from the ISO 20275 migration by establishing clear schema authority guidelines for all AI agents working with the Heritage Custodian Ontology.
### Problem Addressed
**Issue**: Without explicit documentation, agents might:
- Edit RDF files directly (which are auto-generated)
- Treat TypeDB schemas as authoritative (they're derived)
- Modify UML diagrams without updating source schemas
- Use outdated schema references from old `schemas/` directory
**Solution**: Added "Schema Source of Truth" sections to agent documentation establishing LinkML schemas as the single authoritative source.
---
## Changes Made
### 1. Updated `AGENTS.md` ✅
**Location**: Root directory (`/Users/kempersc/apps/glam/AGENTS.md`)
**Changes**:
- Added **Rule 0: LinkML Schemas Are the Single Source of Truth** (line 24)
- Positioned before existing Rule 1 (ontology consultation)
- Clearly establishes schema hierarchy: LinkML → RDF/TypeDB/UML
- Documents regeneration workflow for schema changes
- Lists forbidden practices (editing RDF directly, etc.)
**Key Content**:
```markdown
### Rule 0: LinkML Schemas Are the Single Source of Truth
**MASTER SCHEMA LOCATION**: `schemas/20251121/linkml/`
The LinkML schema files are the **authoritative, canonical definition**
of the Heritage Custodian Ontology...
```
### 2. Updated `.opencode/agent/README.md` ✅
**Location**: Agent directory (`/Users/kempersc/apps/glam/.opencode/agent/README.md`)
**Changes**:
- Added "🚨 Schema Source of Truth" section at top (lines 5-63)
- Updated schema references from old `schemas/` to `schemas/20251121/linkml/`
- Updated schema version from v0.2.0 to v0.2.1 (ISO 20275 migration)
- Updated agent schema mappings to new class names:
- `HeritageCustodian``CustodianAspect`
- `Location``PlaceAspect`
- `ChangeEvent``TemporalEvent`
- Updated validation commands with correct paths
- Added schema features list (ISO 20275 codes, multi-aspect modeling, etc.)
**Key Sections Added**:
1. **Schema Authority Declaration**:
```markdown
## 🚨 Schema Source of Truth
**MASTER SCHEMA LOCATION**: `schemas/20251121/linkml/`
```
2. **File Hierarchy**:
- Primary: `01_name_entity.yaml`, `02_organization_observation_reconstruction.yaml`
- Derived: RDF (8 formats), TypeDB (TQL), UML (Mermaid), Examples (YAML)
3. **Workflow for Changes**:
```
1. EDIT LinkML schema
2. REGENERATE RDF formats (gen-owl + rdfpipe)
3. UPDATE TypeDB schema (manual translation)
4. UPDATE UML/Mermaid diagrams
5. VALIDATE example instances (linkml-validate)
```
4. **Why LinkML is Master**:
- ✅ Formal specification (type-safe, validation, cardinality)
- ✅ Multi-format generation (RDF, JSON-LD, Python, SQL, GraphQL)
- ✅ Version control (clear diffs, semantic versioning)
- ✅ Ontology alignment (explicit class_uri mappings)
- ✅ Documentation (rich inline docs)
5. **Agent Rules**:
- **NEVER**: Edit RDF directly, treat TypeDB as authoritative, modify UML diagrams
- **ALWAYS**: Refer to LinkML, update LinkML first, validate changes, document in YAML comments
---
## Documentation Hierarchy
### Schema Authority Chain
```
LinkML YAML (authoritative)
├─ schemas/20251121/linkml/01_name_entity.yaml
└─ schemas/20251121/linkml/02_organization_observation_reconstruction.yaml
GENERATED Formats (do not edit directly)
├─ RDF/OWL (8 formats: TTL, NT, JSON-LD, RDF/XML, N3, TriG, TriX, TRIX)
├─ TypeDB (TQL schema, manual translation)
├─ UML/Mermaid (diagrams, manual visualization)
└─ Examples (YAML instances conforming to schema)
```
### Agent Instruction Documents
**Primary**:
1. **`AGENTS.md`** - Root-level agent instructions (Rule 0 added)
- Audience: All AI agents working on the project
- Scope: General extraction guidelines, ontology consultation, schema authority
2. **`.opencode/agent/README.md`** - Agent-specific documentation (updated)
- Audience: OpenCode NLP extraction subagents
- Scope: Agent invocation, schema reference, output formats
**Supporting**:
- `.opencode/agent/institution-extractor.md` - Institution extraction agent
- `.opencode/agent/location-extractor.md` - Location extraction agent
- `.opencode/agent/identifier-extractor.md` - Identifier extraction agent
- `.opencode/agent/event-extractor.md` - Event extraction agent
- `.opencode/agent/ontology-mapping-rules.md` - Ontology consultation workflow
---
## Impact on Future Work
### What This Enables
1. **Clear Hierarchy**: Agents know exactly which files are authoritative vs. derived
2. **Correct Workflow**: Schema changes follow proper edit → regenerate → validate flow
3. **Prevents Errors**: Explicit warnings against editing auto-generated files
4. **Consistent References**: All schema paths updated to `schemas/20251121/linkml/`
5. **Version Awareness**: Documentation reflects ISO 20275 migration (v0.2.1)
### What Agents Should Do Now
When working with schemas, agents must:
**DO**:
- Read LinkML YAML files for class definitions
- Update LinkML first, then regenerate RDF
- Validate changes with `linkml-validate`
- Document schema changes in YAML comments
- Reference `schemas/20251121/linkml/` paths
**DON'T**:
- Edit RDF files in `schemas/20251121/rdf/` directly
- Treat TypeDB `.tql` files as authoritative
- Modify UML diagrams without updating LinkML source
- Use old schema paths from `schemas/` directory
---
## Files Modified This Session
### Documentation Files
1. **`AGENTS.md`** (lines 24-82)
- Added Rule 0: LinkML Schemas Are the Single Source of Truth
- 58 lines of schema authority documentation
2. **`.opencode/agent/README.md`** (lines 5-81, plus scattered updates)
- Added "🚨 Schema Source of Truth" section (58 lines)
- Updated schema version v0.2.0 → v0.2.1
- Updated schema references to `schemas/20251121/linkml/`
- Updated agent class mappings to new names
- Updated validation commands
- Updated footer metadata
3. **`SESSION_SUMMARY_20251121_SCHEMA_AUTHORITY_COMPLETE.md`** (this file)
- Complete session documentation
### Total Lines Changed
- `AGENTS.md`: +58 lines
- `.opencode/agent/README.md`: +58 lines + ~20 updates
- Documentation: +300 lines
---
## Validation Checklist
### Pre-Session Status
- [x] ISO 20275 migration complete (6 tasks)
- [x] Country guides complete (5 countries)
- [x] RDF regeneration complete (8 formats, 1,427 triples)
- [x] TypeDB schema updated
- [x] Mermaid diagrams fixed
- [ ] Schema authority documented ← **THIS SESSION**
### Post-Session Status
- [x] ISO 20275 migration complete
- [x] Country guides complete
- [x] RDF regeneration complete
- [x] TypeDB schema updated
- [x] Mermaid diagrams fixed
- [x] **Schema authority documented****COMPLETE**
---
## Next Steps for Future Agents
### Immediate Tasks (If Continuing)
1. **Test Migration Script with Real Data**
- Script: `scripts/migrate_to_iso20275.py`
- Example: Rijksmuseum (Dutch KvK foundation)
- Validate: Legal form code → ISO 20275 mapping
2. **Create Instance Examples**
- Location: `schemas/20251121/examples/`
- Include: ISO 20275 legal form codes
- Cover: Multiple aspects (place, custodian, legal, collections, people)
3. **Expand Country Guides**
- Current: Netherlands, France, Germany, Belgium, Italy
- Next: Spain, Portugal, UK, Austria, Switzerland
- Format: Markdown with ISO 20275 code tables
### Optional Enhancements
4. **Validate RDF in Protégé**
- Load: `schemas/20251121/rdf/02_organization_observation_reconstruction.owl.ttl`
- Run: HermiT reasoner to check consistency
- Document: Any reasoning issues or missing constraints
5. **Add LinkML Validation Tests**
- Test: Schema compliance with LinkML metamodel
- Test: Example instances validate against schema
- Test: Generated RDF validates with ontology reasoners
6. **Create Visual Decision Trees**
- Guide: When to use each aspect (place vs. custodian vs. legal)
- Guide: ISO 20275 code selection flowchart
- Format: Mermaid diagrams
---
## Technical Context
### Schema Version Information
**Current Version**: v0.2.1
**Migration**: ISO 20275 legal form codes
**Date**: 2025-11-21
**Key Changes from v0.2.0**:
- Removed `LegalFormEnum` (replaced with ISO 20275 pattern)
- Added `legal_form_code` pattern: `^[A-Z0-9]{4}$`
- Added `OrganizationName` class (emic name standardization)
- Reorganized schemas into `schemas/20251121/` directory
### RDF Generation Statistics
**Organization Schema** (`02_organization_observation_reconstruction`):
- Triples: 1,427 (up from 1,343)
- Formats: 8 (TTL, NT, JSON-LD, RDF/XML, N3, TriG, TriX, TRIX)
- Reasoning: OWL 2 DL (description logic)
**Name Entity Schema** (`01_name_entity`):
- Triples: 463 (unchanged)
- Formats: 8 (same as organization schema)
**Total Dataset**:
- Triples: 1,890
- Classes: 25+
- Properties: 100+
---
## References
### Documentation Updated This Session
- `AGENTS.md` - Rule 0 added
- `.opencode/agent/README.md` - Schema authority section added
### Related Documentation
- `schemas/20251121/RDF_GENERATION_SUMMARY.md` - RDF generation process
- `schemas/20251121/uml/MERMAID_UPDATE_SUMMARY.md` - Diagram fixes
- `SESSION_SUMMARY_20251121_ISO20275_COMPLETE.md` - Migration completion
- `MIGRATION_CHECKLIST_ISO20275.md` - Task checklist
- `docs/MIGRATION_GUIDE.md` - Schema migration procedures
### Schema Files
- `schemas/20251121/linkml/01_name_entity.yaml` - Name Entity Hub Pattern
- `schemas/20251121/linkml/02_organization_observation_reconstruction.yaml` - Organization Pattern
### External Resources
- LinkML documentation: https://linkml.io/
- ISO 20275 standard: https://www.iso.org/standard/67445.html
- W3C OWL 2 specification: https://www.w3.org/TR/owl2-overview/
---
## Session Statistics
**Duration**: ~30 minutes
**Files Modified**: 3
**Lines Added**: ~136
**Lines Updated**: ~20
**Total Documentation**: ~400 lines
**Tasks Completed**: 1 (schema authority documentation)
**Tasks Remaining**: 0 (all ISO 20275 migration tasks complete)
---
## Conclusion
This session successfully completed the final documentation task from the ISO 20275 migration by:
1. ✅ Establishing LinkML schemas as single source of truth
2. ✅ Documenting schema hierarchy (authoritative vs. derived)
3. ✅ Providing clear workflow for schema changes
4. ✅ Updating all agent documentation with correct schema paths
5. ✅ Preventing future errors from editing auto-generated files
**The Heritage Custodian Ontology schema authority is now clearly documented and ready for future agent work.**
All agents working on this project should refer to:
- `AGENTS.md` (Rule 0) - For schema authority principles
- `.opencode/agent/README.md` - For agent-specific schema guidance
- `schemas/20251121/linkml/` - For authoritative schema definitions
---
**Status**: ✅ COMPLETE
**Next Session**: Optional enhancements (instance examples, Protégé validation, country guides)
**Handoff**: Ready for production use with clear schema authority documentation