glam/.opencode/HYPER_MODULAR_STRUCTURE.md
2025-11-21 22:12:33 +01:00

465 lines
13 KiB
Markdown

# Hyper-Modular Schema Structure
**Version**: 0.1.0
**Date**: 2025-11-21
**Schema**: Heritage Custodian Observation and Reconstruction Pattern
---
## Overview
The Heritage Custodian schema uses a **hyper-modular architecture** where every class, enum, and slot is defined in its own individual file. This provides maximum granularity for version control, parallel development, and maintainability.
**Total Files**: 78 YAML files
- **Classes**: 12 files (`modules/classes/`)
- **Enums**: 5 files (`modules/enums/`)
- **Slots**: 59 files (`modules/slots/`)
- **Core Modules**: 1 file (`metadata.yaml`)
- **Main Schema**: 1 file (`01_custodian_name_modular.yaml`)
**Note**: Aggregator files (`enums_all.yaml`, `slots_all.yaml`, `classes_all.yaml`) still exist but are not used by the main schema. They remain available for backward compatibility.
---
## Directory Structure
```
schemas/20251121/linkml/
├── 01_custodian_name_modular.yaml # Main schema (directly imports all 76 modules)
├── HYPER_MODULAR_STRUCTURE.md # This file
├── SLOT_NAMING_CONVENTIONS.md # Slot naming rules for range variants
└── modules/
├── metadata.yaml # Schema metadata & namespace prefixes
├── enums/ # 5 enum definitions (all imported directly)
│ ├── AgentTypeEnum.yaml
│ ├── AppellationTypeEnum.yaml
│ ├── LegalStatusEnum.yaml
│ ├── ReconstructionActivityTypeEnum.yaml
│ └── SourceDocumentTypeEnum.yaml
├── slots/ # 59 slot definitions (all imported directly)
│ ├── id.yaml
│ ├── created.yaml
│ ├── modified.yaml
│ ├── observed_name.yaml
│ ├── was_revision_of.yaml # May contain multiple range variants
│ └── ... (54 more slot files) # (see SLOT_NAMING_CONVENTIONS.md)
├── classes/ # 12 class definitions (all imported directly)
│ ├── Custodian.yaml
│ ├── CustodianObservation.yaml
│ ├── CustodianName.yaml
│ ├── CustodianReconstruction.yaml
│ ├── ReconstructionActivity.yaml
│ ├── Agent.yaml
│ ├── Identifier.yaml
│ ├── Appellation.yaml
│ ├── SourceDocument.yaml
│ ├── ConfidenceMeasure.yaml
│ ├── LanguageCode.yaml
│ └── TimeSpan.yaml
└── [Legacy aggregators - not used by main schema]
├── enums_all.yaml # Aggregator for backward compatibility
├── slots_all.yaml # Aggregator for backward compatibility
└── classes_all.yaml # Aggregator for backward compatibility
```
---
## Namespace Structure
All components use the `https://nde.nl/ontology/hc/` base namespace:
| Component | Namespace Pattern | Example |
|-----------|-------------------|---------|
| **Base** | `https://nde.nl/ontology/hc/` | Schema root |
| **Classes** | `https://nde.nl/ontology/hc/class/{ClassName}` | `https://nde.nl/ontology/hc/class/Custodian` |
| **Enums** | `https://nde.nl/ontology/hc/enum/{EnumName}` | `https://nde.nl/ontology/hc/enum/LegalStatusEnum` |
| **Slots** | `https://nde.nl/ontology/hc/slot/{slot_name}` | `https://nde.nl/ontology/hc/slot/was_revision_of` |
| **Metadata** | `https://nde.nl/ontology/hc/metadata` | Metadata module |
**Prefixes** (defined in `modules/metadata.yaml`):
```yaml
prefixes:
hc: https://nde.nl/ontology/hc/
hc_class: https://nde.nl/ontology/hc/class/
hc_enum: https://nde.nl/ontology/hc/enum/
hc_slot: https://nde.nl/ontology/hc/slot/
```
---
## Import Strategy
### Direct Import Pattern
The main schema **directly imports all 76 individual component files** for maximum transparency and granularity:
```yaml
# 01_custodian_name_modular.yaml
imports:
- linkml:types
- modules/metadata
# Enums (5 files)
- modules/enums/AgentTypeEnum
- modules/enums/AppellationTypeEnum
- modules/enums/LegalStatusEnum
- modules/enums/ReconstructionActivityTypeEnum
- modules/enums/SourceDocumentTypeEnum
# Slots (59 files)
- modules/slots/activity_type
- modules/slots/affiliation
# ... (57 more slot imports)
# Classes (12 files)
- modules/classes/Agent
- modules/classes/Appellation
# ... (10 more class imports)
```
**Benefits of Direct Imports**:
-**Complete Transparency**: Immediately see all schema dependencies
-**Explicit Dependencies**: No hidden imports through aggregators
-**Selective Imports**: Easy to comment out individual components for custom schemas
-**Better IDE Support**: Direct file references for navigation
-**Clear Audit Trail**: Git diffs show exactly which components changed
**Note**: Aggregator modules (`enums_all.yaml`, `slots_all.yaml`, `classes_all.yaml`) still exist for backward compatibility and can be used by downstream projects that prefer a simpler import structure.
---
## File Naming Conventions
### Class Files
**Pattern**: `{ClassName}.yaml` (PascalCase)
Examples:
- `Custodian.yaml`
- `CustodianObservation.yaml`
- `CustodianReconstruction.yaml`
**File structure**:
```yaml
id: https://nde.nl/ontology/hc/class/ClassName
name: ClassName
title: ClassName Class
imports:
- linkml:types
- OtherClass # If needed
classes:
ClassName:
class_uri: ontology:Class
description: "..."
slots:
- slot1
- slot2
```
---
### Enum Files
**Pattern**: `{EnumName}.yaml` (PascalCase with "Enum" suffix)
Examples:
- `LegalStatusEnum.yaml`
- `AgentTypeEnum.yaml`
**File structure**:
```yaml
id: https://nde.nl/ontology/hc/enum/EnumName
name: EnumName
enums:
EnumName:
description: "..."
permissible_values:
VALUE1:
description: "..."
VALUE2:
description: "..."
```
---
### Slot Files
**Pattern**: `{slot_name}.yaml` (snake_case)
Examples:
- `legal_name.yaml`
- `was_revision_of.yaml`
- `observed_name.yaml`
**Special Case - Range Variants**: See `SLOT_NAMING_CONVENTIONS.md` for handling multiple slots with the same ontological property but different ranges.
**File structure**:
```yaml
id: https://nde.nl/ontology/hc/slot/slot_name
name: slot-name-slot
imports:
- ../classes/RangeClass # If range is a class
slots:
slot_name:
slot_uri: ontology:property
range: RangeType
description: "..."
# Optional: Range variants (same slot_uri, different range)
slot_name-variant:
slot_uri: ontology:property # SAME as base
range: DifferentRangeType
description: "..."
```
---
## Maintenance Guidelines
### Adding a New Class
1. Create file: `modules/classes/{ClassName}.yaml`
2. Define class with namespace: `https://nde.nl/ontology/hc/class/{ClassName}`
3. Add import to `modules/classes_all.yaml`
4. Test schema generation: `gen-owl 01_custodian_name_modular.yaml`
### Adding a New Enum
1. Create file: `modules/enums/{EnumName}.yaml`
2. Define enum with namespace: `https://nde.nl/ontology/hc/enum/{EnumName}`
3. Add import to `modules/enums_all.yaml`
4. Test schema generation
### Adding a New Slot
1. **Check if ontologically related slot exists**: Look for existing slots with same `slot_uri`
2. **If EXISTS**: Add range variant to existing file (see `SLOT_NAMING_CONVENTIONS.md`)
3. **If NEW**: Create file `modules/slots/{slot_name}.yaml`
4. Define slot with namespace: `https://nde.nl/ontology/hc/slot/{slot_name}`
5. Add import to `modules/slots_all.yaml` (only if new file)
6. Test schema generation
### Adding a Range Variant to Existing Slot
**Example**: Adding `was_revision_of` for `Record` class
1. Open existing file: `modules/slots/was_revision_of.yaml`
2. Add import for new range class:
```yaml
imports:
- ../classes/CustodianReconstruction # Existing
- ../classes/Record # NEW
```
3. Add new slot variant:
```yaml
slots:
was_revision_of:
slot_uri: prov:wasRevisionOf
range: CustodianReconstruction
description: "..."
was_revision_of-record: # NEW
slot_uri: prov:wasRevisionOf
range: Record
description: "..."
```
4. **No change to aggregator needed** (file already imported)
5. Test schema generation
---
## Validation and Testing
### Validate Schema Structure
```bash
cd /Users/kempersc/apps/glam/schemas/20251121/linkml
# Test OWL generation (validates imports and structure)
gen-owl 01_custodian_name_modular.yaml > /dev/null
# Test JSON Schema generation
gen-json-schema 01_custodian_name_modular.yaml > /dev/null
# Test Python dataclasses generation
gen-python 01_custodian_name_modular.yaml > /dev/null
```
### Check Namespace Consistency
```bash
# All class files should have hc/class/ namespace
grep -h "^id:" modules/classes/*.yaml | sort -u
# All enum files should have hc/enum/ namespace
grep -h "^id:" modules/enums/*.yaml | sort -u
# All slot files should have hc/slot/ namespace
grep -h "^id:" modules/slots/*.yaml | sort -u
```
Expected output:
```
# Classes
https://nde.nl/ontology/hc/class/Agent
https://nde.nl/ontology/hc/class/Appellation
...
# Enums
https://nde.nl/ontology/hc/enum/AgentTypeEnum
https://nde.nl/ontology/hc/enum/AppellationTypeEnum
...
# Slots
https://nde.nl/ontology/hc/slot/activity_type
https://nde.nl/ontology/hc/slot/affiliation
...
```
---
## Benefits of Hyper-Modular Structure
### 1. Granular Version Control
Each component has independent git history:
```bash
git log modules/classes/Custodian.yaml
git blame modules/slots/legal_form.yaml
```
### 2. Parallel Development
Multiple developers can work simultaneously without merge conflicts:
- Developer A edits `CustodianObservation.yaml`
- Developer B edits `ReconstructionActivity.yaml`
- No conflicts, both changes merge cleanly
### 3. Selective Imports
Can create specialized schemas importing only needed components:
```yaml
# Mini schema for observations only
imports:
- modules/metadata
- modules/classes/CustodianObservation
- modules/slots/observed_name
- modules/slots/source
```
### 4. Clear Ownership
One file = one concept = one maintainer:
- `Custodian.yaml` → CIDOC-CRM expert
- `LegalStatusEnum.yaml` → GLEIF ontology expert
- `was_revision_of.yaml` → PROV-O expert
### 5. Easier Code Review
Small, focused pull requests:
- ❌ "Update schema with 5 new classes" (monolithic, 500 lines)
- ✅ "Add TimeSpan class" (one file, 96 lines)
### 6. Better Documentation
Each file can have extensive inline documentation without cluttering others:
```yaml
# CustodianReconstruction.yaml can have 200 lines of comments
# without making Identifier.yaml harder to read
```
### 7. IDE-Friendly
File tree navigation:
```
modules/classes/
├── Agent.yaml ← Easy to find
├── Custodian.yaml ← Alphabetically sorted
└── CustodianObservation.yaml
```
vs. monolithic:
```
heritage_custodian.yaml:3458 ← Where is CustodianObservation?
```
---
## Migration from Consolidated Structure
### Phase 1: Module Consolidation (Completed)
- ✅ Split monolithic schema into 9 modules
- ✅ Classes grouped by function (base, observation, reconstruction, etc.)
### Phase 2: Hyper-Modularization (Completed 2025-11-21)
- ✅ Split all 12 classes into individual files
- ✅ Split all 5 enums into individual files
- ✅ Split all 59 slots into individual files
- ✅ Created 3 aggregator modules
- ✅ Updated all namespace URIs to `nde.nl/ontology/hc/`
- ✅ Validated OWL generation
### Legacy Files (Can Be Deleted)
These consolidated module files are now obsolete:
- `modules/base_classes.yaml` → Replaced by `classes/Custodian.yaml`
- `modules/observation_classes.yaml` → Replaced by `classes/CustodianObservation.yaml`, `classes/CustodianName.yaml`
- `modules/reconstruction_classes.yaml` → Replaced by `classes/CustodianReconstruction.yaml`
- `modules/provenance_classes.yaml` → Replaced by `classes/ReconstructionActivity.yaml`, `classes/Agent.yaml`
- `modules/supporting_classes.yaml` → Replaced by 6 individual class files
- `modules/enums.yaml` → Replaced by `enums_all.yaml` + 5 individual files
- `modules/slots.yaml` → Replaced by `slots_all.yaml` + 59 individual files
---
## Troubleshooting
### Error: "Cannot find module X"
**Cause**: Import path incorrect or file missing
**Solution**:
1. Check aggregator imports correct file name
2. Verify file exists: `ls modules/classes/X.yaml`
3. Check `id:` in file matches import path
### Error: "Duplicate class definition"
**Cause**: Class defined in multiple files and both imported
**Solution**:
1. Remove class from old consolidated module
2. Ensure aggregator imports new individual file only
### Warning: "Multiple owl types"
**Cause**: Range conflicts (e.g., slot used as both object property and datatype property)
**Solution**: Expected for polymorphic slots with `any_of`. Can be ignored if intentional.
---
## References
- **Slot Naming Conventions**: `SLOT_NAMING_CONVENTIONS.md`
- **LinkML Documentation**: https://linkml.io/
- **Schema Validation**: `gen-owl`, `gen-json-schema`, `gen-python`
- **Main Schema**: `01_custodian_name_modular.yaml`
---
**Last Updated**: 2025-11-21
**Maintainer**: GLAM Data Extraction Project