# Hyper-Modular Schema Structure **Version**: 0.1.0 **Date**: 2025-11-21 **Schema**: Heritage Custodian Observation and Reconstruction Pattern --- ## Overview The Heritage Custodian schema uses a **hyper-modular architecture** where every class, enum, and slot is defined in its own individual file. This provides maximum granularity for version control, parallel development, and maintainability. **Total Files**: 78 YAML files - **Classes**: 12 files (`modules/classes/`) - **Enums**: 5 files (`modules/enums/`) - **Slots**: 59 files (`modules/slots/`) - **Core Modules**: 1 file (`metadata.yaml`) - **Main Schema**: 1 file (`01_custodian_name_modular.yaml`) **Note**: Aggregator files (`enums_all.yaml`, `slots_all.yaml`, `classes_all.yaml`) still exist but are not used by the main schema. They remain available for backward compatibility. --- ## Directory Structure ``` schemas/20251121/linkml/ ├── 01_custodian_name_modular.yaml # Main schema (directly imports all 76 modules) ├── HYPER_MODULAR_STRUCTURE.md # This file ├── SLOT_NAMING_CONVENTIONS.md # Slot naming rules for range variants │ └── modules/ ├── metadata.yaml # Schema metadata & namespace prefixes │ ├── enums/ # 5 enum definitions (all imported directly) │ ├── AgentTypeEnum.yaml │ ├── AppellationTypeEnum.yaml │ ├── LegalStatusEnum.yaml │ ├── ReconstructionActivityTypeEnum.yaml │ └── SourceDocumentTypeEnum.yaml │ ├── slots/ # 59 slot definitions (all imported directly) │ ├── id.yaml │ ├── created.yaml │ ├── modified.yaml │ ├── observed_name.yaml │ ├── was_revision_of.yaml # May contain multiple range variants │ └── ... (54 more slot files) # (see SLOT_NAMING_CONVENTIONS.md) │ ├── classes/ # 12 class definitions (all imported directly) │ ├── Custodian.yaml │ ├── CustodianObservation.yaml │ ├── CustodianName.yaml │ ├── CustodianReconstruction.yaml │ ├── ReconstructionActivity.yaml │ ├── Agent.yaml │ ├── Identifier.yaml │ ├── Appellation.yaml │ ├── SourceDocument.yaml │ ├── ConfidenceMeasure.yaml │ ├── LanguageCode.yaml │ └── TimeSpan.yaml │ └── [Legacy aggregators - not used by main schema] ├── enums_all.yaml # Aggregator for backward compatibility ├── slots_all.yaml # Aggregator for backward compatibility └── classes_all.yaml # Aggregator for backward compatibility ``` --- ## Namespace Structure All components use the `https://nde.nl/ontology/hc/` base namespace: | Component | Namespace Pattern | Example | |-----------|-------------------|---------| | **Base** | `https://nde.nl/ontology/hc/` | Schema root | | **Classes** | `https://nde.nl/ontology/hc/class/{ClassName}` | `https://nde.nl/ontology/hc/class/Custodian` | | **Enums** | `https://nde.nl/ontology/hc/enum/{EnumName}` | `https://nde.nl/ontology/hc/enum/LegalStatusEnum` | | **Slots** | `https://nde.nl/ontology/hc/slot/{slot_name}` | `https://nde.nl/ontology/hc/slot/was_revision_of` | | **Metadata** | `https://nde.nl/ontology/hc/metadata` | Metadata module | **Prefixes** (defined in `modules/metadata.yaml`): ```yaml prefixes: hc: https://nde.nl/ontology/hc/ hc_class: https://nde.nl/ontology/hc/class/ hc_enum: https://nde.nl/ontology/hc/enum/ hc_slot: https://nde.nl/ontology/hc/slot/ ``` --- ## Import Strategy ### Direct Import Pattern The main schema **directly imports all 76 individual component files** for maximum transparency and granularity: ```yaml # 01_custodian_name_modular.yaml imports: - linkml:types - modules/metadata # Enums (5 files) - modules/enums/AgentTypeEnum - modules/enums/AppellationTypeEnum - modules/enums/LegalStatusEnum - modules/enums/ReconstructionActivityTypeEnum - modules/enums/SourceDocumentTypeEnum # Slots (59 files) - modules/slots/activity_type - modules/slots/affiliation # ... (57 more slot imports) # Classes (12 files) - modules/classes/Agent - modules/classes/Appellation # ... (10 more class imports) ``` **Benefits of Direct Imports**: - ✅ **Complete Transparency**: Immediately see all schema dependencies - ✅ **Explicit Dependencies**: No hidden imports through aggregators - ✅ **Selective Imports**: Easy to comment out individual components for custom schemas - ✅ **Better IDE Support**: Direct file references for navigation - ✅ **Clear Audit Trail**: Git diffs show exactly which components changed **Note**: Aggregator modules (`enums_all.yaml`, `slots_all.yaml`, `classes_all.yaml`) still exist for backward compatibility and can be used by downstream projects that prefer a simpler import structure. --- ## File Naming Conventions ### Class Files **Pattern**: `{ClassName}.yaml` (PascalCase) Examples: - `Custodian.yaml` - `CustodianObservation.yaml` - `CustodianReconstruction.yaml` **File structure**: ```yaml id: https://nde.nl/ontology/hc/class/ClassName name: ClassName title: ClassName Class imports: - linkml:types - OtherClass # If needed classes: ClassName: class_uri: ontology:Class description: "..." slots: - slot1 - slot2 ``` --- ### Enum Files **Pattern**: `{EnumName}.yaml` (PascalCase with "Enum" suffix) Examples: - `LegalStatusEnum.yaml` - `AgentTypeEnum.yaml` **File structure**: ```yaml id: https://nde.nl/ontology/hc/enum/EnumName name: EnumName enums: EnumName: description: "..." permissible_values: VALUE1: description: "..." VALUE2: description: "..." ``` --- ### Slot Files **Pattern**: `{slot_name}.yaml` (snake_case) Examples: - `legal_name.yaml` - `was_revision_of.yaml` - `observed_name.yaml` **Special Case - Range Variants**: See `SLOT_NAMING_CONVENTIONS.md` for handling multiple slots with the same ontological property but different ranges. **File structure**: ```yaml id: https://nde.nl/ontology/hc/slot/slot_name name: slot-name-slot imports: - ../classes/RangeClass # If range is a class slots: slot_name: slot_uri: ontology:property range: RangeType description: "..." # Optional: Range variants (same slot_uri, different range) slot_name-variant: slot_uri: ontology:property # SAME as base range: DifferentRangeType description: "..." ``` --- ## Maintenance Guidelines ### Adding a New Class 1. Create file: `modules/classes/{ClassName}.yaml` 2. Define class with namespace: `https://nde.nl/ontology/hc/class/{ClassName}` 3. Add import to `modules/classes_all.yaml` 4. Test schema generation: `gen-owl 01_custodian_name_modular.yaml` ### Adding a New Enum 1. Create file: `modules/enums/{EnumName}.yaml` 2. Define enum with namespace: `https://nde.nl/ontology/hc/enum/{EnumName}` 3. Add import to `modules/enums_all.yaml` 4. Test schema generation ### Adding a New Slot 1. **Check if ontologically related slot exists**: Look for existing slots with same `slot_uri` 2. **If EXISTS**: Add range variant to existing file (see `SLOT_NAMING_CONVENTIONS.md`) 3. **If NEW**: Create file `modules/slots/{slot_name}.yaml` 4. Define slot with namespace: `https://nde.nl/ontology/hc/slot/{slot_name}` 5. Add import to `modules/slots_all.yaml` (only if new file) 6. Test schema generation ### Adding a Range Variant to Existing Slot **Example**: Adding `was_revision_of` for `Record` class 1. Open existing file: `modules/slots/was_revision_of.yaml` 2. Add import for new range class: ```yaml imports: - ../classes/CustodianReconstruction # Existing - ../classes/Record # NEW ``` 3. Add new slot variant: ```yaml slots: was_revision_of: slot_uri: prov:wasRevisionOf range: CustodianReconstruction description: "..." was_revision_of-record: # NEW slot_uri: prov:wasRevisionOf range: Record description: "..." ``` 4. **No change to aggregator needed** (file already imported) 5. Test schema generation --- ## Validation and Testing ### Validate Schema Structure ```bash cd /Users/kempersc/apps/glam/schemas/20251121/linkml # Test OWL generation (validates imports and structure) gen-owl 01_custodian_name_modular.yaml > /dev/null # Test JSON Schema generation gen-json-schema 01_custodian_name_modular.yaml > /dev/null # Test Python dataclasses generation gen-python 01_custodian_name_modular.yaml > /dev/null ``` ### Check Namespace Consistency ```bash # All class files should have hc/class/ namespace grep -h "^id:" modules/classes/*.yaml | sort -u # All enum files should have hc/enum/ namespace grep -h "^id:" modules/enums/*.yaml | sort -u # All slot files should have hc/slot/ namespace grep -h "^id:" modules/slots/*.yaml | sort -u ``` Expected output: ``` # Classes https://nde.nl/ontology/hc/class/Agent https://nde.nl/ontology/hc/class/Appellation ... # Enums https://nde.nl/ontology/hc/enum/AgentTypeEnum https://nde.nl/ontology/hc/enum/AppellationTypeEnum ... # Slots https://nde.nl/ontology/hc/slot/activity_type https://nde.nl/ontology/hc/slot/affiliation ... ``` --- ## Benefits of Hyper-Modular Structure ### 1. Granular Version Control Each component has independent git history: ```bash git log modules/classes/Custodian.yaml git blame modules/slots/legal_form.yaml ``` ### 2. Parallel Development Multiple developers can work simultaneously without merge conflicts: - Developer A edits `CustodianObservation.yaml` - Developer B edits `ReconstructionActivity.yaml` - No conflicts, both changes merge cleanly ### 3. Selective Imports Can create specialized schemas importing only needed components: ```yaml # Mini schema for observations only imports: - modules/metadata - modules/classes/CustodianObservation - modules/slots/observed_name - modules/slots/source ``` ### 4. Clear Ownership One file = one concept = one maintainer: - `Custodian.yaml` → CIDOC-CRM expert - `LegalStatusEnum.yaml` → GLEIF ontology expert - `was_revision_of.yaml` → PROV-O expert ### 5. Easier Code Review Small, focused pull requests: - ❌ "Update schema with 5 new classes" (monolithic, 500 lines) - ✅ "Add TimeSpan class" (one file, 96 lines) ### 6. Better Documentation Each file can have extensive inline documentation without cluttering others: ```yaml # CustodianReconstruction.yaml can have 200 lines of comments # without making Identifier.yaml harder to read ``` ### 7. IDE-Friendly File tree navigation: ``` modules/classes/ ├── Agent.yaml ← Easy to find ├── Custodian.yaml ← Alphabetically sorted └── CustodianObservation.yaml ``` vs. monolithic: ``` heritage_custodian.yaml:3458 ← Where is CustodianObservation? ``` --- ## Migration from Consolidated Structure ### Phase 1: Module Consolidation (Completed) - ✅ Split monolithic schema into 9 modules - ✅ Classes grouped by function (base, observation, reconstruction, etc.) ### Phase 2: Hyper-Modularization (Completed 2025-11-21) - ✅ Split all 12 classes into individual files - ✅ Split all 5 enums into individual files - ✅ Split all 59 slots into individual files - ✅ Created 3 aggregator modules - ✅ Updated all namespace URIs to `nde.nl/ontology/hc/` - ✅ Validated OWL generation ### Legacy Files (Can Be Deleted) These consolidated module files are now obsolete: - `modules/base_classes.yaml` → Replaced by `classes/Custodian.yaml` - `modules/observation_classes.yaml` → Replaced by `classes/CustodianObservation.yaml`, `classes/CustodianName.yaml` - `modules/reconstruction_classes.yaml` → Replaced by `classes/CustodianReconstruction.yaml` - `modules/provenance_classes.yaml` → Replaced by `classes/ReconstructionActivity.yaml`, `classes/Agent.yaml` - `modules/supporting_classes.yaml` → Replaced by 6 individual class files - `modules/enums.yaml` → Replaced by `enums_all.yaml` + 5 individual files - `modules/slots.yaml` → Replaced by `slots_all.yaml` + 59 individual files --- ## Troubleshooting ### Error: "Cannot find module X" **Cause**: Import path incorrect or file missing **Solution**: 1. Check aggregator imports correct file name 2. Verify file exists: `ls modules/classes/X.yaml` 3. Check `id:` in file matches import path ### Error: "Duplicate class definition" **Cause**: Class defined in multiple files and both imported **Solution**: 1. Remove class from old consolidated module 2. Ensure aggregator imports new individual file only ### Warning: "Multiple owl types" **Cause**: Range conflicts (e.g., slot used as both object property and datatype property) **Solution**: Expected for polymorphic slots with `any_of`. Can be ignored if intentional. --- ## References - **Slot Naming Conventions**: `SLOT_NAMING_CONVENTIONS.md` - **LinkML Documentation**: https://linkml.io/ - **Schema Validation**: `gen-owl`, `gen-json-schema`, `gen-python` - **Main Schema**: `01_custodian_name_modular.yaml` --- **Last Updated**: 2025-11-21 **Maintainer**: GLAM Data Extraction Project