glam/.opencode/HYPER_MODULAR_STRUCTURE.md
2025-11-21 22:12:33 +01:00

13 KiB

Hyper-Modular Schema Structure

Version: 0.1.0
Date: 2025-11-21
Schema: Heritage Custodian Observation and Reconstruction Pattern


Overview

The Heritage Custodian schema uses a hyper-modular architecture where every class, enum, and slot is defined in its own individual file. This provides maximum granularity for version control, parallel development, and maintainability.

Total Files: 78 YAML files

  • Classes: 12 files (modules/classes/)
  • Enums: 5 files (modules/enums/)
  • Slots: 59 files (modules/slots/)
  • Core Modules: 1 file (metadata.yaml)
  • Main Schema: 1 file (01_custodian_name_modular.yaml)

Note: Aggregator files (enums_all.yaml, slots_all.yaml, classes_all.yaml) still exist but are not used by the main schema. They remain available for backward compatibility.


Directory Structure

schemas/20251121/linkml/
├── 01_custodian_name_modular.yaml    # Main schema (directly imports all 76 modules)
├── HYPER_MODULAR_STRUCTURE.md        # This file
├── SLOT_NAMING_CONVENTIONS.md        # Slot naming rules for range variants
│
└── modules/
    ├── metadata.yaml                  # Schema metadata & namespace prefixes
    │
    ├── enums/                         # 5 enum definitions (all imported directly)
    │   ├── AgentTypeEnum.yaml
    │   ├── AppellationTypeEnum.yaml
    │   ├── LegalStatusEnum.yaml
    │   ├── ReconstructionActivityTypeEnum.yaml
    │   └── SourceDocumentTypeEnum.yaml
    │
    ├── slots/                         # 59 slot definitions (all imported directly)
    │   ├── id.yaml
    │   ├── created.yaml
    │   ├── modified.yaml
    │   ├── observed_name.yaml
    │   ├── was_revision_of.yaml      # May contain multiple range variants
    │   └── ... (54 more slot files)  # (see SLOT_NAMING_CONVENTIONS.md)
    │
    ├── classes/                       # 12 class definitions (all imported directly)
    │   ├── Custodian.yaml
    │   ├── CustodianObservation.yaml
    │   ├── CustodianName.yaml
    │   ├── CustodianReconstruction.yaml
    │   ├── ReconstructionActivity.yaml
    │   ├── Agent.yaml
    │   ├── Identifier.yaml
    │   ├── Appellation.yaml
    │   ├── SourceDocument.yaml
    │   ├── ConfidenceMeasure.yaml
    │   ├── LanguageCode.yaml
    │   └── TimeSpan.yaml
    │
    └── [Legacy aggregators - not used by main schema]
        ├── enums_all.yaml             # Aggregator for backward compatibility
        ├── slots_all.yaml             # Aggregator for backward compatibility
        └── classes_all.yaml           # Aggregator for backward compatibility

Namespace Structure

All components use the https://nde.nl/ontology/hc/ base namespace:

Component Namespace Pattern Example
Base https://nde.nl/ontology/hc/ Schema root
Classes https://nde.nl/ontology/hc/class/{ClassName} https://nde.nl/ontology/hc/class/Custodian
Enums https://nde.nl/ontology/hc/enum/{EnumName} https://nde.nl/ontology/hc/enum/LegalStatusEnum
Slots https://nde.nl/ontology/hc/slot/{slot_name} https://nde.nl/ontology/hc/slot/was_revision_of
Metadata https://nde.nl/ontology/hc/metadata Metadata module

Prefixes (defined in modules/metadata.yaml):

prefixes:
  hc: https://nde.nl/ontology/hc/
  hc_class: https://nde.nl/ontology/hc/class/
  hc_enum: https://nde.nl/ontology/hc/enum/
  hc_slot: https://nde.nl/ontology/hc/slot/

Import Strategy

Direct Import Pattern

The main schema directly imports all 76 individual component files for maximum transparency and granularity:

# 01_custodian_name_modular.yaml

imports:
  - linkml:types
  - modules/metadata
  
  # Enums (5 files)
  - modules/enums/AgentTypeEnum
  - modules/enums/AppellationTypeEnum
  - modules/enums/LegalStatusEnum
  - modules/enums/ReconstructionActivityTypeEnum
  - modules/enums/SourceDocumentTypeEnum
  
  # Slots (59 files)
  - modules/slots/activity_type
  - modules/slots/affiliation
  # ... (57 more slot imports)
  
  # Classes (12 files)
  - modules/classes/Agent
  - modules/classes/Appellation
  # ... (10 more class imports)

Benefits of Direct Imports:

  • Complete Transparency: Immediately see all schema dependencies
  • Explicit Dependencies: No hidden imports through aggregators
  • Selective Imports: Easy to comment out individual components for custom schemas
  • Better IDE Support: Direct file references for navigation
  • Clear Audit Trail: Git diffs show exactly which components changed

Note: Aggregator modules (enums_all.yaml, slots_all.yaml, classes_all.yaml) still exist for backward compatibility and can be used by downstream projects that prefer a simpler import structure.


File Naming Conventions

Class Files

Pattern: {ClassName}.yaml (PascalCase)

Examples:

  • Custodian.yaml
  • CustodianObservation.yaml
  • CustodianReconstruction.yaml

File structure:

id: https://nde.nl/ontology/hc/class/ClassName
name: ClassName
title: ClassName Class

imports:
  - linkml:types
  - OtherClass  # If needed

classes:
  ClassName:
    class_uri: ontology:Class
    description: "..."
    slots:
      - slot1
      - slot2

Enum Files

Pattern: {EnumName}.yaml (PascalCase with "Enum" suffix)

Examples:

  • LegalStatusEnum.yaml
  • AgentTypeEnum.yaml

File structure:

id: https://nde.nl/ontology/hc/enum/EnumName
name: EnumName

enums:
  EnumName:
    description: "..."
    permissible_values:
      VALUE1:
        description: "..."
      VALUE2:
        description: "..."

Slot Files

Pattern: {slot_name}.yaml (snake_case)

Examples:

  • legal_name.yaml
  • was_revision_of.yaml
  • observed_name.yaml

Special Case - Range Variants: See SLOT_NAMING_CONVENTIONS.md for handling multiple slots with the same ontological property but different ranges.

File structure:

id: https://nde.nl/ontology/hc/slot/slot_name
name: slot-name-slot

imports:
  - ../classes/RangeClass  # If range is a class

slots:
  slot_name:
    slot_uri: ontology:property
    range: RangeType
    description: "..."
    
  # Optional: Range variants (same slot_uri, different range)
  slot_name-variant:
    slot_uri: ontology:property  # SAME as base
    range: DifferentRangeType
    description: "..."

Maintenance Guidelines

Adding a New Class

  1. Create file: modules/classes/{ClassName}.yaml
  2. Define class with namespace: https://nde.nl/ontology/hc/class/{ClassName}
  3. Add import to modules/classes_all.yaml
  4. Test schema generation: gen-owl 01_custodian_name_modular.yaml

Adding a New Enum

  1. Create file: modules/enums/{EnumName}.yaml
  2. Define enum with namespace: https://nde.nl/ontology/hc/enum/{EnumName}
  3. Add import to modules/enums_all.yaml
  4. Test schema generation

Adding a New Slot

  1. Check if ontologically related slot exists: Look for existing slots with same slot_uri
  2. If EXISTS: Add range variant to existing file (see SLOT_NAMING_CONVENTIONS.md)
  3. If NEW: Create file modules/slots/{slot_name}.yaml
  4. Define slot with namespace: https://nde.nl/ontology/hc/slot/{slot_name}
  5. Add import to modules/slots_all.yaml (only if new file)
  6. Test schema generation

Adding a Range Variant to Existing Slot

Example: Adding was_revision_of for Record class

  1. Open existing file: modules/slots/was_revision_of.yaml
  2. Add import for new range class:
    imports:
      - ../classes/CustodianReconstruction  # Existing
      - ../classes/Record                    # NEW
    
  3. Add new slot variant:
    slots:
      was_revision_of:
        slot_uri: prov:wasRevisionOf
        range: CustodianReconstruction
        description: "..."
    
      was_revision_of-record:  # NEW
        slot_uri: prov:wasRevisionOf
        range: Record
        description: "..."
    
  4. No change to aggregator needed (file already imported)
  5. Test schema generation

Validation and Testing

Validate Schema Structure

cd /Users/kempersc/apps/glam/schemas/20251121/linkml

# Test OWL generation (validates imports and structure)
gen-owl 01_custodian_name_modular.yaml > /dev/null

# Test JSON Schema generation
gen-json-schema 01_custodian_name_modular.yaml > /dev/null

# Test Python dataclasses generation
gen-python 01_custodian_name_modular.yaml > /dev/null

Check Namespace Consistency

# All class files should have hc/class/ namespace
grep -h "^id:" modules/classes/*.yaml | sort -u

# All enum files should have hc/enum/ namespace
grep -h "^id:" modules/enums/*.yaml | sort -u

# All slot files should have hc/slot/ namespace
grep -h "^id:" modules/slots/*.yaml | sort -u

Expected output:

# Classes
https://nde.nl/ontology/hc/class/Agent
https://nde.nl/ontology/hc/class/Appellation
...

# Enums
https://nde.nl/ontology/hc/enum/AgentTypeEnum
https://nde.nl/ontology/hc/enum/AppellationTypeEnum
...

# Slots
https://nde.nl/ontology/hc/slot/activity_type
https://nde.nl/ontology/hc/slot/affiliation
...

Benefits of Hyper-Modular Structure

1. Granular Version Control

Each component has independent git history:

git log modules/classes/Custodian.yaml
git blame modules/slots/legal_form.yaml

2. Parallel Development

Multiple developers can work simultaneously without merge conflicts:

  • Developer A edits CustodianObservation.yaml
  • Developer B edits ReconstructionActivity.yaml
  • No conflicts, both changes merge cleanly

3. Selective Imports

Can create specialized schemas importing only needed components:

# Mini schema for observations only
imports:
  - modules/metadata
  - modules/classes/CustodianObservation
  - modules/slots/observed_name
  - modules/slots/source

4. Clear Ownership

One file = one concept = one maintainer:

  • Custodian.yaml → CIDOC-CRM expert
  • LegalStatusEnum.yaml → GLEIF ontology expert
  • was_revision_of.yaml → PROV-O expert

5. Easier Code Review

Small, focused pull requests:

  • "Update schema with 5 new classes" (monolithic, 500 lines)
  • "Add TimeSpan class" (one file, 96 lines)

6. Better Documentation

Each file can have extensive inline documentation without cluttering others:

# CustodianReconstruction.yaml can have 200 lines of comments
# without making Identifier.yaml harder to read

7. IDE-Friendly

File tree navigation:

modules/classes/
  ├── Agent.yaml              ← Easy to find
  ├── Custodian.yaml          ← Alphabetically sorted
  └── CustodianObservation.yaml

vs. monolithic:

heritage_custodian.yaml:3458  ← Where is CustodianObservation?

Migration from Consolidated Structure

Phase 1: Module Consolidation (Completed)

  • Split monolithic schema into 9 modules
  • Classes grouped by function (base, observation, reconstruction, etc.)

Phase 2: Hyper-Modularization (Completed 2025-11-21)

  • Split all 12 classes into individual files
  • Split all 5 enums into individual files
  • Split all 59 slots into individual files
  • Created 3 aggregator modules
  • Updated all namespace URIs to nde.nl/ontology/hc/
  • Validated OWL generation

Legacy Files (Can Be Deleted)

These consolidated module files are now obsolete:

  • modules/base_classes.yaml → Replaced by classes/Custodian.yaml
  • modules/observation_classes.yaml → Replaced by classes/CustodianObservation.yaml, classes/CustodianName.yaml
  • modules/reconstruction_classes.yaml → Replaced by classes/CustodianReconstruction.yaml
  • modules/provenance_classes.yaml → Replaced by classes/ReconstructionActivity.yaml, classes/Agent.yaml
  • modules/supporting_classes.yaml → Replaced by 6 individual class files
  • modules/enums.yaml → Replaced by enums_all.yaml + 5 individual files
  • modules/slots.yaml → Replaced by slots_all.yaml + 59 individual files

Troubleshooting

Error: "Cannot find module X"

Cause: Import path incorrect or file missing

Solution:

  1. Check aggregator imports correct file name
  2. Verify file exists: ls modules/classes/X.yaml
  3. Check id: in file matches import path

Error: "Duplicate class definition"

Cause: Class defined in multiple files and both imported

Solution:

  1. Remove class from old consolidated module
  2. Ensure aggregator imports new individual file only

Warning: "Multiple owl types"

Cause: Range conflicts (e.g., slot used as both object property and datatype property)

Solution: Expected for polymorphic slots with any_of. Can be ignored if intentional.


References

  • Slot Naming Conventions: SLOT_NAMING_CONVENTIONS.md
  • LinkML Documentation: https://linkml.io/
  • Schema Validation: gen-owl, gen-json-schema, gen-python
  • Main Schema: 01_custodian_name_modular.yaml

Last Updated: 2025-11-21
Maintainer: GLAM Data Extraction Project