- Introduced custodian_hub_v3.mmd, custodian_hub_v4_final.mmd, and custodian_hub_v5_FINAL.mmd for Mermaid representation. - Created custodian_hub_FINAL.puml and custodian_hub_v3.puml for PlantUML representation. - Defined entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument. - Established relationships and associations between entities, including temporal extents, observations, and reconstruction activities. - Incorporated enumerations for various types, statuses, and classifications relevant to custodians and their activities.
621 lines
20 KiB
Markdown
621 lines
20 KiB
Markdown
# Schema Modularization Architecture
|
|
|
|
**Version**: 0.2.1
|
|
**Date**: 2025-11-09
|
|
**Status**: ✅ Complete
|
|
|
|
## Overview
|
|
|
|
The Heritage Custodian schema has been modularized from a single 1,102-line YAML file into 7 focused, reusable modules. This architecture enables:
|
|
|
|
- **Independent usage**: Use only the modules you need
|
|
- **Clear separation of concerns**: Core entities vs. provenance vs. collections vs. bibliographic
|
|
- **Easier maintenance**: Changes isolated to specific modules
|
|
- **Better comprehension**: Each module is self-contained and focused
|
|
- **Future extensibility**: Easy to add country-specific or domain-specific modules
|
|
- **Standards integration**: SPAR Ontologies (FaBiO, CiTO, PRO, PSO, DoCO) for scholarly publications
|
|
|
|
## Module Structure
|
|
|
|
```
|
|
schemas/
|
|
├── heritage_custodian.yaml # Main schema (74 lines, imports all modules)
|
|
├── enums.yaml # Enumerations (institution types, data tiers, etc.)
|
|
├── core.yaml # Core organizational classes
|
|
├── provenance.yaml # Provenance tracking, GHCID history, change events
|
|
├── collections.yaml # Collections, digital platforms, partnerships
|
|
├── dutch.yaml # Dutch-specific extensions
|
|
└── bibliographic.yaml # NEW: SPAR Ontologies for scholarly publications
|
|
```
|
|
|
|
## Module Details
|
|
|
|
### 1. `enums.yaml` (8,671 bytes)
|
|
|
|
**Purpose**: Define all enumeration types used across the schema.
|
|
|
|
**Contents**:
|
|
- `InstitutionTypeEnum` - 13 heritage institution types (GALLERY, LIBRARY, ARCHIVE, MUSEUM, etc.)
|
|
- `OrganizationStatusEnum` - Operational status (ACTIVE, INACTIVE, MERGED, etc.)
|
|
- `DataSourceEnum` - Data provenance sources (ISIL_REGISTRY, WIKIDATA, CONVERSATION_NLP, etc.)
|
|
- `DataTierEnum` - Data quality tiers (TIER_1_AUTHORITATIVE → TIER_4_INFERRED)
|
|
- `MetadataStandardEnum` - Standards used (DUBLIN_CORE, MARC21, EAD, RIC_O, etc.)
|
|
- `DigitalPlatformTypeEnum` - Platform types (COLLECTION_MANAGEMENT, DIGITAL_REPOSITORY, etc.)
|
|
- `ChangeTypeEnum` - Organizational changes (FOUNDING, CLOSURE, MERGER, RELOCATION, etc.)
|
|
|
|
**Usage**:
|
|
```yaml
|
|
imports:
|
|
- enums
|
|
|
|
classes:
|
|
MyClass:
|
|
slots:
|
|
- institution_type # Uses InstitutionTypeEnum
|
|
```
|
|
|
|
### 2. `core.yaml` (13,810 bytes)
|
|
|
|
**Purpose**: Core organizational classes and identification.
|
|
|
|
**Contents**:
|
|
- `HeritageCustodian` - Main heritage institution class
|
|
- GHCID tracking (numeric, current, original, history)
|
|
- Organizational metadata (name, type, status, description)
|
|
- Temporal fields (founded_date, closed_date, prov_generated_at, prov_invalidated_at)
|
|
- Hierarchies (parent_organization, sub_organizations)
|
|
- TOOI naming (official_name, sorting_name, abbreviation)
|
|
- `Location` - Physical/virtual locations
|
|
- Address fields (street, city, postal_code, region, country)
|
|
- Geocoding (latitude, longitude, geonames_id)
|
|
- Primary location flag
|
|
- `ContactInfo` - Contact information (email, phone, fax)
|
|
- `Identifier` - External identifiers (ISIL, VIAF, Wikidata, KvK)
|
|
- `OrganizationalUnit` - Departments and divisions within institutions
|
|
|
|
**Ontology Mappings**:
|
|
- `HeritageCustodian` → `org:Organization` + `prov:Entity`
|
|
- `Location` → `schema:Place`
|
|
- `ContactInfo` → `cpov:ContactPoint` + `schema:ContactPoint`
|
|
- `Identifier` → `dcterms:identifier`
|
|
- `OrganizationalUnit` → `org:OrganizationalUnit`
|
|
|
|
**Usage**:
|
|
```yaml
|
|
imports:
|
|
- core
|
|
|
|
# Now you can use HeritageCustodian, Location, etc.
|
|
```
|
|
|
|
### 3. `provenance.yaml` (6,715 bytes)
|
|
|
|
**Purpose**: Data quality tracking, GHCID history, and organizational change events.
|
|
|
|
**Contents**:
|
|
- `Provenance` - Data provenance metadata
|
|
- Source and tier (data_source, data_tier)
|
|
- Extraction metadata (extraction_date, extraction_method, confidence_score)
|
|
- Verification (verified_date, verified_by)
|
|
- Source references (conversation_id, source_url)
|
|
- `GHCIDHistoryEntry` - Historical GHCID tracking
|
|
- GHCID values (ghcid, ghcid_numeric)
|
|
- Validity period (valid_from, valid_to)
|
|
- Context (institution_name, location_city, location_country, reason)
|
|
- `ChangeEvent` - Organizational lifecycle events
|
|
- Event metadata (event_id, change_type, event_date, event_description)
|
|
- Affected entities (affected_organization, resulting_organization, related_organizations)
|
|
- Documentation (source_documentation)
|
|
|
|
**Ontology Mappings**:
|
|
- `ChangeEvent` → `prov:Activity` + `tooi:Wijzigingsgebeurtenis`
|
|
|
|
**Usage**:
|
|
```yaml
|
|
imports:
|
|
- provenance
|
|
|
|
classes:
|
|
MyClass:
|
|
slots:
|
|
- provenance # Uses Provenance class
|
|
- change_history # List of ChangeEvent
|
|
```
|
|
|
|
### 4. `collections.yaml` (4,834 bytes)
|
|
|
|
**Purpose**: Collections, digital platforms, and partnerships.
|
|
|
|
**Contents**:
|
|
- `Collection` - Heritage collections
|
|
- Metadata (collection_id, collection_name, collection_description, collection_type)
|
|
- Content (item_count, subjects, time_period_start, time_period_end)
|
|
- Access (access_rights, catalog_url)
|
|
- **NEW**: Custodian link (custodian slot → implements `rico:hasOrHadHolder`)
|
|
- `DigitalPlatform` - Digital systems and platforms
|
|
- Platform info (platform_name, platform_type, platform_url)
|
|
- Technical (vendor, implemented_standards)
|
|
- `Partnership` - Partnerships and network memberships
|
|
- Partner info (partner_name, partner_id, partnership_type)
|
|
- Temporal (start_date, end_date)
|
|
|
|
**Ontology Mappings**:
|
|
- `Collection` → `schema:Collection` + `rico:RecordSet`
|
|
- `DigitalPlatform` → `schema:SoftwareApplication`
|
|
- `Partnership` → `org:Membership`
|
|
|
|
**Key Feature**:
|
|
The `Collection.custodian` slot implements RiC-O's `rico:hasOrHadHolder` pattern, creating bidirectional relationships between collections and their custodial organizations.
|
|
|
|
**Usage**:
|
|
```yaml
|
|
imports:
|
|
- collections
|
|
|
|
classes:
|
|
MyClass:
|
|
slots:
|
|
- collections # List of Collection objects
|
|
- digital_platforms
|
|
- partnerships
|
|
```
|
|
|
|
### 5. `dutch.yaml` (4,931 bytes)
|
|
|
|
**Purpose**: Dutch-specific extensions for heritage institutions.
|
|
|
|
**Contents**:
|
|
- `DutchHeritageCustodian` - Extends HeritageCustodian
|
|
- Dutch identifiers (kvk_number, gemeente_code)
|
|
- Regional info (provincie)
|
|
- Networks (samenwerkingsverband)
|
|
- Aggregation platforms (in_museum_register, in_rijkscollectie, in_collectie_nederland, in_archieven_nl)
|
|
|
|
**Ontology Mappings**:
|
|
- `DutchHeritageCustodian` → `HeritageCustodian` + `tooi:Overheidsorganisatie`
|
|
|
|
**Usage**:
|
|
```yaml
|
|
imports:
|
|
- dutch
|
|
|
|
# Now you can use DutchHeritageCustodian
|
|
```
|
|
|
|
### 6. `bibliographic.yaml` (949 lines)
|
|
|
|
**Purpose**: SPAR Ontologies integration for scholarly publications about heritage institutions.
|
|
|
|
**Contents**:
|
|
- **9 Classes**:
|
|
- `ScholarlyWork` - FRBR Work level (abstract intellectual creation)
|
|
- `Publication` - FRBR Expression level (concrete realization with authors, dates)
|
|
- `Journal` - Academic journals with ISSN, impact factors, CiteScore
|
|
- `Conference` - Conference proceedings and papers
|
|
- `Person` - Authors, editors, reviewers with ORCID support
|
|
- `Organization` - Publishers, affiliations with ROR IDs
|
|
- `Citation` - Citation relationships between publications (CiTO)
|
|
- `DocumentSection` - Document structure (abstract, methods, results, discussion)
|
|
- `PublishingRole` - Author/editor/reviewer roles (PRO)
|
|
|
|
- **5 Enumerations**:
|
|
- `PublicationTypeEnum` - 13 types (journal article, conference paper, book, thesis, preprint, etc.)
|
|
- `CitationTypeEnum` - 14 CiTO types (cites, citesAsAuthority, supports, refutes, critiques, etc.)
|
|
- `OpenAccessStatusEnum` - 6 types (fully open, green OA, hybrid, closed, embargoed, bronze)
|
|
- `PublishingStatusEnum` - 8 PSO states (submitted, under-review, accepted, published, retracted, etc.)
|
|
- `DocumentSectionTypeEnum` - 9 DoCO types (abstract, introduction, methods, results, discussion, etc.)
|
|
- `PublishingRoleTypeEnum` - 9 PRO roles (author, editor, reviewer, translator, corresponding-author, etc.)
|
|
|
|
**SPAR Ontologies Integrated**:
|
|
- **FaBiO** (FRBR-aligned Bibliographic Ontology) - Core bibliographic classes
|
|
- **CiTO** (Citation Typing Ontology) - Citation relationships
|
|
- **BiRO** (Bibliographic Reference Ontology) - Citation references
|
|
- **DoCO** (Document Components Ontology) - Document sections
|
|
- **PRO** (Publishing Roles Ontology) - Author/editor/reviewer roles
|
|
- **PSO** (Publishing Status Ontology) - Publishing workflow
|
|
- **DataCite** - DOI, ORCID identifiers
|
|
|
|
**Validation Patterns**:
|
|
```yaml
|
|
doi: "^10\\.\\d{4,9}/[-._;()/:A-Za-z0-9]+$" # DOI format
|
|
issn: "^\\d{4}-\\d{3}[\\dX]$" # ISSN format
|
|
isbn: "^(97[89])?[0-9]{9}[0-9X]$" # ISBN-10/13
|
|
orcid: "^\\d{4}-\\d{4}-\\d{4}-\\d{3}[0-9X]$" # ORCID format
|
|
ror_id: "^https://ror\\.org/0[a-z0-9]{6}[0-9]{2}$" # ROR ID
|
|
```
|
|
|
|
**Ontology Mappings**:
|
|
- `ScholarlyWork` → `fabio:Work` + `frbr:Work`
|
|
- `Publication` → `fabio:Expression` + `frbr:Expression`
|
|
- `Journal` → `fabio:Journal` + `bibo:Journal`
|
|
- `Conference` → `fabio:AcademicProceedings` + `bibo:Conference`
|
|
- `Person` → `foaf:Person`
|
|
- `Organization` → `schema:Organization`
|
|
- `Citation` → `cito:Citation` + `biro:BibliographicReference`
|
|
- `DocumentSection` → `doco:Section`
|
|
- `PublishingRole` → `pro:PublishingRole`
|
|
|
|
**Integration with HeritageCustodian**:
|
|
```yaml
|
|
# core.yaml now includes:
|
|
HeritageCustodian:
|
|
slots:
|
|
- publications # Links to Publication class
|
|
# Tracks scholarly works about/by this institution
|
|
```
|
|
|
|
**Usage**:
|
|
```yaml
|
|
imports:
|
|
- bibliographic
|
|
|
|
# Example: Publication with citations
|
|
- publication_id: https://doi.org/10.1007/s00799-023-00123-4
|
|
title: "Linked Open Data for Museum Collections: The Rijksmuseum Case"
|
|
publication_type: JOURNAL_ARTICLE
|
|
authors:
|
|
- person_name: "John Smith"
|
|
orcid: "0000-0002-1234-5678"
|
|
affiliation:
|
|
organization_name: "University of Amsterdam"
|
|
published_in:
|
|
journal_title: "International Journal on Digital Libraries"
|
|
issn: "1432-5012"
|
|
impact_factor: 2.5
|
|
doi: "10.1007/s00799-023-00123-4"
|
|
open_access_status: FULLY_OPEN_ACCESS
|
|
publishing_status: PUBLISHED
|
|
citations:
|
|
- citing_work: https://doi.org/10.1007/s00799-023-00123-4
|
|
cited_work: https://doi.org/10.1145/2970000.2970111
|
|
citation_type: CITES_AS_AUTHORITY
|
|
citation_context: "Following the approach of Smith et al. [42]..."
|
|
```
|
|
|
|
**FRBR Work/Expression Pattern**:
|
|
```yaml
|
|
# Abstract work
|
|
- work_id: https://w3id.org/heritage/work/glam-linked-data
|
|
work_title: "Linked Open Data for Heritage Institutions"
|
|
discipline:
|
|
- Computer Science
|
|
- Library and Information Science
|
|
has_expression:
|
|
# English journal article
|
|
- publication_id: https://doi.org/10.1007/s00799-023-00123-4
|
|
title: "Linked Open Data for Museum Collections"
|
|
language: en
|
|
publication_year: 2023
|
|
|
|
# Dutch book chapter
|
|
- publication_id: https://isbn.org/9789012345678
|
|
title: "Gekoppelde Open Data voor Musea"
|
|
language: nl
|
|
publication_year: 2024
|
|
```
|
|
|
|
**Citation Graph Example (CiTO)**:
|
|
```yaml
|
|
# Paper A cites Paper B as authoritative source
|
|
- citation_id: https://w3id.org/heritage/citation/001
|
|
citing_work: https://doi.org/10.1007/paper-a
|
|
cited_work: https://doi.org/10.1145/paper-b
|
|
citation_type: CITES_AS_AUTHORITY
|
|
citation_intent: "Foundational methodology for LOD extraction"
|
|
citation_context: "Following the approach of Jones et al. [15]..."
|
|
page_number: "p. 7"
|
|
|
|
# Paper C refutes Paper D
|
|
- citation_id: https://w3id.org/heritage/citation/002
|
|
citing_work: https://doi.org/10.1016/paper-c
|
|
cited_work: https://doi.org/10.1108/paper-d
|
|
citation_type: REFUTES
|
|
citation_intent: "Challenge assumptions about metadata quality"
|
|
citation_context: "Contrary to Smith's findings [23], we observe..."
|
|
```
|
|
|
|
**Document Structure (DoCO)**:
|
|
```yaml
|
|
- publication_id: https://doi.org/10.1007/example
|
|
title: "Semantic Technologies for Heritage Institutions"
|
|
document_sections:
|
|
- section_id: https://doi.org/10.1007/example#abstract
|
|
section_type: ABSTRACT
|
|
section_content: "This paper presents..."
|
|
section_order: 1
|
|
|
|
- section_id: https://doi.org/10.1007/example#introduction
|
|
section_type: INTRODUCTION
|
|
section_content: "Heritage institutions face challenges..."
|
|
section_order: 2
|
|
|
|
- section_id: https://doi.org/10.1007/example#methods
|
|
section_type: METHODS
|
|
section_content: "We developed a LinkML schema..."
|
|
section_order: 3
|
|
|
|
- section_id: https://doi.org/10.1007/example#results
|
|
section_type: RESULTS
|
|
section_content: "Our approach successfully extracted..."
|
|
section_order: 4
|
|
```
|
|
|
|
**Publishing Workflow (PSO)**:
|
|
```yaml
|
|
# Track publication status over time
|
|
- publication_id: https://doi.org/10.1007/example
|
|
title: "Heritage Institution Metadata Analysis"
|
|
publishing_status: PUBLISHED # Current status
|
|
|
|
# Status history (not in schema, but tracked externally):
|
|
# 2024-01-15: SUBMITTED
|
|
# 2024-02-01: UNDER_REVIEW
|
|
# 2024-04-10: ACCEPTED
|
|
# 2024-05-20: IN_PRESS
|
|
# 2024-06-01: PUBLISHED
|
|
```
|
|
|
|
## Main Schema (`heritage_custodian.yaml`)
|
|
|
|
The main schema file now serves as an aggregator, importing all 7 modules:
|
|
|
|
```yaml
|
|
id: https://w3id.org/heritage/custodian
|
|
name: heritage-custodian
|
|
title: Heritage Custodian Schema
|
|
description: >-
|
|
Comprehensive LinkML schema for modeling heritage institutions...
|
|
|
|
version: 0.2.1
|
|
|
|
# Prefixes (all ontologies used across modules)
|
|
prefixes:
|
|
heritage: https://w3id.org/heritage/custodian/
|
|
org: http://www.w3.org/ns/org#
|
|
prov: http://www.w3.org/ns/prov#
|
|
rico: https://www.ica.org/standards/RiC/ontology#
|
|
fabio: http://purl.org/spar/fabio/
|
|
cito: http://purl.org/spar/cito/
|
|
datacite: http://purl.org/spar/datacite/
|
|
# ... (30+ prefixes)
|
|
|
|
# Import all modules
|
|
imports:
|
|
- linkml:types
|
|
- enums
|
|
- core
|
|
- provenance
|
|
- collections
|
|
- dutch
|
|
- bibliographic # NEW: SPAR Ontologies for scholarly publications
|
|
```
|
|
|
|
## Using Modules
|
|
|
|
### Full Schema (All Modules)
|
|
|
|
```python
|
|
from linkml_runtime import SchemaView
|
|
|
|
sv = SchemaView('schemas/heritage_custodian.yaml')
|
|
# Access all 21 classes, 155+ slots, 12 enums
|
|
# Includes bibliographic module for scholarly publications
|
|
```
|
|
|
|
### Individual Modules
|
|
|
|
```python
|
|
# Just enumerations
|
|
sv = SchemaView('schemas/enums.yaml')
|
|
|
|
# Just core classes
|
|
sv = SchemaView('schemas/core.yaml')
|
|
|
|
# Just provenance tracking
|
|
sv = SchemaView('schemas/provenance.yaml')
|
|
|
|
# Just bibliographic (SPAR Ontologies)
|
|
sv = SchemaView('schemas/bibliographic.yaml')
|
|
```
|
|
|
|
### Selective Imports
|
|
|
|
Create a custom schema importing only what you need:
|
|
|
|
```yaml
|
|
id: https://example.org/my-schema
|
|
imports:
|
|
- linkml:types
|
|
- enums
|
|
- core
|
|
# Omit provenance, collections, dutch, bibliographic if not needed
|
|
|
|
classes:
|
|
MyCustomClass:
|
|
is_a: HeritageCustodian
|
|
# Use core classes without Dutch-specific fields
|
|
```
|
|
|
|
### Domain-Specific Module Example (Bibliographic)
|
|
|
|
For scholarly publication tracking:
|
|
|
|
```yaml
|
|
id: https://example.org/publication-tracker
|
|
imports:
|
|
- linkml:types
|
|
- bibliographic # Just SPAR Ontologies classes
|
|
|
|
# Track publications about heritage institutions
|
|
# without importing full organizational metadata
|
|
classes:
|
|
SimplifiedPublication:
|
|
is_a: Publication
|
|
slots:
|
|
- title
|
|
- doi
|
|
- authors
|
|
- citations
|
|
```
|
|
|
|
## Benefits
|
|
|
|
### 1. **Independent Usage**
|
|
Use only the modules relevant to your use case. For example:
|
|
- Global dataset without Dutch institutions → omit `dutch.yaml`
|
|
- Simple institution list without provenance → omit `provenance.yaml`
|
|
- Just enumerations for validation → use `enums.yaml` alone
|
|
|
|
### 2. **Clearer Organization**
|
|
Each module has a single, focused purpose:
|
|
- **Enums**: Type definitions
|
|
- **Core**: Organizational structure
|
|
- **Provenance**: Data quality and history
|
|
- **Collections**: Holdings and platforms
|
|
- **Dutch**: Country-specific extensions
|
|
|
|
### 3. **Easier Maintenance**
|
|
Changes are isolated to relevant modules:
|
|
- Add new institution type → edit `enums.yaml`
|
|
- Add new core field → edit `core.yaml`
|
|
- Add new country module → create `brazil.yaml` (doesn't affect existing modules)
|
|
|
|
### 4. **Better IDE Support**
|
|
Smaller files load faster and are easier to navigate in IDEs.
|
|
|
|
### 5. **Future Extensibility**
|
|
Easy to add country-specific and domain-specific modules following established patterns:
|
|
|
|
**Country-Specific Modules** (following `dutch.yaml` pattern):
|
|
```
|
|
schemas/
|
|
├── heritage_custodian.yaml
|
|
├── enums.yaml
|
|
├── core.yaml
|
|
├── provenance.yaml
|
|
├── collections.yaml
|
|
├── dutch.yaml
|
|
├── brazil.yaml # New: Brazilian-specific extensions (CNPJ, Ibram registry)
|
|
├── vietnam.yaml # New: Vietnamese-specific extensions
|
|
└── japan.yaml # New: Japanese-specific extensions (ISIL-JP, NACSIS)
|
|
```
|
|
|
|
**Domain-Specific Modules** (following `bibliographic.yaml` pattern):
|
|
```
|
|
schemas/
|
|
├── heritage_custodian.yaml
|
|
├── bibliographic.yaml # SPAR Ontologies for scholarly publications
|
|
├── museum_objects.yaml # Future: CIDOC-CRM for museum object cataloging
|
|
├── archival_desc.yaml # Future: RiC-O for archival description
|
|
└── conservation.yaml # Future: Conservation event tracking
|
|
```
|
|
|
|
## Validation
|
|
|
|
### LinkML Validation
|
|
|
|
All modules validate successfully with LinkML runtime:
|
|
|
|
```bash
|
|
$ python3 -c "from linkml_runtime import SchemaView; sv = SchemaView('schemas/heritage_custodian.yaml'); print(f'✓ {len(list(sv.all_classes()))} classes, {len(list(sv.all_slots()))} slots, {len(list(sv.all_enums()))} enums')"
|
|
✓ 21 classes, 190 slots, 13 enums
|
|
```
|
|
|
|
### Backward Compatibility
|
|
|
|
All existing parsers and tests pass without modification:
|
|
|
|
```bash
|
|
$ pytest tests/parsers/ -q
|
|
53 passed in 0.40s ✓
|
|
```
|
|
|
|
### Python Models
|
|
|
|
Generated Pydantic models work correctly:
|
|
|
|
```python
|
|
from glam_extractor.models import DutchHeritageCustodian, Provenance, Publication
|
|
# All models functional, no breaking changes
|
|
```
|
|
|
|
## Migration Guide
|
|
|
|
### For Schema Developers
|
|
|
|
**Before (monolithic)**:
|
|
- Edit 1,102-line `heritage_custodian.yaml`
|
|
- Hard to find relevant sections
|
|
- Risk of breaking unrelated code
|
|
|
|
**After (modular)**:
|
|
- Edit focused module (e.g., `collections.yaml` for collection fields)
|
|
- Clear module scope
|
|
- Changes isolated
|
|
|
|
### For Schema Users
|
|
|
|
**No changes required!** The main schema still works exactly as before:
|
|
|
|
```python
|
|
from linkml_runtime import SchemaView
|
|
sv = SchemaView('schemas/heritage_custodian.yaml')
|
|
# All classes, slots, enums available as before
|
|
```
|
|
|
|
### For Extension Developers
|
|
|
|
**New pattern**: Create country/domain-specific modules:
|
|
|
|
```yaml
|
|
# schemas/my_extension.yaml
|
|
id: https://example.org/my-extension
|
|
imports:
|
|
- core # Import base classes
|
|
|
|
classes:
|
|
MyCustomCustodian:
|
|
is_a: HeritageCustodian
|
|
slots:
|
|
- my_custom_field
|
|
```
|
|
|
|
## Metrics
|
|
|
|
| Metric | Before (v0.1) | After (v0.2.1) | Change |
|
|
|--------|---------------|----------------|--------|
|
|
| Main schema size | 1,102 lines | 73 lines | **-93%** |
|
|
| Total schema size | 1,102 lines | 2,436 lines* | +121% (comprehensive) |
|
|
| Modules | 1 | 7 | **Focused separation** |
|
|
| Classes | 12 | 21 | **+9 bibliographic classes** |
|
|
| Slots | 104 | 190 | **+86 bibliographic slots** |
|
|
| Enums | 7 | 13 | **+6 bibliographic enums** |
|
|
| Test coverage | 53 tests | 53 tests | ✓ 100% pass rate |
|
|
| Ontologies integrated | 5 | **12** | **+7 SPAR ontologies** |
|
|
|
|
*Includes extensive documentation, examples, and SPAR Ontologies integration
|
|
|
|
### Module Breakdown
|
|
|
|
| Module | Lines | Classes | Key Features |
|
|
|--------|-------|---------|--------------|
|
|
| `enums.yaml` | 257 | 0 | 13 enumerations (institution types, data tiers, citation types, etc.) |
|
|
| `core.yaml` | 610 | 5 | HeritageCustodian, Location, ContactInfo, Identifier, OrganizationalUnit |
|
|
| `provenance.yaml` | 237 | 3 | Provenance, GHCIDHistoryEntry, ChangeEvent |
|
|
| `collections.yaml` | 194 | 3 | Collection, DigitalPlatform, Partnership |
|
|
| `dutch.yaml` | 117 | 1 | DutchHeritageCustodian (extends HeritageCustodian) |
|
|
| `bibliographic.yaml` | 948 | 9 | ScholarlyWork, Publication, Journal, Conference, Person, Organization, Citation, DocumentSection, PublishingRole |
|
|
| `heritage_custodian.yaml` | 73 | 0 | Main aggregator (imports all modules) |
|
|
| **TOTAL** | **2,436** | **21** | Comprehensive heritage institution modeling |
|
|
|
|
## References
|
|
|
|
- **Schema Files**: `/schemas/`
|
|
- **Documentation**: `/docs/`
|
|
- **Tests**: `/tests/parsers/`
|
|
- **Models**: `/src/glam_extractor/models.py`
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-11-09
|
|
**Schema Version**: 0.2.1
|