glam/SESSION_2025-11-09_SCHEMA_ONTOLOGY_UPDATE.md
kempersc fa5680f0dd Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats
- Introduced custodian_hub_v3.mmd, custodian_hub_v4_final.mmd, and custodian_hub_v5_FINAL.mmd for Mermaid representation.
- Created custodian_hub_FINAL.puml and custodian_hub_v3.puml for PlantUML representation.
- Defined entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument.
- Established relationships and associations between entities, including temporal extents, observations, and reconstruction activities.
- Incorporated enumerations for various types, statuses, and classifications relevant to custodians and their activities.
2025-11-22 14:33:51 +01:00

15 KiB

Session Summary: Schema v0.2.1 and Ontology Integration Documentation

Date: 2025-11-09
Session Type: Schema Version Update + Documentation Enhancement
Status: Complete


Objectives

  1. Update schema version from 0.2.0 to 0.2.1 (reflecting LEARNING_MANAGEMENT addition)
  2. Document TOOI and CPOV ontology integration in ONTOLOGY_EXTENSIONS.md
  3. Add ontology reference instructions to AGENTS.md
  4. Update validation script with new platform type enum

Changes Made

1. Schema Version Update (0.2.0 → 0.2.1)

Updated version numbers in all 6 modular schema files:

Files Modified:

  • schemas/heritage_custodian.yaml - Line 12: version: 0.2.1
  • schemas/core.yaml - Line 11: version: 0.2.1
  • schemas/enums.yaml - Line 10: version: 0.2.1
  • schemas/provenance.yaml - Line 10: version: 0.2.1
  • schemas/collections.yaml - Line 10: version: 0.2.1
  • schemas/dutch.yaml - Line 11: version: 0.2.1

Reason: Version 0.2.1 reflects the addition of LEARNING_MANAGEMENT to DigitalPlatformTypeEnum (added during Libyan extraction work in previous session, 2025-11-09).

Evidence:

$ rg "^version:" schemas/*.yaml
schemas/collections.yaml:version: 0.2.1
schemas/core.yaml:version: 0.2.1
schemas/heritage_custodian.yaml:version: 0.2.1
schemas/provenance.yaml:version: 0.2.1
schemas/dutch.yaml:version: 0.2.1
schemas/enums.yaml:version: 0.2.1

2. ONTOLOGY_EXTENSIONS.md Enhancement

File: docs/ONTOLOGY_EXTENSIONS.md

Added Section: "## Integrating TOOI and CPOV Ontologies" (inserted after line 101, before "Extension Guidelines")

Content Added (~350 lines):

2.1 TOOI Integration Documentation

TOOI - Dutch Government Organizational Ontology:

  • File location: /data/ontology/tooiont.ttl
  • Namespace: https://identifier.overheid.nl/tooi/def/ont/
  • Key classes: tooi:Overheidsorganisatie, tooi:Wijzigingsgebeurtenis
  • Key properties: tooi:officieleNaamInclSoort, tooi:begindatum, tooi:einddatum
  • PROV-O integration for temporal tracking
  • Heritage custodian mapping to DutchHeritageCustodian class
  • RDF serialization examples (Noord-Hollands Archief merger case)

When to Use TOOI:

  • Dutch heritage institutions (government archives, state museums)
  • Dutch organizational change events (mergers, splits, reorganizations)
  • Dutch ISIL registry or KvK data integration
  • Non-Dutch institutions (use CPOV instead)

2.2 CPOV Integration Documentation

CPOV - EU Core Public Organisation Vocabulary:

  • Files: /data/ontology/core-public-organisation-ap.ttl, /data/ontology/core-public-organisation-ap.jsonld
  • Namespace: http://data.europa.eu/m8g/
  • Key classes: cpov:PublicOrganisation, cv:ChangeEvent, locn:Address
  • Key properties: skos:prefLabel, skos:altLabel, dct:identifier, locn:address
  • W3C Organization Ontology integration
  • Heritage custodian mapping to base HeritageCustodian class
  • RDF serialization examples (Biblioteca Nacional do Brasil founding case)

When to Use CPOV:

  • European heritage institutions (non-Dutch)
  • Global public-sector cultural organizations
  • EU Linked Open Data alignment (Europeana, DPLA)
  • ⚠️ Global institutions (use CPOV patterns + regional ontologies)

2.3 Ontology Decision Tree

Added flowchart-style decision tree for agents:

Is the institution Dutch?
├─ YES → Use TOOI
└─ NO → Is it EU public organization?
         ├─ YES → Use CPOV
         └─ NO → Use CPOV patterns + regional ontologies

2.4 Practical Extraction Workflow

Step 1: Read ontology files before extraction
Step 2: Map conversation data to ontology properties (comparison table)
Step 3: Generate RDF-compatible LinkML with class_uri and slot_uri
Step 4: Export to RDF/Turtle using linkml-convert

Property Mapping Table:

Extracted Data TOOI Property CPOV Property Schema.org
Institution name tooi:officieleNaamInclSoort skos:prefLabel schema:name
Founding date tooi:begindatum schema:startDate schema:foundingDate
ISIL code tooi:organisatieIdentificatie dct:identifier schema:identifier
Merger event tooi:Wijzigingsgebeurtenis cv:ChangeEvent schema:Event

2.5 Updated Extension Guidelines

Reordered "Research Base Ontologies" checklist to prioritize TOOI and CPOV:

  1. TOOI (if Dutch institutions)
  2. CPOV (if EU/global public organizations)
  3. Schema.org (web semantics)
  4. CIDOC-CRM (cultural heritage)
  5. RiC-O (archival)
  6. BIBFRAME (bibliographic)
  7. Dublin Core (metadata elements)

Added search strategy using rg (ripgrep) to find relevant ontology classes.


3. AGENTS.md Enhancement

File: AGENTS.md

Changes Made:

3.1 Schema Version Update (Line 11, 13)

Changed references from "v0.2.0" to "v0.2.1":

  • Line 11: **Schema**: See the modular LinkML schema v0.2.1 described below.
  • Line 13: ## Schema Reference (v0.2.1)

3.2 New Section: "Base Ontologies for Global GLAM Data"

Location: Inserted after "Schema Reference (v0.2.1)" section (line 49), before "Institution Type Taxonomy" (line 51)

Content Added (~300 lines):

Foundation Ontologies (3 primary ontologies)

1. TOOI - Dutch Government Organizational Ontology

  • File: /data/ontology/tooiont.ttl
  • Scope: Dutch heritage institutions
  • When to use: Dutch extraction, Dutch ISIL/KvK data, mergers/splits
  • Key classes: tooi:Overheidsorganisatie, tooi:Wijzigingsgebeurtenis
  • LinkML mapping: DutchHeritageCustodian.class_uri: tooi:Overheidsorganisatie

2. CPOV - EU Core Public Organisation Vocabulary

  • Files: /data/ontology/core-public-organisation-ap.ttl, core-public-organisation-ap.jsonld
  • Scope: EU-wide and global public heritage organizations
  • When to use: European/global extraction, Europeana/DPLA alignment
  • Key classes: cpov:PublicOrganisation, cv:ChangeEvent, locn:Address
  • LinkML mapping: HeritageCustodian.class_uri: cpov:PublicOrganisation

3. Schema.org - Web Vocabulary

  • File: /data/ontology/schemaorg.owl
  • Scope: Universal web semantics
  • When to use: Private collections, digital platforms, SEO optimization
  • Key classes: schema:Museum, schema:Library, schema:LearningResource
Ontology Decision Tree

ASCII flowchart guiding agents on which ontology to use based on institution type and location.

Required Ontology Consultation Workflow

4-step process for agents before extraction:

Step 1: Identify institution geographic scope (Dutch → TOOI, EU → CPOV, Other → Schema.org)

Step 2: Review ontology classes and properties using rg search commands:

rg "tooi:Overheidsorganisatie" /data/ontology/tooiont.ttl
rg "cpov:PublicOrganisation" /data/ontology/core-public-organisation-ap.ttl

Step 3: Map conversation data to ontology properties (table with 7 common fields)

Step 4: Document ontology alignment in provenance metadata:

provenance:
  base_ontology: "http://data.europa.eu/m8g/"
  ontology_alignment:
    - "cpov:PublicOrganisation"
    - "cv:ChangeEvent"
Common Ontology Patterns

Pattern 1: Organizational change events (TOOI vs. CPOV vs. Schema.org)
Pattern 2: Multilingual names (language-tagged literals in CPOV)
Pattern 3: Hierarchical relationships (W3C Org Ontology patterns)

Anti-Patterns to Avoid

Inventing custom properties when ontology equivalents exist
Ignoring ontology namespace conventions
Extracting without reviewing ontology files
Always map to base ontologies and document alignment

Additional Ontology Resources

Brief mentions of:

  • CIDOC-CRM (museum objects, provenance)
  • RiC-O (archival description)
  • BIBFRAME (bibliographic metadata)

3.3 Updated "When Asked to Design NLP Components" Section (Line 1072-1083)

Changed:

  • Line 1074: Schema version 0.2.0 → 0.2.1
  • Line 1078: Schema reference section 0.2.0 → 0.2.1

Added (new item #2):

2. **Consult Base Ontologies**: BEFORE designing extraction logic, review relevant ontologies:
   - **Dutch institutions**: Study TOOI ontology (`/data/ontology/tooiont.ttl`)
   - **EU/global institutions**: Study CPOV ontology (`/data/ontology/core-public-organisation-ap.ttl`)
   - **All institutions**: Reference Schema.org patterns (`/data/ontology/schemaorg.owl`)
   - See "Base Ontologies for Global GLAM Data" section above for decision tree

Renumbered: Original items 2-5 became 3-6.


4. Validation Script Update

File: validate_instances.py

Change: Added LEARNING_MANAGEMENT to platform_type enum (line 54-57)

Before:

'platform_type': [
    'COLLECTION_MANAGEMENT', 'DISCOVERY_PORTAL', 'DIGITAL_REPOSITORY', 
    'SPARQL_ENDPOINT', 'API', 'OTHER'
],

After:

'platform_type': [
    'COLLECTION_MANAGEMENT', 'DISCOVERY_PORTAL', 'DIGITAL_REPOSITORY', 
    'SPARQL_ENDPOINT', 'API', 'LEARNING_MANAGEMENT', 'OTHER'
],

Impact: Validator now accepts Libyan university extraction files with LMS platforms (Google Classroom, Moodle).


Verification

Schema Version Consistency

All 6 schema files updated to v0.2.1:

$ rg "^version:" schemas/*.yaml
schemas/collections.yaml:version: 0.2.1
schemas/core.yaml:version: 0.2.1
schemas/heritage_custodian.yaml:version: 0.2.1
schemas/provenance.yaml:version: 0.2.1
schemas/dutch.yaml:version: 0.2.1
schemas/enums.yaml:version: 0.2.1

Documentation Updates Confirmed

ONTOLOGY_EXTENSIONS.md:

$ rg -A 3 "## Integrating TOOI and CPOV Ontologies" docs/ONTOLOGY_EXTENSIONS.md | head -10
## Integrating TOOI and CPOV Ontologies

The GLAM project builds on two foundational ontologies for organizational data modeling. 
**AI agents should always consult these ontologies** when designing extraction pipelines...

AGENTS.md:

$ rg -A 3 "## Base Ontologies for Global GLAM" AGENTS.md | head -10
## Base Ontologies for Global GLAM Data

**CRITICAL**: Before designing extraction pipelines or extending the schema, AI agents 
MUST consult the base ontologies that the LinkML schema builds upon...

Context from Previous Session

Previous Session (2025-11-09, earlier):

  • Extracted 37 Libyan heritage institutions from conversation files
  • Discovered gap: Learning management systems (LMS) not covered by existing platform types
  • Added LEARNING_MANAGEMENT to DigitalPlatformTypeEnum in schemas/enums.yaml
  • Created 5 Libyan extraction JSON files (universities, museums, sites, buildings)
  • Validated data with 3 universities using Google Classroom and Moodle
  • Deferred: LinkML validation (linkml-validate command crashed)

This Session (2025-11-09, resumed):

  • Updated schema version to 0.2.1 to reflect LEARNING_MANAGEMENT addition
  • Documented TOOI/CPOV ontology integration for future extraction work
  • Enhanced agent instructions to require ontology consultation before extraction
  • Updated validation script to support new platform type

Files Modified

File Lines Changed Type Status
schemas/heritage_custodian.yaml 1 line (version) Schema
schemas/core.yaml 1 line (version) Schema
schemas/enums.yaml 1 line (version) Schema
schemas/provenance.yaml 1 line (version) Schema
schemas/collections.yaml 1 line (version) Schema
schemas/dutch.yaml 1 line (version) Schema
docs/ONTOLOGY_EXTENSIONS.md +350 lines Documentation
AGENTS.md +300 lines, version updates Documentation
validate_instances.py 1 line (enum) Validation

Total Changes: 9 files modified, ~650 lines added, 8 lines updated


Impact and Benefits

1. Schema Versioning

  • Clear evolution tracking: v0.2.1 documents the LEARNING_MANAGEMENT extension
  • Version consistency: All 6 schema modules updated in sync
  • Historical record: ONTOLOGY_EXTENSIONS.md logs the rationale and evidence

2. Ontology Integration

  • Standards compliance: Agents now know to align with TOOI, CPOV, Schema.org
  • Semantic interoperability: Extraction data maps to established ontologies
  • RDF-ready: LinkML class_uri and slot_uri mappings documented
  • Multi-ontology support: Institutions can implement multiple ontology classes

3. Agent Guidance

  • Decision tree: Clear workflow for choosing appropriate ontology
  • Practical examples: RDF serialization patterns for TOOI and CPOV
  • Anti-patterns: Documented common mistakes to avoid
  • Required workflow: 4-step process before extraction begins

4. Validation Support

  • Enum coverage: Validator now supports all 7 platform types (including LEARNING_MANAGEMENT)
  • Libyan data: 3 university LMS platforms can now be validated
  • Quality assurance: Schema validation aligns with schema version 0.2.1

Next Steps

Immediate Priorities

  1. Continue Extraction Work (Middle East/North Africa cluster):

    • Algeria (conversation available)
    • Morocco (conversation available)
    • Egypt (conversation available)
    • Tunisia (conversation available)
    • Jordan (conversation available)
  2. Alternative Validation Approach:

    • Investigate why linkml-validate CLI crashed on Libyan JSON files
    • Try Python API approach: from linkml.validator import Validator
    • Or convert JSON to YAML and use existing validation script
  3. Test Ontology Alignment:

    • Extract one Dutch institution following TOOI patterns
    • Extract one Brazilian institution following CPOV patterns
    • Generate RDF serialization with linkml-convert
    • Verify ontology class mappings in output

Secondary Priorities

  1. Update PROGRESS.md:

    • Add entry for schema v0.2.1 release
    • Document ontology integration milestone
    • Update schema evolution timeline
  2. Create Migration Guide (if needed):

    • Document changes from v0.2.0 to v0.2.1
    • Explain LEARNING_MANAGEMENT use cases
    • Provide conversion examples for older datasets
  3. Expand Ontology Coverage:

    • Document CIDOC-CRM integration patterns (museum objects)
    • Add RiC-O patterns for archival description
    • Create BIBFRAME examples for library catalogs

References

  • Schema v0.2.1: All schema files in schemas/ directory
  • Ontology Files:
    • TOOI: /data/ontology/tooiont.ttl
    • CPOV: /data/ontology/core-public-organisation-ap.ttl, core-public-organisation-ap.jsonld
    • Schema.org: /data/ontology/schemaorg.owl
  • Documentation:
    • /docs/ONTOLOGY_EXTENSIONS.md - Extension guidelines and ontology integration
    • /AGENTS.md - AI agent instructions with ontology decision tree
    • /docs/SCHEMA_MODULES.md - Modular schema architecture
  • Validation: validate_instances.py - Custom LinkML validator

Session Metadata

Start Time: 2025-11-09 (resumed from previous session summary)
End Time: 2025-11-09
Duration: ~30 minutes (documentation-focused)
Agent: OpenCODE (Claude 3.7 Sonnet)
Session Type: Schema maintenance + documentation enhancement
Complexity: Medium (version updates + comprehensive documentation)


Status: Session complete. Schema versioned to 0.2.1, ontology integration documented, agent instructions enhanced.