glam/SESSION_SUMMARY_20251121_SLOT_USAGE_COMPLETE.md
2025-11-21 22:12:33 +01:00

19 KiB

Session Summary: Comprehensive Slot Usage Mappings - COMPLETE

Date: 2025-11-21
Session Type: Ontology Mapping - Slot Usage (Session 8)
Status: COMPLETE - All classes have slot_usage blocks
Schema Version: v0.2.2-custodian


Executive Summary

MISSION ACCOMPLISHED: The Heritage Custodian schema now has comprehensive slot_usage blocks in all 7 classes, defining precise ontology property mappings for every slot within its class context.

Key Achievement

Before: Generic slot_uri mappings only (abstract property definitions)

# Global slot definition only
slots:
  id:
    slot_uri: dcterms:identifier  # Too generic!

After: Class-specific slot_usage blocks (concrete ontology constraints)

# Class-specific property mapping
Custodian:
  class_uri: crm:E39_Actor
  slot_usage:
    id:
      slot_uri: crm:P1_is_identified_by  # CIDOC-CRM actor identification!
      description: "Links E39_Actor → E42_Identifier"

What We Built (Session 8)

1. Added Slot Usage to All Classes

Classes with Complete slot_usage:

  1. Custodian (base class) - 3 slots mapped

    • idcrm:P1_is_identified_by
    • createdcrm:P82a_begin_of_the_begin
    • modifiedcrm:P82b_end_of_the_end
  2. CustodianObservation - 8 slots mapped

    • observed_nameskos:prefLabel
    • alternative_observed_namesskos:altLabel
    • observation_dateprov:generatedAtTime
    • sourceprov:hadPrimarySource (REQUIRED)
    • languageschema:inLanguage
    • observation_contextdcterms:description
    • derived_from_entityprov:wasDerivedFrom
    • confidence_scoreprov:confidence
  3. CustodianName (subclass) - 7 slots mapped

    • standardized_nameskos:prefLabel (REQUIRED)
    • endorsement_sourceprov:hadPrimarySource (REQUIRED)
    • name_authorityprov:wasAttributedTo
    • valid_fromschema:validFrom
    • valid_toschema:validUntil
    • supersedesdcterms:replaces
    • superseded_bydcterms:isReplacedBy
  4. CustodianReconstruction - 13 slots mapped

    • legal_namecpov:legalName (REQUIRED)
    • legal_formorg:classification (ISO 20275 ELF codes → GLEIF)
    • registration_numbercpov:identifier
    • registration_dateschema:foundingDate
    • registration_authorityprov:wasAttributedTo
    • dissolution_dateschema:dissolutionDate
    • parent_custodianorg:subOrganizationOf
    • legal_statusgleif-base:hasEntityStatus
    • governance_structureorg:organization
    • was_derived_fromprov:wasDerivedFrom (REQUIRED)
    • was_generated_byprov:wasGeneratedBy (REQUIRED)
    • was_revision_ofprov:wasRevisionOf
    • identifiersdcterms:identifier
  5. ReconstructionActivity - 7 slots mapped

    • activity_typeprov:Activity
    • methoddcterms:description
    • responsible_agentprov:wasAssociatedWith
    • started_at_timeprov:startedAtTime
    • ended_at_timeprov:endedAtTime
    • used_sourcesprov:used
    • justificationprov:qualifiedAttribution
  6. Agent - 4 slots mapped

    • agent_namefoaf:name (REQUIRED)
    • agent_typeprov:Agent
    • affiliationschema:affiliation
    • contactfoaf:mbox
  7. Identifier - 2 slots mapped

    • identifier_schemeskos:inScheme (REQUIRED)
    • identifier_valueskos:notation (REQUIRED)

Total slot_usage mappings: 44 slots across 7 classes


File Changes

1. LinkML Schema (01_custodian_name.yaml)

Metric Before After Change
Lines 1,036 1,264 +228 lines
slot_usage blocks 3 (partial) 7 (complete) +4 classes
Mapped slots 3 44 +41 slots
Description detail Minimal Comprehensive Added ontology context

Key Additions:

  • Added slot_uri to every slot_usage entry
  • Added detailed descriptions with ontology context
  • Documented domain/range relationships (CIDOC-CRM patterns)
  • Added examples and rationale per slot
  • Marked required fields explicitly

2. Documentation (ONTOLOGY_MAPPINGS.md)

Metric Before After Change
Lines 825 1,049 +224 lines
New Section "Slot Usage: Class-Specific Property Mappings" +224 lines

Added Content:

  1. Concept Explanation: What is slot_usage and why it's critical
  2. Per-Class Mappings: Complete slot_usage blocks for all 7 classes
  3. RDF Output Examples: Before/after showing semantic precision gain
  4. Validation Guidance: How slot_usage enables ontology validation

Ontology Integration Patterns

Pattern 1: CIDOC-CRM Actor Identification

Problem: Generic dcterms:identifier doesn't capture CIDOC-CRM actor identification semantics.

Solution: Use crm:P1_is_identified_by with domain/range constraints.

Custodian:
  class_uri: crm:E39_Actor
  slot_usage:
    id:
      slot_uri: crm:P1_is_identified_by  # Domain: E39_Actor, Range: E42_Identifier
      description: "CIDOC-CRM: P1 identifies actors with unique identifiers"

RDF Output:

<https://w3id.org/heritage/custodian/nl/rijksmuseum>
  a crm:E39_Actor ;
  crm:P1_is_identified_by <https://w3id.org/heritage/custodian/nl/rijksmuseum> ;
  crm:P1_is_identified_by [
    a crm:E42_Identifier ;
    crm:P2_has_type <http://id.loc.gov/vocabulary/identifiers/isil> ;
    rdf:value "NL-AmRMA"
  ] .

Pattern 2: PROV-O Entity Derivation

Problem: Observations must link to source documents AND derived entities.

Solution: Use PROV-O provenance properties with explicit semantics.

CustodianObservation:
  class_uri: heritage:CustodianObservation
  slot_usage:
    source:
      slot_uri: prov:hadPrimarySource  # Links Entity → Source (E73_Information_Object)
      description: "PROV-O: hadPrimarySource links Entity to original information source"
      required: true
    derived_from_entity:
      slot_uri: prov:wasDerivedFrom  # Links Observation → Reconstruction
      description: "PROV-O: wasDerivedFrom establishes derivation chain"

RDF Output:

<https://w3id.org/heritage/observation/rijks-letterhead-2015>
  a heritage:CustodianObservation, prov:Entity ;
  prov:hadPrimarySource <https://example.org/source/rijks-letterhead-2015.pdf> ;
  prov:wasDerivedFrom <https://w3id.org/heritage/org/rijksmuseum> ;
  skos:prefLabel "Rijks"@nl .

Problem: Legal form codes (ISO 20275) need semantic links to GLEIF ontology.

Solution: Use org:classification with GLEIF ELF ConceptScheme.

CustodianReconstruction:
  slot_usage:
    legal_form:
      slot_uri: org:classification  # W3C Org classification
      description: "Maps to gleif-elf:EntityLegalForm (ISO 20275)"
      range: string
      pattern: "^[A-Z0-9]{4}$"

RDF Output:

<https://w3id.org/heritage/org/rijksmuseum>
  a heritage:CustodianReconstruction, crm:E40_Legal_Body ;
  org:classification gleif-elf:ELF-V44D ;  # Dutch stichting
  gleif-elf:hasEntityLegalFormCode "V44D" ;
  cpov:legalName "Stichting Rijksmuseum"@nl .

Pattern 4: Name Versioning (Schema.org Temporal Validity)

Problem: Custodian names change over time; need versioning.

Solution: Use Schema.org temporal validity with Dublin Core replacement relations.

CustodianName:
  slot_usage:
    valid_from:
      slot_uri: schema:validFrom  # Validity start date
    valid_to:
      slot_uri: schema:validUntil  # Validity end date
    supersedes:
      slot_uri: dcterms:replaces  # Replacement relationship

RDF Output:

<https://w3id.org/heritage/name/mauritshuis-current>
  a heritage:CustodianName ;
  skos:prefLabel "Mauritshuis"@nl ;
  schema:validFrom "2013-01-01"^^xsd:date ;
  dcterms:replaces <https://w3id.org/heritage/name/mauritshuis-old> .

<https://w3id.org/heritage/name/mauritshuis-old>
  a heritage:CustodianName ;
  skos:prefLabel "Koninklijk Kabinet van Schilderijen"@nl ;
  schema:validFrom "1822-01-01"^^xsd:date ;
  schema:validUntil "2012-12-31"^^xsd:date ;
  dcterms:isReplacedBy <https://w3id.org/heritage/name/mauritshuis-current> .

Validation Benefits

With comprehensive slot_usage, we can now:

1. Ontology Constraint Checking

# Validate CIDOC-CRM domain/range constraints
SELECT ?custodian ?id WHERE {
  ?custodian a crm:E39_Actor .
  ?custodian crm:P1_is_identified_by ?id .
  
  # Check if ?id is an E42_Identifier
  FILTER NOT EXISTS { ?id a crm:E42_Identifier }
}
# Returns violations of CIDOC-CRM constraints

2. SPARQL Query Precision

Before (generic properties):

SELECT ?custodian ?name WHERE {
  ?custodian a crm:E39_Actor .
  ?custodian dcterms:identifier ?name .  # Too generic!
}

After (ontology-specific):

SELECT ?custodian ?name WHERE {
  ?custodian a crm:E39_Actor .
  ?custodian crm:P1_is_identified_by ?id .  # CIDOC-CRM property!
  ?id rdf:value ?name .
}

3. OWL Reasoning

With class-specific properties, OWL reasoners can:

  • Infer implicit relationships based on ontology axioms
  • Validate domain/range constraints automatically
  • Detect inconsistencies in instance data

Mapping Statistics (Complete)

Level Component Count Status
Classes Class → Ontology Class 95 Complete (Session 5)
Properties Slot → Ontology Property 41 Complete (Session 6)
Values Enum → Ontology Concept 14 Complete (Session 7)
Constraints Slot Usage → Class-Property 44 COMPLETE (Session 8)

Total Ontology Mappings: 194 (95 classes + 41 properties + 14 enums + 44 slot_usage)


Sessions 5-8 Summary

Session 5: TOOIont Integration

  • Added 7 Dutch government organization mappings
  • Total class mappings: 88 → 95
  • Documentation: SESSION_SUMMARY_20251121_TOOIONT_INTEGRATION.md

Session 6: Slot URI Mappings

  • Added slot_uri to 41 slots (100% coverage)
  • Mapped to 7 ontologies: PROV-O, SKOS, Dublin Core, Schema.org, CPOV, W3C Org, FOAF
  • Documentation: SESSION_SUMMARY_20251121_SLOT_URI_COMPLETE.md

Session 7: Enum Value Mappings

  • Added meaning to 14 enum permissible values
  • LegalStatusEnum → GLEIF EntityStatus (7 values)
  • ReconstructionActivityTypeEnum → PROV-O Activity (4 values)
  • AgentTypeEnum → FOAF/PROV-O (3 values)
  • Documentation: SESSION_SUMMARY_20251121_ENUM_SLOT_USAGE_MAPPINGS.md

Session 8: Slot Usage Complete (This Session)

  • Added comprehensive slot_usage blocks to 7 classes
  • Mapped 44 slots to class-specific ontology properties
  • Enhanced ONTOLOGY_MAPPINGS.md with +224 lines of documentation
  • Schema: 1,036 → 1,264 lines (+228)
  • Docs: 825 → 1,049 lines (+224)

What This Enables

1. Valid RDF Generation

The schema can now generate RDF with:

  • Class-specific property URIs (not generic dcterms:*)
  • Domain/range constraints from base ontologies
  • Multiple ontology property assertions per slot (if needed)

2. Semantic Interoperability

Heritage custodian data can integrate seamlessly with:

  • CIDOC-CRM systems (museum collections, cultural heritage)
  • PROV-O consumers (provenance tracking, data lineage)
  • Schema.org crawlers (Google Dataset Search, web search engines)
  • GLEIF databases (legal entity identification)
  • RiC-O archival systems (archival finding aids, EAD exports)

3. Query Optimization

SPARQL queries can leverage:

  • Ontology-specific properties for precise filtering
  • OWL reasoning for implicit relationship inference
  • Domain/range constraints for validation

4. Quality Assurance

Ontology validators can check:

  • Property domain/range violations
  • Missing required properties
  • Inconsistent relationships

Next Steps (Future Enhancements)

Phase 1: RDF Generation (High Priority)

  1. Regenerate RDF formats with new slot_usage mappings:

    gen-owl -f ttl schemas/20251121/linkml/01_custodian_name.yaml > schemas/20251121/rdf/01_custodian_name.owl.ttl
    rdfpipe schemas/20251121/rdf/01_custodian_name.owl.ttl -o nt > schemas/20251121/rdf/01_custodian_name.nt
    rdfpipe schemas/20251121/rdf/01_custodian_name.owl.ttl -o jsonld > schemas/20251121/rdf/01_custodian_name.jsonld
    
  2. Validate RDF output:

    • Check property URIs match slot_usage declarations
    • Verify domain/range constraints in generated triples

Phase 2: Instance Examples (Medium Priority)

  1. Create example instances demonstrating slot_usage:

    # schemas/20251121/examples/rijksmuseum_with_slot_usage.yaml
    - id: https://w3id.org/heritage/custodian/nl/rijksmuseum
      # ... complete example using all slot_usage mappings
    
  2. Validate instances against schema:

    linkml-validate -s schemas/20251121/linkml/01_custodian_name.yaml \
                    schemas/20251121/examples/rijksmuseum_with_slot_usage.yaml
    

Phase 3: SPARQL Query Suite (Medium Priority)

  1. Create SPARQL query examples leveraging slot_usage:

    • Actor identification queries (CIDOC-CRM P1)
    • Provenance chain queries (PROV-O wasDerivedFrom)
    • Legal form classification queries (GLEIF)
    • Name versioning queries (Schema.org temporal validity)
  2. Test queries against generated RDF:

    arq --data=schemas/20251121/rdf/01_custodian_name.nt \
        --query=queries/cidoc_crm_actor_identification.sparql
    

Phase 4: TypeDB Translation (Low Priority)

  1. Update TypeDB schema to reflect slot_usage patterns
  2. Test TypeDB query equivalents of SPARQL examples

Key Learnings

1. slot_uri vs. slot_usage

Lesson: Global slot_uri is insufficient for ontology alignment.

  • slot_uri = Abstract property mapping (applies to all classes using slot)
  • slot_usage = Concrete property mapping (class-specific, ontology-aware)

Example:

# Global (too generic)
slots:
  id:
    slot_uri: dcterms:identifier

# Class-specific (ontology-aware)
Custodian:
  slot_usage:
    id:
      slot_uri: crm:P1_is_identified_by  # CIDOC-CRM actor pattern

2. Domain/Range Awareness

Lesson: Ontology properties have semantic constraints.

  • CIDOC-CRM P1_is_identified_by requires domain E39_Actor and range E42_Identifier
  • PROV-O wasGeneratedBy requires domain Entity and range Activity
  • W3C Org subOrganizationOf requires domain and range Organization

Benefit: Explicit slot_usage enables automatic constraint validation.

3. Multiple Ontology Support

Lesson: Same slot can map to different ontologies in different classes.

Example:

# Observation (PROV-O Entity)
CustodianObservation:
  slot_usage:
    id:
      slot_uri: dcterms:identifier  # Generic for Entity

# Reconstruction (CIDOC-CRM Actor)
Custodian:
  slot_usage:
    id:
      slot_uri: crm:P1_is_identified_by  # Actor-specific

4. Documentation is Critical

Lesson: slot_usage descriptions should explain ontology context.

Good:

id:
  slot_uri: crm:P1_is_identified_by
  description: >-
    Unique identifier for this custodian.
    In CIDOC-CRM: P1_is_identified_by links E39_Actor to E42_Identifier.    

Bad:

id:
  slot_uri: crm:P1_is_identified_by
  description: "Identifier"  # No ontology context!

Files Modified (Session 8)

File Before After Change Status
schemas/20251121/linkml/01_custodian_name.yaml 1,036 1,264 +228 Complete
schemas/20251121/ONTOLOGY_MAPPINGS.md 825 1,049 +224 Complete
SESSION_SUMMARY_20251121_SLOT_USAGE_COMPLETE.md ~800 NEW Created

Total Changes: +1,252 lines


Validation Commands

Check YAML Syntax

python3 -c "import yaml; yaml.safe_load(open('schemas/20251121/linkml/01_custodian_name.yaml'))"

Count slot_usage Blocks

grep -c "slot_usage:" schemas/20251121/linkml/01_custodian_name.yaml
# Should return: 7 (one per class)

Count Mapped Slots

grep -E "^\s{6,}[a-z_]+:" schemas/20251121/linkml/01_custodian_name.yaml | wc -l
# Approximate count of slot_usage entries

Verify Ontology Prefixes

grep -E "slot_uri: (crm|prov|skos|schema|dcterms|cpov|org|foaf|gleif)" schemas/20251121/linkml/01_custodian_name.yaml | wc -l
# Should return: 44+ (one per slot_usage mapping)

Success Metrics

Metric Target Achieved Status
Classes with slot_usage 7/7 7/7 100%
Slots mapped 40+ 44 110%
Descriptions added All slots All slots 100%
Ontology context documented All slots All slots 100%
YAML validation Pass Pass
Documentation updated Yes Yes

References

Schema Files

  • schemas/20251121/linkml/01_custodian_name.yaml - LinkML schema with complete slot_usage
  • schemas/20251121/ONTOLOGY_MAPPINGS.md - Ontology mapping documentation

Session Documentation

  • SESSION_SUMMARY_20251121_TOOIONT_INTEGRATION.md - Session 5 (TOOIont)
  • SESSION_SUMMARY_20251121_SLOT_URI_COMPLETE.md - Session 6 (Slot URIs)
  • SESSION_SUMMARY_20251121_ENUM_SLOT_USAGE_MAPPINGS.md - Session 7 (Enum Mappings)
  • SESSION_SUMMARY_20251121_SLOT_USAGE_COMPLETE.md - Session 8 (This document)

Ontology Resources

  • /data/ontology/CIDOC_CRM_v7.1.3.rdf - CIDOC-CRM ontology
  • /data/ontology/prov.ttl - PROV-O ontology
  • /data/ontology/schemaorg.owl - Schema.org
  • /data/ontology/core-public-organisation-ap.ttl - CPOV
  • /data/ontology/org.rdf - W3C Organization Ontology
  • /data/ontology/gleif_base.ttl - GLEIF Base Ontology
  • /data/ontology/gleif_legal_form.ttl - GLEIF Legal Forms
  • /data/ontology/skos.rdf - SKOS
  • /data/ontology/dublin_core_elements.rdf - Dublin Core
  • /data/ontology/foaf.ttl - FOAF

Agent Instructions

  • AGENTS.md - Rule 1: Ontology Files Are Your Primary Reference
  • .opencode/agent/ontology-mapping-rules.md - Ontology consultation workflow

Conclusion

Session 8 Achievements:

  • Added comprehensive slot_usage to all 7 classes
  • Mapped 44 slots to class-specific ontology properties
  • Enhanced documentation with 224 lines of slot_usage guidance
  • Enabled valid RDF generation with ontology constraints
  • Completed 4-session ontology mapping initiative (Sessions 5-8)

Total Ontology Mappings (Sessions 5-8):

  • 194 mappings (95 classes + 41 properties + 14 enums + 44 slot_usage)
  • Schema growth: 845 → 1,264 lines (+419 lines, +49.6%)
  • Documentation growth: 582 → 1,049 lines (+467 lines, +80.2%)

Next Agent: Should proceed to RDF generation and validation to verify slot_usage mappings produce correct ontology-aligned RDF output.


Status: COMPLETE - Comprehensive slot_usage implementation achieved
Quality: - Full ontology integration with class-specific property mappings
Ready for: RDF generation, SPARQL queries, ontology validation

Maintained by: GLAM Data Extraction Project
Session Lead: OpenCODE AI Agent
License: Creative Commons BY-SA 4.0