glam/schemas/20251121/SESSION_SUMMARY_2025-11-21_ONTOLOGY_CORRECTIONS.md
2025-11-21 22:12:33 +01:00

14 KiB

Session Summary: Heritage Custodian Ontology - Critical Conceptual Corrections

Date: 2025-11-21
Session Duration: ~2 hours
Focus: RDF generation + two major conceptual corrections to organization observation/reconstruction pattern


🎯 Session Objectives

  1. Generate RDF/OWL ontology files from LinkML schemas
  2. Correct emic/etic distinction in OrganizationObservation
  3. Separate legal name from legal form classification
  4. Integrate ISO 20275 Entity Legal Forms standard

Completed Tasks

1. RDF/OWL Generation (7 Formats)

Generated files (14 total):

  • schemas/20251121/rdf/01_name_entity.* (7 formats)
  • schemas/20251121/rdf/02_organization_observation_reconstruction.* (7 formats)

Formats:

  1. Turtle (.ttl) - Human-readable RDF
  2. OWL Turtle (.owl.ttl) - OWL ontology
  3. RDF/XML (.rdf) - XML serialization
  4. N-Triples (.nt) - Line-based format
  5. N3 (.n3) - Extended notation
  6. JSON-LD (.jsonld) - JSON format
  7. TriG (.trig) - Named graphs

Statistics:

  • 01_name_entity: 463 triples
  • 02_organization_observation_reconstruction: 1,463 triples (175 new from OrganizationName class)
  • Total: 1,926 triples across both ontologies

Documentation:

  • schemas/20251121/rdf/README.md - Usage guide
  • schemas/20251121/RDF_GENERATION_SUMMARY.md - Complete generation report

2. Conceptual Correction #1: Emic/Etic Distinction

Issue Identified: Schema incorrectly stated OrganizationObservation represents ONLY emic (insider) perspective.

Correction Applied:

BEFORE :

OrganizationObservation:
  description: "Represents EMIC (insider) perspective..."

AFTER :

OrganizationObservation:
  description: >-
    Observations can capture BOTH emic (insider) and etic (outsider) 
    perspectives as they appear in different sources.
    
    Examples:
    - EMIC: organization's website, legal documents, ISIL registry
    - ETIC: guidebooks, academic papers, external descriptions    

New Addition: OrganizationName subclass

  • Specialized subclass of OrganizationObservation
  • Represents STANDARDIZED EMIC name accepted by organization
  • Distinct from vernacular emic (e.g., "Rijks") and etic references
  • New slots: standardized_name, endorsement_source, name_authority, valid_from, valid_to, supersedes, superseded_by

Example:

OrganizationObservation (ANY source)
  ├─ "Rijks" (emic vernacular)
  ├─ "The Rijksmuseum in Amsterdam" (etic guidebook)
  └─ "Rijksmuseum" (emic ISIL registry)

OrganizationName (standardized emic)
  └─ "Rijksmuseum" (official organizational identity)

OrganizationReconstruction (formal entity)
  └─ Stichting Rijksmuseum (legal entity, KvK #41208408)

Issue Identified: Conflated three separate concepts into two.

Three-Way Distinction Established:

# 1. Operational Name (OrganizationName.standardized_name)
standardized_name: "Rijksmuseum"
# Used in: website, signage, marketing, PR, daily operations

# 2. Legal Registered Name (org:legalName)
legal_name: "Stichting Rijksmuseum"
# Used in: legal documents, statutes, KvK registry, contracts

# 3. Legal Form Code (org:classification)
legal_form: "V44D"
# ISO 20275 ELF code for Dutch stichting (foundation)
# Reference: /data/ontology/2023-09-28-elf-code-list-v1.5.csv

Real-World Examples:

Institution Operational Legal Registered Legal Form
Rijksmuseum Rijksmuseum Stichting Rijksmuseum V44D (NL stichting)
Getty Museum Getty Museum J. Paul Getty Trust (US trust)
British Museum British Museum The Trustees of the British Museum 9HLU (UK charity)
BnF Bibliothèque nationale de France Établissement public BnF 5RDO (FR établissement public)

Why This Matters:

  • Before: "Stichting Rijksmuseum" was ambiguous (is "Stichting" part of name or just legal form?)
  • After: Clear separation - operational name, legal name, legal form code
  • 🌍 International: ISO 20275 enables cross-country comparisons (Dutch stichting ≠ US foundation ≠ UK charity)

4. ISO 20275 ELF Integration

Created: schemas/20251121/ISO_20275_ELF_MAPPING.md

Content:

  • Overview of ISO 20275 Entity Legal Forms standard
  • Netherlands: 21 legal forms for heritage institutions
  • Most common for heritage custodians:
    • V44D - stichting (foundation) - MOST COMMON
    • 33MN - vereniging met volledige rechtsbevoegdheid (association)
    • A0W7 - publiekrechtelijke rechtspersoon (public entity)
    • L7HX - kerkgenootschap (religious organization)
  • International examples (France, Germany, Italy, Spain, UK, USA, Japan, etc.)
  • Migration guide from generic enums to ISO 20275
  • W3C Org Ontology alignment

Benefits:

  • International standard (ISO)
  • Machine-readable codes
  • Disambiguates similar forms across countries
  • Supports cross-border operations
  • Maintained by GLEIF (updated annually)

5. Example File Updates

Updated: schemas/20251121/examples/rijksmuseum_observation_reconstruction.yaml

Enhancements:

  1. Shows 5 observations (emic AND etic sources):

    • Emic: website ("Rijks"), ISIL registry ("Rijksmuseum"), KvK ("Stichting Rijksmuseum")
    • Etic: guidebook ("The Rijksmuseum in Amsterdam"), academic paper ("Rijksmuseum Amsterdam")
  2. Demonstrates OrganizationName (standardized emic):

    • standardized_name: "Rijksmuseum"
    • endorsement_source: "https://www.rijksmuseum.nl/en/about-us"
    • valid_from: "2013-04-13" (reopening after renovation)
  3. Shows three-way distinction in reconstruction:

    # Operational name (emic): "Rijksmuseum"
    # Legal registered name: "Stichting Rijksmuseum"
    # Legal form code: "V44D" (ISO 20275)
    
  4. Complete provenance chain:

    • Multiple observations → Standardized name → Reconstruction entity
    • Tracks entity resolution activity with PROV-O

📊 Schema Changes Summary

Files Modified

  1. schemas/20251121/linkml/02_organization_observation_reconstruction.yaml

    • Updated OrganizationObservation description (emic/etic)
    • Added OrganizationName subclass with 6 new slots
    • Enhanced legal_name documentation (distinct from operational name)
    • Enhanced legal_form documentation (ISO 20275 ELF codes)
    • Triple count: 1,288 → 1,463 (+175 triples)
  2. schemas/20251121/examples/rijksmuseum_observation_reconstruction.yaml

    • Complete rewrite showing emic + etic observations
    • Added OrganizationName example
    • Shows three-way distinction (operational/legal name/legal form)

Files Created

  1. schemas/20251121/ISO_20275_ELF_MAPPING.md

    • Complete guide to ISO 20275 Entity Legal Forms
    • Netherlands: 21 legal forms documented
    • International examples
    • Migration strategy
  2. schemas/20251121/rdf/*.{ttl,owl.ttl,rdf,nt,n3,jsonld,trig} (14 files)

    • RDF ontologies in 7 formats
    • Ready for SPARQL querying, triple stores, Linked Data
  3. schemas/20251121/rdf/README.md

    • Usage guide for RDF formats
    • SPARQL query examples
    • Integration patterns
  4. schemas/20251121/RDF_GENERATION_SUMMARY.md

    • Complete generation report
    • Statistics and validation
  5. schemas/20251121/CONCEPTUAL_CORRECTION_2025-11-21.md

    • Detailed documentation of both corrections
    • Before/after comparisons
    • Rationale and examples
  6. schemas/20251121/SESSION_SUMMARY_2025-11-21_ONTOLOGY_CORRECTIONS.md (this file)


🔑 Key Conceptual Clarifications

Clarification 1: Observations Are Not Exclusively Emic

Wrong assumption: "OrganizationObservation = emic only"

Correct understanding:

  • OrganizationObservation = ANY recorded reference (emic OR etic)
  • OrganizationName (subclass) = Standardized EMIC name accepted by organization
  • OrganizationReconstruction = Formal entity derived from ALL observations

Why it matters: Captures full spectrum of naming variation (insider + outsider perspectives)

Clarification 2: Three-Way Name/Form Distinction

Wrong assumption: "legal_name + legal_form covers everything"

Correct understanding:

  1. Operational name (OrganizationName) - Daily use
  2. Legal registered name (org:legalName) - Legal documents
  3. Legal form code (org:classification) - ISO 20275 classification

Why it matters:

  • Operational ≠ Legal (e.g., "Rijksmuseum" ≠ "Stichting Rijksmuseum")
  • Legal name ≠ Legal form (e.g., "Stichting Rijksmuseum" ≠ "V44D")
  • International comparisons (Dutch stichting ≠ US foundation ≠ UK charity)

Clarification 3: ISO 20275 Over Generic Enums

Wrong approach: Generic legal form enums (STICHTING, NGO, GOVERNMENT_AGENCY)

Correct approach: ISO 20275 ELF codes (V44D, A0W7, 5RDO, 9HLU)

Why it matters:

  • International standard (recognized globally)
  • Machine-readable (structured codes)
  • Disambiguates similar forms across countries
  • Maintained by GLEIF (annual updates)
  • Supports cross-border heritage data exchange

📈 Project Impact

Data Quality Improvements

  1. Precision: Three-way distinction eliminates ambiguity
  2. Interoperability: ISO 20275 enables international data exchange
  3. Completeness: Captures both emic and etic perspectives
  4. Provenance: Tracks naming evolution over time

Ontology Alignment

  1. W3C Org Ontology: org:legalName, org:classification
  2. PROV-O: Provenance tracking for observations and reconstructions
  3. PiCo Pattern: Observation/reconstruction separation
  4. ISO 20275: Global legal entity forms standard

RDF/Linked Data Readiness

  • 1,926 triples across two ontologies
  • 7 serialization formats for broad compatibility
  • SPARQL-ready for querying
  • Triple store integration prepared
  • JSON-LD for web APIs

🚀 Next Steps

Immediate (High Priority)

  1. Update LegalFormEnum → Migrate to ISO 20275 free-text pattern

    • Remove generic enum values
    • Add string pattern: ^[A-Z0-9]{4}$
    • Add SHACL validation shapes
  2. Create country-specific ELF guides

    • France (FR): Établissement public, Association loi 1901, Fondation
    • Germany (DE): Stiftung, Verein, Körperschaft des öffentlichen Rechts
    • UK: Charity, Trust, Public corporation
    • USA: 501(c)(3), Trust, Public institution
  3. Update TypeDB schema with OrganizationName entity

  4. Create data migration script

    • Map existing generic legal forms → ISO 20275
    • Document mapping decisions
    • Preserve original values in provenance

Medium Priority

  1. Update diagrams (Mermaid, PlantUML)

    • Show OrganizationName subclass
    • Illustrate three-way distinction
    • Add emic/etic observation examples
  2. Create SHACL validation shapes

    • Validate ISO 20275 code format
    • Cross-reference against GLEIF CSV
    • Warn on unrecognized codes
  3. Extend examples

    • International institutions (France, Germany, Japan, USA)
    • Different legal forms (public entities, NGOs, trusts)
    • Historical name changes

Documentation

  1. Update main README

    • Explain three-way distinction
    • Reference ISO 20275 mapping
    • Update conceptual model diagram
  2. Create ontology alignment guide

    • Map to CIDOC-CRM (museums)
    • Map to RiC-O (archives)
    • Map to BIBFRAME (libraries)
  3. Write data curator handbook

    • How to determine operational vs legal name
    • How to find ISO 20275 codes
    • When to create OrganizationName entries

📚 Reference Files

Core Schema

  • schemas/20251121/linkml/01_name_entity.yaml (unchanged)
  • schemas/20251121/linkml/02_organization_observation_reconstruction.yaml (updated)

RDF Ontologies

  • schemas/20251121/rdf/01_name_entity.* (7 formats)
  • schemas/20251121/rdf/02_organization_observation_reconstruction.* (7 formats)

Documentation

  • schemas/20251121/ISO_20275_ELF_MAPPING.md (NEW)
  • schemas/20251121/CONCEPTUAL_CORRECTION_2025-11-21.md (NEW)
  • schemas/20251121/RDF_GENERATION_SUMMARY.md (NEW)
  • schemas/20251121/rdf/README.md (NEW)

Examples

  • schemas/20251121/examples/rijksmuseum_observation_reconstruction.yaml (updated)

Data Reference

  • /data/ontology/2023-09-28-elf-code-list-v1.5.csv (2,200+ global legal forms)

🎓 Key Learnings

  1. Ontology design requires precise terminology

    • "Emic/etic" distinction critical for anthropological accuracy
    • "Legal name" vs "legal form" vs "operational name" are THREE concepts
  2. International standards essential for heritage data

    • ISO 20275 enables cross-country comparisons
    • Generic enums (STICHTING, NGO) don't translate internationally
  3. Observation-reconstruction pattern is powerful

    • Captures multiple perspectives (emic + etic)
    • Tracks temporal evolution
    • Enables entity resolution from diverse sources
  4. RDF generation from LinkML works well

    • 7 formats generated automatically
    • Minor cleanup needed (WARNING line removal)
    • Ready for Linked Data integration
  5. Documentation is critical

    • Before/after comparisons clarify changes
    • Real-world examples make abstract concepts concrete
    • Migration guides essential for adoption

🤝 Collaboration Notes

For Next Agent/Session

Priority actions:

  1. Migrate LegalFormEnum to ISO 20275 (update schema, add validation)
  2. Create country-specific ELF code guides (France, Germany, UK, USA)
  3. Update TypeDB schema with OrganizationName entity
  4. Create data migration script (generic enums → ISO 20275)

Context to remember:

  • ISO 20275 CSV has 2,200+ codes across 200+ countries
  • Netherlands has 21 legal forms, V44D (stichting) most common for heritage
  • Three-way distinction is NON-NEGOTIABLE (operational/legal name/legal form)
  • OrganizationObservation = emic OR etic (not exclusively emic!)

Files to review:

  • schemas/20251121/ISO_20275_ELF_MAPPING.md - Complete ELF guide
  • schemas/20251121/CONCEPTUAL_CORRECTION_2025-11-21.md - Detailed rationale
  • schemas/20251121/examples/rijksmuseum_observation_reconstruction.yaml - Working example

Version: 1.0.0
Session Date: 2025-11-21
Total Session Time: ~2 hours
Files Created/Modified: 15 files
RDF Triples Generated: 1,926 triples

Status: COMPLETE - Ready for next phase (ELF migration + TypeDB update)


End of session summary