glam/SESSION_SUMMARY_20251121_OBSERVATION_RECONSTRUCTION_PATTERN.md
kempersc fa5680f0dd Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats
- Introduced custodian_hub_v3.mmd, custodian_hub_v4_final.mmd, and custodian_hub_v5_FINAL.mmd for Mermaid representation.
- Created custodian_hub_FINAL.puml and custodian_hub_v3.puml for PlantUML representation.
- Defined entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument.
- Established relationships and associations between entities, including temporal extents, observations, and reconstruction activities.
- Incorporated enumerations for various types, statuses, and classifications relevant to custodians and their activities.
2025-11-22 14:33:51 +01:00

16 KiB

Session Summary - 2025-11-21: Observation vs Reconstruction Pattern

Date: November 21, 2025
Focus: Integrating PiCo pattern for emic/etic distinction in heritage organization modeling
Status: COMPLETE - Major design revision incorporating PiCo insights


🔄 Major Design Revision: Observation vs Reconstruction

Critical Insight from User

User pointed out that emic names (self-references by organizations) and etic spellings/abbreviations/translations should be distinguished from formal legal entities.

Referenced PiCo (Persons in Context) ontology (data/ontology/pico.ttl) which uses:

  • pico:PersonObservation - Person as recorded in source (emic, vernacular)
  • pico:PersonReconstruction - Person entity inferred from observations (etic, formal)

This pattern perfectly matches our heritage organization needs!


📋 What Changed

Before (Session Start)

Single Name entity with direct links to Place/Organization/Collection entities:

heritage:Name
  refers_to_organization  heritage:Organization  # ❌ Too simplistic

Problem: Didn't distinguish between:

  • Emic names (insider perspective: "Rijks", "BnF", vernacular abbreviations)
  • Etic entities (outsider perspective: "Stichting Rijksmuseum", legal forms)

After (PiCo Pattern Integration)

Two-level structure with observation → reconstruction chain:

# LEVEL 1: Observation (emic, source-based)
heritage:OrganizationObservation
  - observed_name: "Rijks"  # Vernacular abbreviation
  - source: letterhead document
  - prov:wasDerivedFrom  OrganizationReconstruction

# LEVEL 2: Reconstruction (etic, legal entity)
heritage:OrganizationReconstruction
  - legal_name: "Stichting Rijksmuseum"  # Official legal name
  - legal_form: STICHTING
  - registration_number: "NL-KvK-41208408"
  - prov:wasDerivedFrom  OrganizationObservation(s)

# NAME ENTITIES: Link to observations, NOT reconstructions
heritage:Name
  refers_to_organization_observation  heritage:OrganizationObservation  # ✅ Correct!

Key Change: Names link to observations (emic references), not entities (etic legal forms).


📂 Files Created

1. LinkML Schema: Observation-Reconstruction Pattern

File: schemas/20251121/linkml/02_organization_observation_reconstruction.yaml

Content:

  • 3 main classes:
    • Organization (abstract base)
    • OrganizationObservation (emic, source-based references)
    • OrganizationReconstruction (etic, legal entities)
  • 2 provenance classes:
    • ReconstructionActivity (entity resolution process)
    • Agent (responsible curator/software)
  • 4 enums:
    • LegalFormEnum (STICHTING, NGO, GOVERNMENT_AGENCY, etc.)
    • LegalStatusEnum (ACTIVE, DISSOLVED, MERGED, etc.)
    • ReconstructionActivityTypeEnum (MANUAL_CURATION, ALGORITHMIC_MATCHING, HYBRID)
    • AgentTypeEnum (PERSON, ORGANIZATION, SOFTWARE)

Lines: ~650 lines of comprehensive LinkML schema

Key Design Patterns:

  • Required provenance: prov:hadPrimarySource for observations, prov:wasDerivedFrom for reconstructions
  • Confidence scoring: Observations include 0.0-1.0 confidence scores
  • Temporal tracking: valid_from/valid_to for historical name changes
  • Multi-observation → single entity: Many observations can derive from one reconstruction

2. Example: Rijksmuseum Case Study

File: schemas/20251121/examples/rijksmuseum_observation_reconstruction.yaml

Content:

  • 5 OrganizationObservations:

    1. "Rijks" (vernacular abbreviation, letterhead, 2015)
    2. "Rijksmuseum Amsterdam" (ISIL registry, 2020)
    3. "Rijksmuseum" (English website, 2024)
    4. "Nationale Kunst-Gallerij" (founding name, 1800)
    5. "Stichting Rijksmuseum" (KvK legal name, 2024)
  • 1 OrganizationReconstruction:

    • Legal name: "Stichting Rijksmuseum"
    • Legal form: STICHTING (Dutch foundation)
    • Registration: NL-KvK-41208408
    • Identifiers: ISIL NL-AmRMA, Wikidata Q190804, VIAF 148691498
  • 1 ReconstructionActivity:

    • Method: Hybrid (algorithmic + manual curation)
    • Sources: ISIL registry, Wikidata, KvK, archival documents
    • Agent: GLAM Ontology Project
  • 4 Name Entities:

    • "Rijks" → links to letterhead observation
    • "Rijksmuseum" → links to ISIL/website observations
    • "Stichting Rijksmuseum" → links to KvK observation
    • "Nationale Kunst-Gallerij" → links to historical observation (1800)

Lines: ~300 lines of detailed example with extensive annotations


🔑 Key Insights

1. Emic vs Etic Distinction

Aspect Emic (Observation) Etic (Reconstruction)
Perspective Insider ("how we call ourselves") Outsider ("what is the legal entity")
Examples "Rijks", "BnF", "Hermitage" "Stichting Rijksmuseum", "Établissement public Bibliothèque nationale de France"
Source Letterheads, websites, vernacular usage Legal registries (KvK, Companies House, etc.)
Stability Variable (nicknames change over time) Stable (legal name persists until formal change)
Multiplicity Many observations → one entity One entity ← many observations

2. Name Entity Integration

CRITICAL: Names link to observations, NOT reconstructions!

# ✅ CORRECT
heritage:Name "Rijks"
  refers_to_organization_observation  OrganizationObservation (letterhead)
     prov:wasDerivedFrom  OrganizationReconstruction (Stichting Rijksmuseum)

# ❌ WRONG
heritage:Name "Rijks"
  refers_to_organization  OrganizationReconstruction (Stichting Rijksmuseum)

Rationale: Names are emic references (how organizations are referred to in sources), not formal entity identifiers. The chain is:

Name (nominal reference)
  ↓ refers_to_organization_observation
OrganizationObservation (emic, source-based)
  ↓ prov:wasDerivedFrom
OrganizationReconstruction (etic, legal entity)

Important distinction:

  • Legal form (e.g., "Stichting") = Part of OrganizationReconstruction.legal_form
  • Emic name (e.g., "Rijks") = Part of OrganizationObservation.observed_name

These are DIFFERENT concepts:

  • "Stichting Rijksmuseum" is the legal name (etic, formal)
  • "Rijks" is the vernacular name (emic, informal)
  • Both refer to the same entity, but from different perspectives

4. Temporal Name Changes

Organizations change names over time:

  • 1800: "Nationale Kunst-Gallerij" (founding)
  • 1808: "'s Rijks Museum" (rename)
  • 2024: "Rijks" (vernacular), "Stichting Rijksmuseum" (legal)

Solution:

  • Create separate OrganizationObservation for each historical name
  • Use valid_from/valid_to on Name entities to track temporal validity
  • Use replaces/replaced_by properties for name succession chains
  • OrganizationReconstruction remains stable entity across name changes

5. Provenance Chain

Every OrganizationReconstruction MUST document:

  1. Source observations: prov:wasDerivedFromOrganizationObservation(s)
  2. Creation activity: prov:wasGeneratedByReconstructionActivity
  3. Responsible agent: Activity links to Agent (person/organization/software)
  4. Method justification: Activity includes rationale for entity resolution

This provides full transparency in how entities are inferred from observations.


🎯 Design Patterns Established

Pattern 1: Multiple Observations → Single Entity

# Many observations (emic names)
observations:
  - "Rijks" (vernacular)
  - "Rijksmuseum Amsterdam" (ISIL registry)
  - "Rijksmuseum" (website)
  - "Stichting Rijksmuseum" (KvK legal)

# Derive single entity (etic legal form)
reconstruction:
  legal_name: "Stichting Rijksmuseum"
  was_derived_from: [all observations above]

Pattern 2: Name → Observation → Entity Chain

# Step 1: Name (nominal reference)
Name:
  prefLabel: "Rijks"
  refers_to_organization_observation: obs-letterhead-2015

# Step 2: Observation (emic, source-based)
OrganizationObservation:
  id: obs-letterhead-2015
  observed_name: "Rijks"
  source: letterhead.pdf
  derived_from_entity: org-rijksmuseum

# Step 3: Entity (etic, legal form)
OrganizationReconstruction:
  id: org-rijksmuseum
  legal_name: "Stichting Rijksmuseum"
  legal_form: STICHTING

Pattern 3: Confidence Scoring

OrganizationObservation:
  observed_name: "Rijks"
  source: letterhead.pdf
  confidence_score: 0.98  # High confidence (authoritative source)
  
OrganizationObservation:
  observed_name: "Nationale Kunst-Gallerij"
  source: archival-decree-1800.pdf
  confidence_score: 0.95  # Slightly lower (historical interpretation required)
legal_form: STICHTING          # Dutch foundation
legal_form: NGO                # Non-governmental organization
legal_form: GOVERNMENT_AGENCY  # Government department
legal_form: ASSOCIATION        # Vereniging
legal_form: LIMITED_COMPANY    # BV, Ltd, etc.

🔬 Ontology Alignments

PiCo (Persons in Context)

PiCo Class Heritage Equivalent Purpose
pico:Person heritage:Organization Abstract base class
pico:PersonObservation heritage:OrganizationObservation Emic references
pico:PersonReconstruction heritage:OrganizationReconstruction Etic entities
prov:Activity heritage:ReconstructionActivity Entity resolution process
prov:Agent heritage:Agent Responsible curator/software

PROV-O (Provenance Ontology)

  • prov:Entity - Base class for Organization
  • prov:hadPrimarySource - Links observation to source document
  • prov:wasDerivedFrom - Links reconstruction to observations
  • prov:wasGeneratedBy - Links reconstruction to activity
  • prov:wasAssociatedWith - Links activity to agent
  • prov:wasRevisionOf - Links updated reconstruction to previous version

CPOV (Core Public Organisation Vocabulary)

  • cpov:legalName - Official legal name in reconstruction
  • cpov:identifier - Formal identifiers (KvK, ISIL, etc.)
  • cpov:PublicOrganisation - Class URI for government agencies

W3C ORG (Organization Ontology)

  • org:classification - Legal form of organization
  • org:subOrganizationOf - Parent organization hierarchy

📊 Comparison: Before vs After

Aspect Before (Session Start) After (PiCo Integration)
Name modeling Single Name class links to entities Name links to observations, not entities
Organization types Single Organization class Two classes: Observation + Reconstruction
Emic/Etic Not distinguished Explicitly modeled (observation vs reconstruction)
Legal forms Undefined Enumerated (STICHTING, NGO, etc.)
Provenance Basic source tracking Full PROV-O chain with activities
Temporal names Unclear Explicit temporal validity + succession
Confidence None Observation-level confidence scores
Source linking Optional Required (prov:hadPrimarySource)

🚀 Next Steps (Updated)

Immediate (Session 3 - HIGH PRIORITY)

  1. Update Name Entity Schema (01_name_entity.yaml)

    • Change refers_to_organization to refers_to_organization_observation
    • Range: OrganizationObservation (not OrganizationReconstruction)
    • Update documentation to explain observation → reconstruction chain
  2. Create Diagrams for Observation-Reconstruction Pattern

    • Mermaid diagram: Class relationships
    • PlantUML diagram: Full UML 2.5 with annotations
    • TypeQL schema: TypeDB implementation with reasoning rules
    • RDF/OWL ontology: Turtle serialization with SHACL constraints
  3. Extract Hypernym Taxonomy (unchanged from previous plan)

    • Parse hyponyms_curated.yaml for unique hypernyms
    • Map hypernyms to OrganizationObservation types (building, museum, archive, etc.)

Medium-Term (This Week)

  1. Create Place Entity Module (03_place_entity.yaml)

    • Physical locations (sites, buildings)
    • Temporal validity (construction → demolition)
    • Link to OrganizationObservation (organizations occupy places)
  2. Create Collection Entity Module (04_collection_entity.yaml)

    • Heritage materials (archival, museum, library collections)
    • Accession/deaccession tracking
    • Custody relationships (which organization holds which collection)
  3. Batch Conversion Script for Wikidata Entities

    • Input: hyponyms_curated_full.yaml (2,453 entities)
    • Output: OrganizationObservation instances
    • Logic: Infer observation type from Wikidata entity type (Q33506 museum → museum observation)

📝 Documentation Updates Needed

  1. Update schemas/20251121/README.md

    • Add section on "Observation vs Reconstruction Pattern"
    • Explain emic/etic distinction
    • Add Rijksmuseum example walkthrough
  2. Create docs/OBSERVATION_RECONSTRUCTION_PATTERN.md

    • Comprehensive guide to the pattern
    • Use cases and anti-patterns
    • Comparison with PiCo
    • Implementation examples in all 4 formats (LinkML, Mermaid, PlantUML, TypeQL, RDF)
  3. Update AGENTS.md

    • Add instructions for extracting observations from sources
    • Distinguish observation extraction (emic) from entity resolution (etic)
    • Provide prompts for confidence score assignment

🎓 Key Learnings

1. Domain Experts Know Best

PiCo developers (CBG|Center for Family History, NIOD, IISH) spent years refining the observation/reconstruction distinction for historical person data. Reusing their pattern saves us from reinventing the wheel and ensures alignment with established heritage informatics practices.

2. Emic/Etic is Fundamental

The emic (insider) vs etic (outsider) distinction from anthropology is fundamental to heritage data modeling:

  • Emic: How organizations refer to themselves (vernacular, culturally specific)
  • Etic: How authorities classify organizations (legal, internationally standardized)

Both perspectives are equally valid and must coexist in the ontology.

3. Names Are NOT Entities

Critical insight: Names are appellations (CIDOC-CRM E41_Appellation), not entities. They:

  • Reference observations (how things are called)
  • Do NOT directly reference entities (what things are)
  • Have temporal validity (names change over time)
  • Are culturally/linguistically specific

4. Provenance is Mandatory

Every entity reconstruction MUST document:

  • Which observations it derives from (prov:wasDerivedFrom)
  • How it was created (prov:wasGeneratedBy)
  • Who created it (prov:wasAssociatedWith)
  • Why decisions were made (justification)

Without provenance, reconstructions are unverifiable and untrustworthy.


Session Status

Status: COMPLETE
Major Achievement: Integrated PiCo observation/reconstruction pattern into heritage organization ontology
Files Created: 2 (schema + example)
Lines Written: ~950 lines
Design Patterns Established: 4 (multi-observation → entity, name chain, confidence scoring, legal form enumeration)

Next Session Focus: Create diagrams + update Name entity schema + extract hypernym taxonomy


📚 References


Session End Time: 2025-11-21 (active)
Total Session Duration: ~2 hours
Collaboration: User + AI (iterative refinement based on domain expert input)