glam/SESSION_SUMMARY_20251121_OBSERVATION_RECONSTRUCTION_PATTERN.md
kempersc fa5680f0dd Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats
- Introduced custodian_hub_v3.mmd, custodian_hub_v4_final.mmd, and custodian_hub_v5_FINAL.mmd for Mermaid representation.
- Created custodian_hub_FINAL.puml and custodian_hub_v3.puml for PlantUML representation.
- Defined entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument.
- Established relationships and associations between entities, including temporal extents, observations, and reconstruction activities.
- Incorporated enumerations for various types, statuses, and classifications relevant to custodians and their activities.
2025-11-22 14:33:51 +01:00

428 lines
16 KiB
Markdown

# Session Summary - 2025-11-21: Observation vs Reconstruction Pattern
**Date**: November 21, 2025
**Focus**: Integrating PiCo pattern for emic/etic distinction in heritage organization modeling
**Status**: ✅ COMPLETE - Major design revision incorporating PiCo insights
---
## 🔄 Major Design Revision: Observation vs Reconstruction
### Critical Insight from User
User pointed out that **emic names** (self-references by organizations) and **etic spellings/abbreviations/translations** should be distinguished from **formal legal entities**.
Referenced **PiCo (Persons in Context) ontology** (`data/ontology/pico.ttl`) which uses:
- `pico:PersonObservation` - Person as recorded in source (emic, vernacular)
- `pico:PersonReconstruction` - Person entity inferred from observations (etic, formal)
This pattern **perfectly matches** our heritage organization needs!
---
## 📋 What Changed
### Before (Session Start)
**Single Name entity** with direct links to Place/Organization/Collection entities:
```turtle
heritage:Name
refers_to_organization heritage:Organization # ❌ Too simplistic
```
**Problem**: Didn't distinguish between:
- Emic names (insider perspective: "Rijks", "BnF", vernacular abbreviations)
- Etic entities (outsider perspective: "Stichting Rijksmuseum", legal forms)
### After (PiCo Pattern Integration)
**Two-level structure** with observation → reconstruction chain:
```turtle
# LEVEL 1: Observation (emic, source-based)
heritage:OrganizationObservation
- observed_name: "Rijks" # Vernacular abbreviation
- source: letterhead document
- prov:wasDerivedFrom OrganizationReconstruction
# LEVEL 2: Reconstruction (etic, legal entity)
heritage:OrganizationReconstruction
- legal_name: "Stichting Rijksmuseum" # Official legal name
- legal_form: STICHTING
- registration_number: "NL-KvK-41208408"
- prov:wasDerivedFrom OrganizationObservation(s)
# NAME ENTITIES: Link to observations, NOT reconstructions
heritage:Name
refers_to_organization_observation heritage:OrganizationObservation # ✅ Correct!
```
**Key Change**: Names link to **observations** (emic references), not **entities** (etic legal forms).
---
## 📂 Files Created
### 1. LinkML Schema: Observation-Reconstruction Pattern
**File**: `schemas/20251121/linkml/02_organization_observation_reconstruction.yaml`
**Content**:
- 3 main classes:
- `Organization` (abstract base)
- `OrganizationObservation` (emic, source-based references)
- `OrganizationReconstruction` (etic, legal entities)
- 2 provenance classes:
- `ReconstructionActivity` (entity resolution process)
- `Agent` (responsible curator/software)
- 4 enums:
- `LegalFormEnum` (STICHTING, NGO, GOVERNMENT_AGENCY, etc.)
- `LegalStatusEnum` (ACTIVE, DISSOLVED, MERGED, etc.)
- `ReconstructionActivityTypeEnum` (MANUAL_CURATION, ALGORITHMIC_MATCHING, HYBRID)
- `AgentTypeEnum` (PERSON, ORGANIZATION, SOFTWARE)
**Lines**: ~650 lines of comprehensive LinkML schema
**Key Design Patterns**:
- Required provenance: `prov:hadPrimarySource` for observations, `prov:wasDerivedFrom` for reconstructions
- Confidence scoring: Observations include 0.0-1.0 confidence scores
- Temporal tracking: `valid_from`/`valid_to` for historical name changes
- Multi-observation → single entity: Many observations can derive from one reconstruction
### 2. Example: Rijksmuseum Case Study
**File**: `schemas/20251121/examples/rijksmuseum_observation_reconstruction.yaml`
**Content**:
- **5 OrganizationObservations**:
1. "Rijks" (vernacular abbreviation, letterhead, 2015)
2. "Rijksmuseum Amsterdam" (ISIL registry, 2020)
3. "Rijksmuseum" (English website, 2024)
4. "Nationale Kunst-Gallerij" (founding name, 1800)
5. "Stichting Rijksmuseum" (KvK legal name, 2024)
- **1 OrganizationReconstruction**:
- Legal name: "Stichting Rijksmuseum"
- Legal form: STICHTING (Dutch foundation)
- Registration: NL-KvK-41208408
- Identifiers: ISIL NL-AmRMA, Wikidata Q190804, VIAF 148691498
- **1 ReconstructionActivity**:
- Method: Hybrid (algorithmic + manual curation)
- Sources: ISIL registry, Wikidata, KvK, archival documents
- Agent: GLAM Ontology Project
- **4 Name Entities**:
- "Rijks" → links to letterhead observation
- "Rijksmuseum" → links to ISIL/website observations
- "Stichting Rijksmuseum" → links to KvK observation
- "Nationale Kunst-Gallerij" → links to historical observation (1800)
**Lines**: ~300 lines of detailed example with extensive annotations
---
## 🔑 Key Insights
### 1. Emic vs Etic Distinction
| Aspect | Emic (Observation) | Etic (Reconstruction) |
|--------|-------------------|----------------------|
| **Perspective** | Insider ("how we call ourselves") | Outsider ("what is the legal entity") |
| **Examples** | "Rijks", "BnF", "Hermitage" | "Stichting Rijksmuseum", "Établissement public Bibliothèque nationale de France" |
| **Source** | Letterheads, websites, vernacular usage | Legal registries (KvK, Companies House, etc.) |
| **Stability** | Variable (nicknames change over time) | Stable (legal name persists until formal change) |
| **Multiplicity** | Many observations → one entity | One entity ← many observations |
### 2. Name Entity Integration
**CRITICAL**: Names link to **observations**, NOT **reconstructions**!
```turtle
# ✅ CORRECT
heritage:Name "Rijks"
refers_to_organization_observation OrganizationObservation (letterhead)
prov:wasDerivedFrom OrganizationReconstruction (Stichting Rijksmuseum)
# ❌ WRONG
heritage:Name "Rijks"
refers_to_organization OrganizationReconstruction (Stichting Rijksmuseum)
```
**Rationale**: Names are **emic references** (how organizations are referred to in sources), not formal entity identifiers. The chain is:
```
Name (nominal reference)
↓ refers_to_organization_observation
OrganizationObservation (emic, source-based)
↓ prov:wasDerivedFrom
OrganizationReconstruction (etic, legal entity)
```
### 3. Legal Form vs Emic Name
**Important distinction**:
- **Legal form** (e.g., "Stichting") = Part of `OrganizationReconstruction.legal_form`
- **Emic name** (e.g., "Rijks") = Part of `OrganizationObservation.observed_name`
These are **DIFFERENT concepts**:
- "Stichting Rijksmuseum" is the **legal name** (etic, formal)
- "Rijks" is the **vernacular name** (emic, informal)
- Both refer to the same **entity**, but from different perspectives
### 4. Temporal Name Changes
Organizations change names over time:
- 1800: "Nationale Kunst-Gallerij" (founding)
- 1808: "'s Rijks Museum" (rename)
- 2024: "Rijks" (vernacular), "Stichting Rijksmuseum" (legal)
**Solution**:
- Create separate `OrganizationObservation` for each historical name
- Use `valid_from`/`valid_to` on `Name` entities to track temporal validity
- Use `replaces`/`replaced_by` properties for name succession chains
- `OrganizationReconstruction` remains **stable entity** across name changes
### 5. Provenance Chain
Every `OrganizationReconstruction` MUST document:
1. **Source observations**: `prov:wasDerivedFrom``OrganizationObservation(s)`
2. **Creation activity**: `prov:wasGeneratedBy``ReconstructionActivity`
3. **Responsible agent**: Activity links to `Agent` (person/organization/software)
4. **Method justification**: Activity includes rationale for entity resolution
This provides **full transparency** in how entities are inferred from observations.
---
## 🎯 Design Patterns Established
### Pattern 1: Multiple Observations → Single Entity
```yaml
# Many observations (emic names)
observations:
- "Rijks" (vernacular)
- "Rijksmuseum Amsterdam" (ISIL registry)
- "Rijksmuseum" (website)
- "Stichting Rijksmuseum" (KvK legal)
# Derive single entity (etic legal form)
reconstruction:
legal_name: "Stichting Rijksmuseum"
was_derived_from: [all observations above]
```
### Pattern 2: Name → Observation → Entity Chain
```yaml
# Step 1: Name (nominal reference)
Name:
prefLabel: "Rijks"
refers_to_organization_observation: obs-letterhead-2015
# Step 2: Observation (emic, source-based)
OrganizationObservation:
id: obs-letterhead-2015
observed_name: "Rijks"
source: letterhead.pdf
derived_from_entity: org-rijksmuseum
# Step 3: Entity (etic, legal form)
OrganizationReconstruction:
id: org-rijksmuseum
legal_name: "Stichting Rijksmuseum"
legal_form: STICHTING
```
### Pattern 3: Confidence Scoring
```yaml
OrganizationObservation:
observed_name: "Rijks"
source: letterhead.pdf
confidence_score: 0.98 # High confidence (authoritative source)
OrganizationObservation:
observed_name: "Nationale Kunst-Gallerij"
source: archival-decree-1800.pdf
confidence_score: 0.95 # Slightly lower (historical interpretation required)
```
### Pattern 4: Legal Form Enumeration
```yaml
legal_form: STICHTING # Dutch foundation
legal_form: NGO # Non-governmental organization
legal_form: GOVERNMENT_AGENCY # Government department
legal_form: ASSOCIATION # Vereniging
legal_form: LIMITED_COMPANY # BV, Ltd, etc.
```
---
## 🔬 Ontology Alignments
### PiCo (Persons in Context)
| PiCo Class | Heritage Equivalent | Purpose |
|-----------|-------------------|---------|
| `pico:Person` | `heritage:Organization` | Abstract base class |
| `pico:PersonObservation` | `heritage:OrganizationObservation` | Emic references |
| `pico:PersonReconstruction` | `heritage:OrganizationReconstruction` | Etic entities |
| `prov:Activity` | `heritage:ReconstructionActivity` | Entity resolution process |
| `prov:Agent` | `heritage:Agent` | Responsible curator/software |
### PROV-O (Provenance Ontology)
- `prov:Entity` - Base class for Organization
- `prov:hadPrimarySource` - Links observation to source document
- `prov:wasDerivedFrom` - Links reconstruction to observations
- `prov:wasGeneratedBy` - Links reconstruction to activity
- `prov:wasAssociatedWith` - Links activity to agent
- `prov:wasRevisionOf` - Links updated reconstruction to previous version
### CPOV (Core Public Organisation Vocabulary)
- `cpov:legalName` - Official legal name in reconstruction
- `cpov:identifier` - Formal identifiers (KvK, ISIL, etc.)
- `cpov:PublicOrganisation` - Class URI for government agencies
### W3C ORG (Organization Ontology)
- `org:classification` - Legal form of organization
- `org:subOrganizationOf` - Parent organization hierarchy
---
## 📊 Comparison: Before vs After
| Aspect | Before (Session Start) | After (PiCo Integration) |
|--------|----------------------|-------------------------|
| **Name modeling** | Single Name class links to entities | Name links to observations, not entities |
| **Organization types** | Single Organization class | Two classes: Observation + Reconstruction |
| **Emic/Etic** | Not distinguished | Explicitly modeled (observation vs reconstruction) |
| **Legal forms** | Undefined | Enumerated (STICHTING, NGO, etc.) |
| **Provenance** | Basic source tracking | Full PROV-O chain with activities |
| **Temporal names** | Unclear | Explicit temporal validity + succession |
| **Confidence** | None | Observation-level confidence scores |
| **Source linking** | Optional | Required (`prov:hadPrimarySource`) |
---
## 🚀 Next Steps (Updated)
### Immediate (Session 3 - HIGH PRIORITY)
1. **Update Name Entity Schema** (`01_name_entity.yaml`)
- Change `refers_to_organization` to `refers_to_organization_observation`
- Range: `OrganizationObservation` (not `OrganizationReconstruction`)
- Update documentation to explain observation → reconstruction chain
2. **Create Diagrams** for Observation-Reconstruction Pattern
- **Mermaid diagram**: Class relationships
- **PlantUML diagram**: Full UML 2.5 with annotations
- **TypeQL schema**: TypeDB implementation with reasoning rules
- **RDF/OWL ontology**: Turtle serialization with SHACL constraints
3. **Extract Hypernym Taxonomy** (unchanged from previous plan)
- Parse `hyponyms_curated.yaml` for unique hypernyms
- Map hypernyms to OrganizationObservation types (building, museum, archive, etc.)
### Medium-Term (This Week)
4. **Create Place Entity Module** (`03_place_entity.yaml`)
- Physical locations (sites, buildings)
- Temporal validity (construction → demolition)
- Link to OrganizationObservation (organizations occupy places)
5. **Create Collection Entity Module** (`04_collection_entity.yaml`)
- Heritage materials (archival, museum, library collections)
- Accession/deaccession tracking
- Custody relationships (which organization holds which collection)
6. **Batch Conversion Script** for Wikidata Entities
- Input: `hyponyms_curated_full.yaml` (2,453 entities)
- Output: OrganizationObservation instances
- Logic: Infer observation type from Wikidata entity type (Q33506 museum → museum observation)
---
## 📝 Documentation Updates Needed
1. **Update `schemas/20251121/README.md`**
- Add section on "Observation vs Reconstruction Pattern"
- Explain emic/etic distinction
- Add Rijksmuseum example walkthrough
2. **Create `docs/OBSERVATION_RECONSTRUCTION_PATTERN.md`**
- Comprehensive guide to the pattern
- Use cases and anti-patterns
- Comparison with PiCo
- Implementation examples in all 4 formats (LinkML, Mermaid, PlantUML, TypeQL, RDF)
3. **Update `AGENTS.md`**
- Add instructions for extracting observations from sources
- Distinguish observation extraction (emic) from entity resolution (etic)
- Provide prompts for confidence score assignment
---
## 🎓 Key Learnings
### 1. Domain Experts Know Best
PiCo developers (CBG|Center for Family History, NIOD, IISH) spent years refining the observation/reconstruction distinction for historical person data. **Reusing their pattern** saves us from reinventing the wheel and ensures alignment with established heritage informatics practices.
### 2. Emic/Etic is Fundamental
The emic (insider) vs etic (outsider) distinction from anthropology is **fundamental** to heritage data modeling:
- Emic: How organizations refer to themselves (vernacular, culturally specific)
- Etic: How authorities classify organizations (legal, internationally standardized)
Both perspectives are **equally valid** and must coexist in the ontology.
### 3. Names Are NOT Entities
**Critical insight**: Names are **appellations** (CIDOC-CRM E41_Appellation), not entities. They:
- Reference observations (how things are called)
- Do NOT directly reference entities (what things are)
- Have temporal validity (names change over time)
- Are culturally/linguistically specific
### 4. Provenance is Mandatory
Every entity reconstruction MUST document:
- Which observations it derives from (`prov:wasDerivedFrom`)
- How it was created (`prov:wasGeneratedBy`)
- Who created it (`prov:wasAssociatedWith`)
- Why decisions were made (`justification`)
Without provenance, reconstructions are **unverifiable** and **untrustworthy**.
---
## ✅ Session Status
**Status**: ✅ COMPLETE
**Major Achievement**: Integrated PiCo observation/reconstruction pattern into heritage organization ontology
**Files Created**: 2 (schema + example)
**Lines Written**: ~950 lines
**Design Patterns Established**: 4 (multi-observation → entity, name chain, confidence scoring, legal form enumeration)
**Next Session Focus**: Create diagrams + update Name entity schema + extract hypernym taxonomy
---
## 📚 References
- **PiCo Ontology**: `data/ontology/pico.ttl` (1,392 lines)
- **PiCo Documentation**: https://personsincontext.org/
- **PROV-O**: https://www.w3.org/TR/prov-o/
- **CIDOC-CRM E41 Appellation**: http://www.cidoc-crm.org/cidoc-crm/E41_Appellation
- **Emic/Etic**: Pike, K. L. (1967). *Language in Relation to a Unified Theory of the Structure of Human Behavior*
---
**Session End Time**: 2025-11-21 (active)
**Total Session Duration**: ~2 hours
**Collaboration**: User + AI (iterative refinement based on domain expert input)