- Introduced custodian_hub_v3.mmd, custodian_hub_v4_final.mmd, and custodian_hub_v5_FINAL.mmd for Mermaid representation. - Created custodian_hub_FINAL.puml and custodian_hub_v3.puml for PlantUML representation. - Defined entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument. - Established relationships and associations between entities, including temporal extents, observations, and reconstruction activities. - Incorporated enumerations for various types, statuses, and classifications relevant to custodians and their activities.
428 lines
16 KiB
Markdown
428 lines
16 KiB
Markdown
# Session Summary - 2025-11-21: Observation vs Reconstruction Pattern
|
|
|
|
**Date**: November 21, 2025
|
|
**Focus**: Integrating PiCo pattern for emic/etic distinction in heritage organization modeling
|
|
**Status**: ✅ COMPLETE - Major design revision incorporating PiCo insights
|
|
|
|
---
|
|
|
|
## 🔄 Major Design Revision: Observation vs Reconstruction
|
|
|
|
### Critical Insight from User
|
|
|
|
User pointed out that **emic names** (self-references by organizations) and **etic spellings/abbreviations/translations** should be distinguished from **formal legal entities**.
|
|
|
|
Referenced **PiCo (Persons in Context) ontology** (`data/ontology/pico.ttl`) which uses:
|
|
- `pico:PersonObservation` - Person as recorded in source (emic, vernacular)
|
|
- `pico:PersonReconstruction` - Person entity inferred from observations (etic, formal)
|
|
|
|
This pattern **perfectly matches** our heritage organization needs!
|
|
|
|
---
|
|
|
|
## 📋 What Changed
|
|
|
|
### Before (Session Start)
|
|
|
|
**Single Name entity** with direct links to Place/Organization/Collection entities:
|
|
```turtle
|
|
heritage:Name
|
|
refers_to_organization → heritage:Organization # ❌ Too simplistic
|
|
```
|
|
|
|
**Problem**: Didn't distinguish between:
|
|
- Emic names (insider perspective: "Rijks", "BnF", vernacular abbreviations)
|
|
- Etic entities (outsider perspective: "Stichting Rijksmuseum", legal forms)
|
|
|
|
### After (PiCo Pattern Integration)
|
|
|
|
**Two-level structure** with observation → reconstruction chain:
|
|
```turtle
|
|
# LEVEL 1: Observation (emic, source-based)
|
|
heritage:OrganizationObservation
|
|
- observed_name: "Rijks" # Vernacular abbreviation
|
|
- source: letterhead document
|
|
- prov:wasDerivedFrom → OrganizationReconstruction
|
|
|
|
# LEVEL 2: Reconstruction (etic, legal entity)
|
|
heritage:OrganizationReconstruction
|
|
- legal_name: "Stichting Rijksmuseum" # Official legal name
|
|
- legal_form: STICHTING
|
|
- registration_number: "NL-KvK-41208408"
|
|
- prov:wasDerivedFrom → OrganizationObservation(s)
|
|
|
|
# NAME ENTITIES: Link to observations, NOT reconstructions
|
|
heritage:Name
|
|
refers_to_organization_observation → heritage:OrganizationObservation # ✅ Correct!
|
|
```
|
|
|
|
**Key Change**: Names link to **observations** (emic references), not **entities** (etic legal forms).
|
|
|
|
---
|
|
|
|
## 📂 Files Created
|
|
|
|
### 1. LinkML Schema: Observation-Reconstruction Pattern
|
|
|
|
**File**: `schemas/20251121/linkml/02_organization_observation_reconstruction.yaml`
|
|
|
|
**Content**:
|
|
- 3 main classes:
|
|
- `Organization` (abstract base)
|
|
- `OrganizationObservation` (emic, source-based references)
|
|
- `OrganizationReconstruction` (etic, legal entities)
|
|
- 2 provenance classes:
|
|
- `ReconstructionActivity` (entity resolution process)
|
|
- `Agent` (responsible curator/software)
|
|
- 4 enums:
|
|
- `LegalFormEnum` (STICHTING, NGO, GOVERNMENT_AGENCY, etc.)
|
|
- `LegalStatusEnum` (ACTIVE, DISSOLVED, MERGED, etc.)
|
|
- `ReconstructionActivityTypeEnum` (MANUAL_CURATION, ALGORITHMIC_MATCHING, HYBRID)
|
|
- `AgentTypeEnum` (PERSON, ORGANIZATION, SOFTWARE)
|
|
|
|
**Lines**: ~650 lines of comprehensive LinkML schema
|
|
|
|
**Key Design Patterns**:
|
|
- Required provenance: `prov:hadPrimarySource` for observations, `prov:wasDerivedFrom` for reconstructions
|
|
- Confidence scoring: Observations include 0.0-1.0 confidence scores
|
|
- Temporal tracking: `valid_from`/`valid_to` for historical name changes
|
|
- Multi-observation → single entity: Many observations can derive from one reconstruction
|
|
|
|
### 2. Example: Rijksmuseum Case Study
|
|
|
|
**File**: `schemas/20251121/examples/rijksmuseum_observation_reconstruction.yaml`
|
|
|
|
**Content**:
|
|
- **5 OrganizationObservations**:
|
|
1. "Rijks" (vernacular abbreviation, letterhead, 2015)
|
|
2. "Rijksmuseum Amsterdam" (ISIL registry, 2020)
|
|
3. "Rijksmuseum" (English website, 2024)
|
|
4. "Nationale Kunst-Gallerij" (founding name, 1800)
|
|
5. "Stichting Rijksmuseum" (KvK legal name, 2024)
|
|
|
|
- **1 OrganizationReconstruction**:
|
|
- Legal name: "Stichting Rijksmuseum"
|
|
- Legal form: STICHTING (Dutch foundation)
|
|
- Registration: NL-KvK-41208408
|
|
- Identifiers: ISIL NL-AmRMA, Wikidata Q190804, VIAF 148691498
|
|
|
|
- **1 ReconstructionActivity**:
|
|
- Method: Hybrid (algorithmic + manual curation)
|
|
- Sources: ISIL registry, Wikidata, KvK, archival documents
|
|
- Agent: GLAM Ontology Project
|
|
|
|
- **4 Name Entities**:
|
|
- "Rijks" → links to letterhead observation
|
|
- "Rijksmuseum" → links to ISIL/website observations
|
|
- "Stichting Rijksmuseum" → links to KvK observation
|
|
- "Nationale Kunst-Gallerij" → links to historical observation (1800)
|
|
|
|
**Lines**: ~300 lines of detailed example with extensive annotations
|
|
|
|
---
|
|
|
|
## 🔑 Key Insights
|
|
|
|
### 1. Emic vs Etic Distinction
|
|
|
|
| Aspect | Emic (Observation) | Etic (Reconstruction) |
|
|
|--------|-------------------|----------------------|
|
|
| **Perspective** | Insider ("how we call ourselves") | Outsider ("what is the legal entity") |
|
|
| **Examples** | "Rijks", "BnF", "Hermitage" | "Stichting Rijksmuseum", "Établissement public Bibliothèque nationale de France" |
|
|
| **Source** | Letterheads, websites, vernacular usage | Legal registries (KvK, Companies House, etc.) |
|
|
| **Stability** | Variable (nicknames change over time) | Stable (legal name persists until formal change) |
|
|
| **Multiplicity** | Many observations → one entity | One entity ← many observations |
|
|
|
|
### 2. Name Entity Integration
|
|
|
|
**CRITICAL**: Names link to **observations**, NOT **reconstructions**!
|
|
|
|
```turtle
|
|
# ✅ CORRECT
|
|
heritage:Name "Rijks"
|
|
refers_to_organization_observation → OrganizationObservation (letterhead)
|
|
→ prov:wasDerivedFrom → OrganizationReconstruction (Stichting Rijksmuseum)
|
|
|
|
# ❌ WRONG
|
|
heritage:Name "Rijks"
|
|
refers_to_organization → OrganizationReconstruction (Stichting Rijksmuseum)
|
|
```
|
|
|
|
**Rationale**: Names are **emic references** (how organizations are referred to in sources), not formal entity identifiers. The chain is:
|
|
|
|
```
|
|
Name (nominal reference)
|
|
↓ refers_to_organization_observation
|
|
OrganizationObservation (emic, source-based)
|
|
↓ prov:wasDerivedFrom
|
|
OrganizationReconstruction (etic, legal entity)
|
|
```
|
|
|
|
### 3. Legal Form vs Emic Name
|
|
|
|
**Important distinction**:
|
|
- **Legal form** (e.g., "Stichting") = Part of `OrganizationReconstruction.legal_form`
|
|
- **Emic name** (e.g., "Rijks") = Part of `OrganizationObservation.observed_name`
|
|
|
|
These are **DIFFERENT concepts**:
|
|
- "Stichting Rijksmuseum" is the **legal name** (etic, formal)
|
|
- "Rijks" is the **vernacular name** (emic, informal)
|
|
- Both refer to the same **entity**, but from different perspectives
|
|
|
|
### 4. Temporal Name Changes
|
|
|
|
Organizations change names over time:
|
|
- 1800: "Nationale Kunst-Gallerij" (founding)
|
|
- 1808: "'s Rijks Museum" (rename)
|
|
- 2024: "Rijks" (vernacular), "Stichting Rijksmuseum" (legal)
|
|
|
|
**Solution**:
|
|
- Create separate `OrganizationObservation` for each historical name
|
|
- Use `valid_from`/`valid_to` on `Name` entities to track temporal validity
|
|
- Use `replaces`/`replaced_by` properties for name succession chains
|
|
- `OrganizationReconstruction` remains **stable entity** across name changes
|
|
|
|
### 5. Provenance Chain
|
|
|
|
Every `OrganizationReconstruction` MUST document:
|
|
1. **Source observations**: `prov:wasDerivedFrom` → `OrganizationObservation(s)`
|
|
2. **Creation activity**: `prov:wasGeneratedBy` → `ReconstructionActivity`
|
|
3. **Responsible agent**: Activity links to `Agent` (person/organization/software)
|
|
4. **Method justification**: Activity includes rationale for entity resolution
|
|
|
|
This provides **full transparency** in how entities are inferred from observations.
|
|
|
|
---
|
|
|
|
## 🎯 Design Patterns Established
|
|
|
|
### Pattern 1: Multiple Observations → Single Entity
|
|
|
|
```yaml
|
|
# Many observations (emic names)
|
|
observations:
|
|
- "Rijks" (vernacular)
|
|
- "Rijksmuseum Amsterdam" (ISIL registry)
|
|
- "Rijksmuseum" (website)
|
|
- "Stichting Rijksmuseum" (KvK legal)
|
|
|
|
# Derive single entity (etic legal form)
|
|
reconstruction:
|
|
legal_name: "Stichting Rijksmuseum"
|
|
was_derived_from: [all observations above]
|
|
```
|
|
|
|
### Pattern 2: Name → Observation → Entity Chain
|
|
|
|
```yaml
|
|
# Step 1: Name (nominal reference)
|
|
Name:
|
|
prefLabel: "Rijks"
|
|
refers_to_organization_observation: obs-letterhead-2015
|
|
|
|
# Step 2: Observation (emic, source-based)
|
|
OrganizationObservation:
|
|
id: obs-letterhead-2015
|
|
observed_name: "Rijks"
|
|
source: letterhead.pdf
|
|
derived_from_entity: org-rijksmuseum
|
|
|
|
# Step 3: Entity (etic, legal form)
|
|
OrganizationReconstruction:
|
|
id: org-rijksmuseum
|
|
legal_name: "Stichting Rijksmuseum"
|
|
legal_form: STICHTING
|
|
```
|
|
|
|
### Pattern 3: Confidence Scoring
|
|
|
|
```yaml
|
|
OrganizationObservation:
|
|
observed_name: "Rijks"
|
|
source: letterhead.pdf
|
|
confidence_score: 0.98 # High confidence (authoritative source)
|
|
|
|
OrganizationObservation:
|
|
observed_name: "Nationale Kunst-Gallerij"
|
|
source: archival-decree-1800.pdf
|
|
confidence_score: 0.95 # Slightly lower (historical interpretation required)
|
|
```
|
|
|
|
### Pattern 4: Legal Form Enumeration
|
|
|
|
```yaml
|
|
legal_form: STICHTING # Dutch foundation
|
|
legal_form: NGO # Non-governmental organization
|
|
legal_form: GOVERNMENT_AGENCY # Government department
|
|
legal_form: ASSOCIATION # Vereniging
|
|
legal_form: LIMITED_COMPANY # BV, Ltd, etc.
|
|
```
|
|
|
|
---
|
|
|
|
## 🔬 Ontology Alignments
|
|
|
|
### PiCo (Persons in Context)
|
|
|
|
| PiCo Class | Heritage Equivalent | Purpose |
|
|
|-----------|-------------------|---------|
|
|
| `pico:Person` | `heritage:Organization` | Abstract base class |
|
|
| `pico:PersonObservation` | `heritage:OrganizationObservation` | Emic references |
|
|
| `pico:PersonReconstruction` | `heritage:OrganizationReconstruction` | Etic entities |
|
|
| `prov:Activity` | `heritage:ReconstructionActivity` | Entity resolution process |
|
|
| `prov:Agent` | `heritage:Agent` | Responsible curator/software |
|
|
|
|
### PROV-O (Provenance Ontology)
|
|
|
|
- `prov:Entity` - Base class for Organization
|
|
- `prov:hadPrimarySource` - Links observation to source document
|
|
- `prov:wasDerivedFrom` - Links reconstruction to observations
|
|
- `prov:wasGeneratedBy` - Links reconstruction to activity
|
|
- `prov:wasAssociatedWith` - Links activity to agent
|
|
- `prov:wasRevisionOf` - Links updated reconstruction to previous version
|
|
|
|
### CPOV (Core Public Organisation Vocabulary)
|
|
|
|
- `cpov:legalName` - Official legal name in reconstruction
|
|
- `cpov:identifier` - Formal identifiers (KvK, ISIL, etc.)
|
|
- `cpov:PublicOrganisation` - Class URI for government agencies
|
|
|
|
### W3C ORG (Organization Ontology)
|
|
|
|
- `org:classification` - Legal form of organization
|
|
- `org:subOrganizationOf` - Parent organization hierarchy
|
|
|
|
---
|
|
|
|
## 📊 Comparison: Before vs After
|
|
|
|
| Aspect | Before (Session Start) | After (PiCo Integration) |
|
|
|--------|----------------------|-------------------------|
|
|
| **Name modeling** | Single Name class links to entities | Name links to observations, not entities |
|
|
| **Organization types** | Single Organization class | Two classes: Observation + Reconstruction |
|
|
| **Emic/Etic** | Not distinguished | Explicitly modeled (observation vs reconstruction) |
|
|
| **Legal forms** | Undefined | Enumerated (STICHTING, NGO, etc.) |
|
|
| **Provenance** | Basic source tracking | Full PROV-O chain with activities |
|
|
| **Temporal names** | Unclear | Explicit temporal validity + succession |
|
|
| **Confidence** | None | Observation-level confidence scores |
|
|
| **Source linking** | Optional | Required (`prov:hadPrimarySource`) |
|
|
|
|
---
|
|
|
|
## 🚀 Next Steps (Updated)
|
|
|
|
### Immediate (Session 3 - HIGH PRIORITY)
|
|
|
|
1. **Update Name Entity Schema** (`01_name_entity.yaml`)
|
|
- Change `refers_to_organization` to `refers_to_organization_observation`
|
|
- Range: `OrganizationObservation` (not `OrganizationReconstruction`)
|
|
- Update documentation to explain observation → reconstruction chain
|
|
|
|
2. **Create Diagrams** for Observation-Reconstruction Pattern
|
|
- **Mermaid diagram**: Class relationships
|
|
- **PlantUML diagram**: Full UML 2.5 with annotations
|
|
- **TypeQL schema**: TypeDB implementation with reasoning rules
|
|
- **RDF/OWL ontology**: Turtle serialization with SHACL constraints
|
|
|
|
3. **Extract Hypernym Taxonomy** (unchanged from previous plan)
|
|
- Parse `hyponyms_curated.yaml` for unique hypernyms
|
|
- Map hypernyms to OrganizationObservation types (building, museum, archive, etc.)
|
|
|
|
### Medium-Term (This Week)
|
|
|
|
4. **Create Place Entity Module** (`03_place_entity.yaml`)
|
|
- Physical locations (sites, buildings)
|
|
- Temporal validity (construction → demolition)
|
|
- Link to OrganizationObservation (organizations occupy places)
|
|
|
|
5. **Create Collection Entity Module** (`04_collection_entity.yaml`)
|
|
- Heritage materials (archival, museum, library collections)
|
|
- Accession/deaccession tracking
|
|
- Custody relationships (which organization holds which collection)
|
|
|
|
6. **Batch Conversion Script** for Wikidata Entities
|
|
- Input: `hyponyms_curated_full.yaml` (2,453 entities)
|
|
- Output: OrganizationObservation instances
|
|
- Logic: Infer observation type from Wikidata entity type (Q33506 museum → museum observation)
|
|
|
|
---
|
|
|
|
## 📝 Documentation Updates Needed
|
|
|
|
1. **Update `schemas/20251121/README.md`**
|
|
- Add section on "Observation vs Reconstruction Pattern"
|
|
- Explain emic/etic distinction
|
|
- Add Rijksmuseum example walkthrough
|
|
|
|
2. **Create `docs/OBSERVATION_RECONSTRUCTION_PATTERN.md`**
|
|
- Comprehensive guide to the pattern
|
|
- Use cases and anti-patterns
|
|
- Comparison with PiCo
|
|
- Implementation examples in all 4 formats (LinkML, Mermaid, PlantUML, TypeQL, RDF)
|
|
|
|
3. **Update `AGENTS.md`**
|
|
- Add instructions for extracting observations from sources
|
|
- Distinguish observation extraction (emic) from entity resolution (etic)
|
|
- Provide prompts for confidence score assignment
|
|
|
|
---
|
|
|
|
## 🎓 Key Learnings
|
|
|
|
### 1. Domain Experts Know Best
|
|
|
|
PiCo developers (CBG|Center for Family History, NIOD, IISH) spent years refining the observation/reconstruction distinction for historical person data. **Reusing their pattern** saves us from reinventing the wheel and ensures alignment with established heritage informatics practices.
|
|
|
|
### 2. Emic/Etic is Fundamental
|
|
|
|
The emic (insider) vs etic (outsider) distinction from anthropology is **fundamental** to heritage data modeling:
|
|
- Emic: How organizations refer to themselves (vernacular, culturally specific)
|
|
- Etic: How authorities classify organizations (legal, internationally standardized)
|
|
|
|
Both perspectives are **equally valid** and must coexist in the ontology.
|
|
|
|
### 3. Names Are NOT Entities
|
|
|
|
**Critical insight**: Names are **appellations** (CIDOC-CRM E41_Appellation), not entities. They:
|
|
- Reference observations (how things are called)
|
|
- Do NOT directly reference entities (what things are)
|
|
- Have temporal validity (names change over time)
|
|
- Are culturally/linguistically specific
|
|
|
|
### 4. Provenance is Mandatory
|
|
|
|
Every entity reconstruction MUST document:
|
|
- Which observations it derives from (`prov:wasDerivedFrom`)
|
|
- How it was created (`prov:wasGeneratedBy`)
|
|
- Who created it (`prov:wasAssociatedWith`)
|
|
- Why decisions were made (`justification`)
|
|
|
|
Without provenance, reconstructions are **unverifiable** and **untrustworthy**.
|
|
|
|
---
|
|
|
|
## ✅ Session Status
|
|
|
|
**Status**: ✅ COMPLETE
|
|
**Major Achievement**: Integrated PiCo observation/reconstruction pattern into heritage organization ontology
|
|
**Files Created**: 2 (schema + example)
|
|
**Lines Written**: ~950 lines
|
|
**Design Patterns Established**: 4 (multi-observation → entity, name chain, confidence scoring, legal form enumeration)
|
|
|
|
**Next Session Focus**: Create diagrams + update Name entity schema + extract hypernym taxonomy
|
|
|
|
---
|
|
|
|
## 📚 References
|
|
|
|
- **PiCo Ontology**: `data/ontology/pico.ttl` (1,392 lines)
|
|
- **PiCo Documentation**: https://personsincontext.org/
|
|
- **PROV-O**: https://www.w3.org/TR/prov-o/
|
|
- **CIDOC-CRM E41 Appellation**: http://www.cidoc-crm.org/cidoc-crm/E41_Appellation
|
|
- **Emic/Etic**: Pike, K. L. (1967). *Language in Relation to a Unified Theory of the Structure of Human Behavior*
|
|
|
|
---
|
|
|
|
**Session End Time**: 2025-11-21 (active)
|
|
**Total Session Duration**: ~2 hours
|
|
**Collaboration**: User + AI (iterative refinement based on domain expert input)
|