- Introduced custodian_hub_v3.mmd, custodian_hub_v4_final.mmd, and custodian_hub_v5_FINAL.mmd for Mermaid representation. - Created custodian_hub_FINAL.puml and custodian_hub_v3.puml for PlantUML representation. - Defined entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument. - Established relationships and associations between entities, including temporal extents, observations, and reconstruction activities. - Incorporated enumerations for various types, statuses, and classifications relevant to custodians and their activities.
16 KiB
Session Summary - 2025-11-21: Observation vs Reconstruction Pattern
Date: November 21, 2025
Focus: Integrating PiCo pattern for emic/etic distinction in heritage organization modeling
Status: ✅ COMPLETE - Major design revision incorporating PiCo insights
🔄 Major Design Revision: Observation vs Reconstruction
Critical Insight from User
User pointed out that emic names (self-references by organizations) and etic spellings/abbreviations/translations should be distinguished from formal legal entities.
Referenced PiCo (Persons in Context) ontology (data/ontology/pico.ttl) which uses:
pico:PersonObservation- Person as recorded in source (emic, vernacular)pico:PersonReconstruction- Person entity inferred from observations (etic, formal)
This pattern perfectly matches our heritage organization needs!
📋 What Changed
Before (Session Start)
Single Name entity with direct links to Place/Organization/Collection entities:
heritage:Name
refers_to_organization → heritage:Organization # ❌ Too simplistic
Problem: Didn't distinguish between:
- Emic names (insider perspective: "Rijks", "BnF", vernacular abbreviations)
- Etic entities (outsider perspective: "Stichting Rijksmuseum", legal forms)
After (PiCo Pattern Integration)
Two-level structure with observation → reconstruction chain:
# LEVEL 1: Observation (emic, source-based)
heritage:OrganizationObservation
- observed_name: "Rijks" # Vernacular abbreviation
- source: letterhead document
- prov:wasDerivedFrom → OrganizationReconstruction
# LEVEL 2: Reconstruction (etic, legal entity)
heritage:OrganizationReconstruction
- legal_name: "Stichting Rijksmuseum" # Official legal name
- legal_form: STICHTING
- registration_number: "NL-KvK-41208408"
- prov:wasDerivedFrom → OrganizationObservation(s)
# NAME ENTITIES: Link to observations, NOT reconstructions
heritage:Name
refers_to_organization_observation → heritage:OrganizationObservation # ✅ Correct!
Key Change: Names link to observations (emic references), not entities (etic legal forms).
📂 Files Created
1. LinkML Schema: Observation-Reconstruction Pattern
File: schemas/20251121/linkml/02_organization_observation_reconstruction.yaml
Content:
- 3 main classes:
Organization(abstract base)OrganizationObservation(emic, source-based references)OrganizationReconstruction(etic, legal entities)
- 2 provenance classes:
ReconstructionActivity(entity resolution process)Agent(responsible curator/software)
- 4 enums:
LegalFormEnum(STICHTING, NGO, GOVERNMENT_AGENCY, etc.)LegalStatusEnum(ACTIVE, DISSOLVED, MERGED, etc.)ReconstructionActivityTypeEnum(MANUAL_CURATION, ALGORITHMIC_MATCHING, HYBRID)AgentTypeEnum(PERSON, ORGANIZATION, SOFTWARE)
Lines: ~650 lines of comprehensive LinkML schema
Key Design Patterns:
- Required provenance:
prov:hadPrimarySourcefor observations,prov:wasDerivedFromfor reconstructions - Confidence scoring: Observations include 0.0-1.0 confidence scores
- Temporal tracking:
valid_from/valid_tofor historical name changes - Multi-observation → single entity: Many observations can derive from one reconstruction
2. Example: Rijksmuseum Case Study
File: schemas/20251121/examples/rijksmuseum_observation_reconstruction.yaml
Content:
-
5 OrganizationObservations:
- "Rijks" (vernacular abbreviation, letterhead, 2015)
- "Rijksmuseum Amsterdam" (ISIL registry, 2020)
- "Rijksmuseum" (English website, 2024)
- "Nationale Kunst-Gallerij" (founding name, 1800)
- "Stichting Rijksmuseum" (KvK legal name, 2024)
-
1 OrganizationReconstruction:
- Legal name: "Stichting Rijksmuseum"
- Legal form: STICHTING (Dutch foundation)
- Registration: NL-KvK-41208408
- Identifiers: ISIL NL-AmRMA, Wikidata Q190804, VIAF 148691498
-
1 ReconstructionActivity:
- Method: Hybrid (algorithmic + manual curation)
- Sources: ISIL registry, Wikidata, KvK, archival documents
- Agent: GLAM Ontology Project
-
4 Name Entities:
- "Rijks" → links to letterhead observation
- "Rijksmuseum" → links to ISIL/website observations
- "Stichting Rijksmuseum" → links to KvK observation
- "Nationale Kunst-Gallerij" → links to historical observation (1800)
Lines: ~300 lines of detailed example with extensive annotations
🔑 Key Insights
1. Emic vs Etic Distinction
| Aspect | Emic (Observation) | Etic (Reconstruction) |
|---|---|---|
| Perspective | Insider ("how we call ourselves") | Outsider ("what is the legal entity") |
| Examples | "Rijks", "BnF", "Hermitage" | "Stichting Rijksmuseum", "Établissement public Bibliothèque nationale de France" |
| Source | Letterheads, websites, vernacular usage | Legal registries (KvK, Companies House, etc.) |
| Stability | Variable (nicknames change over time) | Stable (legal name persists until formal change) |
| Multiplicity | Many observations → one entity | One entity ← many observations |
2. Name Entity Integration
CRITICAL: Names link to observations, NOT reconstructions!
# ✅ CORRECT
heritage:Name "Rijks"
refers_to_organization_observation → OrganizationObservation (letterhead)
→ prov:wasDerivedFrom → OrganizationReconstruction (Stichting Rijksmuseum)
# ❌ WRONG
heritage:Name "Rijks"
refers_to_organization → OrganizationReconstruction (Stichting Rijksmuseum)
Rationale: Names are emic references (how organizations are referred to in sources), not formal entity identifiers. The chain is:
Name (nominal reference)
↓ refers_to_organization_observation
OrganizationObservation (emic, source-based)
↓ prov:wasDerivedFrom
OrganizationReconstruction (etic, legal entity)
3. Legal Form vs Emic Name
Important distinction:
- Legal form (e.g., "Stichting") = Part of
OrganizationReconstruction.legal_form - Emic name (e.g., "Rijks") = Part of
OrganizationObservation.observed_name
These are DIFFERENT concepts:
- "Stichting Rijksmuseum" is the legal name (etic, formal)
- "Rijks" is the vernacular name (emic, informal)
- Both refer to the same entity, but from different perspectives
4. Temporal Name Changes
Organizations change names over time:
- 1800: "Nationale Kunst-Gallerij" (founding)
- 1808: "'s Rijks Museum" (rename)
- 2024: "Rijks" (vernacular), "Stichting Rijksmuseum" (legal)
Solution:
- Create separate
OrganizationObservationfor each historical name - Use
valid_from/valid_toonNameentities to track temporal validity - Use
replaces/replaced_byproperties for name succession chains OrganizationReconstructionremains stable entity across name changes
5. Provenance Chain
Every OrganizationReconstruction MUST document:
- Source observations:
prov:wasDerivedFrom→OrganizationObservation(s) - Creation activity:
prov:wasGeneratedBy→ReconstructionActivity - Responsible agent: Activity links to
Agent(person/organization/software) - Method justification: Activity includes rationale for entity resolution
This provides full transparency in how entities are inferred from observations.
🎯 Design Patterns Established
Pattern 1: Multiple Observations → Single Entity
# Many observations (emic names)
observations:
- "Rijks" (vernacular)
- "Rijksmuseum Amsterdam" (ISIL registry)
- "Rijksmuseum" (website)
- "Stichting Rijksmuseum" (KvK legal)
# Derive single entity (etic legal form)
reconstruction:
legal_name: "Stichting Rijksmuseum"
was_derived_from: [all observations above]
Pattern 2: Name → Observation → Entity Chain
# Step 1: Name (nominal reference)
Name:
prefLabel: "Rijks"
refers_to_organization_observation: obs-letterhead-2015
# Step 2: Observation (emic, source-based)
OrganizationObservation:
id: obs-letterhead-2015
observed_name: "Rijks"
source: letterhead.pdf
derived_from_entity: org-rijksmuseum
# Step 3: Entity (etic, legal form)
OrganizationReconstruction:
id: org-rijksmuseum
legal_name: "Stichting Rijksmuseum"
legal_form: STICHTING
Pattern 3: Confidence Scoring
OrganizationObservation:
observed_name: "Rijks"
source: letterhead.pdf
confidence_score: 0.98 # High confidence (authoritative source)
OrganizationObservation:
observed_name: "Nationale Kunst-Gallerij"
source: archival-decree-1800.pdf
confidence_score: 0.95 # Slightly lower (historical interpretation required)
Pattern 4: Legal Form Enumeration
legal_form: STICHTING # Dutch foundation
legal_form: NGO # Non-governmental organization
legal_form: GOVERNMENT_AGENCY # Government department
legal_form: ASSOCIATION # Vereniging
legal_form: LIMITED_COMPANY # BV, Ltd, etc.
🔬 Ontology Alignments
PiCo (Persons in Context)
| PiCo Class | Heritage Equivalent | Purpose |
|---|---|---|
pico:Person |
heritage:Organization |
Abstract base class |
pico:PersonObservation |
heritage:OrganizationObservation |
Emic references |
pico:PersonReconstruction |
heritage:OrganizationReconstruction |
Etic entities |
prov:Activity |
heritage:ReconstructionActivity |
Entity resolution process |
prov:Agent |
heritage:Agent |
Responsible curator/software |
PROV-O (Provenance Ontology)
prov:Entity- Base class for Organizationprov:hadPrimarySource- Links observation to source documentprov:wasDerivedFrom- Links reconstruction to observationsprov:wasGeneratedBy- Links reconstruction to activityprov:wasAssociatedWith- Links activity to agentprov:wasRevisionOf- Links updated reconstruction to previous version
CPOV (Core Public Organisation Vocabulary)
cpov:legalName- Official legal name in reconstructioncpov:identifier- Formal identifiers (KvK, ISIL, etc.)cpov:PublicOrganisation- Class URI for government agencies
W3C ORG (Organization Ontology)
org:classification- Legal form of organizationorg:subOrganizationOf- Parent organization hierarchy
📊 Comparison: Before vs After
| Aspect | Before (Session Start) | After (PiCo Integration) |
|---|---|---|
| Name modeling | Single Name class links to entities | Name links to observations, not entities |
| Organization types | Single Organization class | Two classes: Observation + Reconstruction |
| Emic/Etic | Not distinguished | Explicitly modeled (observation vs reconstruction) |
| Legal forms | Undefined | Enumerated (STICHTING, NGO, etc.) |
| Provenance | Basic source tracking | Full PROV-O chain with activities |
| Temporal names | Unclear | Explicit temporal validity + succession |
| Confidence | None | Observation-level confidence scores |
| Source linking | Optional | Required (prov:hadPrimarySource) |
🚀 Next Steps (Updated)
Immediate (Session 3 - HIGH PRIORITY)
-
Update Name Entity Schema (
01_name_entity.yaml)- Change
refers_to_organizationtorefers_to_organization_observation - Range:
OrganizationObservation(notOrganizationReconstruction) - Update documentation to explain observation → reconstruction chain
- Change
-
Create Diagrams for Observation-Reconstruction Pattern
- Mermaid diagram: Class relationships
- PlantUML diagram: Full UML 2.5 with annotations
- TypeQL schema: TypeDB implementation with reasoning rules
- RDF/OWL ontology: Turtle serialization with SHACL constraints
-
Extract Hypernym Taxonomy (unchanged from previous plan)
- Parse
hyponyms_curated.yamlfor unique hypernyms - Map hypernyms to OrganizationObservation types (building, museum, archive, etc.)
- Parse
Medium-Term (This Week)
-
Create Place Entity Module (
03_place_entity.yaml)- Physical locations (sites, buildings)
- Temporal validity (construction → demolition)
- Link to OrganizationObservation (organizations occupy places)
-
Create Collection Entity Module (
04_collection_entity.yaml)- Heritage materials (archival, museum, library collections)
- Accession/deaccession tracking
- Custody relationships (which organization holds which collection)
-
Batch Conversion Script for Wikidata Entities
- Input:
hyponyms_curated_full.yaml(2,453 entities) - Output: OrganizationObservation instances
- Logic: Infer observation type from Wikidata entity type (Q33506 museum → museum observation)
- Input:
📝 Documentation Updates Needed
-
Update
schemas/20251121/README.md- Add section on "Observation vs Reconstruction Pattern"
- Explain emic/etic distinction
- Add Rijksmuseum example walkthrough
-
Create
docs/OBSERVATION_RECONSTRUCTION_PATTERN.md- Comprehensive guide to the pattern
- Use cases and anti-patterns
- Comparison with PiCo
- Implementation examples in all 4 formats (LinkML, Mermaid, PlantUML, TypeQL, RDF)
-
Update
AGENTS.md- Add instructions for extracting observations from sources
- Distinguish observation extraction (emic) from entity resolution (etic)
- Provide prompts for confidence score assignment
🎓 Key Learnings
1. Domain Experts Know Best
PiCo developers (CBG|Center for Family History, NIOD, IISH) spent years refining the observation/reconstruction distinction for historical person data. Reusing their pattern saves us from reinventing the wheel and ensures alignment with established heritage informatics practices.
2. Emic/Etic is Fundamental
The emic (insider) vs etic (outsider) distinction from anthropology is fundamental to heritage data modeling:
- Emic: How organizations refer to themselves (vernacular, culturally specific)
- Etic: How authorities classify organizations (legal, internationally standardized)
Both perspectives are equally valid and must coexist in the ontology.
3. Names Are NOT Entities
Critical insight: Names are appellations (CIDOC-CRM E41_Appellation), not entities. They:
- Reference observations (how things are called)
- Do NOT directly reference entities (what things are)
- Have temporal validity (names change over time)
- Are culturally/linguistically specific
4. Provenance is Mandatory
Every entity reconstruction MUST document:
- Which observations it derives from (
prov:wasDerivedFrom) - How it was created (
prov:wasGeneratedBy) - Who created it (
prov:wasAssociatedWith) - Why decisions were made (
justification)
Without provenance, reconstructions are unverifiable and untrustworthy.
✅ Session Status
Status: ✅ COMPLETE
Major Achievement: Integrated PiCo observation/reconstruction pattern into heritage organization ontology
Files Created: 2 (schema + example)
Lines Written: ~950 lines
Design Patterns Established: 4 (multi-observation → entity, name chain, confidence scoring, legal form enumeration)
Next Session Focus: Create diagrams + update Name entity schema + extract hypernym taxonomy
📚 References
- PiCo Ontology:
data/ontology/pico.ttl(1,392 lines) - PiCo Documentation: https://personsincontext.org/
- PROV-O: https://www.w3.org/TR/prov-o/
- CIDOC-CRM E41 Appellation: http://www.cidoc-crm.org/cidoc-crm/E41_Appellation
- Emic/Etic: Pike, K. L. (1967). Language in Relation to a Unified Theory of the Structure of Human Behavior
Session End Time: 2025-11-21 (active)
Total Session Duration: ~2 hours
Collaboration: User + AI (iterative refinement based on domain expert input)