docs(person_pid): add PPID-GHCID alignment and PiCo comparison docs

This commit is contained in:
kempersc 2026-01-09 15:57:26 +01:00
parent a51c8c400c
commit 7f53ec6074
2 changed files with 1625 additions and 0 deletions

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,475 @@
# PiCo vs PPID: Comparative Analysis
**Version**: 0.1.0
**Last Updated**: 2025-01-09
**Related**: [PPID-GHCID Alignment](./10_ppid_ghcid_alignment.md) | [PiCo Ontology Analysis](./03_pico_ontology_analysis.md)
---
## 1. Executive Summary
This document compares the **PiCo (Persons in Context)** ontology developed by CBG|Centrum voor Familiegeschiedenis with our proposed **PPID (Person Persistent Identifier)** system. The analysis is based on deep research into PiCo's implementation in Open Archives (openarchieven.nl) and the WieWasWie platform.
### 1.1 Key Finding
PiCo and PPID serve **complementary purposes**:
| System | Primary Purpose | Identifier Style | Scope |
|--------|-----------------|------------------|-------|
| **PiCo** | Data model for person observations in genealogical sources | Opaque UUIDs | Genealogical records (civil registries, church books) |
| **PPID** | Persistent identifiers for heritage sector persons | Semantic geographic-temporal | Heritage custodian staff and historical figures |
**Recommendation**: PPID should **adopt PiCo's ontological distinctions** (PersonObservation vs PersonReconstruction) while using its own **semantic identifier format** aligned with GHCID conventions.
---
## 2. PiCo Architecture (From Research)
### 2.1 Core Classes
From the PiCo specification at `personsincontext.org/model`:
```
┌─────────────────────────────────────────────────────────────────┐
│ PiCo MODEL │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Person │ │
│ │ (Container class - not used directly) │ │
│ │ │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ │ │
│ │ │ PersonObservation│ │PersonReconstruction │ │
│ │ │ │ │ │ │ │
│ │ │ - Data as found │ │ - Curated identity│ │ │
│ │ │ on Source │ │ - Links multiple │ │ │
│ │ │ - hadPrimarySource │ observations │ │ │
│ │ │ - hasRole │ │ - wasDerivedFrom │ │ │
│ │ │ - hasAge │ │ - wasGeneratedBy │ │ │
│ │ │ - hasOccupation │ │ - wasRevisionOf │ │ │
│ │ └─────────────────┘ └─────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Source │ │
│ │ (schema:ArchiveComponent) │ │
│ │ - name, dateCreated, holdingArchive, associatedMedia │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ PersonName (PNV) │ │
│ │ - literalName, givenName, baseSurname, surnamePrefix │ │
│ │ - patronym, initials │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
### 2.2 PiCo Identifier Structure in Open Archives
From the Open Archives API documentation:
```
URI Format: https://www.openarchieven.nl/{3-letter-archive-code}:{uuid}[/{token}]
Examples:
- https://www.openarchieven.nl/rat:48c2b836-385f-11e0-bcd1-8edf61960649
- https://www.openarchieven.nl/elo:f5169776-db74-70a3-51e3-20c15291429c
Components:
- rat = Regionaal Archief Tilburg (3-letter archive code)
- 48c2b836-385f-11e0-bcd1-8edf61960649 = UUID of the record
- /ttl:pico = Optional token for content negotiation (Turtle + PiCo profile)
```
### 2.3 PiCo PersonObservation Example (Actual Data)
From Open Archives API response:
```turtle
@prefix oa: <https://www.openarchieven.nl/id/> .
@prefix pico: <https://personsincontext.org/model#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix sdo: <https://schema.org/> .
oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f30464-3867-11e0-bcd1-8edf61960649
a pico:PersonObservation ;
prov:hadPrimarySource oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649 ;
pico:hasRole "Moeder" ;
sdo:children oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f2ae9c-... ;
sdo:spouse oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f2da16-... ;
sdo:gender sdo:Female ;
sdo:name "Cornelia Verhulst" ;
sdo:familyName "Verhulst" ;
sdo:givenName "Cornelia" .
```
### 2.4 PiCo PersonReconstruction Example
From PiCo specification:
```turtle
cbg:person_reconstruction_2
a pico:PersonReconstruction ;
sdo:name "Anna Maria Koppen" ;
sdo:familyName "Koppen" ;
sdo:givenName "Anna" ;
sdo:gender sdo:Female ;
sdo:birthPlace "Haarlem" ;
sdo:birthDate "1860-03-31"^^xsd:date ;
sdo:deathPlace "Detroit, VSA" ;
sdo:deathDate "1926"^^xsd:gYear ;
prov:wasDerivedFrom nha:huwelijksakte_1885_321_po_1,
cbg:NL-HaCBG_1755_0341_142_po_1 ;
prov:wasGeneratedBy cbg:reconstruction_activity_01 .
```
---
## 3. Detailed Comparison
### 3.1 Identifier Format
| Aspect | PiCo (CBG/Open Archives) | PPID (Proposed) |
|--------|--------------------------|-----------------|
| **Format** | `{archive}:{uuid}` | `{TYPE}-{FC}-{FR}-{FP}-{LC}-{LR}-{LP}-{CR}-{FT}-{LT}` |
| **Example** | `rat:48c2b836-385f-11e0-bcd1-8edf61960649` | `PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG` |
| **Human Readable** | No (opaque UUID) | Yes (semantic components) |
| **Archive Prefix** | Yes (3-letter code) | No (implicit via source) |
| **Geographic** | No | Yes (birth + death locations) |
| **Temporal** | No | Yes (century range) |
| **Name** | No | Yes (first + last token) |
### 3.2 Conceptual Model
| Concept | PiCo | PPID |
|---------|------|------|
| **Raw Observation** | `PersonObservation` | Observation (separate system) |
| **Curated Identity** | `PersonReconstruction` | `PID` (promoted from `ID`) |
| **Temporary State** | Not explicit | `ID` class |
| **Permanent State** | All URIs persistent | `PID` class only |
| **Provenance** | PROV-O (wasGeneratedBy, wasDerivedFrom) | PROV-O + XPath claims |
| **Name Vocabulary** | PNV (Person Name Vocabulary) | Emic labels from sources |
### 3.3 Persistence Philosophy
| Aspect | PiCo | PPID |
|--------|------|------|
| **All identifiers persistent?** | Yes | No - only PID class |
| **Temporary identifiers?** | No explicit concept | Yes - ID class |
| **Promotion mechanism?** | N/A | ID → PID when criteria met |
| **Epistemic uncertainty?** | Implicit (multiple observations) | Explicit (ID vs PID distinction) |
| **Living persons?** | Can have PersonReconstruction | Must remain ID until death |
### 3.4 Geographic Handling
| Aspect | PiCo | PPID |
|--------|------|------|
| **In identifier?** | No | Yes |
| **In properties?** | Yes (birthPlace, deathPlace) | Also yes |
| **Format** | Free text or URI | ISO 3166-1/2 + GeoNames |
| **Historical mapping?** | Encouraged (link to thesaurus) | Required (historical → modern) |
| **Example** | `sdo:birthPlace "Haarlem"` | `...-NL-NH-HAA-...` |
### 3.5 Temporal Handling
| Aspect | PiCo | PPID |
|--------|------|------|
| **In identifier?** | No | Yes (century range) |
| **Date format** | ISO 8601 (xsd:date) | Century numbers |
| **BCE support** | Via negative years | Via negative centuries (-5--4) |
| **Precision** | Day-level possible | Century-level only in ID |
| **Example** | `sdo:birthDate "1860-03-31"^^xsd:date` | `...-19-20-...` |
---
## 4. Key Differences Explained
### 4.1 Why PiCo Uses Opaque UUIDs
PiCo's design goals (from GitHub README):
1. **Successor to A2A**: Designed to replace XML-based Archive-to-Archive standard
2. **Genealogical focus**: Primary use case is WieWasWie ancestor search
3. **Linked Data**: Interoperability via RDF, not human-readable identifiers
4. **Archive-centric**: Identifiers include archive code prefix
PiCo's UUID approach is appropriate for:
- Massive genealogical databases (millions of records)
- Automated conversion from A2A
- Machine-to-machine data exchange
### 4.2 Why PPID Uses Semantic Identifiers
PPID's design goals:
1. **GHCID alignment**: Consistent identifier philosophy across GLAM project
2. **Heritage sector focus**: Staff of heritage institutions, historical figures
3. **Human discovery**: Identifiers aid browsing and deduplication
4. **Epistemic honesty**: Explicit distinction between ID (uncertain) and PID (verified)
5. **Scholarly citation**: Identifiers can be meaningfully cited in publications
PPID's semantic approach is appropriate for:
- Smaller, curated datasets
- Human curation workflows
- Cross-system deduplication
- Scholarly reference
### 4.3 The ID/PID Distinction (Unique to PPID)
PiCo assumes all identifiers are permanent once created. PPID introduces explicit epistemic states:
```
PiCo:
PersonObservation (always permanent)
↓ prov:wasDerivedFrom
PersonReconstruction (always permanent)
PPID:
Observation (separate system, permanent)
ID (temporary, may change)
↓ promotion when criteria met
PID (permanent, never changes)
```
**Why this matters for heritage sector**:
- **Living persons**: Cannot have verified death observation → must remain ID
- **Incomplete records**: May never have enough data for PID promotion
- **Ongoing research**: Archives not yet explored → cannot claim PID status
- **Scholarly integrity**: Prevents overclaiming certainty
---
## 5. Integration Recommendations
### 5.1 Adopt PiCo Ontological Distinctions
PPID should use PiCo's class hierarchy:
```turtle
@prefix ppid: <https://ppid.org/> .
@prefix pico: <https://personsincontext.org/model#> .
# PPID extends PiCo
ppid:PersonID rdfs:subClassOf pico:PersonReconstruction .
ppid:PersonPID rdfs:subClassOf pico:PersonReconstruction .
# PPID observations link to source observations
ppid:hasSourceObservation rdfs:subPropertyOf prov:wasDerivedFrom ;
rdfs:range pico:PersonObservation .
```
### 5.2 Maintain PPID Semantic Identifier Format
Do not adopt PiCo's opaque UUID format. Keep semantic GHCID-aligned format:
```
PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG
```
**Rationale**: GHCID project-wide consistency, human discoverability, scholarly citation.
### 5.3 Use PNV for Name Properties
Adopt PiCo's use of Person Name Vocabulary for structured name data:
```turtle
ppid:PRID-... pnv:hasName [
a pnv:PersonName ;
pnv:literalName "Jan van den Berg" ;
pnv:givenName "Jan" ;
pnv:surnamePrefix "van den" ;
pnv:baseSurname "Berg"
] .
```
### 5.4 Use PROV-O for Provenance
Adopt PiCo's PROV-O patterns for reconstruction provenance:
```turtle
ppid:PID-NL-NH-AMS-...
prov:wasDerivedFrom <observation-1>, <observation-2> ;
prov:wasGeneratedBy [
a prov:Activity ;
prov:startedAtTime "2025-01-09T00:00:00"^^xsd:dateTime ;
prov:wasAssociatedWith ppid:curator-001
] .
```
### 5.5 Separate Observation Identifiers
As noted in the revised PPID design, observations use a **different identifier system**:
```
{REPOSITORY_GHCID}/{CREATOR_GHCID}/{RiC-O-PATH}
Example:
NL-NH-HAA-A-NHA/NL-NH-HAA-A-NHA/burgerlijke-stand/geboorten/1895/003/045
```
This is distinct from PiCo's `{archive}:{uuid}` but serves similar purposes.
---
## 6. Resolved Open Questions
Based on user clarifications:
### 6.1 BCE Date Handling
**Resolution**: Use negative century numbers.
```
Format: {first_century}-{last_century}
Examples:
- 5th century BCE to 4th century BCE: "-5--4"
- 1st century BCE to 1st century CE: "-1-1"
- 5th century BCE to 3rd century CE: "-5-3"
```
This aligns with ISO 8601 extended format which uses negative years for BCE dates.
### 6.2 Non-Latin Script Transliteration
**Resolution**: Apply same transliteration rules as GHCID (documented in AGENTS.md).
| Script | Standard |
|--------|----------|
| Cyrillic | ISO 9:1995 |
| Chinese | Hanyu Pinyin (ISO 7098) |
| Japanese | Modified Hepburn |
| Korean | Revised Romanization |
| Arabic | ISO 233-2/3 |
| Hebrew | ISO 259-3 |
| Greek | ISO 843 |
### 6.3 Disputed Locations
**Resolution**: Not a PPID concern - handled by ISO standardization.
When historical locations are disputed:
- Use the ISO-standardized modern location
- Document the dispute in observation metadata
- Do not encode uncertainty in the identifier itself
### 6.4 Living Persons
**Resolution**: Living persons are **always ID class** and can only be promoted to PID after death.
```python
def can_promote_to_pid(person_id: str, observations: list) -> bool:
"""
Check if ID can be promoted to PID.
Living persons can NEVER be promoted.
"""
# Check for death observation
death_obs = [o for o in observations if o.is_death_record or o.is_post_death]
if not death_obs:
# No death observation = person may be alive = cannot be PID
return False
# Continue with other promotion criteria...
return check_other_criteria(observations)
```
**Rationale**:
1. PID requires verified last observation (death)
2. Living persons have incomplete lifecycle
3. Future observations may change identity assessment
4. Privacy considerations for living individuals
---
## 7. Implementation Alignment
### 7.1 Class Mapping
| PiCo Class | PPID Equivalent | Notes |
|------------|-----------------|-------|
| `pico:Person` | (Container) | Not used directly |
| `pico:PersonObservation` | Observation (separate system) | Different identifier format |
| `pico:PersonReconstruction` | `ppid:PersonID` or `ppid:PersonPID` | Split by epistemic certainty |
| `pico:Source` | `schema:ArchiveComponent` | Same as PiCo |
| `pnv:PersonName` | `pnv:PersonName` | Adopt PNV |
### 7.2 Property Mapping
| PiCo Property | PPID Usage | Notes |
|---------------|------------|-------|
| `prov:hadPrimarySource` | Same | For observations |
| `prov:wasDerivedFrom` | Same | PRID from POIDs |
| `prov:wasGeneratedBy` | Same | Activity provenance |
| `prov:wasRevisionOf` | Same | Version history |
| `sdo:birthDate` | Same | In properties |
| `sdo:birthPlace` | Same + in identifier | Dual representation |
| `sdo:deathDate` | Same | In properties |
| `sdo:deathPlace` | Same + in identifier | Dual representation |
| `pico:hasRole` | Same | For observations |
| `pico:hasAge` | Same | When birthDate unknown |
### 7.3 Namespace Declarations
```turtle
@prefix ppid: <https://ppid.org/> .
@prefix pico: <https://personsincontext.org/model#> .
@prefix pnv: <https://w3id.org/pnv#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix sdo: <https://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
```
---
## 8. Conclusion
### 8.1 What PPID Adopts from PiCo
1. **PersonObservation/PersonReconstruction distinction** - Core ontological pattern
2. **PROV-O provenance model** - wasDerivedFrom, wasGeneratedBy, wasRevisionOf
3. **Person Name Vocabulary (PNV)** - Structured name representation
4. **Schema.org properties** - birthDate, deathDate, birthPlace, deathPlace, etc.
5. **Source linking** - hadPrimarySource, holdingArchive
### 8.2 What PPID Does Differently
1. **Semantic identifier format** - Geographic-temporal-emic instead of opaque UUID
2. **ID/PID epistemic distinction** - Explicit uncertainty modeling
3. **Living person handling** - Must remain ID until death
4. **GHCID alignment** - Consistent with heritage custodian identifier philosophy
5. **Century range encoding** - Temporal disambiguation in identifier
6. **Emic label tokens** - Name components in identifier for discoverability
### 8.3 Interoperability Path
PPID can be fully interoperable with PiCo systems via:
1. **OWL mappings**: `ppid:PersonPID rdfs:subClassOf pico:PersonReconstruction`
2. **SPARQL federation**: Query across PPID and PiCo endpoints
3. **Bidirectional links**: `owl:sameAs` between PPID and PiCo identifiers
4. **Profile negotiation**: Serve data in PiCo format via content negotiation
---
## 9. References
### PiCo Resources
- PiCo Specification: https://personsincontext.org/model
- PiCo GitHub: https://github.com/CBG-Centrum-voor-familiegeschiedenis/PiCo
- Open Archives API: https://www.openarchieven.nl/api/docs/uri.php
- CBG: https://cbg.nl/
### Standards
- Person Name Vocabulary (PNV): https://w3id.org/pnv
- PROV-O: https://www.w3.org/TR/prov-o/
- Schema.org: https://schema.org/
### Related PPID Documents
- [PPID-GHCID Alignment](./10_ppid_ghcid_alignment.md)
- [PiCo Ontology Analysis](./03_pico_ontology_analysis.md)
- [Identifier Structure Design](./05_identifier_structure_design.md)