docs(person_pid): add PPID-GHCID alignment and PiCo comparison docs
This commit is contained in:
parent
a51c8c400c
commit
7f53ec6074
2 changed files with 1625 additions and 0 deletions
1150
docs/plan/person_pid/10_ppid_ghcid_alignment.md
Normal file
1150
docs/plan/person_pid/10_ppid_ghcid_alignment.md
Normal file
File diff suppressed because it is too large
Load diff
475
docs/plan/person_pid/11_pico_ppid_comparison.md
Normal file
475
docs/plan/person_pid/11_pico_ppid_comparison.md
Normal file
|
|
@ -0,0 +1,475 @@
|
|||
# PiCo vs PPID: Comparative Analysis
|
||||
|
||||
**Version**: 0.1.0
|
||||
**Last Updated**: 2025-01-09
|
||||
**Related**: [PPID-GHCID Alignment](./10_ppid_ghcid_alignment.md) | [PiCo Ontology Analysis](./03_pico_ontology_analysis.md)
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
This document compares the **PiCo (Persons in Context)** ontology developed by CBG|Centrum voor Familiegeschiedenis with our proposed **PPID (Person Persistent Identifier)** system. The analysis is based on deep research into PiCo's implementation in Open Archives (openarchieven.nl) and the WieWasWie platform.
|
||||
|
||||
### 1.1 Key Finding
|
||||
|
||||
PiCo and PPID serve **complementary purposes**:
|
||||
|
||||
| System | Primary Purpose | Identifier Style | Scope |
|
||||
|--------|-----------------|------------------|-------|
|
||||
| **PiCo** | Data model for person observations in genealogical sources | Opaque UUIDs | Genealogical records (civil registries, church books) |
|
||||
| **PPID** | Persistent identifiers for heritage sector persons | Semantic geographic-temporal | Heritage custodian staff and historical figures |
|
||||
|
||||
**Recommendation**: PPID should **adopt PiCo's ontological distinctions** (PersonObservation vs PersonReconstruction) while using its own **semantic identifier format** aligned with GHCID conventions.
|
||||
|
||||
---
|
||||
|
||||
## 2. PiCo Architecture (From Research)
|
||||
|
||||
### 2.1 Core Classes
|
||||
|
||||
From the PiCo specification at `personsincontext.org/model`:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ PiCo MODEL │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ Person │ │
|
||||
│ │ (Container class - not used directly) │ │
|
||||
│ │ │ │
|
||||
│ │ ┌─────────────────┐ ┌─────────────────┐ │ │
|
||||
│ │ │ PersonObservation│ │PersonReconstruction │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ │ - Data as found │ │ - Curated identity│ │ │
|
||||
│ │ │ on Source │ │ - Links multiple │ │ │
|
||||
│ │ │ - hadPrimarySource │ observations │ │ │
|
||||
│ │ │ - hasRole │ │ - wasDerivedFrom │ │ │
|
||||
│ │ │ - hasAge │ │ - wasGeneratedBy │ │ │
|
||||
│ │ │ - hasOccupation │ │ - wasRevisionOf │ │ │
|
||||
│ │ └─────────────────┘ └─────────────────┘ │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ Source │ │
|
||||
│ │ (schema:ArchiveComponent) │ │
|
||||
│ │ - name, dateCreated, holdingArchive, associatedMedia │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ PersonName (PNV) │ │
|
||||
│ │ - literalName, givenName, baseSurname, surnamePrefix │ │
|
||||
│ │ - patronym, initials │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 2.2 PiCo Identifier Structure in Open Archives
|
||||
|
||||
From the Open Archives API documentation:
|
||||
|
||||
```
|
||||
URI Format: https://www.openarchieven.nl/{3-letter-archive-code}:{uuid}[/{token}]
|
||||
|
||||
Examples:
|
||||
- https://www.openarchieven.nl/rat:48c2b836-385f-11e0-bcd1-8edf61960649
|
||||
- https://www.openarchieven.nl/elo:f5169776-db74-70a3-51e3-20c15291429c
|
||||
|
||||
Components:
|
||||
- rat = Regionaal Archief Tilburg (3-letter archive code)
|
||||
- 48c2b836-385f-11e0-bcd1-8edf61960649 = UUID of the record
|
||||
- /ttl:pico = Optional token for content negotiation (Turtle + PiCo profile)
|
||||
```
|
||||
|
||||
### 2.3 PiCo PersonObservation Example (Actual Data)
|
||||
|
||||
From Open Archives API response:
|
||||
|
||||
```turtle
|
||||
@prefix oa: <https://www.openarchieven.nl/id/> .
|
||||
@prefix pico: <https://personsincontext.org/model#> .
|
||||
@prefix prov: <http://www.w3.org/ns/prov#> .
|
||||
@prefix sdo: <https://schema.org/> .
|
||||
|
||||
oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f30464-3867-11e0-bcd1-8edf61960649
|
||||
a pico:PersonObservation ;
|
||||
prov:hadPrimarySource oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649 ;
|
||||
pico:hasRole "Moeder" ;
|
||||
sdo:children oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f2ae9c-... ;
|
||||
sdo:spouse oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f2da16-... ;
|
||||
sdo:gender sdo:Female ;
|
||||
sdo:name "Cornelia Verhulst" ;
|
||||
sdo:familyName "Verhulst" ;
|
||||
sdo:givenName "Cornelia" .
|
||||
```
|
||||
|
||||
### 2.4 PiCo PersonReconstruction Example
|
||||
|
||||
From PiCo specification:
|
||||
|
||||
```turtle
|
||||
cbg:person_reconstruction_2
|
||||
a pico:PersonReconstruction ;
|
||||
sdo:name "Anna Maria Koppen" ;
|
||||
sdo:familyName "Koppen" ;
|
||||
sdo:givenName "Anna" ;
|
||||
sdo:gender sdo:Female ;
|
||||
sdo:birthPlace "Haarlem" ;
|
||||
sdo:birthDate "1860-03-31"^^xsd:date ;
|
||||
sdo:deathPlace "Detroit, VSA" ;
|
||||
sdo:deathDate "1926"^^xsd:gYear ;
|
||||
prov:wasDerivedFrom nha:huwelijksakte_1885_321_po_1,
|
||||
cbg:NL-HaCBG_1755_0341_142_po_1 ;
|
||||
prov:wasGeneratedBy cbg:reconstruction_activity_01 .
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Detailed Comparison
|
||||
|
||||
### 3.1 Identifier Format
|
||||
|
||||
| Aspect | PiCo (CBG/Open Archives) | PPID (Proposed) |
|
||||
|--------|--------------------------|-----------------|
|
||||
| **Format** | `{archive}:{uuid}` | `{TYPE}-{FC}-{FR}-{FP}-{LC}-{LR}-{LP}-{CR}-{FT}-{LT}` |
|
||||
| **Example** | `rat:48c2b836-385f-11e0-bcd1-8edf61960649` | `PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG` |
|
||||
| **Human Readable** | No (opaque UUID) | Yes (semantic components) |
|
||||
| **Archive Prefix** | Yes (3-letter code) | No (implicit via source) |
|
||||
| **Geographic** | No | Yes (birth + death locations) |
|
||||
| **Temporal** | No | Yes (century range) |
|
||||
| **Name** | No | Yes (first + last token) |
|
||||
|
||||
### 3.2 Conceptual Model
|
||||
|
||||
| Concept | PiCo | PPID |
|
||||
|---------|------|------|
|
||||
| **Raw Observation** | `PersonObservation` | Observation (separate system) |
|
||||
| **Curated Identity** | `PersonReconstruction` | `PID` (promoted from `ID`) |
|
||||
| **Temporary State** | Not explicit | `ID` class |
|
||||
| **Permanent State** | All URIs persistent | `PID` class only |
|
||||
| **Provenance** | PROV-O (wasGeneratedBy, wasDerivedFrom) | PROV-O + XPath claims |
|
||||
| **Name Vocabulary** | PNV (Person Name Vocabulary) | Emic labels from sources |
|
||||
|
||||
### 3.3 Persistence Philosophy
|
||||
|
||||
| Aspect | PiCo | PPID |
|
||||
|--------|------|------|
|
||||
| **All identifiers persistent?** | Yes | No - only PID class |
|
||||
| **Temporary identifiers?** | No explicit concept | Yes - ID class |
|
||||
| **Promotion mechanism?** | N/A | ID → PID when criteria met |
|
||||
| **Epistemic uncertainty?** | Implicit (multiple observations) | Explicit (ID vs PID distinction) |
|
||||
| **Living persons?** | Can have PersonReconstruction | Must remain ID until death |
|
||||
|
||||
### 3.4 Geographic Handling
|
||||
|
||||
| Aspect | PiCo | PPID |
|
||||
|--------|------|------|
|
||||
| **In identifier?** | No | Yes |
|
||||
| **In properties?** | Yes (birthPlace, deathPlace) | Also yes |
|
||||
| **Format** | Free text or URI | ISO 3166-1/2 + GeoNames |
|
||||
| **Historical mapping?** | Encouraged (link to thesaurus) | Required (historical → modern) |
|
||||
| **Example** | `sdo:birthPlace "Haarlem"` | `...-NL-NH-HAA-...` |
|
||||
|
||||
### 3.5 Temporal Handling
|
||||
|
||||
| Aspect | PiCo | PPID |
|
||||
|--------|------|------|
|
||||
| **In identifier?** | No | Yes (century range) |
|
||||
| **Date format** | ISO 8601 (xsd:date) | Century numbers |
|
||||
| **BCE support** | Via negative years | Via negative centuries (-5--4) |
|
||||
| **Precision** | Day-level possible | Century-level only in ID |
|
||||
| **Example** | `sdo:birthDate "1860-03-31"^^xsd:date` | `...-19-20-...` |
|
||||
|
||||
---
|
||||
|
||||
## 4. Key Differences Explained
|
||||
|
||||
### 4.1 Why PiCo Uses Opaque UUIDs
|
||||
|
||||
PiCo's design goals (from GitHub README):
|
||||
|
||||
1. **Successor to A2A**: Designed to replace XML-based Archive-to-Archive standard
|
||||
2. **Genealogical focus**: Primary use case is WieWasWie ancestor search
|
||||
3. **Linked Data**: Interoperability via RDF, not human-readable identifiers
|
||||
4. **Archive-centric**: Identifiers include archive code prefix
|
||||
|
||||
PiCo's UUID approach is appropriate for:
|
||||
- Massive genealogical databases (millions of records)
|
||||
- Automated conversion from A2A
|
||||
- Machine-to-machine data exchange
|
||||
|
||||
### 4.2 Why PPID Uses Semantic Identifiers
|
||||
|
||||
PPID's design goals:
|
||||
|
||||
1. **GHCID alignment**: Consistent identifier philosophy across GLAM project
|
||||
2. **Heritage sector focus**: Staff of heritage institutions, historical figures
|
||||
3. **Human discovery**: Identifiers aid browsing and deduplication
|
||||
4. **Epistemic honesty**: Explicit distinction between ID (uncertain) and PID (verified)
|
||||
5. **Scholarly citation**: Identifiers can be meaningfully cited in publications
|
||||
|
||||
PPID's semantic approach is appropriate for:
|
||||
- Smaller, curated datasets
|
||||
- Human curation workflows
|
||||
- Cross-system deduplication
|
||||
- Scholarly reference
|
||||
|
||||
### 4.3 The ID/PID Distinction (Unique to PPID)
|
||||
|
||||
PiCo assumes all identifiers are permanent once created. PPID introduces explicit epistemic states:
|
||||
|
||||
```
|
||||
PiCo:
|
||||
PersonObservation (always permanent)
|
||||
↓ prov:wasDerivedFrom
|
||||
PersonReconstruction (always permanent)
|
||||
|
||||
PPID:
|
||||
Observation (separate system, permanent)
|
||||
↓
|
||||
ID (temporary, may change)
|
||||
↓ promotion when criteria met
|
||||
PID (permanent, never changes)
|
||||
```
|
||||
|
||||
**Why this matters for heritage sector**:
|
||||
|
||||
- **Living persons**: Cannot have verified death observation → must remain ID
|
||||
- **Incomplete records**: May never have enough data for PID promotion
|
||||
- **Ongoing research**: Archives not yet explored → cannot claim PID status
|
||||
- **Scholarly integrity**: Prevents overclaiming certainty
|
||||
|
||||
---
|
||||
|
||||
## 5. Integration Recommendations
|
||||
|
||||
### 5.1 Adopt PiCo Ontological Distinctions
|
||||
|
||||
PPID should use PiCo's class hierarchy:
|
||||
|
||||
```turtle
|
||||
@prefix ppid: <https://ppid.org/> .
|
||||
@prefix pico: <https://personsincontext.org/model#> .
|
||||
|
||||
# PPID extends PiCo
|
||||
ppid:PersonID rdfs:subClassOf pico:PersonReconstruction .
|
||||
ppid:PersonPID rdfs:subClassOf pico:PersonReconstruction .
|
||||
|
||||
# PPID observations link to source observations
|
||||
ppid:hasSourceObservation rdfs:subPropertyOf prov:wasDerivedFrom ;
|
||||
rdfs:range pico:PersonObservation .
|
||||
```
|
||||
|
||||
### 5.2 Maintain PPID Semantic Identifier Format
|
||||
|
||||
Do not adopt PiCo's opaque UUID format. Keep semantic GHCID-aligned format:
|
||||
|
||||
```
|
||||
PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG
|
||||
```
|
||||
|
||||
**Rationale**: GHCID project-wide consistency, human discoverability, scholarly citation.
|
||||
|
||||
### 5.3 Use PNV for Name Properties
|
||||
|
||||
Adopt PiCo's use of Person Name Vocabulary for structured name data:
|
||||
|
||||
```turtle
|
||||
ppid:PRID-... pnv:hasName [
|
||||
a pnv:PersonName ;
|
||||
pnv:literalName "Jan van den Berg" ;
|
||||
pnv:givenName "Jan" ;
|
||||
pnv:surnamePrefix "van den" ;
|
||||
pnv:baseSurname "Berg"
|
||||
] .
|
||||
```
|
||||
|
||||
### 5.4 Use PROV-O for Provenance
|
||||
|
||||
Adopt PiCo's PROV-O patterns for reconstruction provenance:
|
||||
|
||||
```turtle
|
||||
ppid:PID-NL-NH-AMS-...
|
||||
prov:wasDerivedFrom <observation-1>, <observation-2> ;
|
||||
prov:wasGeneratedBy [
|
||||
a prov:Activity ;
|
||||
prov:startedAtTime "2025-01-09T00:00:00"^^xsd:dateTime ;
|
||||
prov:wasAssociatedWith ppid:curator-001
|
||||
] .
|
||||
```
|
||||
|
||||
### 5.5 Separate Observation Identifiers
|
||||
|
||||
As noted in the revised PPID design, observations use a **different identifier system**:
|
||||
|
||||
```
|
||||
{REPOSITORY_GHCID}/{CREATOR_GHCID}/{RiC-O-PATH}
|
||||
|
||||
Example:
|
||||
NL-NH-HAA-A-NHA/NL-NH-HAA-A-NHA/burgerlijke-stand/geboorten/1895/003/045
|
||||
```
|
||||
|
||||
This is distinct from PiCo's `{archive}:{uuid}` but serves similar purposes.
|
||||
|
||||
---
|
||||
|
||||
## 6. Resolved Open Questions
|
||||
|
||||
Based on user clarifications:
|
||||
|
||||
### 6.1 BCE Date Handling
|
||||
|
||||
**Resolution**: Use negative century numbers.
|
||||
|
||||
```
|
||||
Format: {first_century}-{last_century}
|
||||
|
||||
Examples:
|
||||
- 5th century BCE to 4th century BCE: "-5--4"
|
||||
- 1st century BCE to 1st century CE: "-1-1"
|
||||
- 5th century BCE to 3rd century CE: "-5-3"
|
||||
```
|
||||
|
||||
This aligns with ISO 8601 extended format which uses negative years for BCE dates.
|
||||
|
||||
### 6.2 Non-Latin Script Transliteration
|
||||
|
||||
**Resolution**: Apply same transliteration rules as GHCID (documented in AGENTS.md).
|
||||
|
||||
| Script | Standard |
|
||||
|--------|----------|
|
||||
| Cyrillic | ISO 9:1995 |
|
||||
| Chinese | Hanyu Pinyin (ISO 7098) |
|
||||
| Japanese | Modified Hepburn |
|
||||
| Korean | Revised Romanization |
|
||||
| Arabic | ISO 233-2/3 |
|
||||
| Hebrew | ISO 259-3 |
|
||||
| Greek | ISO 843 |
|
||||
|
||||
### 6.3 Disputed Locations
|
||||
|
||||
**Resolution**: Not a PPID concern - handled by ISO standardization.
|
||||
|
||||
When historical locations are disputed:
|
||||
- Use the ISO-standardized modern location
|
||||
- Document the dispute in observation metadata
|
||||
- Do not encode uncertainty in the identifier itself
|
||||
|
||||
### 6.4 Living Persons
|
||||
|
||||
**Resolution**: Living persons are **always ID class** and can only be promoted to PID after death.
|
||||
|
||||
```python
|
||||
def can_promote_to_pid(person_id: str, observations: list) -> bool:
|
||||
"""
|
||||
Check if ID can be promoted to PID.
|
||||
|
||||
Living persons can NEVER be promoted.
|
||||
"""
|
||||
# Check for death observation
|
||||
death_obs = [o for o in observations if o.is_death_record or o.is_post_death]
|
||||
|
||||
if not death_obs:
|
||||
# No death observation = person may be alive = cannot be PID
|
||||
return False
|
||||
|
||||
# Continue with other promotion criteria...
|
||||
return check_other_criteria(observations)
|
||||
```
|
||||
|
||||
**Rationale**:
|
||||
1. PID requires verified last observation (death)
|
||||
2. Living persons have incomplete lifecycle
|
||||
3. Future observations may change identity assessment
|
||||
4. Privacy considerations for living individuals
|
||||
|
||||
---
|
||||
|
||||
## 7. Implementation Alignment
|
||||
|
||||
### 7.1 Class Mapping
|
||||
|
||||
| PiCo Class | PPID Equivalent | Notes |
|
||||
|------------|-----------------|-------|
|
||||
| `pico:Person` | (Container) | Not used directly |
|
||||
| `pico:PersonObservation` | Observation (separate system) | Different identifier format |
|
||||
| `pico:PersonReconstruction` | `ppid:PersonID` or `ppid:PersonPID` | Split by epistemic certainty |
|
||||
| `pico:Source` | `schema:ArchiveComponent` | Same as PiCo |
|
||||
| `pnv:PersonName` | `pnv:PersonName` | Adopt PNV |
|
||||
|
||||
### 7.2 Property Mapping
|
||||
|
||||
| PiCo Property | PPID Usage | Notes |
|
||||
|---------------|------------|-------|
|
||||
| `prov:hadPrimarySource` | Same | For observations |
|
||||
| `prov:wasDerivedFrom` | Same | PRID from POIDs |
|
||||
| `prov:wasGeneratedBy` | Same | Activity provenance |
|
||||
| `prov:wasRevisionOf` | Same | Version history |
|
||||
| `sdo:birthDate` | Same | In properties |
|
||||
| `sdo:birthPlace` | Same + in identifier | Dual representation |
|
||||
| `sdo:deathDate` | Same | In properties |
|
||||
| `sdo:deathPlace` | Same + in identifier | Dual representation |
|
||||
| `pico:hasRole` | Same | For observations |
|
||||
| `pico:hasAge` | Same | When birthDate unknown |
|
||||
|
||||
### 7.3 Namespace Declarations
|
||||
|
||||
```turtle
|
||||
@prefix ppid: <https://ppid.org/> .
|
||||
@prefix pico: <https://personsincontext.org/model#> .
|
||||
@prefix pnv: <https://w3id.org/pnv#> .
|
||||
@prefix prov: <http://www.w3.org/ns/prov#> .
|
||||
@prefix sdo: <https://schema.org/> .
|
||||
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Conclusion
|
||||
|
||||
### 8.1 What PPID Adopts from PiCo
|
||||
|
||||
1. **PersonObservation/PersonReconstruction distinction** - Core ontological pattern
|
||||
2. **PROV-O provenance model** - wasDerivedFrom, wasGeneratedBy, wasRevisionOf
|
||||
3. **Person Name Vocabulary (PNV)** - Structured name representation
|
||||
4. **Schema.org properties** - birthDate, deathDate, birthPlace, deathPlace, etc.
|
||||
5. **Source linking** - hadPrimarySource, holdingArchive
|
||||
|
||||
### 8.2 What PPID Does Differently
|
||||
|
||||
1. **Semantic identifier format** - Geographic-temporal-emic instead of opaque UUID
|
||||
2. **ID/PID epistemic distinction** - Explicit uncertainty modeling
|
||||
3. **Living person handling** - Must remain ID until death
|
||||
4. **GHCID alignment** - Consistent with heritage custodian identifier philosophy
|
||||
5. **Century range encoding** - Temporal disambiguation in identifier
|
||||
6. **Emic label tokens** - Name components in identifier for discoverability
|
||||
|
||||
### 8.3 Interoperability Path
|
||||
|
||||
PPID can be fully interoperable with PiCo systems via:
|
||||
|
||||
1. **OWL mappings**: `ppid:PersonPID rdfs:subClassOf pico:PersonReconstruction`
|
||||
2. **SPARQL federation**: Query across PPID and PiCo endpoints
|
||||
3. **Bidirectional links**: `owl:sameAs` between PPID and PiCo identifiers
|
||||
4. **Profile negotiation**: Serve data in PiCo format via content negotiation
|
||||
|
||||
---
|
||||
|
||||
## 9. References
|
||||
|
||||
### PiCo Resources
|
||||
- PiCo Specification: https://personsincontext.org/model
|
||||
- PiCo GitHub: https://github.com/CBG-Centrum-voor-familiegeschiedenis/PiCo
|
||||
- Open Archives API: https://www.openarchieven.nl/api/docs/uri.php
|
||||
- CBG: https://cbg.nl/
|
||||
|
||||
### Standards
|
||||
- Person Name Vocabulary (PNV): https://w3id.org/pnv
|
||||
- PROV-O: https://www.w3.org/TR/prov-o/
|
||||
- Schema.org: https://schema.org/
|
||||
|
||||
### Related PPID Documents
|
||||
- [PPID-GHCID Alignment](./10_ppid_ghcid_alignment.md)
|
||||
- [PiCo Ontology Analysis](./03_pico_ontology_analysis.md)
|
||||
- [Identifier Structure Design](./05_identifier_structure_design.md)
|
||||
Loading…
Reference in a new issue