18 KiB
PiCo vs PPID: Comparative Analysis
Version: 0.1.0
Last Updated: 2025-01-09
Related: PPID-GHCID Alignment | PiCo Ontology Analysis
1. Executive Summary
This document compares the PiCo (Persons in Context) ontology developed by CBG|Centrum voor Familiegeschiedenis with our proposed PPID (Person Persistent Identifier) system. The analysis is based on deep research into PiCo's implementation in Open Archives (openarchieven.nl) and the WieWasWie platform.
1.1 Key Finding
PiCo and PPID serve complementary purposes:
| System | Primary Purpose | Identifier Style | Scope |
|---|---|---|---|
| PiCo | Data model for person observations in genealogical sources | Opaque UUIDs | Genealogical records (civil registries, church books) |
| PPID | Persistent identifiers for heritage sector persons | Semantic geographic-temporal | Heritage custodian staff and historical figures |
Recommendation: PPID should adopt PiCo's ontological distinctions (PersonObservation vs PersonReconstruction) while using its own semantic identifier format aligned with GHCID conventions.
2. PiCo Architecture (From Research)
2.1 Core Classes
From the PiCo specification at personsincontext.org/model:
┌─────────────────────────────────────────────────────────────────┐
│ PiCo MODEL │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Person │ │
│ │ (Container class - not used directly) │ │
│ │ │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ │ │
│ │ │ PersonObservation│ │PersonReconstruction │ │
│ │ │ │ │ │ │ │
│ │ │ - Data as found │ │ - Curated identity│ │ │
│ │ │ on Source │ │ - Links multiple │ │ │
│ │ │ - hadPrimarySource │ observations │ │ │
│ │ │ - hasRole │ │ - wasDerivedFrom │ │ │
│ │ │ - hasAge │ │ - wasGeneratedBy │ │ │
│ │ │ - hasOccupation │ │ - wasRevisionOf │ │ │
│ │ └─────────────────┘ └─────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Source │ │
│ │ (schema:ArchiveComponent) │ │
│ │ - name, dateCreated, holdingArchive, associatedMedia │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ PersonName (PNV) │ │
│ │ - literalName, givenName, baseSurname, surnamePrefix │ │
│ │ - patronym, initials │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
2.2 PiCo Identifier Structure in Open Archives
From the Open Archives API documentation:
URI Format: https://www.openarchieven.nl/{3-letter-archive-code}:{uuid}[/{token}]
Examples:
- https://www.openarchieven.nl/rat:48c2b836-385f-11e0-bcd1-8edf61960649
- https://www.openarchieven.nl/elo:f5169776-db74-70a3-51e3-20c15291429c
Components:
- rat = Regionaal Archief Tilburg (3-letter archive code)
- 48c2b836-385f-11e0-bcd1-8edf61960649 = UUID of the record
- /ttl:pico = Optional token for content negotiation (Turtle + PiCo profile)
2.3 PiCo PersonObservation Example (Actual Data)
From Open Archives API response:
@prefix oa: <https://www.openarchieven.nl/id/> .
@prefix pico: <https://personsincontext.org/model#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix sdo: <https://schema.org/> .
oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f30464-3867-11e0-bcd1-8edf61960649
a pico:PersonObservation ;
prov:hadPrimarySource oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649 ;
pico:hasRole "Moeder" ;
sdo:children oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f2ae9c-... ;
sdo:spouse oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f2da16-... ;
sdo:gender sdo:Female ;
sdo:name "Cornelia Verhulst" ;
sdo:familyName "Verhulst" ;
sdo:givenName "Cornelia" .
2.4 PiCo PersonReconstruction Example
From PiCo specification:
cbg:person_reconstruction_2
a pico:PersonReconstruction ;
sdo:name "Anna Maria Koppen" ;
sdo:familyName "Koppen" ;
sdo:givenName "Anna" ;
sdo:gender sdo:Female ;
sdo:birthPlace "Haarlem" ;
sdo:birthDate "1860-03-31"^^xsd:date ;
sdo:deathPlace "Detroit, VSA" ;
sdo:deathDate "1926"^^xsd:gYear ;
prov:wasDerivedFrom nha:huwelijksakte_1885_321_po_1,
cbg:NL-HaCBG_1755_0341_142_po_1 ;
prov:wasGeneratedBy cbg:reconstruction_activity_01 .
3. Detailed Comparison
3.1 Identifier Format
| Aspect | PiCo (CBG/Open Archives) | PPID (Proposed) |
|---|---|---|
| Format | {archive}:{uuid} |
{TYPE}-{FC}-{FR}-{FP}-{LC}-{LR}-{LP}-{CR}-{FT}-{LT} |
| Example | rat:48c2b836-385f-11e0-bcd1-8edf61960649 |
PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG |
| Human Readable | No (opaque UUID) | Yes (semantic components) |
| Archive Prefix | Yes (3-letter code) | No (implicit via source) |
| Geographic | No | Yes (birth + death locations) |
| Temporal | No | Yes (century range) |
| Name | No | Yes (first + last token) |
3.2 Conceptual Model
| Concept | PiCo | PPID |
|---|---|---|
| Raw Observation | PersonObservation |
Observation (separate system) |
| Curated Identity | PersonReconstruction |
PID (promoted from ID) |
| Temporary State | Not explicit | ID class |
| Permanent State | All URIs persistent | PID class only |
| Provenance | PROV-O (wasGeneratedBy, wasDerivedFrom) | PROV-O + XPath claims |
| Name Vocabulary | PNV (Person Name Vocabulary) | Emic labels from sources |
3.3 Persistence Philosophy
| Aspect | PiCo | PPID |
|---|---|---|
| All identifiers persistent? | Yes | No - only PID class |
| Temporary identifiers? | No explicit concept | Yes - ID class |
| Promotion mechanism? | N/A | ID → PID when criteria met |
| Epistemic uncertainty? | Implicit (multiple observations) | Explicit (ID vs PID distinction) |
| Living persons? | Can have PersonReconstruction | Must remain ID until death |
3.4 Geographic Handling
| Aspect | PiCo | PPID |
|---|---|---|
| In identifier? | No | Yes |
| In properties? | Yes (birthPlace, deathPlace) | Also yes |
| Format | Free text or URI | ISO 3166-1/2 + GeoNames |
| Historical mapping? | Encouraged (link to thesaurus) | Required (historical → modern) |
| Example | sdo:birthPlace "Haarlem" |
...-NL-NH-HAA-... |
3.5 Temporal Handling
| Aspect | PiCo | PPID |
|---|---|---|
| In identifier? | No | Yes (century range) |
| Date format | ISO 8601 (xsd:date) | Century numbers |
| BCE support | Via negative years | Via negative centuries (-5--4) |
| Precision | Day-level possible | Century-level only in ID |
| Example | sdo:birthDate "1860-03-31"^^xsd:date |
...-19-20-... |
4. Key Differences Explained
4.1 Why PiCo Uses Opaque UUIDs
PiCo's design goals (from GitHub README):
- Successor to A2A: Designed to replace XML-based Archive-to-Archive standard
- Genealogical focus: Primary use case is WieWasWie ancestor search
- Linked Data: Interoperability via RDF, not human-readable identifiers
- Archive-centric: Identifiers include archive code prefix
PiCo's UUID approach is appropriate for:
- Massive genealogical databases (millions of records)
- Automated conversion from A2A
- Machine-to-machine data exchange
4.2 Why PPID Uses Semantic Identifiers
PPID's design goals:
- GHCID alignment: Consistent identifier philosophy across GLAM project
- Heritage sector focus: Staff of heritage institutions, historical figures
- Human discovery: Identifiers aid browsing and deduplication
- Epistemic honesty: Explicit distinction between ID (uncertain) and PID (verified)
- Scholarly citation: Identifiers can be meaningfully cited in publications
PPID's semantic approach is appropriate for:
- Smaller, curated datasets
- Human curation workflows
- Cross-system deduplication
- Scholarly reference
4.3 The ID/PID Distinction (Unique to PPID)
PiCo assumes all identifiers are permanent once created. PPID introduces explicit epistemic states:
PiCo:
PersonObservation (always permanent)
↓ prov:wasDerivedFrom
PersonReconstruction (always permanent)
PPID:
Observation (separate system, permanent)
↓
ID (temporary, may change)
↓ promotion when criteria met
PID (permanent, never changes)
Why this matters for heritage sector:
- Living persons: Cannot have verified death observation → must remain ID
- Incomplete records: May never have enough data for PID promotion
- Ongoing research: Archives not yet explored → cannot claim PID status
- Scholarly integrity: Prevents overclaiming certainty
5. Integration Recommendations
5.1 Adopt PiCo Ontological Distinctions
PPID should use PiCo's class hierarchy:
@prefix ppid: <https://ppid.org/> .
@prefix pico: <https://personsincontext.org/model#> .
# PPID extends PiCo
ppid:PersonID rdfs:subClassOf pico:PersonReconstruction .
ppid:PersonPID rdfs:subClassOf pico:PersonReconstruction .
# PPID observations link to source observations
ppid:hasSourceObservation rdfs:subPropertyOf prov:wasDerivedFrom ;
rdfs:range pico:PersonObservation .
5.2 Maintain PPID Semantic Identifier Format
Do not adopt PiCo's opaque UUID format. Keep semantic GHCID-aligned format:
PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG
Rationale: GHCID project-wide consistency, human discoverability, scholarly citation.
5.3 Use PNV for Name Properties
Adopt PiCo's use of Person Name Vocabulary for structured name data:
ppid:PRID-... pnv:hasName [
a pnv:PersonName ;
pnv:literalName "Jan van den Berg" ;
pnv:givenName "Jan" ;
pnv:surnamePrefix "van den" ;
pnv:baseSurname "Berg"
] .
5.4 Use PROV-O for Provenance
Adopt PiCo's PROV-O patterns for reconstruction provenance:
ppid:PID-NL-NH-AMS-...
prov:wasDerivedFrom <observation-1>, <observation-2> ;
prov:wasGeneratedBy [
a prov:Activity ;
prov:startedAtTime "2025-01-09T00:00:00"^^xsd:dateTime ;
prov:wasAssociatedWith ppid:curator-001
] .
5.5 Separate Observation Identifiers
As noted in the revised PPID design, observations use a different identifier system:
{REPOSITORY_GHCID}/{CREATOR_GHCID}/{RiC-O-PATH}
Example:
NL-NH-HAA-A-NHA/NL-NH-HAA-A-NHA/burgerlijke-stand/geboorten/1895/003/045
This is distinct from PiCo's {archive}:{uuid} but serves similar purposes.
6. Resolved Open Questions
Based on user clarifications:
6.1 BCE Date Handling
Resolution: Use negative century numbers.
Format: {first_century}-{last_century}
Examples:
- 5th century BCE to 4th century BCE: "-5--4"
- 1st century BCE to 1st century CE: "-1-1"
- 5th century BCE to 3rd century CE: "-5-3"
This aligns with ISO 8601 extended format which uses negative years for BCE dates.
6.2 Non-Latin Script Transliteration
Resolution: Apply same transliteration rules as GHCID (documented in AGENTS.md).
| Script | Standard |
|---|---|
| Cyrillic | ISO 9:1995 |
| Chinese | Hanyu Pinyin (ISO 7098) |
| Japanese | Modified Hepburn |
| Korean | Revised Romanization |
| Arabic | ISO 233-2/3 |
| Hebrew | ISO 259-3 |
| Greek | ISO 843 |
6.3 Disputed Locations
Resolution: Not a PPID concern - handled by ISO standardization.
When historical locations are disputed:
- Use the ISO-standardized modern location
- Document the dispute in observation metadata
- Do not encode uncertainty in the identifier itself
6.4 Living Persons
Resolution: Living persons are always ID class and can only be promoted to PID after death.
def can_promote_to_pid(person_id: str, observations: list) -> bool:
"""
Check if ID can be promoted to PID.
Living persons can NEVER be promoted.
"""
# Check for death observation
death_obs = [o for o in observations if o.is_death_record or o.is_post_death]
if not death_obs:
# No death observation = person may be alive = cannot be PID
return False
# Continue with other promotion criteria...
return check_other_criteria(observations)
Rationale:
- PID requires verified last observation (death)
- Living persons have incomplete lifecycle
- Future observations may change identity assessment
- Privacy considerations for living individuals
7. Implementation Alignment
7.1 Class Mapping
| PiCo Class | PPID Equivalent | Notes |
|---|---|---|
pico:Person |
(Container) | Not used directly |
pico:PersonObservation |
Observation (separate system) | Different identifier format |
pico:PersonReconstruction |
ppid:PersonID or ppid:PersonPID |
Split by epistemic certainty |
pico:Source |
schema:ArchiveComponent |
Same as PiCo |
pnv:PersonName |
pnv:PersonName |
Adopt PNV |
7.2 Property Mapping
| PiCo Property | PPID Usage | Notes |
|---|---|---|
prov:hadPrimarySource |
Same | For observations |
prov:wasDerivedFrom |
Same | PRID from POIDs |
prov:wasGeneratedBy |
Same | Activity provenance |
prov:wasRevisionOf |
Same | Version history |
sdo:birthDate |
Same | In properties |
sdo:birthPlace |
Same + in identifier | Dual representation |
sdo:deathDate |
Same | In properties |
sdo:deathPlace |
Same + in identifier | Dual representation |
pico:hasRole |
Same | For observations |
pico:hasAge |
Same | When birthDate unknown |
7.3 Namespace Declarations
@prefix ppid: <https://ppid.org/> .
@prefix pico: <https://personsincontext.org/model#> .
@prefix pnv: <https://w3id.org/pnv#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix sdo: <https://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
8. Conclusion
8.1 What PPID Adopts from PiCo
- PersonObservation/PersonReconstruction distinction - Core ontological pattern
- PROV-O provenance model - wasDerivedFrom, wasGeneratedBy, wasRevisionOf
- Person Name Vocabulary (PNV) - Structured name representation
- Schema.org properties - birthDate, deathDate, birthPlace, deathPlace, etc.
- Source linking - hadPrimarySource, holdingArchive
8.2 What PPID Does Differently
- Semantic identifier format - Geographic-temporal-emic instead of opaque UUID
- ID/PID epistemic distinction - Explicit uncertainty modeling
- Living person handling - Must remain ID until death
- GHCID alignment - Consistent with heritage custodian identifier philosophy
- Century range encoding - Temporal disambiguation in identifier
- Emic label tokens - Name components in identifier for discoverability
8.3 Interoperability Path
PPID can be fully interoperable with PiCo systems via:
- OWL mappings:
ppid:PersonPID rdfs:subClassOf pico:PersonReconstruction - SPARQL federation: Query across PPID and PiCo endpoints
- Bidirectional links:
owl:sameAsbetween PPID and PiCo identifiers - Profile negotiation: Serve data in PiCo format via content negotiation
9. References
PiCo Resources
- PiCo Specification: https://personsincontext.org/model
- PiCo GitHub: https://github.com/CBG-Centrum-voor-familiegeschiedenis/PiCo
- Open Archives API: https://www.openarchieven.nl/api/docs/uri.php
- CBG: https://cbg.nl/
Standards
- Person Name Vocabulary (PNV): https://w3id.org/pnv
- PROV-O: https://www.w3.org/TR/prov-o/
- Schema.org: https://schema.org/