Person Persistent Identifier System (PPID) - Executive Summary
Version: 0.1.0
Status: Research & Planning
Last Updated: 2025-01-09
Related: GHCID PID Scheme | PiCo Ontology
1. Vision Statement
Create a globally interoperable, culturally-aware persistent identifier system for persons associated with heritage custodian institutions. The system will enable:
- Unambiguous identification of persons across heritage collections worldwide
- Linking individuals to their roles at heritage institutions (archivists, curators, directors, researchers)
- Supporting genealogical and biographical research with proper provenance tracking
- Handling the complexity of historical and cross-cultural naming conventions
2. Problem Statement
Current Challenges
| Challenge |
Impact |
| Name variability |
Same person recorded as "Jan van der Berg", "J. v.d. Berg", "Johannes Berg" |
| Cultural naming diversity |
Patronymics (Iceland), mononyms (Indonesia), family-name-first (East Asia) |
| Historical uncertainty |
Birthdates unknown, conflicting records, incomplete data |
| Source fragmentation |
Person data scattered across LinkedIn, institutional websites, archives |
| Observation vs. identity |
Raw source data conflated with curated person records |
Why Existing Systems Fall Short
| System |
Limitation for Heritage Domain |
| ORCID |
Researcher-focused, requires self-registration, living persons only |
| ISNI |
Curated by registration agencies, expensive, slow assignment process |
| VIAF |
Library authority files only, not designed for heritage staff |
| Wikidata |
Notability requirement excludes most heritage professionals |
3. Proposed Solution: PPID
Core Design Principles
- Opaque Identifiers: No personal information encoded in the ID itself (following ORCID best practice)
- Observation-Reconstruction Distinction: Separate identifiers for raw source data vs. curated person records (following PiCo ontology)
- Cultural Neutrality: No assumptions about name structure, birthdates, or family relationships
- Provenance-First: Every claim traceable to its source with confidence assertions
- Interoperable: Links to ORCID, ISNI, VIAF, Wikidata where available
Two-Level Identifier Architecture
┌─────────────────────────────────────────────────────────────────┐
│ PERSON RECONSTRUCTION (PRID) │
│ │
│ Curated identity: "Johannes van der Berg (1892-1967)" │
│ PRID: ppid:PRID-xxxx-xxxx-xxxx-xxxx │
│ │
│ Links to external IDs: │
│ - ORCID: 0000-0002-1234-5678 (if researcher) │
│ - Wikidata: Q12345678 (if notable) │
│ - VIAF: 123456789 (if in library authorities) │
│ │
├─────────────────────────────────────────────────────────────────┤
│ ▲ ▲ ▲ │
│ │ prov:wasDerivedFrom │ │ │
│ │ │ │ │
├──┴────────────────────┴────────────────────┴────────────────────┤
│ PERSON OBSERVATIONS (POIDs) │
│ │
│ LinkedIn observation: Archive observation: │
│ "Jan van der Berg" "J. v.d. Berg" │
│ POID: ppid:POID-aaaa-... POID: ppid:POID-bbbb-... │
│ Source: linkedin.com/in/... Source: archive.org/doc/123 │
│ Retrieved: 2025-01-09 Retrieved: 2024-05-15 │
│ │
└─────────────────────────────────────────────────────────────────┘
Identifier Format
| Component |
Format |
Example |
| POID (Observation) |
ppid:POID-xxxx-xxxx-xxxx-xxxx |
ppid:POID-7a3b-c4d5-e6f7-8901 |
| PRID (Reconstruction) |
ppid:PRID-xxxx-xxxx-xxxx-xxxx |
ppid:PRID-1234-5678-90ab-cdef |
| Checksum |
ISO/IEC 7064 MOD 11-2 |
Last character (0-9 or X) |
4. Alignment with PiCo Ontology
The PiCo (Persons in Context) ontology from CBG-Centrum-voor-familiegeschiedenis provides the conceptual foundation:
| PiCo Concept |
PPID Implementation |
picom:PersonObservation |
POID - identifier for raw source observation |
picom:PersonReconstruction |
PRID - identifier for curated person identity |
prov:wasDerivedFrom |
Links PRID to source POIDs |
picom:hasName (via PNV) |
Structured name representation |
picom:hasRole |
Person's role at heritage institution |
PiCo's Key Innovation
PiCo explicitly separates:
- What the source says (PersonObservation) - "the document states this person was born in 1892"
- What we conclude (PersonReconstruction) - "we believe this person was Johannes van der Berg, born c. 1892"
This distinction is critical for:
- Handling conflicting information across sources
- Preserving original source data integrity
- Supporting scholarly genealogical research
- Enabling transparent reasoning about person identities
5. Scope and Boundaries
In Scope
- Persons associated with heritage custodian institutions (GLAM sector)
- Historical persons appearing in heritage collections
- Genealogical subjects in archival records
- Staff, researchers, donors, and stakeholders of heritage organizations
Out of Scope (Initially)
- General public registration (unlike ORCID's self-service model)
- Living persons without heritage sector connection
- Fictional characters
- Legal entities (covered by GHCID for institutions)
6. Success Criteria
| Metric |
Target |
| Interoperability |
Bidirectional links to ORCID, ISNI, VIAF, Wikidata |
| Cultural coverage |
Support for 50+ naming conventions documented |
| Provenance completeness |
100% of claims have source attribution |
| Resolution accuracy |
<5% false positive rate in entity matching |
| Adoption |
Integration with 3+ major genealogical platforms |
7. Document Roadmap
8. Key Stakeholders
| Stakeholder |
Interest |
| Heritage institutions |
Staff identification, donor tracking, researcher attribution |
| Genealogical organizations |
Standardized person identification across sources |
| Archival services |
Authority control for persons in collections |
| Research libraries |
Integration with existing authority files |
| Digital humanities |
Linked data for biographical research |
9. Timeline
| Phase |
Duration |
Deliverables |
| Research |
Q1 2025 |
This planning document series |
| Design |
Q2 2025 |
Identifier specification, ontology alignment |
| Prototype |
Q3 2025 |
Reference implementation, test dataset |
| Pilot |
Q4 2025 |
Integration with 2-3 heritage partners |
| Launch |
Q1 2026 |
Public API, documentation, governance structure |
10. References