# Person Persistent Identifier System (PPID) - Executive Summary **Version**: 0.1.0 **Status**: Research & Planning **Last Updated**: 2025-01-09 **Related**: [GHCID PID Scheme](../../../docs/GHCID_PID_SCHEME.md) | [PiCo Ontology](https://github.com/CBG-Centrum-voor-familiegeschiedenis/PiCo) --- ## 1. Vision Statement Create a **globally interoperable, culturally-aware persistent identifier system for persons** associated with heritage custodian institutions. The system will enable: - Unambiguous identification of persons across heritage collections worldwide - Linking individuals to their roles at heritage institutions (archivists, curators, directors, researchers) - Supporting genealogical and biographical research with proper provenance tracking - Handling the complexity of historical and cross-cultural naming conventions --- ## 2. Problem Statement ### Current Challenges | Challenge | Impact | |-----------|--------| | **Name variability** | Same person recorded as "Jan van der Berg", "J. v.d. Berg", "Johannes Berg" | | **Cultural naming diversity** | Patronymics (Iceland), mononyms (Indonesia), family-name-first (East Asia) | | **Historical uncertainty** | Birthdates unknown, conflicting records, incomplete data | | **Source fragmentation** | Person data scattered across LinkedIn, institutional websites, archives | | **Observation vs. identity** | Raw source data conflated with curated person records | ### Why Existing Systems Fall Short | System | Limitation for Heritage Domain | |--------|-------------------------------| | **ORCID** | Researcher-focused, requires self-registration, living persons only | | **ISNI** | Curated by registration agencies, expensive, slow assignment process | | **VIAF** | Library authority files only, not designed for heritage staff | | **Wikidata** | Notability requirement excludes most heritage professionals | --- ## 3. Proposed Solution: PPID ### Core Design Principles 1. **Opaque Identifiers**: No personal information encoded in the ID itself (following ORCID best practice) 2. **Observation-Reconstruction Distinction**: Separate identifiers for raw source data vs. curated person records (following PiCo ontology) 3. **Cultural Neutrality**: No assumptions about name structure, birthdates, or family relationships 4. **Provenance-First**: Every claim traceable to its source with confidence assertions 5. **Interoperable**: Links to ORCID, ISNI, VIAF, Wikidata where available ### Two-Level Identifier Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ PERSON RECONSTRUCTION (PRID) │ │ │ │ Curated identity: "Johannes van der Berg (1892-1967)" │ │ PRID: ppid:PRID-xxxx-xxxx-xxxx-xxxx │ │ │ │ Links to external IDs: │ │ - ORCID: 0000-0002-1234-5678 (if researcher) │ │ - Wikidata: Q12345678 (if notable) │ │ - VIAF: 123456789 (if in library authorities) │ │ │ ├─────────────────────────────────────────────────────────────────┤ │ ▲ ▲ ▲ │ │ │ prov:wasDerivedFrom │ │ │ │ │ │ │ │ ├──┴────────────────────┴────────────────────┴────────────────────┤ │ PERSON OBSERVATIONS (POIDs) │ │ │ │ LinkedIn observation: Archive observation: │ │ "Jan van der Berg" "J. v.d. Berg" │ │ POID: ppid:POID-aaaa-... POID: ppid:POID-bbbb-... │ │ Source: linkedin.com/in/... Source: archive.org/doc/123 │ │ Retrieved: 2025-01-09 Retrieved: 2024-05-15 │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Identifier Format | Component | Format | Example | |-----------|--------|---------| | **POID** (Observation) | `ppid:POID-xxxx-xxxx-xxxx-xxxx` | `ppid:POID-7a3b-c4d5-e6f7-8901` | | **PRID** (Reconstruction) | `ppid:PRID-xxxx-xxxx-xxxx-xxxx` | `ppid:PRID-1234-5678-90ab-cdef` | | **Checksum** | ISO/IEC 7064 MOD 11-2 | Last character (0-9 or X) | --- ## 4. Alignment with PiCo Ontology The **PiCo (Persons in Context)** ontology from CBG-Centrum-voor-familiegeschiedenis provides the conceptual foundation: | PiCo Concept | PPID Implementation | |--------------|---------------------| | `picom:PersonObservation` | POID - identifier for raw source observation | | `picom:PersonReconstruction` | PRID - identifier for curated person identity | | `prov:wasDerivedFrom` | Links PRID to source POIDs | | `picom:hasName` (via PNV) | Structured name representation | | `picom:hasRole` | Person's role at heritage institution | ### PiCo's Key Innovation PiCo explicitly separates: - **What the source says** (PersonObservation) - "the document states this person was born in 1892" - **What we conclude** (PersonReconstruction) - "we believe this person was Johannes van der Berg, born c. 1892" This distinction is critical for: - Handling conflicting information across sources - Preserving original source data integrity - Supporting scholarly genealogical research - Enabling transparent reasoning about person identities --- ## 5. Scope and Boundaries ### In Scope - Persons associated with heritage custodian institutions (GLAM sector) - Historical persons appearing in heritage collections - Genealogical subjects in archival records - Staff, researchers, donors, and stakeholders of heritage organizations ### Out of Scope (Initially) - General public registration (unlike ORCID's self-service model) - Living persons without heritage sector connection - Fictional characters - Legal entities (covered by GHCID for institutions) --- ## 6. Success Criteria | Metric | Target | |--------|--------| | **Interoperability** | Bidirectional links to ORCID, ISNI, VIAF, Wikidata | | **Cultural coverage** | Support for 50+ naming conventions documented | | **Provenance completeness** | 100% of claims have source attribution | | **Resolution accuracy** | <5% false positive rate in entity matching | | **Adoption** | Integration with 3+ major genealogical platforms | --- ## 7. Document Roadmap | Document | Purpose | |----------|---------| | [02_sota_identifier_systems.md](./02_sota_identifier_systems.md) | Analysis of ORCID, ISNI, VIAF | | [03_pico_ontology_analysis.md](./03_pico_ontology_analysis.md) | Deep dive into PiCo model | | [04_cultural_naming_conventions.md](./04_cultural_naming_conventions.md) | Global naming pattern challenges | | [05_identifier_structure_design.md](./05_identifier_structure_design.md) | Format, checksum, namespaces | | [06_entity_resolution_patterns.md](./06_entity_resolution_patterns.md) | Handling partial/uncertain data | | [07_claims_and_provenance.md](./07_claims_and_provenance.md) | Web claims, provenance statements | | [08_implementation_guidelines.md](./08_implementation_guidelines.md) | Technical specifications | | [09_governance_and_sustainability.md](./09_governance_and_sustainability.md) | Long-term management | --- ## 8. Key Stakeholders | Stakeholder | Interest | |-------------|----------| | **Heritage institutions** | Staff identification, donor tracking, researcher attribution | | **Genealogical organizations** | Standardized person identification across sources | | **Archival services** | Authority control for persons in collections | | **Research libraries** | Integration with existing authority files | | **Digital humanities** | Linked data for biographical research | --- ## 9. Timeline | Phase | Duration | Deliverables | |-------|----------|--------------| | **Research** | Q1 2025 | This planning document series | | **Design** | Q2 2025 | Identifier specification, ontology alignment | | **Prototype** | Q3 2025 | Reference implementation, test dataset | | **Pilot** | Q4 2025 | Integration with 2-3 heritage partners | | **Launch** | Q1 2026 | Public API, documentation, governance structure | --- ## 10. References - **ORCID**: https://orcid.org/ - Model for researcher identification - **ISNI**: https://isni.org/ - ISO 27729 standard for public identities - **VIAF**: https://viaf.org/ - Virtual International Authority File - **PiCo Ontology**: https://github.com/CBG-Centrum-voor-familiegeschiedenis/PiCo - **PNV**: Person Name Vocabulary for Dutch historical names - **W3C Personal Names**: https://www.w3.org/International/questions/qa-personal-names - **GHCID**: Our heritage custodian identifier system (parallel design)