glam/docs/plan/person_pid/01_executive_summary.md

9.4 KiB

Person Persistent Identifier System (PPID) - Executive Summary

Version: 0.1.0
Status: Research & Planning
Last Updated: 2025-01-09
Related: GHCID PID Scheme | PiCo Ontology


1. Vision Statement

Create a globally interoperable, culturally-aware persistent identifier system for persons associated with heritage custodian institutions. The system will enable:

  • Unambiguous identification of persons across heritage collections worldwide
  • Linking individuals to their roles at heritage institutions (archivists, curators, directors, researchers)
  • Supporting genealogical and biographical research with proper provenance tracking
  • Handling the complexity of historical and cross-cultural naming conventions

2. Problem Statement

Current Challenges

Challenge Impact
Name variability Same person recorded as "Jan van der Berg", "J. v.d. Berg", "Johannes Berg"
Cultural naming diversity Patronymics (Iceland), mononyms (Indonesia), family-name-first (East Asia)
Historical uncertainty Birthdates unknown, conflicting records, incomplete data
Source fragmentation Person data scattered across LinkedIn, institutional websites, archives
Observation vs. identity Raw source data conflated with curated person records

Why Existing Systems Fall Short

System Limitation for Heritage Domain
ORCID Researcher-focused, requires self-registration, living persons only
ISNI Curated by registration agencies, expensive, slow assignment process
VIAF Library authority files only, not designed for heritage staff
Wikidata Notability requirement excludes most heritage professionals

3. Proposed Solution: PPID

Core Design Principles

  1. Opaque Identifiers: No personal information encoded in the ID itself (following ORCID best practice)
  2. Observation-Reconstruction Distinction: Separate identifiers for raw source data vs. curated person records (following PiCo ontology)
  3. Cultural Neutrality: No assumptions about name structure, birthdates, or family relationships
  4. Provenance-First: Every claim traceable to its source with confidence assertions
  5. Interoperable: Links to ORCID, ISNI, VIAF, Wikidata where available

Two-Level Identifier Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    PERSON RECONSTRUCTION (PRID)                  │
│                                                                  │
│  Curated identity: "Johannes van der Berg (1892-1967)"          │
│  PRID: ppid:PRID-xxxx-xxxx-xxxx-xxxx                            │
│                                                                  │
│  Links to external IDs:                                         │
│  - ORCID: 0000-0002-1234-5678 (if researcher)                   │
│  - Wikidata: Q12345678 (if notable)                             │
│  - VIAF: 123456789 (if in library authorities)                  │
│                                                                  │
├─────────────────────────────────────────────────────────────────┤
│  ▲                    ▲                    ▲                    │
│  │ prov:wasDerivedFrom │                    │                    │
│  │                    │                    │                    │
├──┴────────────────────┴────────────────────┴────────────────────┤
│            PERSON OBSERVATIONS (POIDs)                          │
│                                                                  │
│  LinkedIn observation:        Archive observation:               │
│  "Jan van der Berg"           "J. v.d. Berg"                    │
│  POID: ppid:POID-aaaa-...     POID: ppid:POID-bbbb-...         │
│  Source: linkedin.com/in/...  Source: archive.org/doc/123       │
│  Retrieved: 2025-01-09        Retrieved: 2024-05-15             │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Identifier Format

Component Format Example
POID (Observation) ppid:POID-xxxx-xxxx-xxxx-xxxx ppid:POID-7a3b-c4d5-e6f7-8901
PRID (Reconstruction) ppid:PRID-xxxx-xxxx-xxxx-xxxx ppid:PRID-1234-5678-90ab-cdef
Checksum ISO/IEC 7064 MOD 11-2 Last character (0-9 or X)

4. Alignment with PiCo Ontology

The PiCo (Persons in Context) ontology from CBG-Centrum-voor-familiegeschiedenis provides the conceptual foundation:

PiCo Concept PPID Implementation
picom:PersonObservation POID - identifier for raw source observation
picom:PersonReconstruction PRID - identifier for curated person identity
prov:wasDerivedFrom Links PRID to source POIDs
picom:hasName (via PNV) Structured name representation
picom:hasRole Person's role at heritage institution

PiCo's Key Innovation

PiCo explicitly separates:

  • What the source says (PersonObservation) - "the document states this person was born in 1892"
  • What we conclude (PersonReconstruction) - "we believe this person was Johannes van der Berg, born c. 1892"

This distinction is critical for:

  • Handling conflicting information across sources
  • Preserving original source data integrity
  • Supporting scholarly genealogical research
  • Enabling transparent reasoning about person identities

5. Scope and Boundaries

In Scope

  • Persons associated with heritage custodian institutions (GLAM sector)
  • Historical persons appearing in heritage collections
  • Genealogical subjects in archival records
  • Staff, researchers, donors, and stakeholders of heritage organizations

Out of Scope (Initially)

  • General public registration (unlike ORCID's self-service model)
  • Living persons without heritage sector connection
  • Fictional characters
  • Legal entities (covered by GHCID for institutions)

6. Success Criteria

Metric Target
Interoperability Bidirectional links to ORCID, ISNI, VIAF, Wikidata
Cultural coverage Support for 50+ naming conventions documented
Provenance completeness 100% of claims have source attribution
Resolution accuracy <5% false positive rate in entity matching
Adoption Integration with 3+ major genealogical platforms

7. Document Roadmap

Document Purpose
02_sota_identifier_systems.md Analysis of ORCID, ISNI, VIAF
03_pico_ontology_analysis.md Deep dive into PiCo model
04_cultural_naming_conventions.md Global naming pattern challenges
05_identifier_structure_design.md Format, checksum, namespaces
06_entity_resolution_patterns.md Handling partial/uncertain data
07_claims_and_provenance.md Web claims, provenance statements
08_implementation_guidelines.md Technical specifications
09_governance_and_sustainability.md Long-term management

8. Key Stakeholders

Stakeholder Interest
Heritage institutions Staff identification, donor tracking, researcher attribution
Genealogical organizations Standardized person identification across sources
Archival services Authority control for persons in collections
Research libraries Integration with existing authority files
Digital humanities Linked data for biographical research

9. Timeline

Phase Duration Deliverables
Research Q1 2025 This planning document series
Design Q2 2025 Identifier specification, ontology alignment
Prototype Q3 2025 Reference implementation, test dataset
Pilot Q4 2025 Integration with 2-3 heritage partners
Launch Q1 2026 Public API, documentation, governance structure

10. References