glam/docs/plan/person_pid/01_executive_summary.md

197 lines
9.4 KiB
Markdown

# Person Persistent Identifier System (PPID) - Executive Summary
**Version**: 0.1.0
**Status**: Research & Planning
**Last Updated**: 2025-01-09
**Related**: [GHCID PID Scheme](../../../docs/GHCID_PID_SCHEME.md) | [PiCo Ontology](https://github.com/CBG-Centrum-voor-familiegeschiedenis/PiCo)
---
## 1. Vision Statement
Create a **globally interoperable, culturally-aware persistent identifier system for persons** associated with heritage custodian institutions. The system will enable:
- Unambiguous identification of persons across heritage collections worldwide
- Linking individuals to their roles at heritage institutions (archivists, curators, directors, researchers)
- Supporting genealogical and biographical research with proper provenance tracking
- Handling the complexity of historical and cross-cultural naming conventions
---
## 2. Problem Statement
### Current Challenges
| Challenge | Impact |
|-----------|--------|
| **Name variability** | Same person recorded as "Jan van der Berg", "J. v.d. Berg", "Johannes Berg" |
| **Cultural naming diversity** | Patronymics (Iceland), mononyms (Indonesia), family-name-first (East Asia) |
| **Historical uncertainty** | Birthdates unknown, conflicting records, incomplete data |
| **Source fragmentation** | Person data scattered across LinkedIn, institutional websites, archives |
| **Observation vs. identity** | Raw source data conflated with curated person records |
### Why Existing Systems Fall Short
| System | Limitation for Heritage Domain |
|--------|-------------------------------|
| **ORCID** | Researcher-focused, requires self-registration, living persons only |
| **ISNI** | Curated by registration agencies, expensive, slow assignment process |
| **VIAF** | Library authority files only, not designed for heritage staff |
| **Wikidata** | Notability requirement excludes most heritage professionals |
---
## 3. Proposed Solution: PPID
### Core Design Principles
1. **Opaque Identifiers**: No personal information encoded in the ID itself (following ORCID best practice)
2. **Observation-Reconstruction Distinction**: Separate identifiers for raw source data vs. curated person records (following PiCo ontology)
3. **Cultural Neutrality**: No assumptions about name structure, birthdates, or family relationships
4. **Provenance-First**: Every claim traceable to its source with confidence assertions
5. **Interoperable**: Links to ORCID, ISNI, VIAF, Wikidata where available
### Two-Level Identifier Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ PERSON RECONSTRUCTION (PRID) │
│ │
│ Curated identity: "Johannes van der Berg (1892-1967)" │
│ PRID: ppid:PRID-xxxx-xxxx-xxxx-xxxx │
│ │
│ Links to external IDs: │
│ - ORCID: 0000-0002-1234-5678 (if researcher) │
│ - Wikidata: Q12345678 (if notable) │
│ - VIAF: 123456789 (if in library authorities) │
│ │
├─────────────────────────────────────────────────────────────────┤
│ ▲ ▲ ▲ │
│ │ prov:wasDerivedFrom │ │ │
│ │ │ │ │
├──┴────────────────────┴────────────────────┴────────────────────┤
│ PERSON OBSERVATIONS (POIDs) │
│ │
│ LinkedIn observation: Archive observation: │
│ "Jan van der Berg" "J. v.d. Berg" │
│ POID: ppid:POID-aaaa-... POID: ppid:POID-bbbb-... │
│ Source: linkedin.com/in/... Source: archive.org/doc/123 │
│ Retrieved: 2025-01-09 Retrieved: 2024-05-15 │
│ │
└─────────────────────────────────────────────────────────────────┘
```
### Identifier Format
| Component | Format | Example |
|-----------|--------|---------|
| **POID** (Observation) | `ppid:POID-xxxx-xxxx-xxxx-xxxx` | `ppid:POID-7a3b-c4d5-e6f7-8901` |
| **PRID** (Reconstruction) | `ppid:PRID-xxxx-xxxx-xxxx-xxxx` | `ppid:PRID-1234-5678-90ab-cdef` |
| **Checksum** | ISO/IEC 7064 MOD 11-2 | Last character (0-9 or X) |
---
## 4. Alignment with PiCo Ontology
The **PiCo (Persons in Context)** ontology from CBG-Centrum-voor-familiegeschiedenis provides the conceptual foundation:
| PiCo Concept | PPID Implementation |
|--------------|---------------------|
| `picom:PersonObservation` | POID - identifier for raw source observation |
| `picom:PersonReconstruction` | PRID - identifier for curated person identity |
| `prov:wasDerivedFrom` | Links PRID to source POIDs |
| `picom:hasName` (via PNV) | Structured name representation |
| `picom:hasRole` | Person's role at heritage institution |
### PiCo's Key Innovation
PiCo explicitly separates:
- **What the source says** (PersonObservation) - "the document states this person was born in 1892"
- **What we conclude** (PersonReconstruction) - "we believe this person was Johannes van der Berg, born c. 1892"
This distinction is critical for:
- Handling conflicting information across sources
- Preserving original source data integrity
- Supporting scholarly genealogical research
- Enabling transparent reasoning about person identities
---
## 5. Scope and Boundaries
### In Scope
- Persons associated with heritage custodian institutions (GLAM sector)
- Historical persons appearing in heritage collections
- Genealogical subjects in archival records
- Staff, researchers, donors, and stakeholders of heritage organizations
### Out of Scope (Initially)
- General public registration (unlike ORCID's self-service model)
- Living persons without heritage sector connection
- Fictional characters
- Legal entities (covered by GHCID for institutions)
---
## 6. Success Criteria
| Metric | Target |
|--------|--------|
| **Interoperability** | Bidirectional links to ORCID, ISNI, VIAF, Wikidata |
| **Cultural coverage** | Support for 50+ naming conventions documented |
| **Provenance completeness** | 100% of claims have source attribution |
| **Resolution accuracy** | <5% false positive rate in entity matching |
| **Adoption** | Integration with 3+ major genealogical platforms |
---
## 7. Document Roadmap
| Document | Purpose |
|----------|---------|
| [02_sota_identifier_systems.md](./02_sota_identifier_systems.md) | Analysis of ORCID, ISNI, VIAF |
| [03_pico_ontology_analysis.md](./03_pico_ontology_analysis.md) | Deep dive into PiCo model |
| [04_cultural_naming_conventions.md](./04_cultural_naming_conventions.md) | Global naming pattern challenges |
| [05_identifier_structure_design.md](./05_identifier_structure_design.md) | Format, checksum, namespaces |
| [06_entity_resolution_patterns.md](./06_entity_resolution_patterns.md) | Handling partial/uncertain data |
| [07_claims_and_provenance.md](./07_claims_and_provenance.md) | Web claims, provenance statements |
| [08_implementation_guidelines.md](./08_implementation_guidelines.md) | Technical specifications |
| [09_governance_and_sustainability.md](./09_governance_and_sustainability.md) | Long-term management |
---
## 8. Key Stakeholders
| Stakeholder | Interest |
|-------------|----------|
| **Heritage institutions** | Staff identification, donor tracking, researcher attribution |
| **Genealogical organizations** | Standardized person identification across sources |
| **Archival services** | Authority control for persons in collections |
| **Research libraries** | Integration with existing authority files |
| **Digital humanities** | Linked data for biographical research |
---
## 9. Timeline
| Phase | Duration | Deliverables |
|-------|----------|--------------|
| **Research** | Q1 2025 | This planning document series |
| **Design** | Q2 2025 | Identifier specification, ontology alignment |
| **Prototype** | Q3 2025 | Reference implementation, test dataset |
| **Pilot** | Q4 2025 | Integration with 2-3 heritage partners |
| **Launch** | Q1 2026 | Public API, documentation, governance structure |
---
## 10. References
- **ORCID**: https://orcid.org/ - Model for researcher identification
- **ISNI**: https://isni.org/ - ISO 27729 standard for public identities
- **VIAF**: https://viaf.org/ - Virtual International Authority File
- **PiCo Ontology**: https://github.com/CBG-Centrum-voor-familiegeschiedenis/PiCo
- **PNV**: Person Name Vocabulary for Dutch historical names
- **W3C Personal Names**: https://www.w3.org/International/questions/qa-personal-names
- **GHCID**: Our heritage custodian identifier system (parallel design)