197 lines
9.4 KiB
Markdown
197 lines
9.4 KiB
Markdown
# Person Persistent Identifier System (PPID) - Executive Summary
|
|
|
|
**Version**: 0.1.0
|
|
**Status**: Research & Planning
|
|
**Last Updated**: 2025-01-09
|
|
**Related**: [GHCID PID Scheme](../../../docs/GHCID_PID_SCHEME.md) | [PiCo Ontology](https://github.com/CBG-Centrum-voor-familiegeschiedenis/PiCo)
|
|
|
|
---
|
|
|
|
## 1. Vision Statement
|
|
|
|
Create a **globally interoperable, culturally-aware persistent identifier system for persons** associated with heritage custodian institutions. The system will enable:
|
|
|
|
- Unambiguous identification of persons across heritage collections worldwide
|
|
- Linking individuals to their roles at heritage institutions (archivists, curators, directors, researchers)
|
|
- Supporting genealogical and biographical research with proper provenance tracking
|
|
- Handling the complexity of historical and cross-cultural naming conventions
|
|
|
|
---
|
|
|
|
## 2. Problem Statement
|
|
|
|
### Current Challenges
|
|
|
|
| Challenge | Impact |
|
|
|-----------|--------|
|
|
| **Name variability** | Same person recorded as "Jan van der Berg", "J. v.d. Berg", "Johannes Berg" |
|
|
| **Cultural naming diversity** | Patronymics (Iceland), mononyms (Indonesia), family-name-first (East Asia) |
|
|
| **Historical uncertainty** | Birthdates unknown, conflicting records, incomplete data |
|
|
| **Source fragmentation** | Person data scattered across LinkedIn, institutional websites, archives |
|
|
| **Observation vs. identity** | Raw source data conflated with curated person records |
|
|
|
|
### Why Existing Systems Fall Short
|
|
|
|
| System | Limitation for Heritage Domain |
|
|
|--------|-------------------------------|
|
|
| **ORCID** | Researcher-focused, requires self-registration, living persons only |
|
|
| **ISNI** | Curated by registration agencies, expensive, slow assignment process |
|
|
| **VIAF** | Library authority files only, not designed for heritage staff |
|
|
| **Wikidata** | Notability requirement excludes most heritage professionals |
|
|
|
|
---
|
|
|
|
## 3. Proposed Solution: PPID
|
|
|
|
### Core Design Principles
|
|
|
|
1. **Opaque Identifiers**: No personal information encoded in the ID itself (following ORCID best practice)
|
|
2. **Observation-Reconstruction Distinction**: Separate identifiers for raw source data vs. curated person records (following PiCo ontology)
|
|
3. **Cultural Neutrality**: No assumptions about name structure, birthdates, or family relationships
|
|
4. **Provenance-First**: Every claim traceable to its source with confidence assertions
|
|
5. **Interoperable**: Links to ORCID, ISNI, VIAF, Wikidata where available
|
|
|
|
### Two-Level Identifier Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ PERSON RECONSTRUCTION (PRID) │
|
|
│ │
|
|
│ Curated identity: "Johannes van der Berg (1892-1967)" │
|
|
│ PRID: ppid:PRID-xxxx-xxxx-xxxx-xxxx │
|
|
│ │
|
|
│ Links to external IDs: │
|
|
│ - ORCID: 0000-0002-1234-5678 (if researcher) │
|
|
│ - Wikidata: Q12345678 (if notable) │
|
|
│ - VIAF: 123456789 (if in library authorities) │
|
|
│ │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ ▲ ▲ ▲ │
|
|
│ │ prov:wasDerivedFrom │ │ │
|
|
│ │ │ │ │
|
|
├──┴────────────────────┴────────────────────┴────────────────────┤
|
|
│ PERSON OBSERVATIONS (POIDs) │
|
|
│ │
|
|
│ LinkedIn observation: Archive observation: │
|
|
│ "Jan van der Berg" "J. v.d. Berg" │
|
|
│ POID: ppid:POID-aaaa-... POID: ppid:POID-bbbb-... │
|
|
│ Source: linkedin.com/in/... Source: archive.org/doc/123 │
|
|
│ Retrieved: 2025-01-09 Retrieved: 2024-05-15 │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Identifier Format
|
|
|
|
| Component | Format | Example |
|
|
|-----------|--------|---------|
|
|
| **POID** (Observation) | `ppid:POID-xxxx-xxxx-xxxx-xxxx` | `ppid:POID-7a3b-c4d5-e6f7-8901` |
|
|
| **PRID** (Reconstruction) | `ppid:PRID-xxxx-xxxx-xxxx-xxxx` | `ppid:PRID-1234-5678-90ab-cdef` |
|
|
| **Checksum** | ISO/IEC 7064 MOD 11-2 | Last character (0-9 or X) |
|
|
|
|
---
|
|
|
|
## 4. Alignment with PiCo Ontology
|
|
|
|
The **PiCo (Persons in Context)** ontology from CBG-Centrum-voor-familiegeschiedenis provides the conceptual foundation:
|
|
|
|
| PiCo Concept | PPID Implementation |
|
|
|--------------|---------------------|
|
|
| `picom:PersonObservation` | POID - identifier for raw source observation |
|
|
| `picom:PersonReconstruction` | PRID - identifier for curated person identity |
|
|
| `prov:wasDerivedFrom` | Links PRID to source POIDs |
|
|
| `picom:hasName` (via PNV) | Structured name representation |
|
|
| `picom:hasRole` | Person's role at heritage institution |
|
|
|
|
### PiCo's Key Innovation
|
|
|
|
PiCo explicitly separates:
|
|
- **What the source says** (PersonObservation) - "the document states this person was born in 1892"
|
|
- **What we conclude** (PersonReconstruction) - "we believe this person was Johannes van der Berg, born c. 1892"
|
|
|
|
This distinction is critical for:
|
|
- Handling conflicting information across sources
|
|
- Preserving original source data integrity
|
|
- Supporting scholarly genealogical research
|
|
- Enabling transparent reasoning about person identities
|
|
|
|
---
|
|
|
|
## 5. Scope and Boundaries
|
|
|
|
### In Scope
|
|
|
|
- Persons associated with heritage custodian institutions (GLAM sector)
|
|
- Historical persons appearing in heritage collections
|
|
- Genealogical subjects in archival records
|
|
- Staff, researchers, donors, and stakeholders of heritage organizations
|
|
|
|
### Out of Scope (Initially)
|
|
|
|
- General public registration (unlike ORCID's self-service model)
|
|
- Living persons without heritage sector connection
|
|
- Fictional characters
|
|
- Legal entities (covered by GHCID for institutions)
|
|
|
|
---
|
|
|
|
## 6. Success Criteria
|
|
|
|
| Metric | Target |
|
|
|--------|--------|
|
|
| **Interoperability** | Bidirectional links to ORCID, ISNI, VIAF, Wikidata |
|
|
| **Cultural coverage** | Support for 50+ naming conventions documented |
|
|
| **Provenance completeness** | 100% of claims have source attribution |
|
|
| **Resolution accuracy** | <5% false positive rate in entity matching |
|
|
| **Adoption** | Integration with 3+ major genealogical platforms |
|
|
|
|
---
|
|
|
|
## 7. Document Roadmap
|
|
|
|
| Document | Purpose |
|
|
|----------|---------|
|
|
| [02_sota_identifier_systems.md](./02_sota_identifier_systems.md) | Analysis of ORCID, ISNI, VIAF |
|
|
| [03_pico_ontology_analysis.md](./03_pico_ontology_analysis.md) | Deep dive into PiCo model |
|
|
| [04_cultural_naming_conventions.md](./04_cultural_naming_conventions.md) | Global naming pattern challenges |
|
|
| [05_identifier_structure_design.md](./05_identifier_structure_design.md) | Format, checksum, namespaces |
|
|
| [06_entity_resolution_patterns.md](./06_entity_resolution_patterns.md) | Handling partial/uncertain data |
|
|
| [07_claims_and_provenance.md](./07_claims_and_provenance.md) | Web claims, provenance statements |
|
|
| [08_implementation_guidelines.md](./08_implementation_guidelines.md) | Technical specifications |
|
|
| [09_governance_and_sustainability.md](./09_governance_and_sustainability.md) | Long-term management |
|
|
|
|
---
|
|
|
|
## 8. Key Stakeholders
|
|
|
|
| Stakeholder | Interest |
|
|
|-------------|----------|
|
|
| **Heritage institutions** | Staff identification, donor tracking, researcher attribution |
|
|
| **Genealogical organizations** | Standardized person identification across sources |
|
|
| **Archival services** | Authority control for persons in collections |
|
|
| **Research libraries** | Integration with existing authority files |
|
|
| **Digital humanities** | Linked data for biographical research |
|
|
|
|
---
|
|
|
|
## 9. Timeline
|
|
|
|
| Phase | Duration | Deliverables |
|
|
|-------|----------|--------------|
|
|
| **Research** | Q1 2025 | This planning document series |
|
|
| **Design** | Q2 2025 | Identifier specification, ontology alignment |
|
|
| **Prototype** | Q3 2025 | Reference implementation, test dataset |
|
|
| **Pilot** | Q4 2025 | Integration with 2-3 heritage partners |
|
|
| **Launch** | Q1 2026 | Public API, documentation, governance structure |
|
|
|
|
---
|
|
|
|
## 10. References
|
|
|
|
- **ORCID**: https://orcid.org/ - Model for researcher identification
|
|
- **ISNI**: https://isni.org/ - ISO 27729 standard for public identities
|
|
- **VIAF**: https://viaf.org/ - Virtual International Authority File
|
|
- **PiCo Ontology**: https://github.com/CBG-Centrum-voor-familiegeschiedenis/PiCo
|
|
- **PNV**: Person Name Vocabulary for Dutch historical names
|
|
- **W3C Personal Names**: https://www.w3.org/International/questions/qa-personal-names
|
|
- **GHCID**: Our heritage custodian identifier system (parallel design)
|