State-of-the-Art Identifier Systems Analysis
Version: 0.1.0
Last Updated: 2025-01-09
Related: Executive Summary | Identifier Structure Design
1. Overview
This document analyzes three major person identifier systems to inform the design of PPID:
- ORCID - Open Researcher and Contributor ID
- ISNI - International Standard Name Identifier
- VIAF - Virtual International Authority File
Each system has distinct design philosophies, governance models, and technical implementations that offer valuable lessons.
2. ORCID (Open Researcher and Contributor ID)
2.1 Background
- Founded: 2010 (launched 2012)
- Governance: Non-profit organization
- Purpose: Uniquely identify researchers and their scholarly contributions
- Website: https://orcid.org/
2.2 Identifier Structure
Format: xxxx-xxxx-xxxx-xxxx
Example: 0000-0002-1825-0097
Components:
- 16 digits total (15 digits + 1 check digit)
- Grouped in 4 blocks of 4 characters
- Hyphen-separated for readability
- Last character: check digit (0-9 or X)
2.3 Technical Specifications
| Aspect |
Specification |
| Length |
16 characters (excluding hyphens) |
| Character set |
Digits 0-9, plus X for check digit |
| Checksum |
ISO/IEC 7064:2003, MOD 11-2 |
| Namespace |
https://orcid.org/ |
| URI format |
https://orcid.org/0000-0002-1825-0097 |
2.4 Checksum Algorithm (MOD 11-2)
def calculate_orcid_checksum(digits: str) -> str:
"""
Calculate ORCID check digit using ISO/IEC 7064 MOD 11-2.
Args:
digits: 15-digit string (without check digit)
Returns:
Check digit (0-9 or X)
"""
total = 0
for digit in digits:
total = (total + int(digit)) * 2
remainder = total % 11
result = (12 - remainder) % 11
return 'X' if result == 10 else str(result)
def validate_orcid(orcid: str) -> bool:
"""
Validate complete ORCID identifier.
Args:
orcid: 16-character ORCID (with or without hyphens)
Returns:
True if valid, False otherwise
"""
# Remove hyphens and URL prefix
clean = orcid.replace('-', '').replace('https://orcid.org/', '')
if len(clean) != 16:
return False
digits = clean[:15]
check_digit = clean[15]
return calculate_orcid_checksum(digits) == check_digit.upper()
2.5 Key Design Decisions
| Decision |
Rationale |
Lesson for PPID |
| Opaque identifiers |
No personal info encoded - prevents discrimination, ensures persistence |
Adopt: Privacy-first design |
| Random assignment |
Prevents inference of registration date or status |
Adopt: Avoid sequential IDs |
| Self-registration |
Researchers control their own record |
Adapt: Heritage sector may need institutional registration |
| Single ID per person |
One identifier for life |
Adopt: Career-long persistence |
| ISNI compatible |
16-digit format matches ISO 27729 |
Adopt: Interoperability with ISNI |
2.6 Strengths
- Wide adoption: 18+ million registrations
- Self-service: Researchers manage own profiles
- API-first: Robust REST API with OAuth
- Open data: CC0 public data file available
- Integration: Works with publishers, funders, institutions
2.7 Limitations for Heritage Domain
| Limitation |
Impact on Heritage Use |
| Living persons only |
Cannot identify historical figures |
| Self-registration model |
Deceased persons cannot register |
| Research focus |
Not designed for archivists, curators, donors |
| Notability bias |
Assumes published output |
| English-centric metadata |
Limited support for historical name forms |
3. ISNI (International Standard Name Identifier)
3.1 Background
- Standard: ISO 27729:2012
- Governance: ISNI International Agency (ISNI-IA)
- Purpose: Identify public identities of contributors to creative works
- Website: https://isni.org/
3.2 Identifier Structure
Format: xxxx xxxx xxxx xxxx
Example: 0000 0001 2103 2683
Components:
- 16 digits total (15 digits + 1 check digit)
- Typically displayed with spaces
- Last character: check digit (0-9 or X)
- Same format as ORCID (by design)
3.3 Registration Agencies
ISNI uses a federated model with multiple registration agencies:
| Agency |
Domain |
| OCLC |
Libraries, publishers |
| BnF (France) |
French cultural heritage |
| ORCID |
Researchers |
| Ringgold |
Organizations |
| Bowker |
Publishers, authors |
3.4 Key Differences from ORCID
| Aspect |
ORCID |
ISNI |
| Scope |
Researchers only |
All public identities |
| Registration |
Self-service |
Agency-mediated |
| Cost |
Free |
Fee-based (agencies charge) |
| Historical persons |
No |
Yes |
| Data control |
Individual |
Agency |
3.5 Strengths
- Broader scope: Covers authors, performers, artists, historical figures
- Quality control: Curated by registration agencies
- Linked data: Published as RDF with owl:sameAs links
- Disambiguation: Explicit clustering of variant names
3.6 Limitations for Heritage Domain
| Limitation |
Impact |
| Cost |
Registration fees may limit adoption |
| Slow assignment |
Weeks/months to receive ISNI |
| Agency dependency |
Must work through intermediary |
| Limited coverage |
Heritage staff rarely have ISNIs |
| Metadata constraints |
Fixed schema may not fit genealogical data |
4. VIAF (Virtual International Authority File)
4.1 Background
- Founded: 2003 (OCLC-hosted since 2012)
- Governance: OCLC with contributing libraries
- Purpose: Link national library authority files
- Website: https://viaf.org/
4.2 Architecture
┌─────────────────────────────────┐
│ VIAF Cluster │
│ viaf.org/viaf/102333412 │
└─────────────────────────────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Library of │ │ Deutsche │ │ Bibliothèque │
│ Congress │ │ Nationalbiblio │ │ nationale de │
│ n79021164 │ │ thek 118529579 │ │ France 11908666│
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
"Twain, Mark" "Twain, Mark" "Twain, Mark"
"Clemens, Samuel" "Clemens, Samuel "Clemens, Samuel
Langhorne" Langhorne"
4.3 Key Concepts
| Concept |
Description |
| Cluster |
A VIAF record grouping authority records from multiple sources |
| Contributor |
A library or agency providing authority data |
| Link |
owl:sameAs relationship between contributor records |
| Heading |
The authorized form of name from a contributor |
4.4 Identifier Format
Format: Numeric ID (variable length)
Example: 102333412
URI: https://viaf.org/viaf/102333412
4.5 Matching Algorithm
VIAF uses sophisticated matching to cluster records:
- Name normalization: Standardize name forms
- Date matching: Birth/death dates when available
- Work matching: Shared bibliographic works
- Manual review: Disputed clusters resolved by humans
4.6 Strengths
- Comprehensive: 40+ national libraries contributing
- Algorithmic matching: Automatic clustering of variant names
- Linked data: RDF with rich relationships
- Free access: Open data, no registration fees
- Historical coverage: Excellent for historical figures
4.7 Limitations for Heritage Domain
| Limitation |
Impact |
| Library focus |
Primarily bibliographic authority control |
| Passive creation |
Cannot request VIAF for new person |
| Work-centric |
Expects persons to have authored works |
| No provenance model |
Limited tracking of source assertions |
| Cluster instability |
Records can be split/merged over time |
5. Comparative Analysis
5.1 Feature Matrix
| Feature |
ORCID |
ISNI |
VIAF |
PPID (Proposed) |
| Format |
16-digit |
16-digit |
Numeric |
16-digit |
| Checksum |
MOD 11-2 |
MOD 11-2 |
None |
MOD 11-2 |
| Living persons |
Yes |
Yes |
Yes |
Yes |
| Historical persons |
No |
Yes |
Yes |
Yes |
| Self-registration |
Yes |
No |
No |
Hybrid |
| Free registration |
Yes |
No |
N/A |
Yes |
| Observation/reconstruction |
No |
No |
Partial |
Yes |
| Provenance tracking |
Limited |
Limited |
Limited |
Full |
| Cultural name support |
Limited |
Limited |
Good |
Comprehensive |
| Heritage sector focus |
No |
No |
Partial |
Yes |
5.2 Identifier Assignment Models
┌─────────────────────────────────────────────────────────────────┐
│ SELF-SERVICE (ORCID) │
│ │
│ Person → Registers → Gets ID immediately → Manages own record │
│ │
│ Pros: Fast, empowering, scalable │
│ Cons: Quality control, no historical persons, spam risk │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ MEDIATED (ISNI) │
│ │
│ Institution → Submits → Agency reviews → ID assigned │
│ │
│ Pros: Quality control, historical persons, authority │
│ Cons: Slow, costly, dependency on agencies │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ ALGORITHMIC (VIAF) │
│ │
│ Library catalogs → Matching algorithm → Cluster created │
│ │
│ Pros: Automatic, comprehensive, existing data │
│ Cons: No new persons, cluster instability, opaque │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ HYBRID (PPID PROPOSED) │
│ │
│ Source observation (POID) → Created automatically │
│ Person reconstruction (PRID) → Curated with provenance │
│ │
│ Pros: Best of all models, full provenance, heritage focus │
│ Cons: Complexity, requires clear governance │
└─────────────────────────────────────────────────────────────────┘
6. Interoperability Considerations
6.1 Linking Between Systems
All three systems support linking:
# VIAF links to external identifiers
<http://viaf.org/viaf/102333412>
owl:sameAs <http://id.loc.gov/authorities/names/n79021164> ;
owl:sameAs <http://d-nb.info/gnd/118529579> ;
schema:sameAs <https://www.wikidata.org/wiki/Q7245> .
# ORCID links via Wikidata
<https://www.wikidata.org/wiki/Q7245>
wdt:P496 "0000-0002-1825-0097"^^xsd:string . # ORCID
6.2 PPID Interoperability Design
PPID should support bidirectional linking:
# PPID links to external systems
<ppid:PRID-1234-5678-90ab-cdef>
owl:sameAs <https://orcid.org/0000-0002-1825-0097> ;
owl:sameAs <https://isni.org/isni/0000000121032683> ;
owl:sameAs <http://viaf.org/viaf/102333412> ;
owl:sameAs <https://www.wikidata.org/wiki/Q7245> ;
skos:exactMatch <http://id.loc.gov/authorities/names/n79021164> .
7. Lessons for PPID Design
7.1 What to Adopt
| From |
Lesson |
Implementation |
| ORCID |
Opaque 16-digit format |
Use same structure for recognizability |
| ORCID |
MOD 11-2 checksum |
Implement for validation |
| ORCID |
URI-based identifiers |
https://ppid.org/xxxx-xxxx-xxxx-xxxx |
| ISNI |
Historical person support |
No restriction to living persons |
| VIAF |
Algorithmic matching |
Support automatic clustering |
| VIAF |
Multiple name forms |
Store all variant names |
7.2 What to Avoid
| System |
Pitfall |
PPID Approach |
| ORCID |
Self-registration only |
Hybrid: institutional + algorithmic |
| ISNI |
Costly registration |
Free for heritage sector |
| VIAF |
Passive creation only |
Active creation supported |
| All |
No observation/reconstruction distinction |
PiCo-based two-level model |
| All |
Limited provenance |
Full claim tracking |
7.3 Novel PPID Features
Features not found in existing systems:
- Observation-level identifiers (POID): Track raw source data separately
- Reconstruction-level identifiers (PRID): Curated person records with provenance
- Claim-based assertions: Every fact traceable to source
- Confidence scoring: Quantified certainty for assertions
- Heritage sector focus: Designed for archivists, curators, donors
8. References
Standards
- ISO 27729:2012 - International Standard Name Identifier (ISNI)
- ISO/IEC 7064:2003 - Check character systems
Technical Documentation
Research
- Haak, L.L., et al. (2012). "ORCID: A system to uniquely identify researchers." Learned Publishing, 25(4), 259-264.
- Hickey, T.B., & Toves, J.A. (2014). "VIAF: Linking the World's Library Data." Cataloging & Classification Quarterly, 52(2), 155-166.