glam/docs/PERSISTENT_IDENTIFIERS.md
2025-12-07 00:26:01 +01:00

25 KiB
Raw Blame History

Persistent Identifiers for Heritage Institutions

Overview

The GLAM Data Extraction project uses multiple identifier formats optimized for different purposes:

Persistent Identifiers (Deterministic)

These can be regenerated from the GHCID string and are stable across systems:

Format Bits Algorithm Use Case Status
UUID v5 128 SHA-1 PRIMARY - Europeana, DPLA, IIIF, Wikidata RFC 4122 Standard
UUID SHA-256 128 SHA-256 SOTA - Security compliance, future-proofing RFC 9562 (UUID v8)
Numeric 64 SHA-256 CSV exports, numeric analysis Internal
Human-readable Variable ISO format Citations, documentation ISO-based

Database Record Identifiers (Non-Deterministic)

These are generated once per record and optimize database performance:

Format Bits Algorithm Use Case Status
UUID v7 128 Timestamp + Random Database PKs, time-ordered queries RFC 9562 Standard

Why Four Formats?

1. UUID v5 (SHA-1) - Interoperability Standard PRIMARY

Format: 550e8400-e29b-41d4-a716-446655440000
Version: 5 (name-based, SHA-1)
Standard: RFC 4122 (2005)

Strengths:

  • RFC 4122 compliant - Universal library support
  • Deterministic - Same GHCID → Same UUID always (content-addressed)
  • Transparent - Publicly documented algorithm, anyone can verify
  • Interoperable - Works with Europeana, DPLA, IIIF, Wikidata
  • 128-bit collision resistance - P(collision) ≈ 1.5×10^-29 for 1M institutions

⚠️ SHA-1 Nuance:

  • Uses SHA-1 internally (RFC 4122 specification)
  • SHA-1 deprecated for cryptographic security (digital signatures, TLS, passwords)
  • SHA-1 appropriate for identifier generation (non-adversarial, collision-resistant)
  • See Why GHCID Uses UUID v5 and SHA-1 for detailed rationale

Why SHA-1 is Safe for GHCID:

Cryptographic Use (Vulnerable):
  - Adversarial context (attacker forges signatures)
  - Two-message collision attack
  - Security-critical (financial, authentication)

Identifier Use (Safe):
  - Non-adversarial context (no one forges museum IDs)
  - Single-source generation (we control inputs)
  - Uniqueness requirement (birthday paradox protection sufficient)

Use When:

  • Primary identifier for all GHCID records
  • Integrating with existing UUID v5 systems
  • Exporting to Europeana, DPLA, IIIF
  • Storing in Wikidata as external identifier
  • RFC 4122 strict compliance required
  • Maximum transparency required (anyone can verify)

2. UUID SHA-256 (Custom) - SOTA Cryptographic Strength

Format: a1b2c3d4-e5f6-8a1b-9c2d-3e4f5a6b7c8d
Version: 8 (custom/experimental)
Algorithm: SHA-256 (truncated to 128 bits)

Strengths:

  • SHA-256 - NIST-approved, SOTA cryptographic hash (2024)
  • Superior collision resistance vs SHA-1
  • Future-proof - No known practical attacks
  • UUID-compatible - Valid UUID format, works with UUID parsers

⚠️ Nuances:

  • Not RFC 4122 standard - Custom implementation
  • UUID v8 is "experimental/vendor-specific" designation
  • May not be recognized by strict UUID v5-only systems

Use When:

  • Security policy mandates SHA-256
  • Maximum collision resistance required
  • Future-proofing against SHA-1 deprecation
  • Custom identifier resolution service

Algorithm:

  1. Hash GHCID string with SHA-256 → 256 bits
  2. Truncate to first 128 bits (16 bytes)
  3. Set version bits to 8 (custom)
  4. Set variant bits to RFC 4122 (0b10xxxxxx)

3. Numeric (64-bit) - Database Optimization

Format: 213324328442227739
Algorithm: SHA-256 → first 8 bytes → uint64
Range: 0 to 18,446,744,073,709,551,615

Strengths:

  • Compact - Fits in SQL BIGINT (8 bytes)
  • Fast indexing - Integer comparisons faster than UUID
  • CSV-friendly - No special characters
  • Deterministic - Same GHCID → Same number

⚠️ Nuances:

  • 64-bit truncation reduces collision resistance vs full 256-bit
  • P(collision) ≈ 2.7×10^-7 for 1M institutions (0.00003%)
  • Still negligible for heritage domain (<10M institutions expected)

Use When:

  • Database primary key optimization
  • CSV exports for spreadsheet analysis
  • Numeric sorting required
  • Systems without UUID support

4. Human-Readable (ISO-based) - Citations & References

Format: US-CA-SAN-A-IA
Components: {Country}-{Region}-{City}-{Type}-{Abbreviation}
Example: NL-NH-AMS-M-RM (Rijksmuseum Amsterdam)

Strengths:

  • Human-readable - Understandable without lookup
  • Geographic context - Location embedded in ID
  • Type indicator - Institution type visible
  • Citable - Use in academic papers, documentation

⚠️ Nuances:

  • Not persistent if institution relocates or changes name
  • Use ghcid_original field (frozen) for true persistence
  • ghcid field (current) may change over time

Use When:

  • Academic citations
  • Documentation and reports
  • Human-readable data exchange
  • Debugging and logging

Collision Resistance Comparison

Mathematical Analysis

# Collision probability (birthday paradox):
# P(collision) ≈ n² / (2 × 2^bits)

# For 1,000,000 institutions:

# UUID v5 / UUID SHA-256 (128-bit):
P = (10^6)² / (2 × 2^128)  1.5 × 10^-29
# Effectively zero - more atoms in universe than collisions

# Numeric (64-bit):
P = (10^6)² / (2 × 2^64)  2.7 × 10^-7  (0.00003%)
# Negligible for heritage domain

# Even at 10 million institutions:
P_64bit = (10^7)² / (2 × 2^64)  2.7 × 10^-5  (0.003%)
# Still acceptable

Real-World Context

Institution Count UUID v5/SHA-256 Numeric (64-bit) Assessment
100,000 ~0% 2.7×10^-11 (0.0000000027%) All safe
1,000,000 ~0% 2.7×10^-7 (0.00003%) All safe
10,000,000 ~0% 2.7×10^-5 (0.003%) UUID safe, numeric acceptable
100,000,000 ~0% 0.27% ⚠️ Use UUID, numeric risky

Conclusion: For the heritage domain (expected <10M institutions worldwide), all formats provide sufficient collision resistance.


Historical Collision Resolution

The Rule: Temporal Priority Determines Disambiguation

When creating GHCIDs, collisions can occur in two temporal contexts:

  1. First Batch Creation (initial PID assignment): Multiple institutions discovered simultaneously
  2. Historical Addition (post-publication): New historical institution added after existing GHCID published

Critical Design Decision: The collision resolution strategy differs based on temporal context to preserve PID stability.

Collision Resolution: Native Language Name Suffix

Key Change: Collisions are resolved by appending the full legal name in native language in snake_case format, NOT Wikidata Q-numbers.

Name Suffix Rules:

  • Use the institution's full official name in its native language
  • Convert to snake_case (lowercase, underscores for spaces)
  • Remove apostrophes, accents, commas, and other punctuation/diacritics
  • Transliterate non-Latin scripts to ASCII (e.g., Pinyin for Chinese)

Name Normalization Examples:

"Stedelijk Museum Amsterdam" → "stedelijk_museum_amsterdam"
"Musée d'Orsay" → "musee_dorsay"
"Biblioteca Nacional do Brasil" → "biblioteca_nacional_do_brasil"
"北京故宫博物院" → "beijing_gugong_bowuyuan" (pinyin transliteration)
"Österreichische Nationalbibliothek" → "osterreichische_nationalbibliothek"

First Batch Behavior (Initial PID Creation)

Scenario: During initial GHCID generation, multiple institutions with identical base GHCIDs are discovered together.

Resolution: ALL colliding institutions get name suffixes appended.

Example:

# Discovery: Two museums in Amsterdam both generate NL-NH-AMS-M-SM

# Stedelijk Museum (founded 1874)
ghcid_original: NL-NH-AMS-M-SM-stedelijk_museum_amsterdam

# Science Museum Amsterdam (founded 2010)
ghcid_original: NL-NH-AMS-M-SM-science_museum_amsterdam

Rationale: No existing PIDs to preserve; both institutions are "new" to the system.

Historical Addition Behavior (Post-Publication)

Scenario: After initial GHCID batch is published, a historical institution is added that collides with an existing GHCID.

Resolution: ONLY the newly added historical institution gets a name suffix. The existing PID remains unchanged.

Example:

# Existing GHCID (published 2025-11-01)
ghcid_original: NL-NH-AMS-M-HM  # Hermitage Museum Amsterdam (2009-2023)

# Historical institution added later (2025-11-15)
# Amsterdam Historical Museum (1926-1975)
# Would also generate: NL-NH-AMS-M-HM
# 
# COLLISION DETECTED → Add name suffix to NEW addition ONLY
ghcid_original: NL-NH-AMS-M-HM-amsterdam_historical_museum

Outcome:

  • NL-NH-AMS-M-HM (Hermitage Museum Amsterdam) → UNCHANGED
  • NL-NH-AMS-M-HM-amsterdam_historical_museum (Amsterdam Historical Museum) → Name suffix added

Rationale: Preserve stability of already-published PIDs.

Why This Matters: PID Stability Principle

Problem: Changing existing GHCIDs breaks external references.

PIDs may already be:

  • Cited in academic publications
  • Referenced in datasets and APIs
  • Stored in institutional databases
  • Embedded in IIIF manifests
  • Linked from Wikidata

Principle: "Cool URIs don't change" (Tim Berners-Lee, W3C)

Once a GHCID is published (in first batch or as standalone record), it should NEVER change, even if new historical institutions create collisions.

Decision Table: Who Gets Name Suffix?

Scenario When Existing GHCID New GHCID Who Gets Name Suffix Rationale
First Batch Initial PID creation (2025-11-01) None (first time) NL-NH-AMS-M-SM (2 institutions) ALL colliding institutions No existing PIDs to preserve
Historical Addition Post-publication (2025-11-15) NL-NH-AMS-M-HM (published) NL-NH-AMS-M-HM (historical) ONLY newly added institution Preserve published PID stability
Standalone Addition New institution (2026-01-01) NL-NH-AMS-M-XY (published) NL-NH-AMS-M-XY (new contemporary) ONLY newly added institution Preserve existing PID

Implementation Guidance

Name Suffix Generation:

import re
import unicodedata

def generate_name_suffix(native_name: str) -> str:
    """Convert native language institution name to snake_case suffix.
    
    Examples:
        "Stedelijk Museum Amsterdam" → "stedelijk_museum_amsterdam"
        "Musée d'Orsay" → "musee_dorsay"
        "Österreichische Nationalbibliothek" → "osterreichische_nationalbibliothek"
    """
    # Normalize unicode (NFD decomposition) and remove diacritics
    normalized = unicodedata.normalize('NFD', native_name)
    ascii_name = ''.join(c for c in normalized if unicodedata.category(c) != 'Mn')
    
    # Convert to lowercase
    lowercase = ascii_name.lower()
    
    # Remove apostrophes, commas, and other punctuation
    no_punct = re.sub(r"[''`\",.:;!?()[\]{}]", '', lowercase)
    
    # Replace spaces and hyphens with underscores
    underscored = re.sub(r'[\s\-]+', '_', no_punct)
    
    # Remove any remaining non-alphanumeric characters (except underscores)
    clean = re.sub(r'[^a-z0-9_]', '', underscored)
    
    # Collapse multiple underscores
    final = re.sub(r'_+', '_', clean).strip('_')
    
    return final

Collision Detection Logic:

def resolve_collision(new_ghcid: str, new_name: str, existing_ghcids: Set[str]) -> str:
    """
    Resolve GHCID collision based on temporal context.
    
    Args:
        new_ghcid: Base GHCID for new institution
        new_name: Native language name of the institution
        existing_ghcids: Set of already-published GHCIDs
    
    Returns:
        Final GHCID (with name suffix if needed)
    """
    if new_ghcid in existing_ghcids:
        # COLLISION DETECTED: New institution collides with existing
        # Resolution: Add name suffix to NEW institution ONLY
        name_suffix = generate_name_suffix(new_name)
        return f"{new_ghcid}-{name_suffix}"
    else:
        # No collision: Use base GHCID
        return new_ghcid

First Batch Processing (different logic):

def process_first_batch(institutions: List[Institution]) -> List[GHCIDRecord]:
    """
    Process initial batch of institutions.
    
    For first batch, ALL collisions get name suffixes appended.
    """
    # Group by base GHCID
    ghcid_groups = defaultdict(list)
    for inst in institutions:
        base_ghcid = generate_base_ghcid(inst)
        ghcid_groups[base_ghcid].append(inst)
    
    records = []
    for base_ghcid, group in ghcid_groups.items():
        if len(group) == 1:
            # No collision: Use base GHCID
            records.append(create_record(group[0], base_ghcid))
        else:
            # COLLISION: ALL institutions get name suffixes
            for inst in group:
                name_suffix = generate_name_suffix(inst.name)
                ghcid = f"{base_ghcid}-{name_suffix}"
                records.append(create_record(inst, ghcid))
    
    return records

Edge Cases

Case 1: Multiple historical institutions added simultaneously

If multiple historical institutions are added together (same date) and collide with existing GHCID:

# Existing (published 2025-11-01)
ghcid: NL-NH-AMS-M-XY

# Both added 2025-11-15
# Historical Institution A: "Amsterdam Art Archive"
ghcid: NL-NH-AMS-M-XY-amsterdam_art_archive

# Historical Institution B: "Amsterdam Archaeology Museum"
ghcid: NL-NH-AMS-M-XY-amsterdam_archaeology_museum

Resolution: ALL newly added institutions get name suffixes (treat as mini-batch).

Case 2: Existing GHCID already has name suffix

If existing GHCID already has name suffix (from first batch collision), new historical addition gets different name suffix:

# Existing (from first batch with collision)
ghcid: NL-NH-AMS-M-SM-stedelijk_museum_amsterdam

# Historical addition (2025-11-15)
ghcid: NL-NH-AMS-M-SM-stadsmuseum_amsterdam  # Different name suffix

No ambiguity: Each institution has unique name suffix derived from its native language name.

Case 3: Non-Latin script names

For institutions with non-Latin script names, transliterate to ASCII:

# Chinese institution: 北京故宫博物院 (Palace Museum Beijing)
ghcid: CN-BJ-BEI-M-PM-beijing_gugong_bowuyuan

# Japanese institution: 東京国立博物館 (Tokyo National Museum)  
ghcid: JP-TK-TOK-M-TN-tokyo_kokuritsu_hakubutsukan

# Arabic institution: المتحف المصري (Egyptian Museum)
ghcid: EG-CA-CAI-M-EM-al_mathaf_al_masri

Testing Strategy

Test 1: First Batch Collision

def test_first_batch_collision():
    """Verify ALL institutions in first batch get name suffixes"""
    institutions = [
        Institution("Stedelijk Museum Amsterdam", type="M", city="AMS"),
        Institution("Science Museum Amsterdam", type="M", city="AMS")
    ]
    
    records = process_first_batch(institutions)
    
    # Both should have name suffixes
    assert records[0].ghcid == "NL-NH-AMS-M-SM-stedelijk_museum_amsterdam"
    assert records[1].ghcid == "NL-NH-AMS-M-SM-science_museum_amsterdam"

Test 2: Historical Addition Collision

def test_historical_addition_preserves_existing():
    """Verify existing GHCID unchanged when historical added"""
    # Existing GHCID (published)
    existing_ghcids = {"NL-NH-AMS-M-HM"}
    
    # Add historical institution
    historical = Institution(
        name="Amsterdam Historical Museum",
        type="M",
        city="AMS",
        temporal_extent={"start": "1926", "end": "1975"}
    )
    
    new_ghcid = resolve_collision(
        generate_base_ghcid(historical),
        historical.name,
        existing_ghcids
    )
    
    # New historical gets name suffix
    assert new_ghcid == "NL-NH-AMS-M-HM-amsterdam_historical_museum"
    
    # Existing GHCID NOT in database update
    # (verify existing record unchanged)

Test 3: Name Suffix Generation

def test_name_suffix_generation():
    """Verify name suffix normalization"""
    assert generate_name_suffix("Musée d'Orsay") == "musee_dorsay"
    assert generate_name_suffix("Österreichische Nationalbibliothek") == "osterreichische_nationalbibliothek"
    assert generate_name_suffix("Biblioteca Nacional do Brasil") == "biblioteca_nacional_do_brasil"
    assert generate_name_suffix("Royal Museum, London") == "royal_museum_london"

Documentation References

  • Collision Resolution: docs/plan/global_glam/07-ghcid-collision-resolution.md
  • GHCID Specification: docs/GHCID_PID_SCHEME.md
  • Implementation: src/glam_extractor/identifiers/ghcid.py
  • Schema: schemas/provenance.yaml (GHCIDHistoryEntry)
  • Abbreviation Special Characters: .opencode/ABBREVIATION_SPECIAL_CHAR_RULE.md (characters to exclude from abbreviations)

SHA-1 vs SHA-256: The Nuance

Why UUID v5 Uses SHA-1

RFC 4122 (2005) standardized UUID v5 with SHA-1 because:

  • SHA-1 was considered secure in 2005
  • 128-bit UUID space provides collision resistance even with SHA-1
  • Purpose is identifier generation, not security/authentication

SHA-1 Cryptographic Weakness

SHA-1 collision attacks (2017):

  • Google/CWI demonstrated practical SHA-1 collision
  • Two different inputs producing same hash
  • Critical for digital signatures (authentication, certificates)
  • Less critical for identifiers (birthday paradox protection sufficient)

When SHA-1 Is Problematic

Digital signatures - Attacker can forge documents Certificate authorities - SSL/TLS security compromised Password hashing - Weakens brute-force resistance Blockchain - Consensus security at risk

When SHA-1 Is Acceptable

UUID generation - Collision resistance adequate for identifier space Git commits - Linus Torvalds: "SHA-1 is fine for Git's use case" Non-adversarial contexts - No attacker trying to cause collisions


Default: Dual UUID Approach

Store both UUID formats for maximum flexibility:

# Example YAML record
- id: 550e8400-e29b-41d4-a716-446655440000  # Use UUID v5 as primary ID
  name: Internet Archive
  institution_type: ARCHIVE
  ghcid: US-CA-SAN-A-IA
  ghcid_uuid: 550e8400-e29b-41d4-a716-446655440000  # UUID v5 (SHA-1)
  ghcid_uuid_sha256: a1b2c3d4-e5f6-8a1b-9c2d-3e4f5a6b7c8d  # UUID SHA-256
  ghcid_numeric: 213324328442227739  # Numeric (64-bit)
  identifiers:
    - identifier_scheme: GHCID
      identifier_value: US-CA-SAN-A-IA
    - identifier_scheme: GHCID_UUID_V5
      identifier_value: 550e8400-e29b-41d4-a716-446655440000
    - identifier_scheme: GHCID_UUID_SHA256
      identifier_value: a1b2c3d4-e5f6-8a1b-9c2d-3e4f5a6b7c8d
    - identifier_scheme: GHCID_NUMERIC
      identifier_value: 213324328442227739

Use Case Decision Tree

Need to integrate with existing systems?
├─ YES → Use UUID v5 (to_uuid())
│         - Europeana, DPLA, IIIF, Wikidata
│         - RFC 4122 compliance required
│
└─ NO → Building custom system?
    ├─ Security policy mandates SHA-256?
    │  ├─ YES → Use UUID SHA-256 (to_uuid_sha256())
    │  └─ NO → Use UUID v5 for standard compliance
    │
    └─ Database optimization critical?
        ├─ YES → Use Numeric (to_numeric()) as PK
        │         - Store UUID v5 as alternate key
        └─ NO → Use UUID v5 as primary identifier

Code Examples

Generate All Four Formats

from glam_extractor.identifiers.ghcid import GHCIDComponents

# Create GHCID components
components = GHCIDComponents(
    country_code="US",
    region_code="CA",
    city_locode="SAN",
    institution_type="A",
    abbreviation="IA"
)

# Generate all formats
uuid_v5 = components.to_uuid()           # UUID v5 (SHA-1)
uuid_sha256 = components.to_uuid_sha256()  # UUID SHA-256
numeric = components.to_numeric()        # Numeric (64-bit)
human = components.to_string()           # Human-readable

print(f"UUID v5:      {uuid_v5}")
print(f"UUID SHA-256: {uuid_sha256}")
print(f"Numeric:      {numeric}")
print(f"Human:        {human}")

# Output:
# UUID v5:      550e8400-e29b-41d4-a716-446655440000
# UUID SHA-256: a1b2c3d4-e5f6-8a1b-9c2d-3e4f5a6b7c8d
# Numeric:      213324328442227739
# Human:        US-CA-SAN-A-IA

Verify Determinism

# Same input always produces same output
comp1 = GHCIDComponents("NL", "NH", "AMS", "M", "RM")
comp2 = GHCIDComponents("NL", "NH", "AMS", "M", "RM")

assert comp1.to_uuid() == comp2.to_uuid()
assert comp1.to_uuid_sha256() == comp2.to_uuid_sha256()
assert comp1.to_numeric() == comp2.to_numeric()
assert comp1.to_string() == comp2.to_string()

Export to Different Formats

# RDF/JSON-LD (use UUID v5)
rdf_id = f"urn:uuid:{components.to_uuid()}"
# → "urn:uuid:550e8400-e29b-41d4-a716-446655440000"

# IIIF Manifest (use UUID v5)
iiif_id = f"https://iiif.example.org/manifests/{components.to_uuid()}/manifest.json"

# Database (use numeric PK)
sql = f"INSERT INTO institutions (id, name) VALUES ({components.to_numeric()}, 'Internet Archive')"

# Citation (use human-readable)
citation = f"See Internet Archive ({components.to_string()}) for digital collections."

Future-Proofing Strategy

Timeline Projections

Year SHA-1 Status UUID v5 Status Recommendation
2024 Weak for security, OK for IDs Standard, widely supported Use UUID v5 as primary
2030 Likely deprecated for security Still standard for IDs Dual UUID (v5 + SHA-256)
2040 Possibly deprecated entirely May be superseded ⚠️ Migrate to UUID SHA-256

Migration Path

If SHA-1 is fully deprecated:

  1. Phase 1 (Now): Store both UUID v5 and UUID SHA-256
  2. Phase 2 (2030): Make UUID SHA-256 primary, keep v5 as alias
  3. Phase 3 (2040): Deprecate UUID v5, use SHA-256 exclusively

Critical: Because both are deterministic, you can always regenerate from GHCID string without breaking references.


Governance & Resolution

Identifier Persistence Requirements

Technical generation is only half the solution. True persistence requires:

1. Resolution Service

https://id.heritage.example.org/uuid/{uuid}
https://id.heritage.example.org/numeric/{numeric}
https://id.heritage.example.org/ghcid/{ghcid}

All three should resolve to the same institutional record.

2. Mapping Database

CREATE TABLE ghcid_registry (
    uuid_v5 UUID PRIMARY KEY,
    uuid_sha256 UUID NOT NULL,
    numeric BIGINT NOT NULL,
    ghcid VARCHAR(100) NOT NULL,
    ghcid_original VARCHAR(100) NOT NULL,  -- Frozen
    institution_name TEXT NOT NULL,
    last_updated TIMESTAMP,
    UNIQUE(uuid_sha256),
    UNIQUE(numeric),
    UNIQUE(ghcid_original)
);

3. Organizational Commitment

  • Maintain resolution service for decades
  • Fund infrastructure for long-term operation
  • Establish governance policies for ID assignment
  • Handle institution mergers/closures/relocations

4. Community Standards

  • Coordinate with ISIL, Wikidata, GeoNames
  • Publish GHCID specification as RFC or W3C note
  • Engage with Europeana, DPLA, IIIF communities
  • Establish dispute resolution process

Comparison with Existing PID Systems

System Format Governance Resolution Adoption
DOI 10.xxxx/yyyy IDF (non-profit) doi.org High (scholarly)
ARK ark:/nnnnn/xxx CDL (California) n2t.net Medium (archives)
Handle hdl:xxxx/yyyy CNRI (non-profit) handle.net Medium (repositories)
GHCID UUID v5 TBD TBD None (new)

Lesson: Technical mechanism is necessary but not sufficient. Governance and organizational commitment are critical.


Recommendations

For This Project (2024-2025)

  1. Implement dual UUID generation (v5 + SHA-256)
  2. Store all four identifier formats in data model
  3. Use UUID v5 as primary ID for current interoperability
  4. Document SHA-1 nuance clearly
  5. Build resolution service prototype
  6. Engage with Europeana/DPLA for feedback
  7. Draft GHCID specification for community review

For Production Deployment

  1. Establish governance body (non-profit foundation?)
  2. Secure long-term funding for resolution service
  3. Coordinate with existing PID systems (ISIL, VIAF, Wikidata)
  4. Publish specification (W3C note or IETF RFC)
  5. Deploy resolution infrastructure (multi-region, high availability)
  6. Engage heritage community for adoption

References


Version: 1.0
Date: 2024-11-06
Status: Draft for Community Review