glam/docs/DECISION_FOUR_IDENTIFIERS.md
2025-11-19 23:25:22 +01:00

1.7 KiB

FINAL DECISION: Four-Identifier Strategy for Heritage Institutions

Date: 2024-11-06
Decision: Implement hybrid UUID approach with four identifier formats
Status: Approved


TL;DR

Each heritage institution gets FOUR identifiers serving different purposes:

Identifier Format Purpose Deterministic? Use For
1. UUID v7 018e1234-5678-7... Database primary key No (time-based) Internal DB operations, fast queries
2. UUID v5 550e8400-e29b-4... PUBLIC PID Yes (SHA-1) Europeana, DPLA, IIIF, Wikidata, citations
3. UUID v8 a1b2c3d4-e5f6-8... SOTA PID Yes (SHA-256) Security compliance, future-proofing
4. GHCID String US-CA-SAN-A-IA Human-readable Yes Documentation, citations, debugging

Why NOT Use Only UUID v7?

UUID v7 is Time-Based, NOT Deterministic

# UUID v7 generates DIFFERENT UUIDs each time!
ghcid = "US-CA-SAN-A-IA"

uuid1 = uuid7()  # → 018e1234-5678-7abc-def0-123456789abc
time.sleep(1)
uuid2 = uuid7()  # → 018e1234-9999-7fff-aaaa-bbbbbbbbbbbb

uuid1 != uuid2  # ❌ DIFFERENT UUIDs for same institution!

Problem: If you lose your database, you cannot regenerate the UUID v7 from the GHCID string because it depends on:

  • Current timestamp
  • Random bits

This violates the core requirement for persistent identifiers: deterministic generation.


Summary

Use UUID v7 for database performance, but keep UUID v5/v8 for persistent identifiers!

All four identifiers serve complementary purposes and should be stored together.