1.7 KiB
1.7 KiB
FINAL DECISION: Four-Identifier Strategy for Heritage Institutions
Date: 2024-11-06
Decision: Implement hybrid UUID approach with four identifier formats
Status: ✅ Approved
TL;DR
Each heritage institution gets FOUR identifiers serving different purposes:
| Identifier | Format | Purpose | Deterministic? | Use For |
|---|---|---|---|---|
| 1. UUID v7 | 018e1234-5678-7... |
Database primary key | ❌ No (time-based) | Internal DB operations, fast queries |
| 2. UUID v5 | 550e8400-e29b-4... |
PUBLIC PID | ✅ Yes (SHA-1) | Europeana, DPLA, IIIF, Wikidata, citations |
| 3. UUID v8 | a1b2c3d4-e5f6-8... |
SOTA PID | ✅ Yes (SHA-256) | Security compliance, future-proofing |
| 4. GHCID String | US-CA-SAN-A-IA |
Human-readable | ✅ Yes | Documentation, citations, debugging |
Why NOT Use Only UUID v7?
UUID v7 is Time-Based, NOT Deterministic
# UUID v7 generates DIFFERENT UUIDs each time!
ghcid = "US-CA-SAN-A-IA"
uuid1 = uuid7() # → 018e1234-5678-7abc-def0-123456789abc
time.sleep(1)
uuid2 = uuid7() # → 018e1234-9999-7fff-aaaa-bbbbbbbbbbbb
uuid1 != uuid2 # ❌ DIFFERENT UUIDs for same institution!
Problem: If you lose your database, you cannot regenerate the UUID v7 from the GHCID string because it depends on:
- Current timestamp
- Random bits
This violates the core requirement for persistent identifiers: deterministic generation.
Summary
Use UUID v7 for database performance, but keep UUID v5/v8 for persistent identifiers!
All four identifiers serve complementary purposes and should be stored together.