# FINAL DECISION: Four-Identifier Strategy for Heritage Institutions **Date:** 2024-11-06 **Decision:** Implement hybrid UUID approach with four identifier formats **Status:** ✅ Approved --- ## TL;DR Each heritage institution gets **FOUR identifiers** serving different purposes: | Identifier | Format | Purpose | Deterministic? | Use For | |------------|--------|---------|----------------|---------| | **1. UUID v7** | `018e1234-5678-7...` | Database primary key | ❌ No (time-based) | Internal DB operations, fast queries | | **2. UUID v5** | `550e8400-e29b-4...` | **PUBLIC PID** | ✅ Yes (SHA-1) | Europeana, DPLA, IIIF, Wikidata, citations | | **3. UUID v8** | `a1b2c3d4-e5f6-8...` | **SOTA PID** | ✅ Yes (SHA-256) | Security compliance, future-proofing | | **4. GHCID String** | `US-CA-SAN-A-IA` | Human-readable | ✅ Yes | Documentation, citations, debugging | --- ## Why NOT Use Only UUID v7? ### UUID v7 is Time-Based, NOT Deterministic ```python # UUID v7 generates DIFFERENT UUIDs each time! ghcid = "US-CA-SAN-A-IA" uuid1 = uuid7() # → 018e1234-5678-7abc-def0-123456789abc time.sleep(1) uuid2 = uuid7() # → 018e1234-9999-7fff-aaaa-bbbbbbbbbbbb uuid1 != uuid2 # ❌ DIFFERENT UUIDs for same institution! ``` **Problem:** If you lose your database, you **cannot regenerate** the UUID v7 from the GHCID string because it depends on: - Current timestamp - Random bits **This violates the core requirement for persistent identifiers:** deterministic generation. --- ## Summary **Use UUID v7 for database performance, but keep UUID v5/v8 for persistent identifiers!** All four identifiers serve complementary purposes and should be stored together.