glam/docs/DECISION_FOUR_IDENTIFIERS.md
2025-11-19 23:25:22 +01:00

49 lines
1.7 KiB
Markdown

# FINAL DECISION: Four-Identifier Strategy for Heritage Institutions
**Date:** 2024-11-06
**Decision:** Implement hybrid UUID approach with four identifier formats
**Status:** ✅ Approved
---
## TL;DR
Each heritage institution gets **FOUR identifiers** serving different purposes:
| Identifier | Format | Purpose | Deterministic? | Use For |
|------------|--------|---------|----------------|---------|
| **1. UUID v7** | `018e1234-5678-7...` | Database primary key | ❌ No (time-based) | Internal DB operations, fast queries |
| **2. UUID v5** | `550e8400-e29b-4...` | **PUBLIC PID** | ✅ Yes (SHA-1) | Europeana, DPLA, IIIF, Wikidata, citations |
| **3. UUID v8** | `a1b2c3d4-e5f6-8...` | **SOTA PID** | ✅ Yes (SHA-256) | Security compliance, future-proofing |
| **4. GHCID String** | `US-CA-SAN-A-IA` | Human-readable | ✅ Yes | Documentation, citations, debugging |
---
## Why NOT Use Only UUID v7?
### UUID v7 is Time-Based, NOT Deterministic
```python
# UUID v7 generates DIFFERENT UUIDs each time!
ghcid = "US-CA-SAN-A-IA"
uuid1 = uuid7() # → 018e1234-5678-7abc-def0-123456789abc
time.sleep(1)
uuid2 = uuid7() # → 018e1234-9999-7fff-aaaa-bbbbbbbbbbbb
uuid1 != uuid2 # ❌ DIFFERENT UUIDs for same institution!
```
**Problem:** If you lose your database, you **cannot regenerate** the UUID v7 from the GHCID string because it depends on:
- Current timestamp
- Random bits
**This violates the core requirement for persistent identifiers:** deterministic generation.
---
## Summary
**Use UUID v7 for database performance, but keep UUID v5/v8 for persistent identifiers!**
All four identifiers serve complementary purposes and should be stored together.