49 lines
1.7 KiB
Markdown
49 lines
1.7 KiB
Markdown
# FINAL DECISION: Four-Identifier Strategy for Heritage Institutions
|
|
|
|
**Date:** 2024-11-06
|
|
**Decision:** Implement hybrid UUID approach with four identifier formats
|
|
**Status:** ✅ Approved
|
|
|
|
---
|
|
|
|
## TL;DR
|
|
|
|
Each heritage institution gets **FOUR identifiers** serving different purposes:
|
|
|
|
| Identifier | Format | Purpose | Deterministic? | Use For |
|
|
|------------|--------|---------|----------------|---------|
|
|
| **1. UUID v7** | `018e1234-5678-7...` | Database primary key | ❌ No (time-based) | Internal DB operations, fast queries |
|
|
| **2. UUID v5** | `550e8400-e29b-4...` | **PUBLIC PID** | ✅ Yes (SHA-1) | Europeana, DPLA, IIIF, Wikidata, citations |
|
|
| **3. UUID v8** | `a1b2c3d4-e5f6-8...` | **SOTA PID** | ✅ Yes (SHA-256) | Security compliance, future-proofing |
|
|
| **4. GHCID String** | `US-CA-SAN-A-IA` | Human-readable | ✅ Yes | Documentation, citations, debugging |
|
|
|
|
---
|
|
|
|
## Why NOT Use Only UUID v7?
|
|
|
|
### UUID v7 is Time-Based, NOT Deterministic
|
|
|
|
```python
|
|
# UUID v7 generates DIFFERENT UUIDs each time!
|
|
ghcid = "US-CA-SAN-A-IA"
|
|
|
|
uuid1 = uuid7() # → 018e1234-5678-7abc-def0-123456789abc
|
|
time.sleep(1)
|
|
uuid2 = uuid7() # → 018e1234-9999-7fff-aaaa-bbbbbbbbbbbb
|
|
|
|
uuid1 != uuid2 # ❌ DIFFERENT UUIDs for same institution!
|
|
```
|
|
|
|
**Problem:** If you lose your database, you **cannot regenerate** the UUID v7 from the GHCID string because it depends on:
|
|
- Current timestamp
|
|
- Random bits
|
|
|
|
**This violates the core requirement for persistent identifiers:** deterministic generation.
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
**Use UUID v7 for database performance, but keep UUID v5/v8 for persistent identifiers!**
|
|
|
|
All four identifiers serve complementary purposes and should be stored together.
|