From 7f53ec6074b8be7b2f540aebf48a9f840e8d8fb3 Mon Sep 17 00:00:00 2001
From: kempersc <sckemper@mailfence.com>
Date: Fri, 9 Jan 2026 15:57:26 +0100
Subject: [PATCH] docs(person_pid): add PPID-GHCID alignment and PiCo
 comparison docs

---
 .../person_pid/10_ppid_ghcid_alignment.md     | 1150 +++++++++++++++++
 .../person_pid/11_pico_ppid_comparison.md     |  475 +++++++
 2 files changed, 1625 insertions(+)
 create mode 100644 docs/plan/person_pid/10_ppid_ghcid_alignment.md
 create mode 100644 docs/plan/person_pid/11_pico_ppid_comparison.md

diff --git a/docs/plan/person_pid/10_ppid_ghcid_alignment.md b/docs/plan/person_pid/10_ppid_ghcid_alignment.md
new file mode 100644
index 0000000000..a9fec96913
--- /dev/null
+++ b/docs/plan/person_pid/10_ppid_ghcid_alignment.md
@@ -0,0 +1,1150 @@
+# PPID-GHCID Alignment: Revised Identifier Structure
+
+**Version**: 0.1.0  
+**Last Updated**: 2025-01-09  
+**Status**: DRAFT - Supersedes opaque identifier design in [05_identifier_structure_design.md](./05_identifier_structure_design.md)  
+**Related**: [GHCID Specification](../../GHCID_PID_SCHEME.md) | [PiCo Ontology](./03_pico_ontology_analysis.md)
+
+---
+
+## 1. Executive Summary
+
+This document proposes a **revised PPID structure** that aligns with GHCID's geographic-semantic identifier pattern while accommodating the unique challenges of person identification across historical records.
+
+### 1.1 Key Changes from Original Design
+
+| Aspect | Original (Doc 05) | Revised (This Document) |
+|--------|-------------------|-------------------------|
+| **Format** | Opaque hex (`POID-7a3b-c4d5-...`) | Semantic (`PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG`) |
+| **Type Distinction** | POID vs PRID | ID (temporary) vs PID (persistent) |
+| **Geographic** | None in identifier | Dual anchors: first + last observation |
+| **Temporal** | None in identifier | Century range |
+| **Name** | None in identifier | First + last token of emic label |
+| **Persistence** | Always persistent | May remain ID indefinitely |
+
+### 1.2 Design Philosophy
+
+The revised PPID follows the same principles as GHCID:
+
+1. **Human-readable semantic components** that aid discovery and deduplication
+2. **Geographic anchoring** to physical locations using GeoNames
+3. **Temporal anchoring** to enable disambiguation across time
+4. **Emic authenticity** using names from primary sources
+5. **Collision resolution** via full emic label suffix
+6. **Dual representation** as both semantic string and UUID/numeric
+
+---
+
+## 2. Identifier Type: ID vs PID
+
+### 2.1 The Epistemic Uncertainty Problem
+
+Unlike institutions (which typically have founding documents, legal registrations, and clear organizational boundaries), **persons in historical records often exist in epistemic uncertainty**:
+
+- Incomplete records (many records lost to time)
+- Ambiguous references (common names, no surnames)
+- Conflicting sources (different dates, spellings)
+- Undiscovered archives (unexplored record sets)
+
+### 2.2 Two-Class Identifier System
+
+| Type | Prefix | Description | Persistence | Promotion Path |
+|------|--------|-------------|-------------|----------------|
+| **ID** | `ID-` | Temporary identifier | May change | Can become PID |
+| **PID** | `PID-` | Persistent identifier | Permanent | Cannot revert to ID |
+
+### 2.3 Promotion Criteria: ID → PID
+
+An identifier can be promoted from ID to PID when ALL of the following are satisfied:
+
+```python
+@dataclass
+class PIDPromotionCriteria:
+    """
+    Criteria for promoting an ID to a PID.
+    ALL conditions must be True for promotion.
+    """
+    
+    # Geographic anchors
+    first_observation_verified: bool  # Birth or equivalent
+    last_observation_verified: bool   # Death or equivalent
+    
+    # Temporal anchors
+    century_range_established: bool   # From verified observations
+    
+    # Identity anchors
+    emic_label_verified: bool         # From primary sources
+    no_unexplored_archives: bool      # Reasonable assumption
+    
+    # Quality checks
+    no_unresolved_conflicts: bool     # No conflicting claims
+    multiple_corroborating_sources: bool  # At least 2 independent sources
+    
+    def is_promotable(self) -> bool:
+        return all([
+            self.first_observation_verified,
+            self.last_observation_verified,
+            self.century_range_established,
+            self.emic_label_verified,
+            self.no_unexplored_archives,
+            self.no_unresolved_conflicts,
+            self.multiple_corroborating_sources,
+        ])
+```
+
+### 2.4 Permanent ID Status
+
+Some identifiers may **forever remain IDs** due to:
+
+- **Fragmentary records**: Only one surviving document mentions the person
+- **Uncertain dates**: Cannot establish century range
+- **Unknown location**: Cannot anchor geographically
+- **Anonymous figures**: No emic label recoverable
+- **Ongoing research**: Archives not yet explored
+
+This is acceptable and expected. An ID is still a valid identifier for internal use; it simply cannot be cited as a persistent identifier in scholarly work.
+
+---
+
+## 3. Identifier Structure
+
+### 3.1 Full Format Specification
+
+```
+{TYPE}-{FC}-{FR}-{FP}-{LC}-{LR}-{LP}-{CR}-{FT}-{LT}[-{FULL_EMIC}]
+  │      │    │    │    │    │    │    │    │    │       │
+  │      │    │    │    │    │    │    │    │    │       └── Collision suffix (optional)
+  │      │    │    │    │    │    │    │    │    └── Last Token of emic label
+  │      │    │    │    │    │    │    │    └── First Token of emic label
+  │      │    │    │    │    │    │    └── Century Range (e.g., 19-20)
+  │      │    │    │    │    │    └── Last observation Place (GeoNames 3-letter)
+  │      │    │    │    │    └── Last observation Region (ISO 3166-2)
+  │      │    │    │    └── Last observation Country (ISO 3166-1 alpha-2)
+  │      │    │    └── First observation Place (GeoNames 3-letter)
+  │      │    └── First observation Region (ISO 3166-2)
+  │      └── First observation Country (ISO 3166-1 alpha-2)
+  └── Type: ID or PID
+```
+
+### 3.2 Component Definitions
+
+| Component | Format | Description | Example |
+|-----------|--------|-------------|---------|
+| **TYPE** | `ID` or `PID` | Identifier class | `PID` |
+| **FC** | ISO 3166-1 α2 | First observation country (modern) | `NL` |
+| **FR** | ISO 3166-2 suffix | First observation region | `NH` |
+| **FP** | 3 letters | First observation place (GeoNames) | `AMS` |
+| **LC** | ISO 3166-1 α2 | Last observation country (modern) | `NL` |
+| **LR** | ISO 3166-2 suffix | Last observation region | `NH` |
+| **LP** | 3 letters | Last observation place (GeoNames) | `HAA` |
+| **CR** | `CC-CC` | Century range (CE) | `19-20` |
+| **FT** | UPPERCASE | First token of emic label | `JAN` |
+| **LT** | UPPERCASE | Last token of emic label | `BERG` |
+| **FULL_EMIC** | snake_case | Full emic label (collision only) | `jan_van_den_berg` |
+
+### 3.3 Examples
+
+| Person | Full Emic Label | PPID |
+|--------|-----------------|------|
+| Jan van den Berg, born Amsterdam 1895, died Haarlem 1970 | Jan van den Berg | `PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG` |
+| Rembrandt, born Leiden 1606, died Amsterdam 1669 | Rembrandt van Rijn | `PID-NL-ZH-LEI-NL-NH-AMS-17-17-REMBRANDT-RIJN` |
+| Maria Sibylla Merian, born Frankfurt 1647, died Amsterdam 1717 | Maria Sibylla Merian | `PID-DE-HE-FRA-NL-NH-AMS-17-18-MARIA-MERIAN` |
+| Unknown soldier, found Normandy, died 1944 | (unknown) | `ID-XX-XX-XXX-FR-NM-OMH-20-20-UNKNOWN-` |
+| Henry VIII, born London 1491, died London 1547 | Henry VIII | `PID-GB-ENG-LON-GB-ENG-LON-15-16-HENRY-VIII` |
+
+**Notes on Emic Labels**:
+- Always use **formal/complete emic names** from primary sources, not modern colloquial short forms
+- "Rembrandt" alone is a modern convention; the emic label from his lifetime was "Rembrandt van Rijn"
+- **Tussenvoegsels (particles)** like "van", "de", "den", "der", "van de", "van den", "van der" are **skipped** when extracting the last token (see §4.5)
+- This follows the same pattern as GHCID abbreviation rules (AGENTS.md Rule 8)
+
+---
+
+## 4. Component Rules
+
+### 4.1 First Observation (Birth or Earliest)
+
+```python
+from dataclasses import dataclass
+from enum import Enum
+from typing import Optional
+
+class ObservationType(Enum):
+    BIRTH_CERTIFICATE = "birth_certificate"       # Highest authority
+    BAPTISM_RECORD = "baptism_record"             # Common for pre-civil registration
+    BIRTH_STATEMENT = "birth_statement"           # Stated birth in other document
+    EARLIEST_REFERENCE = "earliest_reference"     # Earliest surviving mention
+    INFERRED = "inferred"                         # Inferred from context
+
+@dataclass
+class FirstObservation:
+    """
+    First observation of a person during their lifetime.
+    Ideally a birth record, but may be another early record.
+    """
+    
+    observation_type: ObservationType
+    
+    # Modern geographic codes (mapped from historical)
+    country_code: str       # ISO 3166-1 alpha-2
+    region_code: str        # ISO 3166-2 subdivision
+    place_code: str         # GeoNames 3-letter code
+    
+    # Original historical reference
+    historical_place_name: str      # As named in source
+    historical_date: str            # As stated in source
+    
+    # Mapping provenance
+    modern_mapping_method: str      # How historical → modern mapping done
+    geonames_id: Optional[int]      # GeoNames ID for place
+    
+    # Quality indicators
+    is_birth_record: bool
+    can_assume_earliest: bool       # No unexplored archives likely
+    source_confidence: float        # 0.0 - 1.0
+    
+    def is_valid_for_pid(self) -> bool:
+        """
+        Determine if this observation is valid for PID generation.
+        """
+        if self.is_birth_record:
+            return True
+        
+        if self.observation_type == ObservationType.EARLIEST_REFERENCE:
+            # Must be able to assume this is actually the earliest
+            return self.can_assume_earliest and self.source_confidence >= 0.8
+        
+        return False
+```
+
+### 4.2 Last Observation (Death or Latest During Lifetime)
+
+```python
+@dataclass
+class LastObservation:
+    """
+    Last observation of a person during their lifetime or immediate after death.
+    Ideally a death record, but may be last known living reference.
+    """
+    
+    observation_type: ObservationType  # Reusing enum, but DEATH_CERTIFICATE etc.
+    
+    # Modern geographic codes
+    country_code: str
+    region_code: str
+    place_code: str
+    
+    # Original historical reference
+    historical_place_name: str
+    historical_date: str
+    
+    # Critical distinction
+    is_death_record: bool
+    is_lifetime_observation: bool     # True if person still alive at observation
+    is_immediate_post_death: bool     # First record after death
+    
+    # Quality
+    can_assume_latest: bool
+    source_confidence: float
+    
+    def is_valid_for_pid(self) -> bool:
+        if self.is_death_record:
+            return True
+        
+        if self.is_immediate_post_death:
+            # First mention of death
+            return self.source_confidence >= 0.8
+        
+        if self.is_lifetime_observation:
+            # Last known alive, but not death record
+            return self.can_assume_latest and self.source_confidence >= 0.8
+        
+        return False
+```
+
+### 4.3 Geographic Mapping: Historical → Modern
+
+```python
+from dataclasses import dataclass
+from typing import Optional, Tuple
+
+@dataclass
+class HistoricalPlaceMapping:
+    """
+    Map historical place names to modern ISO/GeoNames codes.
+    
+    Historical places must be mapped to their MODERN equivalents
+    as of the PPID generation date. This ensures stability even
+    when historical boundaries shifted.
+    """
+    
+    # Historical input
+    historical_name: str
+    historical_date: str  # When the place was referenced
+    
+    # Modern output (at PPID generation time)
+    modern_country_code: str   # ISO 3166-1 alpha-2
+    modern_region_code: str    # ISO 3166-2 suffix (e.g., "NH" not "NL-NH")
+    modern_place_code: str     # 3-letter from GeoNames
+    
+    # GeoNames reference
+    geonames_id: int
+    geonames_name: str         # Modern canonical name
+    geonames_feature_class: str  # P = populated place
+    geonames_feature_code: str   # PPL, PPLA, PPLC, etc.
+    
+    # Mapping provenance
+    mapping_method: str        # "direct", "successor", "enclosing", "manual"
+    mapping_confidence: float
+    mapping_notes: str
+    ppid_generation_date: str  # When mapping was performed
+
+def map_historical_to_modern(
+    historical_name: str,
+    historical_date: str,
+    db
+) -> HistoricalPlaceMapping:
+    """
+    Map a historical place name to modern ISO/GeoNames codes.
+    
+    Strategies (in order):
+    1. Direct match: Place still exists with same name
+    2. Successor: Place renamed but geographically same
+    3. Enclosing: Place absorbed into larger entity
+    4. Manual: Requires human research
+    """
+    
+    # Strategy 1: Direct GeoNames lookup
+    direct_match = db.geonames_search(historical_name)
+    if direct_match and direct_match.is_populated_place:
+        return HistoricalPlaceMapping(
+            historical_name=historical_name,
+            historical_date=historical_date,
+            modern_country_code=direct_match.country_code,
+            modern_region_code=direct_match.admin1_code,
+            modern_place_code=generate_place_code(direct_match.name),
+            geonames_id=direct_match.geonames_id,
+            geonames_name=direct_match.name,
+            geonames_feature_class=direct_match.feature_class,
+            geonames_feature_code=direct_match.feature_code,
+            mapping_method="direct",
+            mapping_confidence=0.95,
+            mapping_notes="Direct GeoNames match",
+            ppid_generation_date=datetime.utcnow().isoformat()
+        )
+    
+    # Strategy 2: Historical name lookup (renamed places)
+    # e.g., "Batavia" → "Jakarta"
+    historical_match = db.historical_place_names.get(historical_name)
+    if historical_match:
+        modern = db.geonames_by_id(historical_match.modern_geonames_id)
+        return HistoricalPlaceMapping(
+            historical_name=historical_name,
+            historical_date=historical_date,
+            modern_country_code=modern.country_code,
+            modern_region_code=modern.admin1_code,
+            modern_place_code=generate_place_code(modern.name),
+            geonames_id=modern.geonames_id,
+            geonames_name=modern.name,
+            geonames_feature_class=modern.feature_class,
+            geonames_feature_code=modern.feature_code,
+            mapping_method="successor",
+            mapping_confidence=0.90,
+            mapping_notes=f"Historical name '{historical_name}' → modern '{modern.name}'",
+            ppid_generation_date=datetime.utcnow().isoformat()
+        )
+    
+    # Strategy 3: Geographic coordinates (if available from source)
+    # Reverse geocode to find enclosing modern settlement
+    
+    # Strategy 4: Manual research required
+    raise ManualResearchRequired(
+        f"Cannot automatically map '{historical_name}' ({historical_date}) to modern location"
+    )
+
+
+def generate_place_code(place_name: str) -> str:
+    """
+    Generate 3-letter place code from GeoNames name.
+    
+    Rules (same as GHCID):
+    - Single word: First 3 letters → "Amsterdam" → "AMS"
+    - Multi-word: Initials → "New York" → "NYO" (or "NYC" if registered)
+    - Dutch articles: Article initial + 2 from main → "Den Haag" → "DHA"
+    """
+    # Implementation follows GHCID rules
+    # See AGENTS.md: "SETTLEMENT STANDARDIZATION: GEONAMES IS AUTHORITATIVE"
+    pass
+```
+
+### 4.4 Century Range Calculation
+
+```python
+def calculate_century_range(
+    first_observation: FirstObservation,
+    last_observation: LastObservation
+) -> str:
+    """
+    Calculate the CE century range for a person's lifetime.
+    
+    Returns format: "CC-CC" (e.g., "19-20" for 1850-1925)
+    
+    Rules:
+    - Centuries are 1-indexed: 1-100 AD = 1st century, 1901-2000 = 20th century
+    - BCE dates: Use negative century numbers (e.g., "-5--4" for 5th-4th century BCE)
+      This follows ISO 8601 extended format which uses negative years for BCE
+    - Range must be from verified observations
+    """
+    
+    def year_to_century(year: int) -> int:
+        """
+        Convert year to century number.
+        
+        Positive years (CE): 1-100 = century 1, 1901-2000 = century 20
+        Negative years (BCE): -500 to -401 = century -5
+        
+        Note: There is no year 0 in the proleptic Gregorian calendar.
+        Year 1 BCE is followed directly by year 1 CE.
+        """
+        if year > 0:
+            return ((year - 1) // 100) + 1
+        else:
+            # BCE: year -500 → century -5, year -1 → century -1
+            return (year // 100)
+    
+    def parse_year(date_str: str) -> int:
+        """Extract year from various date formats."""
+        # Handle: "1895", "1895-03-15", "March 1895", "c. 1895", etc.
+        # Also handle BCE: "-500", "500 BCE", "500 BC", "c. 500 BCE"
+        import re
+        
+        # Check for BCE indicators
+        bce_match = re.search(r'(\d+)\s*(BCE|BC|B\.C\.E?\.|v\.Chr\.)', date_str, re.IGNORECASE)
+        if bce_match:
+            return -int(bce_match.group(1))
+        
+        # Check for negative year (ISO 8601 extended)
+        neg_match = re.search(r'-(\d+)', date_str)
+        if neg_match and date_str.strip().startswith('-'):
+            return -int(neg_match.group(1))
+        
+        # Standard positive year
+        match = re.search(r'\b(\d{4})\b', date_str)
+        if match:
+            return int(match.group(1))
+        
+        # 3-digit year (ancient dates)
+        match = re.search(r'\b(\d{3})\b', date_str)
+        if match:
+            return int(match.group(1))
+            
+        raise ValueError(f"Cannot parse year from: {date_str}")
+    
+    first_year = parse_year(first_observation.historical_date)
+    last_year = parse_year(last_observation.historical_date)
+    
+    first_century = year_to_century(first_year)
+    last_century = year_to_century(last_year)
+    
+    # Validation
+    if last_century < first_century:
+        raise ValueError(
+            f"Last observation ({last_year}) cannot be before "
+            f"first observation ({first_year})"
+        )
+    
+    return f"{first_century}-{last_century}"
+
+
+# Examples (CE):
+# 1850 → century 19
+# 1925 → century 20
+# Range: "19-20"
+
+# 1606 → century 17
+# 1669 → century 17
+# Range: "17-17" (same century)
+
+# 1895 → century 19
+# 2005 → century 21
+# Range: "19-21" (centenarian)
+
+# Examples (BCE):
+# -500 (500 BCE) → century -5
+# -401 (401 BCE) → century -5
+# Range: "-5--5" (same century)
+
+# -469 (469 BCE, Socrates birth) → century -5
+# -399 (399 BCE, Socrates death) → century -4
+# Range: "-5--4"
+
+# -100 (100 BCE) → century -1
+# 14 (14 CE) → century 1
+# Range: "-1-1" (crossing BCE/CE boundary)
+```
+
+### 4.5 Emic Label Tokens
+
+```python
+from dataclasses import dataclass
+from typing import Optional, List
+import re
+
+@dataclass
+class EmicLabel:
+    """
+    The common contemporary emic label of a person.
+    
+    "Emic" = from the insider perspective, as the person was known
+    during their lifetime in primary sources.
+    
+    "Etic" = from the outsider perspective, how we refer to them now.
+    
+    Prefer emic; fall back to etic only if emic unrecoverable.
+    """
+    
+    full_label: str           # Complete emic label
+    first_token: str          # First word/token
+    last_token: str           # Last word/token (empty if mononym)
+    
+    # Source provenance
+    source_type: str          # "primary" or "etic_fallback"
+    source_document: str      # Reference to source
+    source_date: str          # When source was created
+    
+    # Quality
+    is_from_primary_source: bool
+    is_vernacular: bool       # From vernacular (non-official) source
+    confidence: float
+    
+    @classmethod
+    def from_full_label(cls, label: str, **kwargs) -> 'EmicLabel':
+        """Parse full label into first and last tokens."""
+        tokens = tokenize_emic_label(label)
+        
+        first_token = tokens[0].upper() if tokens else ""
+        last_token = tokens[-1].upper() if len(tokens) > 1 else ""
+        
+        return cls(
+            full_label=label,
+            first_token=first_token,
+            last_token=last_token,
+            **kwargs
+        )
+
+
+def tokenize_emic_label(label: str) -> List[str]:
+    """
+    Tokenize an emic label into words.
+    
+    Rules:
+    - Split on whitespace
+    - Preserve numeric tokens (e.g., "VIII" in "Henry VIII")
+    - Do NOT split compound words
+    - Normalize to uppercase for identifier
+    """
+    # Basic whitespace split
+    tokens = label.strip().split()
+    
+    # Filter empty tokens
+    tokens = [t for t in tokens if t]
+    
+    return tokens
+
+
+def extract_name_tokens(
+    full_emic_label: str
+) -> tuple[str, str]:
+    """
+    Extract first and last tokens from emic label.
+    
+    Rules:
+    1. First token: First word of the emic label
+    2. Last token: Last word AFTER skipping tussenvoegsels (name particles)
+    
+    Tussenvoegsels are common prefixes in Dutch and other languages that are
+    NOT part of the surname proper. They are skipped when extracting the
+    last token (same as GHCID abbreviation rules - AGENTS.md Rule 8).
+    
+    Examples:
+    - "Jan van den Berg" → ("JAN", "BERG")  # "van den" skipped
+    - "Rembrandt van Rijn" → ("REMBRANDT", "RIJN")  # "van" skipped
+    - "Henry VIII" → ("HENRY", "VIII")
+    - "Maria Sibylla Merian" → ("MARIA", "MERIAN")
+    - "Ludwig van Beethoven" → ("LUDWIG", "BEETHOVEN")  # "van" skipped
+    - "Vincent van Gogh" → ("VINCENT", "GOGH")  # "van" skipped
+    - "Leonardo da Vinci" → ("LEONARDO", "VINCI")  # "da" skipped
+    - "中村 太郎" → transliterated: ("NAKAMURA", "TARO")
+    """
+    # Tussenvoegsels (name particles) to skip when finding last token
+    # Following GHCID pattern (AGENTS.md Rule 8: Legal Form Filtering)
+    TUSSENVOEGSELS = {
+        # Dutch
+        'van', 'de', 'den', 'der', 'het', "'t", 'te', 'ten', 'ter',
+        'van de', 'van den', 'van der', 'van het', "van 't",
+        'in de', 'in den', 'in het', "in 't",
+        'op de', 'op den', 'op het', "op 't",
+        # German
+        'von', 'vom', 'zu', 'zum', 'zur', 'von und zu',
+        # French
+        'de', 'du', 'des', 'de la', 'le', 'la', 'les',
+        # Italian
+        'da', 'di', 'del', 'della', 'dei', 'degli', 'delle',
+        # Spanish
+        'de', 'del', 'de la', 'de los', 'de las',
+        # Portuguese
+        'da', 'do', 'dos', 'das', 'de',
+    }
+    
+    tokens = tokenize_emic_label(full_emic_label)
+    
+    if len(tokens) == 0:
+        raise ValueError("Empty emic label")
+    
+    first_token = tokens[0].upper()
+    
+    if len(tokens) == 1:
+        # Mononym
+        last_token = ""
+    else:
+        # Find last token that is NOT a tussenvoegsel
+        # Work backwards from the end
+        last_token = ""
+        for token in reversed(tokens[1:]):  # Skip first token
+            token_lower = token.lower()
+            if token_lower not in TUSSENVOEGSELS:
+                last_token = token.upper()
+                break
+        
+        # If all remaining tokens are tussenvoegsels, use the actual last token
+        if not last_token:
+            last_token = tokens[-1].upper()
+    
+    # Normalize: remove diacritics, special characters
+    first_token = normalize_token(first_token)
+    last_token = normalize_token(last_token)
+    
+    return (first_token, last_token)
+
+
+def normalize_token(token: str) -> str:
+    """
+    Normalize token for PPID.
+    
+    - Remove diacritics (é → E)
+    - Uppercase
+    - Allow alphanumeric only (for Roman numerals like VIII)
+    - Transliterate non-Latin scripts
+    """
+    import unicodedata
+    
+    # NFD decomposition + remove combining marks
+    normalized = unicodedata.normalize('NFD', token)
+    ascii_token = ''.join(
+        c for c in normalized 
+        if unicodedata.category(c) != 'Mn'
+    )
+    
+    # Uppercase
+    ascii_token = ascii_token.upper()
+    
+    # Keep only alphanumeric
+    ascii_token = re.sub(r'[^A-Z0-9]', '', ascii_token)
+    
+    return ascii_token
+```
+
+### 4.6 Emic vs Etic Fallback
+
+```python
+@dataclass
+class EmicLabelResolution:
+    """
+    Resolution of emic label for a person.
+    
+    Priority:
+    1. Emic from primary sources (documents from their lifetime)
+    2. Etic fallback (only if emic truly unrecoverable)
+    """
+    
+    resolved_label: EmicLabel
+    resolution_method: str  # "emic_primary", "emic_vernacular", "etic_fallback"
+    emic_search_exhausted: bool
+    vernacular_sources_checked: List[str]
+    fallback_justification: Optional[str]
+
+def resolve_emic_label(
+    person_observations: List['PersonObservation'],
+    db
+) -> EmicLabelResolution:
+    """
+    Resolve the emic label for a person from their observations.
+    
+    Rules:
+    1. Search all primary sources for emic names
+    2. Prefer most frequently used name in primary sources
+    3. Only use etic fallback if emic truly unrecoverable
+    4. Vernacular sources must have clear pedigrees
+    5. Oral traditions without documentation not valid
+    """
+    
+    # Collect all name mentions from primary sources
+    emic_candidates = []
+    
+    for obs in person_observations:
+        if obs.is_primary_source and obs.is_from_lifetime:
+            for claim in obs.claims:
+                if claim.claim_type in ('full_name', 'given_name', 'title'):
+                    emic_candidates.append({
+                        'label': claim.claim_value,
+                        'source': obs.source_url,
+                        'date': obs.source_date,
+                        'is_vernacular': obs.is_vernacular_source
+                    })
+    
+    if emic_candidates:
+        # Find most common emic label
+        from collections import Counter
+        label_counts = Counter(c['label'] for c in emic_candidates)
+        most_common = label_counts.most_common(1)[0][0]
+        
+        best_candidate = next(
+            c for c in emic_candidates if c['label'] == most_common
+        )
+        
+        return EmicLabelResolution(
+            resolved_label=EmicLabel.from_full_label(
+                most_common,
+                source_type="primary",
+                source_document=best_candidate['source'],
+                source_date=best_candidate['date'],
+                is_from_primary_source=True,
+                is_vernacular=best_candidate['is_vernacular'],
+                confidence=0.95
+            ),
+            resolution_method="emic_primary",
+            emic_search_exhausted=True,
+            vernacular_sources_checked=[c['source'] for c in emic_candidates if c['is_vernacular']],
+            fallback_justification=None
+        )
+    
+    # Check if etic fallback is justified
+    unexplored_vernacular = db.get_unexplored_vernacular_archives(person_observations)
+    
+    if unexplored_vernacular:
+        raise EmicLabelNotYetResolvable(
+            f"Emic label not found in explored sources. "
+            f"Unexplored vernacular archives exist: {unexplored_vernacular}. "
+            f"Cannot use etic fallback until these are explored."
+        )
+    
+    # Etic fallback (rare)
+    etic_label = db.get_most_common_etic_label(person_observations)
+    
+    return EmicLabelResolution(
+        resolved_label=EmicLabel.from_full_label(
+            etic_label,
+            source_type="etic_fallback",
+            source_document="Modern scholarly consensus",
+            source_date=datetime.utcnow().isoformat(),
+            is_from_primary_source=False,
+            is_vernacular=False,
+            confidence=0.70
+        ),
+        resolution_method="etic_fallback",
+        emic_search_exhausted=True,
+        vernacular_sources_checked=[],
+        fallback_justification=(
+            "No emic label found in explored primary sources. "
+            "All known vernacular sources checked. "
+            "Using most common modern scholarly reference."
+        )
+    )
+```
+
+---
+
+## 5. Collision Handling
+
+### 5.1 Collision Detection
+
+Two PPIDs collide when all components except the collision suffix match:
+
+```python
+def detect_collision(new_ppid: str, existing_ppids: Set[str]) -> bool:
+    """
+    Check if new PPID collides with existing identifiers.
+    
+    Collision = same base components (before any collision suffix).
+    """
+    base_new = get_base_ppid(new_ppid)
+    
+    for existing in existing_ppids:
+        base_existing = get_base_ppid(existing)
+        if base_new == base_existing:
+            return True
+    
+    return False
+
+def get_base_ppid(ppid: str) -> str:
+    """Extract base PPID without collision suffix."""
+    # Full PPID may have collision suffix after last token
+    # e.g., "PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG-jan_van_den_berg"
+    #       Base: "PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG"
+    
+    parts = ppid.split('-')
+    
+    # Standard PPID has 11 parts (TYPE + 6 geo + CR + FT + LT)
+    # If more parts, the extra is collision suffix
+    if len(parts) > 11:
+        return '-'.join(parts[:11])
+    
+    return ppid
+```
+
+### 5.2 Collision Resolution via Full Emic Label
+
+When collision occurs, append full emic label in snake_case:
+
+```python
+def resolve_collision(
+    base_ppid: str,
+    full_emic_label: str,
+    existing_ppids: Set[str]
+) -> str:
+    """
+    Resolve collision by appending full emic label.
+    
+    Example:
+    Base: "PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG"
+    Emic: "Jan van den Berg"
+    Result: "PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG-jan_van_den_berg"
+    """
+    suffix = generate_collision_suffix(full_emic_label)
+    resolved = f"{base_ppid}-{suffix}"
+    
+    # Check if still collides (extremely rare)
+    if resolved in existing_ppids:
+        # Add numeric discriminator
+        counter = 2
+        while f"{resolved}_{counter}" in existing_ppids:
+            counter += 1
+        resolved = f"{resolved}_{counter}"
+    
+    return resolved
+
+def generate_collision_suffix(full_emic_label: str) -> str:
+    """
+    Generate collision suffix from full emic label.
+    
+    Same rules as GHCID collision suffix:
+    - Convert to lowercase snake_case
+    - Remove diacritics
+    - Remove punctuation
+    """
+    import unicodedata
+    import re
+    
+    # Normalize unicode
+    normalized = unicodedata.normalize('NFD', full_emic_label)
+    ascii_name = ''.join(
+        c for c in normalized 
+        if unicodedata.category(c) != 'Mn'
+    )
+    
+    # Lowercase
+    lowercase = ascii_name.lower()
+    
+    # Remove punctuation
+    no_punct = re.sub(r"[''`\",.:;!?()[\]{}]", '', lowercase)
+    
+    # Replace spaces with underscores
+    underscored = re.sub(r'\s+', '_', no_punct)
+    
+    # Remove non-alphanumeric except underscore
+    clean = re.sub(r'[^a-z0-9_]', '', underscored)
+    
+    # Collapse multiple underscores
+    final = re.sub(r'_+', '_', clean).strip('_')
+    
+    return final
+```
+
+---
+
+## 6. Unknown Components: XX and XXX Placeholders
+
+### 6.1 When Components Are Unknown
+
+Unlike GHCID (where `XX`/`XXX` are temporary and require research), PPID may have permanently unknown components:
+
+| Scenario | Placeholder | Can be PID? |
+|----------|-------------|-------------|
+| Unknown birth country | `XX` | No (remains ID) |
+| Unknown birth region | `XX` | No (remains ID) |
+| Unknown birth place | `XXX` | No (remains ID) |
+| Unknown death country | `XX` | No (remains ID) |
+| Unknown death region | `XX` | No (remains ID) |
+| Unknown death place | `XXX` | No (remains ID) |
+| Unknown century | `XX-XX` | No (remains ID) |
+| Unknown first token | `UNKNOWN` | No (remains ID) |
+| Unknown last token | (empty) | Yes (if mononym) |
+
+### 6.2 ID Examples with Unknown Components
+
+```
+ID-XX-XX-XXX-FR-NM-OMH-20-20-UNKNOWN-        # Unknown soldier, Normandy
+ID-NL-NH-AMS-XX-XX-XXX-17-17-REMBRANDT-       # Rembrandt, death place unknown
+ID-XX-XX-XXX-XX-XX-XXX-XX-XX-ANONYMOUS-        # Completely unknown person
+```
+
+---
+
+## 7. UUID and Numeric Generation
+
+### 7.1 Dual Representation (Same as GHCID)
+
+Every PPID generates three representations:
+
+| Format | Purpose | Example |
+|--------|---------|---------|
+| **Semantic String** | Human-readable | `PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG` |
+| **UUID v5** | Linked data, URIs | `550e8400-e29b-41d4-a716-446655440000` |
+| **Numeric (64-bit)** | Database keys, CSV | `213324328442227739` |
+
+### 7.2 Generation Algorithm
+
+```python
+import uuid
+import hashlib
+
+# PPID namespace UUID (different from GHCID namespace)
+PPID_NAMESPACE = uuid.UUID('f47ac10b-58cc-4372-a567-0e02b2c3d479')
+
+def generate_ppid_identifiers(semantic_ppid: str) -> dict:
+    """
+    Generate all identifier formats from semantic PPID string.
+    
+    Returns:
+        {
+            'semantic': 'PID-NL-NH-AMS-...',
+            'uuid_v5': '550e8400-...',
+            'numeric': 213324328442227739
+        }
+    """
+    # UUID v5 from semantic string
+    ppid_uuid = uuid.uuid5(PPID_NAMESPACE, semantic_ppid)
+    
+    # Numeric from SHA-256 (64-bit)
+    sha256 = hashlib.sha256(semantic_ppid.encode()).digest()
+    numeric = int.from_bytes(sha256[:8], byteorder='big')
+    
+    return {
+        'semantic': semantic_ppid,
+        'uuid_v5': str(ppid_uuid),
+        'numeric': numeric
+    }
+
+
+# Example:
+ppid = "PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG"
+identifiers = generate_ppid_identifiers(ppid)
+# {
+#     'semantic': 'PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG',
+#     'uuid_v5': 'a1b2c3d4-e5f6-5a1b-9c2d-3e4f5a6b7c8d',
+#     'numeric': 1234567890123456789
+# }
+```
+
+---
+
+## 8. Relationship to Person Observations
+
+### 8.1 Distinction: PPID vs Observation Identifiers
+
+| Identifier | Purpose | Structure | Persistence |
+|------------|---------|-----------|-------------|
+| **PPID** | Identify a person (reconstruction) | Geographic + temporal + emic | Permanent (if PID) |
+| **Observation ID** | Identify a specific source observation | GHCID-based + RiC-O | Permanent |
+
+### 8.2 Observation Identifier Structure (Forthcoming)
+
+As noted in the user's input, observation identifiers will use a different pattern:
+
+```
+{REPOSITORY_GHCID}/{CREATOR_GHCID}/{RICO_RECORD_PATH}
+```
+
+Where:
+- **REPOSITORY_GHCID**: GHCID of the institution holding the record
+- **CREATOR_GHCID**: GHCID of the institution that created the record (may be same)
+- **RICO_RECORD_PATH**: RiC-O derived path to RecordSet/Record/RecordPart
+
+Example:
+```
+NL-NH-HAA-A-NHA/NL-NH-HAA-A-NHA/burgerlijke-stand/geboorten/1895/003/045
+│              │              │
+│              │              └── RiC-O path: fonds/series/file/item
+│              └── Creator (same institution)
+└── Repository
+```
+
+This is **separate from PPID** and will be specified in a future document.
+
+---
+
+## 9. Comparison with Original POID/PRID Design
+
+### 9.1 What Changes
+
+| Aspect | POID/PRID (Doc 05) | Revised PPID (This Doc) |
+|--------|-------------------|-------------------------|
+| **Identifier opacity** | Opaque (no semantic content) | Semantic (human-readable) |
+| **Geographic anchoring** | None | Dual (birth + death locations) |
+| **Temporal anchoring** | None | Century range |
+| **Name in identifier** | None | First + last token |
+| **Type prefix** | POID/PRID | ID/PID |
+| **Observation vs Person** | Different identifier types | Completely separate systems |
+| **UUID backing** | Primary | Secondary (derived) |
+| **Collision handling** | UUID collision (rare) | Semantic collision (more common) |
+
+### 9.2 What Stays the Same
+
+- Dual identifier generation (UUID + numeric)
+- Deterministic generation from input
+- Permanent persistence (once PID)
+- Integration with GHCID for institution links
+- Claim-based provenance model
+- PiCo ontology alignment
+
+### 9.3 Transition Plan
+
+If this revised structure is adopted:
+
+1. **Document 05** becomes historical reference
+2. **This document** becomes the authoritative identifier spec
+3. No existing identifiers need migration (this is a new system)
+4. Code examples in other documents need updates
+
+---
+
+## 10. Implementation Considerations
+
+### 10.1 Character Set and Length
+
+```python
+# Maximum lengths
+MAX_COUNTRY_CODE = 2      # ISO 3166-1 alpha-2
+MAX_REGION_CODE = 3       # ISO 3166-2 suffix (some are 3 chars)
+MAX_PLACE_CODE = 3        # GeoNames convention
+MAX_CENTURY_RANGE = 5     # "XX-XX"
+MAX_TOKEN_LENGTH = 20     # Reasonable limit for names
+MAX_COLLISION_SUFFIX = 50 # Full emic label
+
+# Maximum total PPID length (without collision suffix)
+# "PID-" + "XX-XXX-XXX-" * 2 + "XX-XX-" + "TOKEN-TOKEN"
+# = 4 + (2+3+3+4)*2 + 6 + 20 + 20 = ~70 characters
+
+# With collision suffix: ~120 characters max
+```
+
+### 10.2 Validation Regex
+
+```python
+import re
+
+PPID_PATTERN = re.compile(
+    r'^(ID|PID)-'                    # Type
+    r'([A-Z]{2}|XX)-'                # First country
+    r'([A-Z]{2,3}|XX)-'              # First region
+    r'([A-Z]{3}|XXX)-'               # First place
+    r'([A-Z]{2}|XX)-'                # Last country
+    r'([A-Z]{2,3}|XX)-'              # Last region
+    r'([A-Z]{3}|XXX)-'               # Last place
+    r'(\d{1,2}-\d{1,2}|XX-XX)-'      # Century range
+    r'([A-Z0-9]+)-'                  # First token
+    r'([A-Z0-9]*)'                   # Last token (may be empty)
+    r'(-[a-z0-9_]+)?$'               # Collision suffix (optional)
+)
+
+def validate_ppid(ppid: str) -> tuple[bool, str]:
+    """Validate PPID format."""
+    if not PPID_PATTERN.match(ppid):
+        return False, "Invalid PPID format"
+    
+    # Additional semantic validation
+    parts = ppid.split('-')
+    
+    # Century range validation
+    if len(parts) >= 9:
+        century_range = f"{parts[7]}-{parts[8]}"
+        if century_range != "XX-XX":
+            try:
+                first_c, last_c = map(int, [parts[7], parts[8]])
+                if last_c < first_c:
+                    return False, "Last century cannot be before first century"
+                if first_c < 1 or last_c > 22:  # Reasonable bounds
+                    return False, "Century out of reasonable range"
+            except ValueError:
+                pass
+    
+    return True, "Valid"
+```
+
+---
+
+## 11. Open Questions
+
+### 11.1 BCE Dates
+
+How to handle persons from before Common Era?
+
+**Options**:
+1. Negative century numbers: `-5--4` for 5th-4th century BCE
+2. BCE prefix: `BCE5-BCE4`
+3. Separate identifier scheme for ancient persons
+
+### 11.2 Non-Latin Name Tokens
+
+How to handle names in non-Latin scripts?
+
+**Options**:
+1. Require transliteration (current approach)
+2. Allow Unicode tokens with normalization
+3. Dual representation (original + transliterated)
+
+### 11.3 Disputed Locations
+
+What if birth/death locations are historically disputed?
+
+**Options**:
+1. Use most likely location with note
+2. Use `XX`/`XXX` until resolved
+3. Create multiple IDs for each interpretation
+
+### 11.4 Living Persons
+
+How to handle persons still alive (no death observation)?
+
+**Options**:
+1. Cannot be PID until death
+2. Use `XX-XX-XXX` for death location, current century for range
+3. Separate identifier class for living persons
+
+---
+
+## 12. References
+
+### GHCID Documentation
+- [GHCID PID Scheme](../../GHCID_PID_SCHEME.md)
+- [AGENTS.md: Persistent Identifiers](../../AGENTS.md#persistent-identifiers-ghcid)
+
+### Related PPID Documents
+- [Original Identifier Structure (superseded)](./05_identifier_structure_design.md)
+- [PiCo Ontology Analysis](./03_pico_ontology_analysis.md)
+- [Cultural Naming Conventions](./04_cultural_naming_conventions.md)
+
+### Standards
+- ISO 3166-1: Country codes
+- ISO 3166-2: Subdivision codes
+- GeoNames: Geographic names database
diff --git a/docs/plan/person_pid/11_pico_ppid_comparison.md b/docs/plan/person_pid/11_pico_ppid_comparison.md
new file mode 100644
index 0000000000..ccc6a74413
--- /dev/null
+++ b/docs/plan/person_pid/11_pico_ppid_comparison.md
@@ -0,0 +1,475 @@
+# PiCo vs PPID: Comparative Analysis
+
+**Version**: 0.1.0  
+**Last Updated**: 2025-01-09  
+**Related**: [PPID-GHCID Alignment](./10_ppid_ghcid_alignment.md) | [PiCo Ontology Analysis](./03_pico_ontology_analysis.md)
+
+---
+
+## 1. Executive Summary
+
+This document compares the **PiCo (Persons in Context)** ontology developed by CBG|Centrum voor Familiegeschiedenis with our proposed **PPID (Person Persistent Identifier)** system. The analysis is based on deep research into PiCo's implementation in Open Archives (openarchieven.nl) and the WieWasWie platform.
+
+### 1.1 Key Finding
+
+PiCo and PPID serve **complementary purposes**:
+
+| System | Primary Purpose | Identifier Style | Scope |
+|--------|-----------------|------------------|-------|
+| **PiCo** | Data model for person observations in genealogical sources | Opaque UUIDs | Genealogical records (civil registries, church books) |
+| **PPID** | Persistent identifiers for heritage sector persons | Semantic geographic-temporal | Heritage custodian staff and historical figures |
+
+**Recommendation**: PPID should **adopt PiCo's ontological distinctions** (PersonObservation vs PersonReconstruction) while using its own **semantic identifier format** aligned with GHCID conventions.
+
+---
+
+## 2. PiCo Architecture (From Research)
+
+### 2.1 Core Classes
+
+From the PiCo specification at `personsincontext.org/model`:
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                         PiCo MODEL                               │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  ┌─────────────────────────────────────────────────────────┐    │
+│  │                     Person                                │    │
+│  │        (Container class - not used directly)             │    │
+│  │                                                          │    │
+│  │    ┌─────────────────┐    ┌─────────────────┐           │    │
+│  │    │ PersonObservation│    │PersonReconstruction         │    │
+│  │    │                 │    │                 │           │    │
+│  │    │ - Data as found │    │ - Curated identity│          │    │
+│  │    │   on Source     │    │ - Links multiple │          │    │
+│  │    │ - hadPrimarySource   │   observations   │          │    │
+│  │    │ - hasRole       │    │ - wasDerivedFrom │          │    │
+│  │    │ - hasAge        │    │ - wasGeneratedBy │          │    │
+│  │    │ - hasOccupation │    │ - wasRevisionOf  │          │    │
+│  │    └─────────────────┘    └─────────────────┘           │    │
+│  └─────────────────────────────────────────────────────────┘    │
+│                                                                  │
+│  ┌─────────────────────────────────────────────────────────┐    │
+│  │                     Source                                │    │
+│  │  (schema:ArchiveComponent)                               │    │
+│  │  - name, dateCreated, holdingArchive, associatedMedia   │    │
+│  └─────────────────────────────────────────────────────────┘    │
+│                                                                  │
+│  ┌─────────────────────────────────────────────────────────┐    │
+│  │                   PersonName (PNV)                        │    │
+│  │  - literalName, givenName, baseSurname, surnamePrefix   │    │
+│  │  - patronym, initials                                    │    │
+│  └─────────────────────────────────────────────────────────┘    │
+│                                                                  │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### 2.2 PiCo Identifier Structure in Open Archives
+
+From the Open Archives API documentation:
+
+```
+URI Format: https://www.openarchieven.nl/{3-letter-archive-code}:{uuid}[/{token}]
+
+Examples:
+- https://www.openarchieven.nl/rat:48c2b836-385f-11e0-bcd1-8edf61960649
+- https://www.openarchieven.nl/elo:f5169776-db74-70a3-51e3-20c15291429c
+
+Components:
+- rat = Regionaal Archief Tilburg (3-letter archive code)
+- 48c2b836-385f-11e0-bcd1-8edf61960649 = UUID of the record
+- /ttl:pico = Optional token for content negotiation (Turtle + PiCo profile)
+```
+
+### 2.3 PiCo PersonObservation Example (Actual Data)
+
+From Open Archives API response:
+
+```turtle
+@prefix oa: <https://www.openarchieven.nl/id/> .
+@prefix pico: <https://personsincontext.org/model#> .
+@prefix prov: <http://www.w3.org/ns/prov#> .
+@prefix sdo: <https://schema.org/> .
+
+oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f30464-3867-11e0-bcd1-8edf61960649
+    a pico:PersonObservation ;
+    prov:hadPrimarySource oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649 ;
+    pico:hasRole "Moeder" ;
+    sdo:children oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f2ae9c-... ;
+    sdo:spouse oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f2da16-... ;
+    sdo:gender sdo:Female ;
+    sdo:name "Cornelia Verhulst" ;
+    sdo:familyName "Verhulst" ;
+    sdo:givenName "Cornelia" .
+```
+
+### 2.4 PiCo PersonReconstruction Example
+
+From PiCo specification:
+
+```turtle
+cbg:person_reconstruction_2
+    a pico:PersonReconstruction ;
+    sdo:name "Anna Maria Koppen" ;
+    sdo:familyName "Koppen" ;
+    sdo:givenName "Anna" ;
+    sdo:gender sdo:Female ;
+    sdo:birthPlace "Haarlem" ;
+    sdo:birthDate "1860-03-31"^^xsd:date ;
+    sdo:deathPlace "Detroit, VSA" ;
+    sdo:deathDate "1926"^^xsd:gYear ;
+    prov:wasDerivedFrom nha:huwelijksakte_1885_321_po_1, 
+                        cbg:NL-HaCBG_1755_0341_142_po_1 ;
+    prov:wasGeneratedBy cbg:reconstruction_activity_01 .
+```
+
+---
+
+## 3. Detailed Comparison
+
+### 3.1 Identifier Format
+
+| Aspect | PiCo (CBG/Open Archives) | PPID (Proposed) |
+|--------|--------------------------|-----------------|
+| **Format** | `{archive}:{uuid}` | `{TYPE}-{FC}-{FR}-{FP}-{LC}-{LR}-{LP}-{CR}-{FT}-{LT}` |
+| **Example** | `rat:48c2b836-385f-11e0-bcd1-8edf61960649` | `PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG` |
+| **Human Readable** | No (opaque UUID) | Yes (semantic components) |
+| **Archive Prefix** | Yes (3-letter code) | No (implicit via source) |
+| **Geographic** | No | Yes (birth + death locations) |
+| **Temporal** | No | Yes (century range) |
+| **Name** | No | Yes (first + last token) |
+
+### 3.2 Conceptual Model
+
+| Concept | PiCo | PPID |
+|---------|------|------|
+| **Raw Observation** | `PersonObservation` | Observation (separate system) |
+| **Curated Identity** | `PersonReconstruction` | `PID` (promoted from `ID`) |
+| **Temporary State** | Not explicit | `ID` class |
+| **Permanent State** | All URIs persistent | `PID` class only |
+| **Provenance** | PROV-O (wasGeneratedBy, wasDerivedFrom) | PROV-O + XPath claims |
+| **Name Vocabulary** | PNV (Person Name Vocabulary) | Emic labels from sources |
+
+### 3.3 Persistence Philosophy
+
+| Aspect | PiCo | PPID |
+|--------|------|------|
+| **All identifiers persistent?** | Yes | No - only PID class |
+| **Temporary identifiers?** | No explicit concept | Yes - ID class |
+| **Promotion mechanism?** | N/A | ID → PID when criteria met |
+| **Epistemic uncertainty?** | Implicit (multiple observations) | Explicit (ID vs PID distinction) |
+| **Living persons?** | Can have PersonReconstruction | Must remain ID until death |
+
+### 3.4 Geographic Handling
+
+| Aspect | PiCo | PPID |
+|--------|------|------|
+| **In identifier?** | No | Yes |
+| **In properties?** | Yes (birthPlace, deathPlace) | Also yes |
+| **Format** | Free text or URI | ISO 3166-1/2 + GeoNames |
+| **Historical mapping?** | Encouraged (link to thesaurus) | Required (historical → modern) |
+| **Example** | `sdo:birthPlace "Haarlem"` | `...-NL-NH-HAA-...` |
+
+### 3.5 Temporal Handling
+
+| Aspect | PiCo | PPID |
+|--------|------|------|
+| **In identifier?** | No | Yes (century range) |
+| **Date format** | ISO 8601 (xsd:date) | Century numbers |
+| **BCE support** | Via negative years | Via negative centuries (-5--4) |
+| **Precision** | Day-level possible | Century-level only in ID |
+| **Example** | `sdo:birthDate "1860-03-31"^^xsd:date` | `...-19-20-...` |
+
+---
+
+## 4. Key Differences Explained
+
+### 4.1 Why PiCo Uses Opaque UUIDs
+
+PiCo's design goals (from GitHub README):
+
+1. **Successor to A2A**: Designed to replace XML-based Archive-to-Archive standard
+2. **Genealogical focus**: Primary use case is WieWasWie ancestor search
+3. **Linked Data**: Interoperability via RDF, not human-readable identifiers
+4. **Archive-centric**: Identifiers include archive code prefix
+
+PiCo's UUID approach is appropriate for:
+- Massive genealogical databases (millions of records)
+- Automated conversion from A2A
+- Machine-to-machine data exchange
+
+### 4.2 Why PPID Uses Semantic Identifiers
+
+PPID's design goals:
+
+1. **GHCID alignment**: Consistent identifier philosophy across GLAM project
+2. **Heritage sector focus**: Staff of heritage institutions, historical figures
+3. **Human discovery**: Identifiers aid browsing and deduplication
+4. **Epistemic honesty**: Explicit distinction between ID (uncertain) and PID (verified)
+5. **Scholarly citation**: Identifiers can be meaningfully cited in publications
+
+PPID's semantic approach is appropriate for:
+- Smaller, curated datasets
+- Human curation workflows
+- Cross-system deduplication
+- Scholarly reference
+
+### 4.3 The ID/PID Distinction (Unique to PPID)
+
+PiCo assumes all identifiers are permanent once created. PPID introduces explicit epistemic states:
+
+```
+PiCo:
+  PersonObservation (always permanent)
+      ↓ prov:wasDerivedFrom
+  PersonReconstruction (always permanent)
+
+PPID:
+  Observation (separate system, permanent)
+      ↓
+  ID (temporary, may change)
+      ↓ promotion when criteria met
+  PID (permanent, never changes)
+```
+
+**Why this matters for heritage sector**:
+
+- **Living persons**: Cannot have verified death observation → must remain ID
+- **Incomplete records**: May never have enough data for PID promotion
+- **Ongoing research**: Archives not yet explored → cannot claim PID status
+- **Scholarly integrity**: Prevents overclaiming certainty
+
+---
+
+## 5. Integration Recommendations
+
+### 5.1 Adopt PiCo Ontological Distinctions
+
+PPID should use PiCo's class hierarchy:
+
+```turtle
+@prefix ppid: <https://ppid.org/> .
+@prefix pico: <https://personsincontext.org/model#> .
+
+# PPID extends PiCo
+ppid:PersonID rdfs:subClassOf pico:PersonReconstruction .
+ppid:PersonPID rdfs:subClassOf pico:PersonReconstruction .
+
+# PPID observations link to source observations
+ppid:hasSourceObservation rdfs:subPropertyOf prov:wasDerivedFrom ;
+    rdfs:range pico:PersonObservation .
+```
+
+### 5.2 Maintain PPID Semantic Identifier Format
+
+Do not adopt PiCo's opaque UUID format. Keep semantic GHCID-aligned format:
+
+```
+PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG
+```
+
+**Rationale**: GHCID project-wide consistency, human discoverability, scholarly citation.
+
+### 5.3 Use PNV for Name Properties
+
+Adopt PiCo's use of Person Name Vocabulary for structured name data:
+
+```turtle
+ppid:PRID-... pnv:hasName [
+    a pnv:PersonName ;
+    pnv:literalName "Jan van den Berg" ;
+    pnv:givenName "Jan" ;
+    pnv:surnamePrefix "van den" ;
+    pnv:baseSurname "Berg"
+] .
+```
+
+### 5.4 Use PROV-O for Provenance
+
+Adopt PiCo's PROV-O patterns for reconstruction provenance:
+
+```turtle
+ppid:PID-NL-NH-AMS-...
+    prov:wasDerivedFrom <observation-1>, <observation-2> ;
+    prov:wasGeneratedBy [
+        a prov:Activity ;
+        prov:startedAtTime "2025-01-09T00:00:00"^^xsd:dateTime ;
+        prov:wasAssociatedWith ppid:curator-001
+    ] .
+```
+
+### 5.5 Separate Observation Identifiers
+
+As noted in the revised PPID design, observations use a **different identifier system**:
+
+```
+{REPOSITORY_GHCID}/{CREATOR_GHCID}/{RiC-O-PATH}
+
+Example:
+NL-NH-HAA-A-NHA/NL-NH-HAA-A-NHA/burgerlijke-stand/geboorten/1895/003/045
+```
+
+This is distinct from PiCo's `{archive}:{uuid}` but serves similar purposes.
+
+---
+
+## 6. Resolved Open Questions
+
+Based on user clarifications:
+
+### 6.1 BCE Date Handling
+
+**Resolution**: Use negative century numbers.
+
+```
+Format: {first_century}-{last_century}
+
+Examples:
+- 5th century BCE to 4th century BCE: "-5--4"
+- 1st century BCE to 1st century CE: "-1-1"
+- 5th century BCE to 3rd century CE: "-5-3"
+```
+
+This aligns with ISO 8601 extended format which uses negative years for BCE dates.
+
+### 6.2 Non-Latin Script Transliteration
+
+**Resolution**: Apply same transliteration rules as GHCID (documented in AGENTS.md).
+
+| Script | Standard |
+|--------|----------|
+| Cyrillic | ISO 9:1995 |
+| Chinese | Hanyu Pinyin (ISO 7098) |
+| Japanese | Modified Hepburn |
+| Korean | Revised Romanization |
+| Arabic | ISO 233-2/3 |
+| Hebrew | ISO 259-3 |
+| Greek | ISO 843 |
+
+### 6.3 Disputed Locations
+
+**Resolution**: Not a PPID concern - handled by ISO standardization.
+
+When historical locations are disputed:
+- Use the ISO-standardized modern location
+- Document the dispute in observation metadata
+- Do not encode uncertainty in the identifier itself
+
+### 6.4 Living Persons
+
+**Resolution**: Living persons are **always ID class** and can only be promoted to PID after death.
+
+```python
+def can_promote_to_pid(person_id: str, observations: list) -> bool:
+    """
+    Check if ID can be promoted to PID.
+    
+    Living persons can NEVER be promoted.
+    """
+    # Check for death observation
+    death_obs = [o for o in observations if o.is_death_record or o.is_post_death]
+    
+    if not death_obs:
+        # No death observation = person may be alive = cannot be PID
+        return False
+    
+    # Continue with other promotion criteria...
+    return check_other_criteria(observations)
+```
+
+**Rationale**:
+1. PID requires verified last observation (death)
+2. Living persons have incomplete lifecycle
+3. Future observations may change identity assessment
+4. Privacy considerations for living individuals
+
+---
+
+## 7. Implementation Alignment
+
+### 7.1 Class Mapping
+
+| PiCo Class | PPID Equivalent | Notes |
+|------------|-----------------|-------|
+| `pico:Person` | (Container) | Not used directly |
+| `pico:PersonObservation` | Observation (separate system) | Different identifier format |
+| `pico:PersonReconstruction` | `ppid:PersonID` or `ppid:PersonPID` | Split by epistemic certainty |
+| `pico:Source` | `schema:ArchiveComponent` | Same as PiCo |
+| `pnv:PersonName` | `pnv:PersonName` | Adopt PNV |
+
+### 7.2 Property Mapping
+
+| PiCo Property | PPID Usage | Notes |
+|---------------|------------|-------|
+| `prov:hadPrimarySource` | Same | For observations |
+| `prov:wasDerivedFrom` | Same | PRID from POIDs |
+| `prov:wasGeneratedBy` | Same | Activity provenance |
+| `prov:wasRevisionOf` | Same | Version history |
+| `sdo:birthDate` | Same | In properties |
+| `sdo:birthPlace` | Same + in identifier | Dual representation |
+| `sdo:deathDate` | Same | In properties |
+| `sdo:deathPlace` | Same + in identifier | Dual representation |
+| `pico:hasRole` | Same | For observations |
+| `pico:hasAge` | Same | When birthDate unknown |
+
+### 7.3 Namespace Declarations
+
+```turtle
+@prefix ppid: <https://ppid.org/> .
+@prefix pico: <https://personsincontext.org/model#> .
+@prefix pnv: <https://w3id.org/pnv#> .
+@prefix prov: <http://www.w3.org/ns/prov#> .
+@prefix sdo: <https://schema.org/> .
+@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
+```
+
+---
+
+## 8. Conclusion
+
+### 8.1 What PPID Adopts from PiCo
+
+1. **PersonObservation/PersonReconstruction distinction** - Core ontological pattern
+2. **PROV-O provenance model** - wasDerivedFrom, wasGeneratedBy, wasRevisionOf
+3. **Person Name Vocabulary (PNV)** - Structured name representation
+4. **Schema.org properties** - birthDate, deathDate, birthPlace, deathPlace, etc.
+5. **Source linking** - hadPrimarySource, holdingArchive
+
+### 8.2 What PPID Does Differently
+
+1. **Semantic identifier format** - Geographic-temporal-emic instead of opaque UUID
+2. **ID/PID epistemic distinction** - Explicit uncertainty modeling
+3. **Living person handling** - Must remain ID until death
+4. **GHCID alignment** - Consistent with heritage custodian identifier philosophy
+5. **Century range encoding** - Temporal disambiguation in identifier
+6. **Emic label tokens** - Name components in identifier for discoverability
+
+### 8.3 Interoperability Path
+
+PPID can be fully interoperable with PiCo systems via:
+
+1. **OWL mappings**: `ppid:PersonPID rdfs:subClassOf pico:PersonReconstruction`
+2. **SPARQL federation**: Query across PPID and PiCo endpoints
+3. **Bidirectional links**: `owl:sameAs` between PPID and PiCo identifiers
+4. **Profile negotiation**: Serve data in PiCo format via content negotiation
+
+---
+
+## 9. References
+
+### PiCo Resources
+- PiCo Specification: https://personsincontext.org/model
+- PiCo GitHub: https://github.com/CBG-Centrum-voor-familiegeschiedenis/PiCo
+- Open Archives API: https://www.openarchieven.nl/api/docs/uri.php
+- CBG: https://cbg.nl/
+
+### Standards
+- Person Name Vocabulary (PNV): https://w3id.org/pnv
+- PROV-O: https://www.w3.org/TR/prov-o/
+- Schema.org: https://schema.org/
+
+### Related PPID Documents
+- [PPID-GHCID Alignment](./10_ppid_ghcid_alignment.md)
+- [PiCo Ontology Analysis](./03_pico_ontology_analysis.md)
+- [Identifier Structure Design](./05_identifier_structure_design.md)