diff --git a/docs/plan/person_pid/10_ppid_ghcid_alignment.md b/docs/plan/person_pid/10_ppid_ghcid_alignment.md
new file mode 100644
index 0000000000..a9fec96913
--- /dev/null
+++ b/docs/plan/person_pid/10_ppid_ghcid_alignment.md
@@ -0,0 +1,1150 @@
+# PPID-GHCID Alignment: Revised Identifier Structure
+
+**Version**: 0.1.0
+**Last Updated**: 2025-01-09
+**Status**: DRAFT - Supersedes opaque identifier design in [05_identifier_structure_design.md](./05_identifier_structure_design.md)
+**Related**: [GHCID Specification](../../GHCID_PID_SCHEME.md) | [PiCo Ontology](./03_pico_ontology_analysis.md)
+
+---
+
+## 1. Executive Summary
+
+This document proposes a **revised PPID structure** that aligns with GHCID's geographic-semantic identifier pattern while accommodating the unique challenges of person identification across historical records.
+
+### 1.1 Key Changes from Original Design
+
+| Aspect | Original (Doc 05) | Revised (This Document) |
+|--------|-------------------|-------------------------|
+| **Format** | Opaque hex (`POID-7a3b-c4d5-...`) | Semantic (`PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG`) |
+| **Type Distinction** | POID vs PRID | ID (temporary) vs PID (persistent) |
+| **Geographic** | None in identifier | Dual anchors: first + last observation |
+| **Temporal** | None in identifier | Century range |
+| **Name** | None in identifier | First + last token of emic label |
+| **Persistence** | Always persistent | May remain ID indefinitely |
+
+### 1.2 Design Philosophy
+
+The revised PPID follows the same principles as GHCID:
+
+1. **Human-readable semantic components** that aid discovery and deduplication
+2. **Geographic anchoring** to physical locations using GeoNames
+3. **Temporal anchoring** to enable disambiguation across time
+4. **Emic authenticity** using names from primary sources
+5. **Collision resolution** via full emic label suffix
+6. **Dual representation** as both semantic string and UUID/numeric
+
+---
+
+## 2. Identifier Type: ID vs PID
+
+### 2.1 The Epistemic Uncertainty Problem
+
+Unlike institutions (which typically have founding documents, legal registrations, and clear organizational boundaries), **persons in historical records often exist in epistemic uncertainty**:
+
+- Incomplete records (many records lost to time)
+- Ambiguous references (common names, no surnames)
+- Conflicting sources (different dates, spellings)
+- Undiscovered archives (unexplored record sets)
+
+### 2.2 Two-Class Identifier System
+
+| Type | Prefix | Description | Persistence | Promotion Path |
+|------|--------|-------------|-------------|----------------|
+| **ID** | `ID-` | Temporary identifier | May change | Can become PID |
+| **PID** | `PID-` | Persistent identifier | Permanent | Cannot revert to ID |
+
+### 2.3 Promotion Criteria: ID → PID
+
+An identifier can be promoted from ID to PID when ALL of the following are satisfied:
+
+```python
+@dataclass
+class PIDPromotionCriteria:
+ """
+ Criteria for promoting an ID to a PID.
+ ALL conditions must be True for promotion.
+ """
+
+ # Geographic anchors
+ first_observation_verified: bool # Birth or equivalent
+ last_observation_verified: bool # Death or equivalent
+
+ # Temporal anchors
+ century_range_established: bool # From verified observations
+
+ # Identity anchors
+ emic_label_verified: bool # From primary sources
+ no_unexplored_archives: bool # Reasonable assumption
+
+ # Quality checks
+ no_unresolved_conflicts: bool # No conflicting claims
+ multiple_corroborating_sources: bool # At least 2 independent sources
+
+ def is_promotable(self) -> bool:
+ return all([
+ self.first_observation_verified,
+ self.last_observation_verified,
+ self.century_range_established,
+ self.emic_label_verified,
+ self.no_unexplored_archives,
+ self.no_unresolved_conflicts,
+ self.multiple_corroborating_sources,
+ ])
+```
+
+### 2.4 Permanent ID Status
+
+Some identifiers may **forever remain IDs** due to:
+
+- **Fragmentary records**: Only one surviving document mentions the person
+- **Uncertain dates**: Cannot establish century range
+- **Unknown location**: Cannot anchor geographically
+- **Anonymous figures**: No emic label recoverable
+- **Ongoing research**: Archives not yet explored
+
+This is acceptable and expected. An ID is still a valid identifier for internal use; it simply cannot be cited as a persistent identifier in scholarly work.
+
+---
+
+## 3. Identifier Structure
+
+### 3.1 Full Format Specification
+
+```
+{TYPE}-{FC}-{FR}-{FP}-{LC}-{LR}-{LP}-{CR}-{FT}-{LT}[-{FULL_EMIC}]
+ │ │ │ │ │ │ │ │ │ │ │
+ │ │ │ │ │ │ │ │ │ │ └── Collision suffix (optional)
+ │ │ │ │ │ │ │ │ │ └── Last Token of emic label
+ │ │ │ │ │ │ │ │ └── First Token of emic label
+ │ │ │ │ │ │ │ └── Century Range (e.g., 19-20)
+ │ │ │ │ │ │ └── Last observation Place (GeoNames 3-letter)
+ │ │ │ │ │ └── Last observation Region (ISO 3166-2)
+ │ │ │ │ └── Last observation Country (ISO 3166-1 alpha-2)
+ │ │ │ └── First observation Place (GeoNames 3-letter)
+ │ │ └── First observation Region (ISO 3166-2)
+ │ └── First observation Country (ISO 3166-1 alpha-2)
+ └── Type: ID or PID
+```
+
+### 3.2 Component Definitions
+
+| Component | Format | Description | Example |
+|-----------|--------|-------------|---------|
+| **TYPE** | `ID` or `PID` | Identifier class | `PID` |
+| **FC** | ISO 3166-1 α2 | First observation country (modern) | `NL` |
+| **FR** | ISO 3166-2 suffix | First observation region | `NH` |
+| **FP** | 3 letters | First observation place (GeoNames) | `AMS` |
+| **LC** | ISO 3166-1 α2 | Last observation country (modern) | `NL` |
+| **LR** | ISO 3166-2 suffix | Last observation region | `NH` |
+| **LP** | 3 letters | Last observation place (GeoNames) | `HAA` |
+| **CR** | `CC-CC` | Century range (CE) | `19-20` |
+| **FT** | UPPERCASE | First token of emic label | `JAN` |
+| **LT** | UPPERCASE | Last token of emic label | `BERG` |
+| **FULL_EMIC** | snake_case | Full emic label (collision only) | `jan_van_den_berg` |
+
+### 3.3 Examples
+
+| Person | Full Emic Label | PPID |
+|--------|-----------------|------|
+| Jan van den Berg, born Amsterdam 1895, died Haarlem 1970 | Jan van den Berg | `PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG` |
+| Rembrandt, born Leiden 1606, died Amsterdam 1669 | Rembrandt van Rijn | `PID-NL-ZH-LEI-NL-NH-AMS-17-17-REMBRANDT-RIJN` |
+| Maria Sibylla Merian, born Frankfurt 1647, died Amsterdam 1717 | Maria Sibylla Merian | `PID-DE-HE-FRA-NL-NH-AMS-17-18-MARIA-MERIAN` |
+| Unknown soldier, found Normandy, died 1944 | (unknown) | `ID-XX-XX-XXX-FR-NM-OMH-20-20-UNKNOWN-` |
+| Henry VIII, born London 1491, died London 1547 | Henry VIII | `PID-GB-ENG-LON-GB-ENG-LON-15-16-HENRY-VIII` |
+
+**Notes on Emic Labels**:
+- Always use **formal/complete emic names** from primary sources, not modern colloquial short forms
+- "Rembrandt" alone is a modern convention; the emic label from his lifetime was "Rembrandt van Rijn"
+- **Tussenvoegsels (particles)** like "van", "de", "den", "der", "van de", "van den", "van der" are **skipped** when extracting the last token (see §4.5)
+- This follows the same pattern as GHCID abbreviation rules (AGENTS.md Rule 8)
+
+---
+
+## 4. Component Rules
+
+### 4.1 First Observation (Birth or Earliest)
+
+```python
+from dataclasses import dataclass
+from enum import Enum
+from typing import Optional
+
+class ObservationType(Enum):
+ BIRTH_CERTIFICATE = "birth_certificate" # Highest authority
+ BAPTISM_RECORD = "baptism_record" # Common for pre-civil registration
+ BIRTH_STATEMENT = "birth_statement" # Stated birth in other document
+ EARLIEST_REFERENCE = "earliest_reference" # Earliest surviving mention
+ INFERRED = "inferred" # Inferred from context
+
+@dataclass
+class FirstObservation:
+ """
+ First observation of a person during their lifetime.
+ Ideally a birth record, but may be another early record.
+ """
+
+ observation_type: ObservationType
+
+ # Modern geographic codes (mapped from historical)
+ country_code: str # ISO 3166-1 alpha-2
+ region_code: str # ISO 3166-2 subdivision
+ place_code: str # GeoNames 3-letter code
+
+ # Original historical reference
+ historical_place_name: str # As named in source
+ historical_date: str # As stated in source
+
+ # Mapping provenance
+ modern_mapping_method: str # How historical → modern mapping done
+ geonames_id: Optional[int] # GeoNames ID for place
+
+ # Quality indicators
+ is_birth_record: bool
+ can_assume_earliest: bool # No unexplored archives likely
+ source_confidence: float # 0.0 - 1.0
+
+ def is_valid_for_pid(self) -> bool:
+ """
+ Determine if this observation is valid for PID generation.
+ """
+ if self.is_birth_record:
+ return True
+
+ if self.observation_type == ObservationType.EARLIEST_REFERENCE:
+ # Must be able to assume this is actually the earliest
+ return self.can_assume_earliest and self.source_confidence >= 0.8
+
+ return False
+```
+
+### 4.2 Last Observation (Death or Latest During Lifetime)
+
+```python
+@dataclass
+class LastObservation:
+ """
+ Last observation of a person during their lifetime or immediate after death.
+ Ideally a death record, but may be last known living reference.
+ """
+
+ observation_type: ObservationType # Reusing enum, but DEATH_CERTIFICATE etc.
+
+ # Modern geographic codes
+ country_code: str
+ region_code: str
+ place_code: str
+
+ # Original historical reference
+ historical_place_name: str
+ historical_date: str
+
+ # Critical distinction
+ is_death_record: bool
+ is_lifetime_observation: bool # True if person still alive at observation
+ is_immediate_post_death: bool # First record after death
+
+ # Quality
+ can_assume_latest: bool
+ source_confidence: float
+
+ def is_valid_for_pid(self) -> bool:
+ if self.is_death_record:
+ return True
+
+ if self.is_immediate_post_death:
+ # First mention of death
+ return self.source_confidence >= 0.8
+
+ if self.is_lifetime_observation:
+ # Last known alive, but not death record
+ return self.can_assume_latest and self.source_confidence >= 0.8
+
+ return False
+```
+
+### 4.3 Geographic Mapping: Historical → Modern
+
+```python
+from dataclasses import dataclass
+from typing import Optional, Tuple
+
+@dataclass
+class HistoricalPlaceMapping:
+ """
+ Map historical place names to modern ISO/GeoNames codes.
+
+ Historical places must be mapped to their MODERN equivalents
+ as of the PPID generation date. This ensures stability even
+ when historical boundaries shifted.
+ """
+
+ # Historical input
+ historical_name: str
+ historical_date: str # When the place was referenced
+
+ # Modern output (at PPID generation time)
+ modern_country_code: str # ISO 3166-1 alpha-2
+ modern_region_code: str # ISO 3166-2 suffix (e.g., "NH" not "NL-NH")
+ modern_place_code: str # 3-letter from GeoNames
+
+ # GeoNames reference
+ geonames_id: int
+ geonames_name: str # Modern canonical name
+ geonames_feature_class: str # P = populated place
+ geonames_feature_code: str # PPL, PPLA, PPLC, etc.
+
+ # Mapping provenance
+ mapping_method: str # "direct", "successor", "enclosing", "manual"
+ mapping_confidence: float
+ mapping_notes: str
+ ppid_generation_date: str # When mapping was performed
+
+def map_historical_to_modern(
+ historical_name: str,
+ historical_date: str,
+ db
+) -> HistoricalPlaceMapping:
+ """
+ Map a historical place name to modern ISO/GeoNames codes.
+
+ Strategies (in order):
+ 1. Direct match: Place still exists with same name
+ 2. Successor: Place renamed but geographically same
+ 3. Enclosing: Place absorbed into larger entity
+ 4. Manual: Requires human research
+ """
+
+ # Strategy 1: Direct GeoNames lookup
+ direct_match = db.geonames_search(historical_name)
+ if direct_match and direct_match.is_populated_place:
+ return HistoricalPlaceMapping(
+ historical_name=historical_name,
+ historical_date=historical_date,
+ modern_country_code=direct_match.country_code,
+ modern_region_code=direct_match.admin1_code,
+ modern_place_code=generate_place_code(direct_match.name),
+ geonames_id=direct_match.geonames_id,
+ geonames_name=direct_match.name,
+ geonames_feature_class=direct_match.feature_class,
+ geonames_feature_code=direct_match.feature_code,
+ mapping_method="direct",
+ mapping_confidence=0.95,
+ mapping_notes="Direct GeoNames match",
+ ppid_generation_date=datetime.utcnow().isoformat()
+ )
+
+ # Strategy 2: Historical name lookup (renamed places)
+ # e.g., "Batavia" → "Jakarta"
+ historical_match = db.historical_place_names.get(historical_name)
+ if historical_match:
+ modern = db.geonames_by_id(historical_match.modern_geonames_id)
+ return HistoricalPlaceMapping(
+ historical_name=historical_name,
+ historical_date=historical_date,
+ modern_country_code=modern.country_code,
+ modern_region_code=modern.admin1_code,
+ modern_place_code=generate_place_code(modern.name),
+ geonames_id=modern.geonames_id,
+ geonames_name=modern.name,
+ geonames_feature_class=modern.feature_class,
+ geonames_feature_code=modern.feature_code,
+ mapping_method="successor",
+ mapping_confidence=0.90,
+ mapping_notes=f"Historical name '{historical_name}' → modern '{modern.name}'",
+ ppid_generation_date=datetime.utcnow().isoformat()
+ )
+
+ # Strategy 3: Geographic coordinates (if available from source)
+ # Reverse geocode to find enclosing modern settlement
+
+ # Strategy 4: Manual research required
+ raise ManualResearchRequired(
+ f"Cannot automatically map '{historical_name}' ({historical_date}) to modern location"
+ )
+
+
+def generate_place_code(place_name: str) -> str:
+ """
+ Generate 3-letter place code from GeoNames name.
+
+ Rules (same as GHCID):
+ - Single word: First 3 letters → "Amsterdam" → "AMS"
+ - Multi-word: Initials → "New York" → "NYO" (or "NYC" if registered)
+ - Dutch articles: Article initial + 2 from main → "Den Haag" → "DHA"
+ """
+ # Implementation follows GHCID rules
+ # See AGENTS.md: "SETTLEMENT STANDARDIZATION: GEONAMES IS AUTHORITATIVE"
+ pass
+```
+
+### 4.4 Century Range Calculation
+
+```python
+def calculate_century_range(
+ first_observation: FirstObservation,
+ last_observation: LastObservation
+) -> str:
+ """
+ Calculate the CE century range for a person's lifetime.
+
+ Returns format: "CC-CC" (e.g., "19-20" for 1850-1925)
+
+ Rules:
+ - Centuries are 1-indexed: 1-100 AD = 1st century, 1901-2000 = 20th century
+ - BCE dates: Use negative century numbers (e.g., "-5--4" for 5th-4th century BCE)
+ This follows ISO 8601 extended format which uses negative years for BCE
+ - Range must be from verified observations
+ """
+
+ def year_to_century(year: int) -> int:
+ """
+ Convert year to century number.
+
+ Positive years (CE): 1-100 = century 1, 1901-2000 = century 20
+ Negative years (BCE): -500 to -401 = century -5
+
+ Note: There is no year 0 in the proleptic Gregorian calendar.
+ Year 1 BCE is followed directly by year 1 CE.
+ """
+ if year > 0:
+ return ((year - 1) // 100) + 1
+ else:
+ # BCE: year -500 → century -5, year -1 → century -1
+ return (year // 100)
+
+ def parse_year(date_str: str) -> int:
+ """Extract year from various date formats."""
+ # Handle: "1895", "1895-03-15", "March 1895", "c. 1895", etc.
+ # Also handle BCE: "-500", "500 BCE", "500 BC", "c. 500 BCE"
+ import re
+
+ # Check for BCE indicators
+ bce_match = re.search(r'(\d+)\s*(BCE|BC|B\.C\.E?\.|v\.Chr\.)', date_str, re.IGNORECASE)
+ if bce_match:
+ return -int(bce_match.group(1))
+
+ # Check for negative year (ISO 8601 extended)
+ neg_match = re.search(r'-(\d+)', date_str)
+ if neg_match and date_str.strip().startswith('-'):
+ return -int(neg_match.group(1))
+
+ # Standard positive year
+ match = re.search(r'\b(\d{4})\b', date_str)
+ if match:
+ return int(match.group(1))
+
+ # 3-digit year (ancient dates)
+ match = re.search(r'\b(\d{3})\b', date_str)
+ if match:
+ return int(match.group(1))
+
+ raise ValueError(f"Cannot parse year from: {date_str}")
+
+ first_year = parse_year(first_observation.historical_date)
+ last_year = parse_year(last_observation.historical_date)
+
+ first_century = year_to_century(first_year)
+ last_century = year_to_century(last_year)
+
+ # Validation
+ if last_century < first_century:
+ raise ValueError(
+ f"Last observation ({last_year}) cannot be before "
+ f"first observation ({first_year})"
+ )
+
+ return f"{first_century}-{last_century}"
+
+
+# Examples (CE):
+# 1850 → century 19
+# 1925 → century 20
+# Range: "19-20"
+
+# 1606 → century 17
+# 1669 → century 17
+# Range: "17-17" (same century)
+
+# 1895 → century 19
+# 2005 → century 21
+# Range: "19-21" (centenarian)
+
+# Examples (BCE):
+# -500 (500 BCE) → century -5
+# -401 (401 BCE) → century -5
+# Range: "-5--5" (same century)
+
+# -469 (469 BCE, Socrates birth) → century -5
+# -399 (399 BCE, Socrates death) → century -4
+# Range: "-5--4"
+
+# -100 (100 BCE) → century -1
+# 14 (14 CE) → century 1
+# Range: "-1-1" (crossing BCE/CE boundary)
+```
+
+### 4.5 Emic Label Tokens
+
+```python
+from dataclasses import dataclass
+from typing import Optional, List
+import re
+
+@dataclass
+class EmicLabel:
+ """
+ The common contemporary emic label of a person.
+
+ "Emic" = from the insider perspective, as the person was known
+ during their lifetime in primary sources.
+
+ "Etic" = from the outsider perspective, how we refer to them now.
+
+ Prefer emic; fall back to etic only if emic unrecoverable.
+ """
+
+ full_label: str # Complete emic label
+ first_token: str # First word/token
+ last_token: str # Last word/token (empty if mononym)
+
+ # Source provenance
+ source_type: str # "primary" or "etic_fallback"
+ source_document: str # Reference to source
+ source_date: str # When source was created
+
+ # Quality
+ is_from_primary_source: bool
+ is_vernacular: bool # From vernacular (non-official) source
+ confidence: float
+
+ @classmethod
+ def from_full_label(cls, label: str, **kwargs) -> 'EmicLabel':
+ """Parse full label into first and last tokens."""
+ tokens = tokenize_emic_label(label)
+
+ first_token = tokens[0].upper() if tokens else ""
+ last_token = tokens[-1].upper() if len(tokens) > 1 else ""
+
+ return cls(
+ full_label=label,
+ first_token=first_token,
+ last_token=last_token,
+ **kwargs
+ )
+
+
+def tokenize_emic_label(label: str) -> List[str]:
+ """
+ Tokenize an emic label into words.
+
+ Rules:
+ - Split on whitespace
+ - Preserve numeric tokens (e.g., "VIII" in "Henry VIII")
+ - Do NOT split compound words
+ - Normalize to uppercase for identifier
+ """
+ # Basic whitespace split
+ tokens = label.strip().split()
+
+ # Filter empty tokens
+ tokens = [t for t in tokens if t]
+
+ return tokens
+
+
+def extract_name_tokens(
+ full_emic_label: str
+) -> tuple[str, str]:
+ """
+ Extract first and last tokens from emic label.
+
+ Rules:
+ 1. First token: First word of the emic label
+ 2. Last token: Last word AFTER skipping tussenvoegsels (name particles)
+
+ Tussenvoegsels are common prefixes in Dutch and other languages that are
+ NOT part of the surname proper. They are skipped when extracting the
+ last token (same as GHCID abbreviation rules - AGENTS.md Rule 8).
+
+ Examples:
+ - "Jan van den Berg" → ("JAN", "BERG") # "van den" skipped
+ - "Rembrandt van Rijn" → ("REMBRANDT", "RIJN") # "van" skipped
+ - "Henry VIII" → ("HENRY", "VIII")
+ - "Maria Sibylla Merian" → ("MARIA", "MERIAN")
+ - "Ludwig van Beethoven" → ("LUDWIG", "BEETHOVEN") # "van" skipped
+ - "Vincent van Gogh" → ("VINCENT", "GOGH") # "van" skipped
+ - "Leonardo da Vinci" → ("LEONARDO", "VINCI") # "da" skipped
+ - "中村 太郎" → transliterated: ("NAKAMURA", "TARO")
+ """
+ # Tussenvoegsels (name particles) to skip when finding last token
+ # Following GHCID pattern (AGENTS.md Rule 8: Legal Form Filtering)
+ TUSSENVOEGSELS = {
+ # Dutch
+ 'van', 'de', 'den', 'der', 'het', "'t", 'te', 'ten', 'ter',
+ 'van de', 'van den', 'van der', 'van het', "van 't",
+ 'in de', 'in den', 'in het', "in 't",
+ 'op de', 'op den', 'op het', "op 't",
+ # German
+ 'von', 'vom', 'zu', 'zum', 'zur', 'von und zu',
+ # French
+ 'de', 'du', 'des', 'de la', 'le', 'la', 'les',
+ # Italian
+ 'da', 'di', 'del', 'della', 'dei', 'degli', 'delle',
+ # Spanish
+ 'de', 'del', 'de la', 'de los', 'de las',
+ # Portuguese
+ 'da', 'do', 'dos', 'das', 'de',
+ }
+
+ tokens = tokenize_emic_label(full_emic_label)
+
+ if len(tokens) == 0:
+ raise ValueError("Empty emic label")
+
+ first_token = tokens[0].upper()
+
+ if len(tokens) == 1:
+ # Mononym
+ last_token = ""
+ else:
+ # Find last token that is NOT a tussenvoegsel
+ # Work backwards from the end
+ last_token = ""
+ for token in reversed(tokens[1:]): # Skip first token
+ token_lower = token.lower()
+ if token_lower not in TUSSENVOEGSELS:
+ last_token = token.upper()
+ break
+
+ # If all remaining tokens are tussenvoegsels, use the actual last token
+ if not last_token:
+ last_token = tokens[-1].upper()
+
+ # Normalize: remove diacritics, special characters
+ first_token = normalize_token(first_token)
+ last_token = normalize_token(last_token)
+
+ return (first_token, last_token)
+
+
+def normalize_token(token: str) -> str:
+ """
+ Normalize token for PPID.
+
+ - Remove diacritics (é → E)
+ - Uppercase
+ - Allow alphanumeric only (for Roman numerals like VIII)
+ - Transliterate non-Latin scripts
+ """
+ import unicodedata
+
+ # NFD decomposition + remove combining marks
+ normalized = unicodedata.normalize('NFD', token)
+ ascii_token = ''.join(
+ c for c in normalized
+ if unicodedata.category(c) != 'Mn'
+ )
+
+ # Uppercase
+ ascii_token = ascii_token.upper()
+
+ # Keep only alphanumeric
+ ascii_token = re.sub(r'[^A-Z0-9]', '', ascii_token)
+
+ return ascii_token
+```
+
+### 4.6 Emic vs Etic Fallback
+
+```python
+@dataclass
+class EmicLabelResolution:
+ """
+ Resolution of emic label for a person.
+
+ Priority:
+ 1. Emic from primary sources (documents from their lifetime)
+ 2. Etic fallback (only if emic truly unrecoverable)
+ """
+
+ resolved_label: EmicLabel
+ resolution_method: str # "emic_primary", "emic_vernacular", "etic_fallback"
+ emic_search_exhausted: bool
+ vernacular_sources_checked: List[str]
+ fallback_justification: Optional[str]
+
+def resolve_emic_label(
+ person_observations: List['PersonObservation'],
+ db
+) -> EmicLabelResolution:
+ """
+ Resolve the emic label for a person from their observations.
+
+ Rules:
+ 1. Search all primary sources for emic names
+ 2. Prefer most frequently used name in primary sources
+ 3. Only use etic fallback if emic truly unrecoverable
+ 4. Vernacular sources must have clear pedigrees
+ 5. Oral traditions without documentation not valid
+ """
+
+ # Collect all name mentions from primary sources
+ emic_candidates = []
+
+ for obs in person_observations:
+ if obs.is_primary_source and obs.is_from_lifetime:
+ for claim in obs.claims:
+ if claim.claim_type in ('full_name', 'given_name', 'title'):
+ emic_candidates.append({
+ 'label': claim.claim_value,
+ 'source': obs.source_url,
+ 'date': obs.source_date,
+ 'is_vernacular': obs.is_vernacular_source
+ })
+
+ if emic_candidates:
+ # Find most common emic label
+ from collections import Counter
+ label_counts = Counter(c['label'] for c in emic_candidates)
+ most_common = label_counts.most_common(1)[0][0]
+
+ best_candidate = next(
+ c for c in emic_candidates if c['label'] == most_common
+ )
+
+ return EmicLabelResolution(
+ resolved_label=EmicLabel.from_full_label(
+ most_common,
+ source_type="primary",
+ source_document=best_candidate['source'],
+ source_date=best_candidate['date'],
+ is_from_primary_source=True,
+ is_vernacular=best_candidate['is_vernacular'],
+ confidence=0.95
+ ),
+ resolution_method="emic_primary",
+ emic_search_exhausted=True,
+ vernacular_sources_checked=[c['source'] for c in emic_candidates if c['is_vernacular']],
+ fallback_justification=None
+ )
+
+ # Check if etic fallback is justified
+ unexplored_vernacular = db.get_unexplored_vernacular_archives(person_observations)
+
+ if unexplored_vernacular:
+ raise EmicLabelNotYetResolvable(
+ f"Emic label not found in explored sources. "
+ f"Unexplored vernacular archives exist: {unexplored_vernacular}. "
+ f"Cannot use etic fallback until these are explored."
+ )
+
+ # Etic fallback (rare)
+ etic_label = db.get_most_common_etic_label(person_observations)
+
+ return EmicLabelResolution(
+ resolved_label=EmicLabel.from_full_label(
+ etic_label,
+ source_type="etic_fallback",
+ source_document="Modern scholarly consensus",
+ source_date=datetime.utcnow().isoformat(),
+ is_from_primary_source=False,
+ is_vernacular=False,
+ confidence=0.70
+ ),
+ resolution_method="etic_fallback",
+ emic_search_exhausted=True,
+ vernacular_sources_checked=[],
+ fallback_justification=(
+ "No emic label found in explored primary sources. "
+ "All known vernacular sources checked. "
+ "Using most common modern scholarly reference."
+ )
+ )
+```
+
+---
+
+## 5. Collision Handling
+
+### 5.1 Collision Detection
+
+Two PPIDs collide when all components except the collision suffix match:
+
+```python
+def detect_collision(new_ppid: str, existing_ppids: Set[str]) -> bool:
+ """
+ Check if new PPID collides with existing identifiers.
+
+ Collision = same base components (before any collision suffix).
+ """
+ base_new = get_base_ppid(new_ppid)
+
+ for existing in existing_ppids:
+ base_existing = get_base_ppid(existing)
+ if base_new == base_existing:
+ return True
+
+ return False
+
+def get_base_ppid(ppid: str) -> str:
+ """Extract base PPID without collision suffix."""
+ # Full PPID may have collision suffix after last token
+ # e.g., "PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG-jan_van_den_berg"
+ # Base: "PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG"
+
+ parts = ppid.split('-')
+
+ # Standard PPID has 11 parts (TYPE + 6 geo + CR + FT + LT)
+ # If more parts, the extra is collision suffix
+ if len(parts) > 11:
+ return '-'.join(parts[:11])
+
+ return ppid
+```
+
+### 5.2 Collision Resolution via Full Emic Label
+
+When collision occurs, append full emic label in snake_case:
+
+```python
+def resolve_collision(
+ base_ppid: str,
+ full_emic_label: str,
+ existing_ppids: Set[str]
+) -> str:
+ """
+ Resolve collision by appending full emic label.
+
+ Example:
+ Base: "PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG"
+ Emic: "Jan van den Berg"
+ Result: "PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG-jan_van_den_berg"
+ """
+ suffix = generate_collision_suffix(full_emic_label)
+ resolved = f"{base_ppid}-{suffix}"
+
+ # Check if still collides (extremely rare)
+ if resolved in existing_ppids:
+ # Add numeric discriminator
+ counter = 2
+ while f"{resolved}_{counter}" in existing_ppids:
+ counter += 1
+ resolved = f"{resolved}_{counter}"
+
+ return resolved
+
+def generate_collision_suffix(full_emic_label: str) -> str:
+ """
+ Generate collision suffix from full emic label.
+
+ Same rules as GHCID collision suffix:
+ - Convert to lowercase snake_case
+ - Remove diacritics
+ - Remove punctuation
+ """
+ import unicodedata
+ import re
+
+ # Normalize unicode
+ normalized = unicodedata.normalize('NFD', full_emic_label)
+ ascii_name = ''.join(
+ c for c in normalized
+ if unicodedata.category(c) != 'Mn'
+ )
+
+ # Lowercase
+ lowercase = ascii_name.lower()
+
+ # Remove punctuation
+ no_punct = re.sub(r"[''`\",.:;!?()[\]{}]", '', lowercase)
+
+ # Replace spaces with underscores
+ underscored = re.sub(r'\s+', '_', no_punct)
+
+ # Remove non-alphanumeric except underscore
+ clean = re.sub(r'[^a-z0-9_]', '', underscored)
+
+ # Collapse multiple underscores
+ final = re.sub(r'_+', '_', clean).strip('_')
+
+ return final
+```
+
+---
+
+## 6. Unknown Components: XX and XXX Placeholders
+
+### 6.1 When Components Are Unknown
+
+Unlike GHCID (where `XX`/`XXX` are temporary and require research), PPID may have permanently unknown components:
+
+| Scenario | Placeholder | Can be PID? |
+|----------|-------------|-------------|
+| Unknown birth country | `XX` | No (remains ID) |
+| Unknown birth region | `XX` | No (remains ID) |
+| Unknown birth place | `XXX` | No (remains ID) |
+| Unknown death country | `XX` | No (remains ID) |
+| Unknown death region | `XX` | No (remains ID) |
+| Unknown death place | `XXX` | No (remains ID) |
+| Unknown century | `XX-XX` | No (remains ID) |
+| Unknown first token | `UNKNOWN` | No (remains ID) |
+| Unknown last token | (empty) | Yes (if mononym) |
+
+### 6.2 ID Examples with Unknown Components
+
+```
+ID-XX-XX-XXX-FR-NM-OMH-20-20-UNKNOWN- # Unknown soldier, Normandy
+ID-NL-NH-AMS-XX-XX-XXX-17-17-REMBRANDT- # Rembrandt, death place unknown
+ID-XX-XX-XXX-XX-XX-XXX-XX-XX-ANONYMOUS- # Completely unknown person
+```
+
+---
+
+## 7. UUID and Numeric Generation
+
+### 7.1 Dual Representation (Same as GHCID)
+
+Every PPID generates three representations:
+
+| Format | Purpose | Example |
+|--------|---------|---------|
+| **Semantic String** | Human-readable | `PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG` |
+| **UUID v5** | Linked data, URIs | `550e8400-e29b-41d4-a716-446655440000` |
+| **Numeric (64-bit)** | Database keys, CSV | `213324328442227739` |
+
+### 7.2 Generation Algorithm
+
+```python
+import uuid
+import hashlib
+
+# PPID namespace UUID (different from GHCID namespace)
+PPID_NAMESPACE = uuid.UUID('f47ac10b-58cc-4372-a567-0e02b2c3d479')
+
+def generate_ppid_identifiers(semantic_ppid: str) -> dict:
+ """
+ Generate all identifier formats from semantic PPID string.
+
+ Returns:
+ {
+ 'semantic': 'PID-NL-NH-AMS-...',
+ 'uuid_v5': '550e8400-...',
+ 'numeric': 213324328442227739
+ }
+ """
+ # UUID v5 from semantic string
+ ppid_uuid = uuid.uuid5(PPID_NAMESPACE, semantic_ppid)
+
+ # Numeric from SHA-256 (64-bit)
+ sha256 = hashlib.sha256(semantic_ppid.encode()).digest()
+ numeric = int.from_bytes(sha256[:8], byteorder='big')
+
+ return {
+ 'semantic': semantic_ppid,
+ 'uuid_v5': str(ppid_uuid),
+ 'numeric': numeric
+ }
+
+
+# Example:
+ppid = "PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG"
+identifiers = generate_ppid_identifiers(ppid)
+# {
+# 'semantic': 'PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG',
+# 'uuid_v5': 'a1b2c3d4-e5f6-5a1b-9c2d-3e4f5a6b7c8d',
+# 'numeric': 1234567890123456789
+# }
+```
+
+---
+
+## 8. Relationship to Person Observations
+
+### 8.1 Distinction: PPID vs Observation Identifiers
+
+| Identifier | Purpose | Structure | Persistence |
+|------------|---------|-----------|-------------|
+| **PPID** | Identify a person (reconstruction) | Geographic + temporal + emic | Permanent (if PID) |
+| **Observation ID** | Identify a specific source observation | GHCID-based + RiC-O | Permanent |
+
+### 8.2 Observation Identifier Structure (Forthcoming)
+
+As noted in the user's input, observation identifiers will use a different pattern:
+
+```
+{REPOSITORY_GHCID}/{CREATOR_GHCID}/{RICO_RECORD_PATH}
+```
+
+Where:
+- **REPOSITORY_GHCID**: GHCID of the institution holding the record
+- **CREATOR_GHCID**: GHCID of the institution that created the record (may be same)
+- **RICO_RECORD_PATH**: RiC-O derived path to RecordSet/Record/RecordPart
+
+Example:
+```
+NL-NH-HAA-A-NHA/NL-NH-HAA-A-NHA/burgerlijke-stand/geboorten/1895/003/045
+│ │ │
+│ │ └── RiC-O path: fonds/series/file/item
+│ └── Creator (same institution)
+└── Repository
+```
+
+This is **separate from PPID** and will be specified in a future document.
+
+---
+
+## 9. Comparison with Original POID/PRID Design
+
+### 9.1 What Changes
+
+| Aspect | POID/PRID (Doc 05) | Revised PPID (This Doc) |
+|--------|-------------------|-------------------------|
+| **Identifier opacity** | Opaque (no semantic content) | Semantic (human-readable) |
+| **Geographic anchoring** | None | Dual (birth + death locations) |
+| **Temporal anchoring** | None | Century range |
+| **Name in identifier** | None | First + last token |
+| **Type prefix** | POID/PRID | ID/PID |
+| **Observation vs Person** | Different identifier types | Completely separate systems |
+| **UUID backing** | Primary | Secondary (derived) |
+| **Collision handling** | UUID collision (rare) | Semantic collision (more common) |
+
+### 9.2 What Stays the Same
+
+- Dual identifier generation (UUID + numeric)
+- Deterministic generation from input
+- Permanent persistence (once PID)
+- Integration with GHCID for institution links
+- Claim-based provenance model
+- PiCo ontology alignment
+
+### 9.3 Transition Plan
+
+If this revised structure is adopted:
+
+1. **Document 05** becomes historical reference
+2. **This document** becomes the authoritative identifier spec
+3. No existing identifiers need migration (this is a new system)
+4. Code examples in other documents need updates
+
+---
+
+## 10. Implementation Considerations
+
+### 10.1 Character Set and Length
+
+```python
+# Maximum lengths
+MAX_COUNTRY_CODE = 2 # ISO 3166-1 alpha-2
+MAX_REGION_CODE = 3 # ISO 3166-2 suffix (some are 3 chars)
+MAX_PLACE_CODE = 3 # GeoNames convention
+MAX_CENTURY_RANGE = 5 # "XX-XX"
+MAX_TOKEN_LENGTH = 20 # Reasonable limit for names
+MAX_COLLISION_SUFFIX = 50 # Full emic label
+
+# Maximum total PPID length (without collision suffix)
+# "PID-" + "XX-XXX-XXX-" * 2 + "XX-XX-" + "TOKEN-TOKEN"
+# = 4 + (2+3+3+4)*2 + 6 + 20 + 20 = ~70 characters
+
+# With collision suffix: ~120 characters max
+```
+
+### 10.2 Validation Regex
+
+```python
+import re
+
+PPID_PATTERN = re.compile(
+ r'^(ID|PID)-' # Type
+ r'([A-Z]{2}|XX)-' # First country
+ r'([A-Z]{2,3}|XX)-' # First region
+ r'([A-Z]{3}|XXX)-' # First place
+ r'([A-Z]{2}|XX)-' # Last country
+ r'([A-Z]{2,3}|XX)-' # Last region
+ r'([A-Z]{3}|XXX)-' # Last place
+ r'(\d{1,2}-\d{1,2}|XX-XX)-' # Century range
+ r'([A-Z0-9]+)-' # First token
+ r'([A-Z0-9]*)' # Last token (may be empty)
+ r'(-[a-z0-9_]+)?$' # Collision suffix (optional)
+)
+
+def validate_ppid(ppid: str) -> tuple[bool, str]:
+ """Validate PPID format."""
+ if not PPID_PATTERN.match(ppid):
+ return False, "Invalid PPID format"
+
+ # Additional semantic validation
+ parts = ppid.split('-')
+
+ # Century range validation
+ if len(parts) >= 9:
+ century_range = f"{parts[7]}-{parts[8]}"
+ if century_range != "XX-XX":
+ try:
+ first_c, last_c = map(int, [parts[7], parts[8]])
+ if last_c < first_c:
+ return False, "Last century cannot be before first century"
+ if first_c < 1 or last_c > 22: # Reasonable bounds
+ return False, "Century out of reasonable range"
+ except ValueError:
+ pass
+
+ return True, "Valid"
+```
+
+---
+
+## 11. Open Questions
+
+### 11.1 BCE Dates
+
+How to handle persons from before Common Era?
+
+**Options**:
+1. Negative century numbers: `-5--4` for 5th-4th century BCE
+2. BCE prefix: `BCE5-BCE4`
+3. Separate identifier scheme for ancient persons
+
+### 11.2 Non-Latin Name Tokens
+
+How to handle names in non-Latin scripts?
+
+**Options**:
+1. Require transliteration (current approach)
+2. Allow Unicode tokens with normalization
+3. Dual representation (original + transliterated)
+
+### 11.3 Disputed Locations
+
+What if birth/death locations are historically disputed?
+
+**Options**:
+1. Use most likely location with note
+2. Use `XX`/`XXX` until resolved
+3. Create multiple IDs for each interpretation
+
+### 11.4 Living Persons
+
+How to handle persons still alive (no death observation)?
+
+**Options**:
+1. Cannot be PID until death
+2. Use `XX-XX-XXX` for death location, current century for range
+3. Separate identifier class for living persons
+
+---
+
+## 12. References
+
+### GHCID Documentation
+- [GHCID PID Scheme](../../GHCID_PID_SCHEME.md)
+- [AGENTS.md: Persistent Identifiers](../../AGENTS.md#persistent-identifiers-ghcid)
+
+### Related PPID Documents
+- [Original Identifier Structure (superseded)](./05_identifier_structure_design.md)
+- [PiCo Ontology Analysis](./03_pico_ontology_analysis.md)
+- [Cultural Naming Conventions](./04_cultural_naming_conventions.md)
+
+### Standards
+- ISO 3166-1: Country codes
+- ISO 3166-2: Subdivision codes
+- GeoNames: Geographic names database
diff --git a/docs/plan/person_pid/11_pico_ppid_comparison.md b/docs/plan/person_pid/11_pico_ppid_comparison.md
new file mode 100644
index 0000000000..ccc6a74413
--- /dev/null
+++ b/docs/plan/person_pid/11_pico_ppid_comparison.md
@@ -0,0 +1,475 @@
+# PiCo vs PPID: Comparative Analysis
+
+**Version**: 0.1.0
+**Last Updated**: 2025-01-09
+**Related**: [PPID-GHCID Alignment](./10_ppid_ghcid_alignment.md) | [PiCo Ontology Analysis](./03_pico_ontology_analysis.md)
+
+---
+
+## 1. Executive Summary
+
+This document compares the **PiCo (Persons in Context)** ontology developed by CBG|Centrum voor Familiegeschiedenis with our proposed **PPID (Person Persistent Identifier)** system. The analysis is based on deep research into PiCo's implementation in Open Archives (openarchieven.nl) and the WieWasWie platform.
+
+### 1.1 Key Finding
+
+PiCo and PPID serve **complementary purposes**:
+
+| System | Primary Purpose | Identifier Style | Scope |
+|--------|-----------------|------------------|-------|
+| **PiCo** | Data model for person observations in genealogical sources | Opaque UUIDs | Genealogical records (civil registries, church books) |
+| **PPID** | Persistent identifiers for heritage sector persons | Semantic geographic-temporal | Heritage custodian staff and historical figures |
+
+**Recommendation**: PPID should **adopt PiCo's ontological distinctions** (PersonObservation vs PersonReconstruction) while using its own **semantic identifier format** aligned with GHCID conventions.
+
+---
+
+## 2. PiCo Architecture (From Research)
+
+### 2.1 Core Classes
+
+From the PiCo specification at `personsincontext.org/model`:
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│ PiCo MODEL │
+├─────────────────────────────────────────────────────────────────┤
+│ │
+│ ┌─────────────────────────────────────────────────────────┐ │
+│ │ Person │ │
+│ │ (Container class - not used directly) │ │
+│ │ │ │
+│ │ ┌─────────────────┐ ┌─────────────────┐ │ │
+│ │ │ PersonObservation│ │PersonReconstruction │ │
+│ │ │ │ │ │ │ │
+│ │ │ - Data as found │ │ - Curated identity│ │ │
+│ │ │ on Source │ │ - Links multiple │ │ │
+│ │ │ - hadPrimarySource │ observations │ │ │
+│ │ │ - hasRole │ │ - wasDerivedFrom │ │ │
+│ │ │ - hasAge │ │ - wasGeneratedBy │ │ │
+│ │ │ - hasOccupation │ │ - wasRevisionOf │ │ │
+│ │ └─────────────────┘ └─────────────────┘ │ │
+│ └─────────────────────────────────────────────────────────┘ │
+│ │
+│ ┌─────────────────────────────────────────────────────────┐ │
+│ │ Source │ │
+│ │ (schema:ArchiveComponent) │ │
+│ │ - name, dateCreated, holdingArchive, associatedMedia │ │
+│ └─────────────────────────────────────────────────────────┘ │
+│ │
+│ ┌─────────────────────────────────────────────────────────┐ │
+│ │ PersonName (PNV) │ │
+│ │ - literalName, givenName, baseSurname, surnamePrefix │ │
+│ │ - patronym, initials │ │
+│ └─────────────────────────────────────────────────────────┘ │
+│ │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### 2.2 PiCo Identifier Structure in Open Archives
+
+From the Open Archives API documentation:
+
+```
+URI Format: https://www.openarchieven.nl/{3-letter-archive-code}:{uuid}[/{token}]
+
+Examples:
+- https://www.openarchieven.nl/rat:48c2b836-385f-11e0-bcd1-8edf61960649
+- https://www.openarchieven.nl/elo:f5169776-db74-70a3-51e3-20c15291429c
+
+Components:
+- rat = Regionaal Archief Tilburg (3-letter archive code)
+- 48c2b836-385f-11e0-bcd1-8edf61960649 = UUID of the record
+- /ttl:pico = Optional token for content negotiation (Turtle + PiCo profile)
+```
+
+### 2.3 PiCo PersonObservation Example (Actual Data)
+
+From Open Archives API response:
+
+```turtle
+@prefix oa: .
+@prefix pico: .
+@prefix prov: .
+@prefix sdo: .
+
+oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f30464-3867-11e0-bcd1-8edf61960649
+ a pico:PersonObservation ;
+ prov:hadPrimarySource oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649 ;
+ pico:hasRole "Moeder" ;
+ sdo:children oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f2ae9c-... ;
+ sdo:spouse oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f2da16-... ;
+ sdo:gender sdo:Female ;
+ sdo:name "Cornelia Verhulst" ;
+ sdo:familyName "Verhulst" ;
+ sdo:givenName "Cornelia" .
+```
+
+### 2.4 PiCo PersonReconstruction Example
+
+From PiCo specification:
+
+```turtle
+cbg:person_reconstruction_2
+ a pico:PersonReconstruction ;
+ sdo:name "Anna Maria Koppen" ;
+ sdo:familyName "Koppen" ;
+ sdo:givenName "Anna" ;
+ sdo:gender sdo:Female ;
+ sdo:birthPlace "Haarlem" ;
+ sdo:birthDate "1860-03-31"^^xsd:date ;
+ sdo:deathPlace "Detroit, VSA" ;
+ sdo:deathDate "1926"^^xsd:gYear ;
+ prov:wasDerivedFrom nha:huwelijksakte_1885_321_po_1,
+ cbg:NL-HaCBG_1755_0341_142_po_1 ;
+ prov:wasGeneratedBy cbg:reconstruction_activity_01 .
+```
+
+---
+
+## 3. Detailed Comparison
+
+### 3.1 Identifier Format
+
+| Aspect | PiCo (CBG/Open Archives) | PPID (Proposed) |
+|--------|--------------------------|-----------------|
+| **Format** | `{archive}:{uuid}` | `{TYPE}-{FC}-{FR}-{FP}-{LC}-{LR}-{LP}-{CR}-{FT}-{LT}` |
+| **Example** | `rat:48c2b836-385f-11e0-bcd1-8edf61960649` | `PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG` |
+| **Human Readable** | No (opaque UUID) | Yes (semantic components) |
+| **Archive Prefix** | Yes (3-letter code) | No (implicit via source) |
+| **Geographic** | No | Yes (birth + death locations) |
+| **Temporal** | No | Yes (century range) |
+| **Name** | No | Yes (first + last token) |
+
+### 3.2 Conceptual Model
+
+| Concept | PiCo | PPID |
+|---------|------|------|
+| **Raw Observation** | `PersonObservation` | Observation (separate system) |
+| **Curated Identity** | `PersonReconstruction` | `PID` (promoted from `ID`) |
+| **Temporary State** | Not explicit | `ID` class |
+| **Permanent State** | All URIs persistent | `PID` class only |
+| **Provenance** | PROV-O (wasGeneratedBy, wasDerivedFrom) | PROV-O + XPath claims |
+| **Name Vocabulary** | PNV (Person Name Vocabulary) | Emic labels from sources |
+
+### 3.3 Persistence Philosophy
+
+| Aspect | PiCo | PPID |
+|--------|------|------|
+| **All identifiers persistent?** | Yes | No - only PID class |
+| **Temporary identifiers?** | No explicit concept | Yes - ID class |
+| **Promotion mechanism?** | N/A | ID → PID when criteria met |
+| **Epistemic uncertainty?** | Implicit (multiple observations) | Explicit (ID vs PID distinction) |
+| **Living persons?** | Can have PersonReconstruction | Must remain ID until death |
+
+### 3.4 Geographic Handling
+
+| Aspect | PiCo | PPID |
+|--------|------|------|
+| **In identifier?** | No | Yes |
+| **In properties?** | Yes (birthPlace, deathPlace) | Also yes |
+| **Format** | Free text or URI | ISO 3166-1/2 + GeoNames |
+| **Historical mapping?** | Encouraged (link to thesaurus) | Required (historical → modern) |
+| **Example** | `sdo:birthPlace "Haarlem"` | `...-NL-NH-HAA-...` |
+
+### 3.5 Temporal Handling
+
+| Aspect | PiCo | PPID |
+|--------|------|------|
+| **In identifier?** | No | Yes (century range) |
+| **Date format** | ISO 8601 (xsd:date) | Century numbers |
+| **BCE support** | Via negative years | Via negative centuries (-5--4) |
+| **Precision** | Day-level possible | Century-level only in ID |
+| **Example** | `sdo:birthDate "1860-03-31"^^xsd:date` | `...-19-20-...` |
+
+---
+
+## 4. Key Differences Explained
+
+### 4.1 Why PiCo Uses Opaque UUIDs
+
+PiCo's design goals (from GitHub README):
+
+1. **Successor to A2A**: Designed to replace XML-based Archive-to-Archive standard
+2. **Genealogical focus**: Primary use case is WieWasWie ancestor search
+3. **Linked Data**: Interoperability via RDF, not human-readable identifiers
+4. **Archive-centric**: Identifiers include archive code prefix
+
+PiCo's UUID approach is appropriate for:
+- Massive genealogical databases (millions of records)
+- Automated conversion from A2A
+- Machine-to-machine data exchange
+
+### 4.2 Why PPID Uses Semantic Identifiers
+
+PPID's design goals:
+
+1. **GHCID alignment**: Consistent identifier philosophy across GLAM project
+2. **Heritage sector focus**: Staff of heritage institutions, historical figures
+3. **Human discovery**: Identifiers aid browsing and deduplication
+4. **Epistemic honesty**: Explicit distinction between ID (uncertain) and PID (verified)
+5. **Scholarly citation**: Identifiers can be meaningfully cited in publications
+
+PPID's semantic approach is appropriate for:
+- Smaller, curated datasets
+- Human curation workflows
+- Cross-system deduplication
+- Scholarly reference
+
+### 4.3 The ID/PID Distinction (Unique to PPID)
+
+PiCo assumes all identifiers are permanent once created. PPID introduces explicit epistemic states:
+
+```
+PiCo:
+ PersonObservation (always permanent)
+ ↓ prov:wasDerivedFrom
+ PersonReconstruction (always permanent)
+
+PPID:
+ Observation (separate system, permanent)
+ ↓
+ ID (temporary, may change)
+ ↓ promotion when criteria met
+ PID (permanent, never changes)
+```
+
+**Why this matters for heritage sector**:
+
+- **Living persons**: Cannot have verified death observation → must remain ID
+- **Incomplete records**: May never have enough data for PID promotion
+- **Ongoing research**: Archives not yet explored → cannot claim PID status
+- **Scholarly integrity**: Prevents overclaiming certainty
+
+---
+
+## 5. Integration Recommendations
+
+### 5.1 Adopt PiCo Ontological Distinctions
+
+PPID should use PiCo's class hierarchy:
+
+```turtle
+@prefix ppid: .
+@prefix pico: .
+
+# PPID extends PiCo
+ppid:PersonID rdfs:subClassOf pico:PersonReconstruction .
+ppid:PersonPID rdfs:subClassOf pico:PersonReconstruction .
+
+# PPID observations link to source observations
+ppid:hasSourceObservation rdfs:subPropertyOf prov:wasDerivedFrom ;
+ rdfs:range pico:PersonObservation .
+```
+
+### 5.2 Maintain PPID Semantic Identifier Format
+
+Do not adopt PiCo's opaque UUID format. Keep semantic GHCID-aligned format:
+
+```
+PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG
+```
+
+**Rationale**: GHCID project-wide consistency, human discoverability, scholarly citation.
+
+### 5.3 Use PNV for Name Properties
+
+Adopt PiCo's use of Person Name Vocabulary for structured name data:
+
+```turtle
+ppid:PRID-... pnv:hasName [
+ a pnv:PersonName ;
+ pnv:literalName "Jan van den Berg" ;
+ pnv:givenName "Jan" ;
+ pnv:surnamePrefix "van den" ;
+ pnv:baseSurname "Berg"
+] .
+```
+
+### 5.4 Use PROV-O for Provenance
+
+Adopt PiCo's PROV-O patterns for reconstruction provenance:
+
+```turtle
+ppid:PID-NL-NH-AMS-...
+ prov:wasDerivedFrom , ;
+ prov:wasGeneratedBy [
+ a prov:Activity ;
+ prov:startedAtTime "2025-01-09T00:00:00"^^xsd:dateTime ;
+ prov:wasAssociatedWith ppid:curator-001
+ ] .
+```
+
+### 5.5 Separate Observation Identifiers
+
+As noted in the revised PPID design, observations use a **different identifier system**:
+
+```
+{REPOSITORY_GHCID}/{CREATOR_GHCID}/{RiC-O-PATH}
+
+Example:
+NL-NH-HAA-A-NHA/NL-NH-HAA-A-NHA/burgerlijke-stand/geboorten/1895/003/045
+```
+
+This is distinct from PiCo's `{archive}:{uuid}` but serves similar purposes.
+
+---
+
+## 6. Resolved Open Questions
+
+Based on user clarifications:
+
+### 6.1 BCE Date Handling
+
+**Resolution**: Use negative century numbers.
+
+```
+Format: {first_century}-{last_century}
+
+Examples:
+- 5th century BCE to 4th century BCE: "-5--4"
+- 1st century BCE to 1st century CE: "-1-1"
+- 5th century BCE to 3rd century CE: "-5-3"
+```
+
+This aligns with ISO 8601 extended format which uses negative years for BCE dates.
+
+### 6.2 Non-Latin Script Transliteration
+
+**Resolution**: Apply same transliteration rules as GHCID (documented in AGENTS.md).
+
+| Script | Standard |
+|--------|----------|
+| Cyrillic | ISO 9:1995 |
+| Chinese | Hanyu Pinyin (ISO 7098) |
+| Japanese | Modified Hepburn |
+| Korean | Revised Romanization |
+| Arabic | ISO 233-2/3 |
+| Hebrew | ISO 259-3 |
+| Greek | ISO 843 |
+
+### 6.3 Disputed Locations
+
+**Resolution**: Not a PPID concern - handled by ISO standardization.
+
+When historical locations are disputed:
+- Use the ISO-standardized modern location
+- Document the dispute in observation metadata
+- Do not encode uncertainty in the identifier itself
+
+### 6.4 Living Persons
+
+**Resolution**: Living persons are **always ID class** and can only be promoted to PID after death.
+
+```python
+def can_promote_to_pid(person_id: str, observations: list) -> bool:
+ """
+ Check if ID can be promoted to PID.
+
+ Living persons can NEVER be promoted.
+ """
+ # Check for death observation
+ death_obs = [o for o in observations if o.is_death_record or o.is_post_death]
+
+ if not death_obs:
+ # No death observation = person may be alive = cannot be PID
+ return False
+
+ # Continue with other promotion criteria...
+ return check_other_criteria(observations)
+```
+
+**Rationale**:
+1. PID requires verified last observation (death)
+2. Living persons have incomplete lifecycle
+3. Future observations may change identity assessment
+4. Privacy considerations for living individuals
+
+---
+
+## 7. Implementation Alignment
+
+### 7.1 Class Mapping
+
+| PiCo Class | PPID Equivalent | Notes |
+|------------|-----------------|-------|
+| `pico:Person` | (Container) | Not used directly |
+| `pico:PersonObservation` | Observation (separate system) | Different identifier format |
+| `pico:PersonReconstruction` | `ppid:PersonID` or `ppid:PersonPID` | Split by epistemic certainty |
+| `pico:Source` | `schema:ArchiveComponent` | Same as PiCo |
+| `pnv:PersonName` | `pnv:PersonName` | Adopt PNV |
+
+### 7.2 Property Mapping
+
+| PiCo Property | PPID Usage | Notes |
+|---------------|------------|-------|
+| `prov:hadPrimarySource` | Same | For observations |
+| `prov:wasDerivedFrom` | Same | PRID from POIDs |
+| `prov:wasGeneratedBy` | Same | Activity provenance |
+| `prov:wasRevisionOf` | Same | Version history |
+| `sdo:birthDate` | Same | In properties |
+| `sdo:birthPlace` | Same + in identifier | Dual representation |
+| `sdo:deathDate` | Same | In properties |
+| `sdo:deathPlace` | Same + in identifier | Dual representation |
+| `pico:hasRole` | Same | For observations |
+| `pico:hasAge` | Same | When birthDate unknown |
+
+### 7.3 Namespace Declarations
+
+```turtle
+@prefix ppid: .
+@prefix pico: .
+@prefix pnv: .
+@prefix prov: .
+@prefix sdo: .
+@prefix xsd: .
+```
+
+---
+
+## 8. Conclusion
+
+### 8.1 What PPID Adopts from PiCo
+
+1. **PersonObservation/PersonReconstruction distinction** - Core ontological pattern
+2. **PROV-O provenance model** - wasDerivedFrom, wasGeneratedBy, wasRevisionOf
+3. **Person Name Vocabulary (PNV)** - Structured name representation
+4. **Schema.org properties** - birthDate, deathDate, birthPlace, deathPlace, etc.
+5. **Source linking** - hadPrimarySource, holdingArchive
+
+### 8.2 What PPID Does Differently
+
+1. **Semantic identifier format** - Geographic-temporal-emic instead of opaque UUID
+2. **ID/PID epistemic distinction** - Explicit uncertainty modeling
+3. **Living person handling** - Must remain ID until death
+4. **GHCID alignment** - Consistent with heritage custodian identifier philosophy
+5. **Century range encoding** - Temporal disambiguation in identifier
+6. **Emic label tokens** - Name components in identifier for discoverability
+
+### 8.3 Interoperability Path
+
+PPID can be fully interoperable with PiCo systems via:
+
+1. **OWL mappings**: `ppid:PersonPID rdfs:subClassOf pico:PersonReconstruction`
+2. **SPARQL federation**: Query across PPID and PiCo endpoints
+3. **Bidirectional links**: `owl:sameAs` between PPID and PiCo identifiers
+4. **Profile negotiation**: Serve data in PiCo format via content negotiation
+
+---
+
+## 9. References
+
+### PiCo Resources
+- PiCo Specification: https://personsincontext.org/model
+- PiCo GitHub: https://github.com/CBG-Centrum-voor-familiegeschiedenis/PiCo
+- Open Archives API: https://www.openarchieven.nl/api/docs/uri.php
+- CBG: https://cbg.nl/
+
+### Standards
+- Person Name Vocabulary (PNV): https://w3id.org/pnv
+- PROV-O: https://www.w3.org/TR/prov-o/
+- Schema.org: https://schema.org/
+
+### Related PPID Documents
+- [PPID-GHCID Alignment](./10_ppid_ghcid_alignment.md)
+- [PiCo Ontology Analysis](./03_pico_ontology_analysis.md)
+- [Identifier Structure Design](./05_identifier_structure_design.md)