glam/docs/plan/person_pid/11_pico_ppid_comparison.md

18 KiB

PiCo vs PPID: Comparative Analysis

Version: 0.1.0
Last Updated: 2025-01-09
Related: PPID-GHCID Alignment | PiCo Ontology Analysis


1. Executive Summary

This document compares the PiCo (Persons in Context) ontology developed by CBG|Centrum voor Familiegeschiedenis with our proposed PPID (Person Persistent Identifier) system. The analysis is based on deep research into PiCo's implementation in Open Archives (openarchieven.nl) and the WieWasWie platform.

1.1 Key Finding

PiCo and PPID serve complementary purposes:

System Primary Purpose Identifier Style Scope
PiCo Data model for person observations in genealogical sources Opaque UUIDs Genealogical records (civil registries, church books)
PPID Persistent identifiers for heritage sector persons Semantic geographic-temporal Heritage custodian staff and historical figures

Recommendation: PPID should adopt PiCo's ontological distinctions (PersonObservation vs PersonReconstruction) while using its own semantic identifier format aligned with GHCID conventions.


2. PiCo Architecture (From Research)

2.1 Core Classes

From the PiCo specification at personsincontext.org/model:

┌─────────────────────────────────────────────────────────────────┐
│                         PiCo MODEL                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                     Person                                │    │
│  │        (Container class - not used directly)             │    │
│  │                                                          │    │
│  │    ┌─────────────────┐    ┌─────────────────┐           │    │
│  │    │ PersonObservation│    │PersonReconstruction         │    │
│  │    │                 │    │                 │           │    │
│  │    │ - Data as found │    │ - Curated identity│          │    │
│  │    │   on Source     │    │ - Links multiple │          │    │
│  │    │ - hadPrimarySource   │   observations   │          │    │
│  │    │ - hasRole       │    │ - wasDerivedFrom │          │    │
│  │    │ - hasAge        │    │ - wasGeneratedBy │          │    │
│  │    │ - hasOccupation │    │ - wasRevisionOf  │          │    │
│  │    └─────────────────┘    └─────────────────┘           │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                     Source                                │    │
│  │  (schema:ArchiveComponent)                               │    │
│  │  - name, dateCreated, holdingArchive, associatedMedia   │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                   PersonName (PNV)                        │    │
│  │  - literalName, givenName, baseSurname, surnamePrefix   │    │
│  │  - patronym, initials                                    │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

2.2 PiCo Identifier Structure in Open Archives

From the Open Archives API documentation:

URI Format: https://www.openarchieven.nl/{3-letter-archive-code}:{uuid}[/{token}]

Examples:
- https://www.openarchieven.nl/rat:48c2b836-385f-11e0-bcd1-8edf61960649
- https://www.openarchieven.nl/elo:f5169776-db74-70a3-51e3-20c15291429c

Components:
- rat = Regionaal Archief Tilburg (3-letter archive code)
- 48c2b836-385f-11e0-bcd1-8edf61960649 = UUID of the record
- /ttl:pico = Optional token for content negotiation (Turtle + PiCo profile)

2.3 PiCo PersonObservation Example (Actual Data)

From Open Archives API response:

@prefix oa: <https://www.openarchieven.nl/id/> .
@prefix pico: <https://personsincontext.org/model#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix sdo: <https://schema.org/> .

oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f30464-3867-11e0-bcd1-8edf61960649
    a pico:PersonObservation ;
    prov:hadPrimarySource oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649 ;
    pico:hasRole "Moeder" ;
    sdo:children oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f2ae9c-... ;
    sdo:spouse oa:record_rat_48c2b836-385f-11e0-bcd1-8edf61960649_Person_22f2da16-... ;
    sdo:gender sdo:Female ;
    sdo:name "Cornelia Verhulst" ;
    sdo:familyName "Verhulst" ;
    sdo:givenName "Cornelia" .

2.4 PiCo PersonReconstruction Example

From PiCo specification:

cbg:person_reconstruction_2
    a pico:PersonReconstruction ;
    sdo:name "Anna Maria Koppen" ;
    sdo:familyName "Koppen" ;
    sdo:givenName "Anna" ;
    sdo:gender sdo:Female ;
    sdo:birthPlace "Haarlem" ;
    sdo:birthDate "1860-03-31"^^xsd:date ;
    sdo:deathPlace "Detroit, VSA" ;
    sdo:deathDate "1926"^^xsd:gYear ;
    prov:wasDerivedFrom nha:huwelijksakte_1885_321_po_1, 
                        cbg:NL-HaCBG_1755_0341_142_po_1 ;
    prov:wasGeneratedBy cbg:reconstruction_activity_01 .

3. Detailed Comparison

3.1 Identifier Format

Aspect PiCo (CBG/Open Archives) PPID (Proposed)
Format {archive}:{uuid} {TYPE}-{FC}-{FR}-{FP}-{LC}-{LR}-{LP}-{CR}-{FT}-{LT}
Example rat:48c2b836-385f-11e0-bcd1-8edf61960649 PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG
Human Readable No (opaque UUID) Yes (semantic components)
Archive Prefix Yes (3-letter code) No (implicit via source)
Geographic No Yes (birth + death locations)
Temporal No Yes (century range)
Name No Yes (first + last token)

3.2 Conceptual Model

Concept PiCo PPID
Raw Observation PersonObservation Observation (separate system)
Curated Identity PersonReconstruction PID (promoted from ID)
Temporary State Not explicit ID class
Permanent State All URIs persistent PID class only
Provenance PROV-O (wasGeneratedBy, wasDerivedFrom) PROV-O + XPath claims
Name Vocabulary PNV (Person Name Vocabulary) Emic labels from sources

3.3 Persistence Philosophy

Aspect PiCo PPID
All identifiers persistent? Yes No - only PID class
Temporary identifiers? No explicit concept Yes - ID class
Promotion mechanism? N/A ID → PID when criteria met
Epistemic uncertainty? Implicit (multiple observations) Explicit (ID vs PID distinction)
Living persons? Can have PersonReconstruction Must remain ID until death

3.4 Geographic Handling

Aspect PiCo PPID
In identifier? No Yes
In properties? Yes (birthPlace, deathPlace) Also yes
Format Free text or URI ISO 3166-1/2 + GeoNames
Historical mapping? Encouraged (link to thesaurus) Required (historical → modern)
Example sdo:birthPlace "Haarlem" ...-NL-NH-HAA-...

3.5 Temporal Handling

Aspect PiCo PPID
In identifier? No Yes (century range)
Date format ISO 8601 (xsd:date) Century numbers
BCE support Via negative years Via negative centuries (-5--4)
Precision Day-level possible Century-level only in ID
Example sdo:birthDate "1860-03-31"^^xsd:date ...-19-20-...

4. Key Differences Explained

4.1 Why PiCo Uses Opaque UUIDs

PiCo's design goals (from GitHub README):

  1. Successor to A2A: Designed to replace XML-based Archive-to-Archive standard
  2. Genealogical focus: Primary use case is WieWasWie ancestor search
  3. Linked Data: Interoperability via RDF, not human-readable identifiers
  4. Archive-centric: Identifiers include archive code prefix

PiCo's UUID approach is appropriate for:

  • Massive genealogical databases (millions of records)
  • Automated conversion from A2A
  • Machine-to-machine data exchange

4.2 Why PPID Uses Semantic Identifiers

PPID's design goals:

  1. GHCID alignment: Consistent identifier philosophy across GLAM project
  2. Heritage sector focus: Staff of heritage institutions, historical figures
  3. Human discovery: Identifiers aid browsing and deduplication
  4. Epistemic honesty: Explicit distinction between ID (uncertain) and PID (verified)
  5. Scholarly citation: Identifiers can be meaningfully cited in publications

PPID's semantic approach is appropriate for:

  • Smaller, curated datasets
  • Human curation workflows
  • Cross-system deduplication
  • Scholarly reference

4.3 The ID/PID Distinction (Unique to PPID)

PiCo assumes all identifiers are permanent once created. PPID introduces explicit epistemic states:

PiCo:
  PersonObservation (always permanent)
      ↓ prov:wasDerivedFrom
  PersonReconstruction (always permanent)

PPID:
  Observation (separate system, permanent)
      ↓
  ID (temporary, may change)
      ↓ promotion when criteria met
  PID (permanent, never changes)

Why this matters for heritage sector:

  • Living persons: Cannot have verified death observation → must remain ID
  • Incomplete records: May never have enough data for PID promotion
  • Ongoing research: Archives not yet explored → cannot claim PID status
  • Scholarly integrity: Prevents overclaiming certainty

5. Integration Recommendations

5.1 Adopt PiCo Ontological Distinctions

PPID should use PiCo's class hierarchy:

@prefix ppid: <https://ppid.org/> .
@prefix pico: <https://personsincontext.org/model#> .

# PPID extends PiCo
ppid:PersonID rdfs:subClassOf pico:PersonReconstruction .
ppid:PersonPID rdfs:subClassOf pico:PersonReconstruction .

# PPID observations link to source observations
ppid:hasSourceObservation rdfs:subPropertyOf prov:wasDerivedFrom ;
    rdfs:range pico:PersonObservation .

5.2 Maintain PPID Semantic Identifier Format

Do not adopt PiCo's opaque UUID format. Keep semantic GHCID-aligned format:

PID-NL-NH-AMS-NL-NH-HAA-19-20-JAN-BERG

Rationale: GHCID project-wide consistency, human discoverability, scholarly citation.

5.3 Use PNV for Name Properties

Adopt PiCo's use of Person Name Vocabulary for structured name data:

ppid:PRID-... pnv:hasName [
    a pnv:PersonName ;
    pnv:literalName "Jan van den Berg" ;
    pnv:givenName "Jan" ;
    pnv:surnamePrefix "van den" ;
    pnv:baseSurname "Berg"
] .

5.4 Use PROV-O for Provenance

Adopt PiCo's PROV-O patterns for reconstruction provenance:

ppid:PID-NL-NH-AMS-...
    prov:wasDerivedFrom <observation-1>, <observation-2> ;
    prov:wasGeneratedBy [
        a prov:Activity ;
        prov:startedAtTime "2025-01-09T00:00:00"^^xsd:dateTime ;
        prov:wasAssociatedWith ppid:curator-001
    ] .

5.5 Separate Observation Identifiers

As noted in the revised PPID design, observations use a different identifier system:

{REPOSITORY_GHCID}/{CREATOR_GHCID}/{RiC-O-PATH}

Example:
NL-NH-HAA-A-NHA/NL-NH-HAA-A-NHA/burgerlijke-stand/geboorten/1895/003/045

This is distinct from PiCo's {archive}:{uuid} but serves similar purposes.


6. Resolved Open Questions

Based on user clarifications:

6.1 BCE Date Handling

Resolution: Use negative century numbers.

Format: {first_century}-{last_century}

Examples:
- 5th century BCE to 4th century BCE: "-5--4"
- 1st century BCE to 1st century CE: "-1-1"
- 5th century BCE to 3rd century CE: "-5-3"

This aligns with ISO 8601 extended format which uses negative years for BCE dates.

6.2 Non-Latin Script Transliteration

Resolution: Apply same transliteration rules as GHCID (documented in AGENTS.md).

Script Standard
Cyrillic ISO 9:1995
Chinese Hanyu Pinyin (ISO 7098)
Japanese Modified Hepburn
Korean Revised Romanization
Arabic ISO 233-2/3
Hebrew ISO 259-3
Greek ISO 843

6.3 Disputed Locations

Resolution: Not a PPID concern - handled by ISO standardization.

When historical locations are disputed:

  • Use the ISO-standardized modern location
  • Document the dispute in observation metadata
  • Do not encode uncertainty in the identifier itself

6.4 Living Persons

Resolution: Living persons are always ID class and can only be promoted to PID after death.

def can_promote_to_pid(person_id: str, observations: list) -> bool:
    """
    Check if ID can be promoted to PID.
    
    Living persons can NEVER be promoted.
    """
    # Check for death observation
    death_obs = [o for o in observations if o.is_death_record or o.is_post_death]
    
    if not death_obs:
        # No death observation = person may be alive = cannot be PID
        return False
    
    # Continue with other promotion criteria...
    return check_other_criteria(observations)

Rationale:

  1. PID requires verified last observation (death)
  2. Living persons have incomplete lifecycle
  3. Future observations may change identity assessment
  4. Privacy considerations for living individuals

7. Implementation Alignment

7.1 Class Mapping

PiCo Class PPID Equivalent Notes
pico:Person (Container) Not used directly
pico:PersonObservation Observation (separate system) Different identifier format
pico:PersonReconstruction ppid:PersonID or ppid:PersonPID Split by epistemic certainty
pico:Source schema:ArchiveComponent Same as PiCo
pnv:PersonName pnv:PersonName Adopt PNV

7.2 Property Mapping

PiCo Property PPID Usage Notes
prov:hadPrimarySource Same For observations
prov:wasDerivedFrom Same PRID from POIDs
prov:wasGeneratedBy Same Activity provenance
prov:wasRevisionOf Same Version history
sdo:birthDate Same In properties
sdo:birthPlace Same + in identifier Dual representation
sdo:deathDate Same In properties
sdo:deathPlace Same + in identifier Dual representation
pico:hasRole Same For observations
pico:hasAge Same When birthDate unknown

7.3 Namespace Declarations

@prefix ppid: <https://ppid.org/> .
@prefix pico: <https://personsincontext.org/model#> .
@prefix pnv: <https://w3id.org/pnv#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix sdo: <https://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

8. Conclusion

8.1 What PPID Adopts from PiCo

  1. PersonObservation/PersonReconstruction distinction - Core ontological pattern
  2. PROV-O provenance model - wasDerivedFrom, wasGeneratedBy, wasRevisionOf
  3. Person Name Vocabulary (PNV) - Structured name representation
  4. Schema.org properties - birthDate, deathDate, birthPlace, deathPlace, etc.
  5. Source linking - hadPrimarySource, holdingArchive

8.2 What PPID Does Differently

  1. Semantic identifier format - Geographic-temporal-emic instead of opaque UUID
  2. ID/PID epistemic distinction - Explicit uncertainty modeling
  3. Living person handling - Must remain ID until death
  4. GHCID alignment - Consistent with heritage custodian identifier philosophy
  5. Century range encoding - Temporal disambiguation in identifier
  6. Emic label tokens - Name components in identifier for discoverability

8.3 Interoperability Path

PPID can be fully interoperable with PiCo systems via:

  1. OWL mappings: ppid:PersonPID rdfs:subClassOf pico:PersonReconstruction
  2. SPARQL federation: Query across PPID and PiCo endpoints
  3. Bidirectional links: owl:sameAs between PPID and PiCo identifiers
  4. Profile negotiation: Serve data in PiCo format via content negotiation

9. References

PiCo Resources

Standards