glam/docs/plan/person_pid/03_pico_ontology_analysis.md

16 KiB

PiCo Ontology Analysis

Version: 0.1.0
Last Updated: 2025-01-09
Source: https://github.com/CBG-Centrum-voor-familiegeschiedenis/PiCo
Related: Executive Summary | Claims and Provenance


1. Overview

PiCo (Persons in Context) is an ontology developed by the CBG-Centrum-voor-familiegeschiedenis (Center for Family History) in the Netherlands. It provides a conceptual framework for modeling persons in historical sources with explicit distinction between observations (what sources say) and reconstructions (what we conclude).

This distinction is fundamental to the PPID design and directly informs our two-level identifier architecture.


2. Core Philosophy

2.1 The Observation-Reconstruction Distinction

PiCo's central innovation is the explicit separation of:

Concept Definition Example
PersonObservation A person as described in a specific source "The baptism register states 'Johannes, son of Pieter'"
PersonReconstruction A curated identity derived from one or more observations "Johannes Pietersen van der Berg (1692-1756)"

This mirrors the genealogical research process:

┌─────────────────────────────────────────────────────────────────┐
│                     RESEARCH WORKFLOW                            │
│                                                                  │
│  Source A          Source B          Source C                   │
│  (Baptism)         (Marriage)        (Burial)                   │
│      │                 │                 │                       │
│      ▼                 ▼                 ▼                       │
│  ┌────────┐       ┌────────┐       ┌────────┐                   │
│  │ Person │       │ Person │       │ Person │                   │
│  │ Obs. A │       │ Obs. B │       │ Obs. C │                   │
│  └────────┘       └────────┘       └────────┘                   │
│      │                 │                 │                       │
│      └────────────────┬┴─────────────────┘                       │
│                       │                                          │
│                       ▼  (researcher reasoning)                  │
│              ┌─────────────────┐                                 │
│              │     Person      │                                 │
│              │ Reconstruction  │                                 │
│              │ "Johannes..."   │                                 │
│              └─────────────────┘                                 │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

2.2 Why This Matters

Benefit Description
Transparency Clear separation of evidence from conclusions
Traceability Every assertion traceable to source
Revision safety New evidence can update reconstruction without losing observations
Scholarly integrity Supports genealogical proof standards
Conflict handling Contradictory sources can coexist

3. Ontology Structure

3.1 Namespace and Prefixes

@prefix picom: <https://personsincontext.org/model#> .
@prefix pnv:   <https://w3id.org/pnv#> .
@prefix prov:  <http://www.w3.org/ns/prov#> .
@prefix bio:   <http://purl.org/vocab/bio/0.1/> .
@prefix schema: <http://schema.org/> .

3.2 Class Hierarchy

schema:Person
    │
    ├── picom:PersonObservation
    │       │
    │       └── (represents a person as found in a single source)
    │
    └── picom:PersonReconstruction
            │
            └── (represents a curated person identity)

3.3 Core Classes

PersonObservation

picom:PersonObservation a owl:Class ;
    rdfs:subClassOf schema:Person ;
    rdfs:label "Person Observation"@en ;
    rdfs:comment """A person as observed/described in a specific source. 
                    This represents what the source says, not necessarily 
                    what is true."""@en .

Key properties:

  • picom:hasName → Name as recorded in source
  • picom:hasRole → Role mentioned in source
  • picom:inRecord → Link to source document
  • prov:wasDerivedFrom → Source provenance

PersonReconstruction

picom:PersonReconstruction a owl:Class ;
    rdfs:subClassOf schema:Person ;
    rdfs:label "Person Reconstruction"@en ;
    rdfs:comment """A curated person identity constructed from one or more 
                    PersonObservations through research and reasoning."""@en .

Key properties:

  • prov:wasDerivedFrom → Links to source PersonObservations
  • picom:hasName → Canonical name form(s)
  • bio:birth / bio:death → Life events
  • picom:hasRole → Aggregated roles

4. Integration with Existing Ontologies

PiCo builds on established vocabularies rather than reinventing:

4.1 Schema.org

PiCo Usage Schema.org Class/Property
Person base class schema:Person
Birth date schema:birthDate
Death date schema:deathDate
Gender schema:gender
Family name schema:familyName
Given name schema:givenName

4.2 PROV-O (Provenance Ontology)

PiCo Usage PROV-O Property
Observation derived from source prov:wasDerivedFrom
Reconstruction generated by activity prov:wasGeneratedBy
Attribution to researcher prov:wasAttributedTo
Revision tracking prov:wasRevisionOf
# Example: Reconstruction with provenance
<reconstruction/johannes-van-der-berg>
    a picom:PersonReconstruction ;
    prov:wasDerivedFrom <observation/baptism-1692-123> ;
    prov:wasDerivedFrom <observation/marriage-1715-456> ;
    prov:wasDerivedFrom <observation/burial-1756-789> ;
    prov:wasGeneratedBy <research-activity/cbg-2024-001> ;
    prov:wasAttributedTo <researcher/jan-jansen> .

4.3 BIO Vocabulary

PiCo Usage BIO Class/Property
Birth event bio:Birth
Death event bio:Death
Marriage bio:Marriage
Event date bio:date
Event place bio:place

4.4 PNV (Person Name Vocabulary)

PiCo uses PNV for structured name representation:

<observation/baptism-1692-123>
    picom:hasName [
        a pnv:PersonName ;
        pnv:givenName "Johannes" ;
        pnv:patronym "Pietersen" ;
        pnv:surnamePrefix "van der" ;
        pnv:baseSurname "Berg" ;
        pnv:literalName "Johannes Pietersen van der Berg"
    ] .

5. Person Name Vocabulary (PNV) Deep Dive

5.1 Background

PNV was developed to handle the complexity of Dutch historical names, but its patterns apply globally:

  • Patronymics: "Pietersen" (son of Pieter)
  • Surname prefixes: "van der", "de", "ten"
  • Multiple given names
  • Initials
  • Name changes over time

5.2 PNV Properties

Property Description Example
pnv:literalName Full name as single string "Johannes Pietersen van der Berg"
pnv:givenName First/given name(s) "Johannes"
pnv:patronym Patronymic name "Pietersen"
pnv:surnamePrefix Particles before surname "van der"
pnv:baseSurname Core family name "Berg"
pnv:surname Combined prefix + baseSurname "van der Berg"
pnv:initials Initials only "J.P."
pnv:infixTitle Title within name "graaf" (count)
pnv:disambiguatingDescription Distinguishing info "de oude" (the elder)

5.3 Name Complexity Examples

Dutch with patronymic:

[ a pnv:PersonName ;
  pnv:givenName "Jan" ;
  pnv:patronym "Hendrikszoon" ;
  pnv:surnamePrefix "van" ;
  pnv:baseSurname "Amstel" ;
  pnv:literalName "Jan Hendrikszoon van Amstel" ] .

Spanish with two family names:

[ a pnv:PersonName ;
  pnv:givenName "María" ;
  pnv:givenName "Elena" ;
  pnv:baseSurname "García" ;
  pnv:baseSurname "López" ;
  pnv:literalName "María Elena García López" ] .

Icelandic patronymic (no surname):

[ a pnv:PersonName ;
  pnv:givenName "Björk" ;
  pnv:patronym "Guðmundsdóttir" ;
  pnv:literalName "Björk Guðmundsdóttir" ] .

6. Handling Uncertainty

6.1 Date Uncertainty

PiCo allows flexibility in date representation:

# Exact date known
<observation/birth-1692>
    bio:date "1692-03-15"^^xsd:date .

# Only year known
<observation/birth-approx>
    bio:date "1692"^^xsd:gYear .

# Estimated from age at death
<observation/birth-estimated>
    picom:estimatedBirthYear "1692"^^xsd:gYear ;
    picom:birthYearEstimationMethod "calculated from age 64 at death in 1756" .

6.2 Uncertain Identity Linkage

When observations might refer to same person:

<observation/a> picom:possibleSameAs <observation/b> .
<observation/a> picom:certainSameAs <observation/c> .

6.3 Confidence Scores

PiCo supports confidence assertions:

<reconstruction/johannes>
    picom:hasConfidence [
        picom:confidenceValue 0.85 ;
        picom:confidenceMethod "probabilistic record linkage" ;
        picom:confidenceNote "High confidence based on matching name, date, and location"
    ] .

7. Role Modeling

7.1 Persons in Context

PiCo's name reflects its focus on persons in context - roles and relationships:

<observation/baptism-1692-123>
    picom:hasRole [
        a picom:Role ;
        picom:roleType "child" ;
        picom:roleContext <event/baptism-1692>
    ] ;
    picom:hasRole [
        a picom:Role ;
        picom:roleType "son" ;
        picom:roleInRelationTo <observation/pieter-father>
    ] .

7.2 Role Types for Heritage Sector

Role Type Context Example
archivist Institution employment "Chief archivist at Noord-Hollands Archief"
curator Collection management "Curator of Dutch Masters"
director Leadership "Museum director 2010-2020"
donor Collection contribution "Donated family papers in 1985"
researcher Academic work "Visiting researcher"
subject Collection content "Person depicted in portrait"

8. PPID Alignment with PiCo

8.1 Mapping PiCo to PPID

PiCo Concept PPID Implementation
picom:PersonObservation POID (Person Observation ID)
picom:PersonReconstruction PRID (Person Reconstruction ID)
prov:wasDerivedFrom Links PRID → POIDs
pnv:PersonName Structured name storage
picom:hasRole Role at heritage institution

8.2 Extended PPID Model

PPID extends PiCo for heritage custodian context:

@prefix ppid: <https://ppid.org/> .
@prefix picom: <https://personsincontext.org/model#> .
@prefix ghcid: <https://w3id.org/heritage/custodian/> .

# Person Observation (from LinkedIn)
ppid:POID-7a3b-c4d5-e6f7-8901 a picom:PersonObservation ;
    picom:hasName [
        pnv:givenName "Jan" ;
        pnv:baseSurname "Berg" ;
        pnv:literalName "Jan van den Berg"
    ] ;
    picom:hasRole [
        picom:roleType "Senior Archivist" ;
        picom:roleAtInstitution ghcid:NL-NH-HAA-A-NHA
    ] ;
    prov:wasDerivedFrom <https://linkedin.com/in/jan-van-den-berg> ;
    ppid:retrievedOn "2025-01-09"^^xsd:date .

# Person Observation (from institutional website)
ppid:POID-8b4c-d5e6-f7g8-9012 a picom:PersonObservation ;
    picom:hasName [
        pnv:givenName "J." ;
        pnv:surnamePrefix "van den" ;
        pnv:baseSurname "Berg" ;
        pnv:literalName "J. van den Berg"
    ] ;
    picom:hasRole [
        picom:roleType "Archivaris" ;
        picom:roleAtInstitution ghcid:NL-NH-HAA-A-NHA
    ] ;
    prov:wasDerivedFrom <https://noord-hollandsarchief.nl/over-ons/medewerkers> ;
    ppid:retrievedOn "2025-01-08"^^xsd:date .

# Person Reconstruction (curated identity)
ppid:PRID-1234-5678-90ab-cdef a picom:PersonReconstruction ;
    picom:hasName [
        pnv:givenName "Jan" ;
        pnv:surnamePrefix "van den" ;
        pnv:baseSurname "Berg" ;
        pnv:literalName "Jan van den Berg"
    ] ;
    prov:wasDerivedFrom ppid:POID-7a3b-c4d5-e6f7-8901 ;
    prov:wasDerivedFrom ppid:POID-8b4c-d5e6-f7g8-9012 ;
    prov:wasGeneratedBy [
        a prov:Activity ;
        prov:wasAssociatedWith <agent/ppid-matcher> ;
        prov:atTime "2025-01-09T10:30:00Z"^^xsd:dateTime
    ] ;
    ppid:employmentHistory [
        ppid:institution ghcid:NL-NH-HAA-A-NHA ;
        ppid:role "Senior Archivist" ;
        ppid:startDate "2015"^^xsd:gYear ;
        ppid:endDate "present"
    ] .

9. Implementation Considerations

9.1 When to Create POID vs PRID

Scenario Create
Extract person from LinkedIn POID
Extract person from institutional website POID
Extract person from archival document POID
Match multiple POIDs to single identity PRID
User claims "these are the same person" PRID linking POIDs

9.2 PRID Creation Rules

A PRID should be created when:

  1. Single authoritative source: One high-quality POID with comprehensive data
  2. Multiple matched POIDs: Algorithm or human determines multiple observations refer to same person
  3. External identifier exists: Person has ORCID, ISNI, or Wikidata ID

9.3 Handling Updates

# Original reconstruction
ppid:PRID-1234-5678-90ab-cdef a picom:PersonReconstruction ;
    prov:wasGeneratedAt "2025-01-09T10:30:00Z"^^xsd:dateTime .

# Updated reconstruction (new evidence)
ppid:PRID-1234-5678-90ab-cdef-v2 a picom:PersonReconstruction ;
    prov:wasRevisionOf ppid:PRID-1234-5678-90ab-cdef ;
    prov:wasDerivedFrom ppid:POID-7a3b-c4d5-e6f7-8901 ;
    prov:wasDerivedFrom ppid:POID-8b4c-d5e6-f7g8-9012 ;
    prov:wasDerivedFrom ppid:POID-new-observation ;  # New evidence
    prov:wasGeneratedAt "2025-01-15T14:00:00Z"^^xsd:dateTime .

10. Gaps in PiCo for PPID

While PiCo provides an excellent foundation, PPID needs extensions:

Gap PPID Extension
Web source provenance Add XPath, retrieval timestamp, HTML archival
Confidence scoring standards Define confidence scale and methods
Heritage sector roles Vocabulary for archivist, curator, director, etc.
Institution linking Integration with GHCID
Living person data protection GDPR-compliant access controls

These extensions are detailed in 07_claims_and_provenance.md and 08_implementation_guidelines.md.


11. References

Primary Sources

Academic Papers

  • Bloothooft, G., & Schraagen, M. (2015). "Learning name variants from true person resolution." Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference.