glam/.opencode/rules/ppid-birth-date-enrichment-rule.md
2026-01-09 18:26:58 +01:00

11 KiB

Rule 44: PPID Birth Date Enrichment and Unknown Date Notation

Version: 1.0.0
Created: 2025-01-09
Status: ACTIVE
Related: PPID-GHCID Alignment | EDTF Specification


1. Summary

When birth/death dates are missing from person entity sources, agents MUST:

  1. Search for dates using Exa Search and Linkup tools
  2. Record all enrichment data as web claims with provenance
  3. If not found, use EDTF-compliant notation for estimated/unknown dates
  4. Never fabricate specific dates without source evidence

2. Enrichment Workflow

2.1 Required Search Before Using Unknown Notation

Before marking a date as unknown, agents MUST attempt enrichment:

Person Entity (missing birth_date)
    ↓
1. Search Exa: "{full_name} born birth date"
    ↓
2. Search Exa: "{full_name} {known_employer}"
    ↓
3. Search Linkup: "{full_name} biography"
    ↓
4. If found → Record as web_claim with provenance
    ↓
5. If NOT found → Use EDTF unknown notation
    ↓
6. Record enrichment_attempt in metadata

2.2 Enrichment Search Requirements

Search Tool Query Pattern When to Use
exa_web_search_exa "{name}" born birthday birth date year Primary search
exa_linkedin_search_exa "{name}" at "{employer}" For work context
linkup_linkup-search "{name}" biography personal Deep research

2.3 Recording Successful Enrichment

When birth date is found, record as web claim:

web_claims:
  - claim_type: birth_date
    claim_value: "1985-03-15"
    source_url: "https://example.org/person/bio"
    retrieved_on: "2025-01-09T14:30:00Z"
    retrieval_agent: "opencode-claude-sonnet-4"
    confidence_score: 0.85
    notes: "Found in biography section"

2.4 Recording Failed Enrichment Attempts

Always record that enrichment was attempted:

enrichment_metadata:
  birth_date_search:
    attempted: true
    search_date: "2025-01-09T14:30:00Z"
    search_agent: "opencode-claude-sonnet-4"
    search_tools_used:
      - exa_web_search_exa
      - linkup_linkup-search
    queries_tried:
      - '"Jan van Berg" born birthday'
      - '"Jan van Berg" biography'
    result: "NOT_FOUND"
    notes: "No publicly available birth date found after comprehensive search"

3. EDTF-Compliant Unknown Date Notation

3.1 Standard: Extended Date/Time Format (EDTF)

This project follows the Library of Congress EDTF Specification (ISO 8601-2:2019) for representing uncertain, approximate, and unspecified dates.

Key EDTF Characters:

Character Meaning EDTF Level Example
X Unspecified digit Level 1+ 19XX = some year 1900-1999
~ Approximate (circa) Level 1+ 1985~ = circa 1985
? Uncertain Level 1+ 1985? = possibly 1985
% Uncertain AND approximate Level 1+ 1985% = possibly circa 1985
S Significant digits Level 2 1950S2 = 1900-1999, estimated 1950
[..] One of set Level 2 [1970,1980] = either 1970 or 1980
{..} All of set Level 2 {1970..1980} = all years 1970-1980

3.2 Unspecified Date Components (X Notation)

Use X to replace unknown digits:

Known Information EDTF Format Meaning
Only decade known (1970s) 197X Some year 1970-1979
Only century known (1900s) 19XX Some year 1900-1999
Year unknown entirely XXXX Year unknown
Year known, month unknown 1985-XX Some month in 1985
Year+month known, day unknown 1985-04-XX Some day in April 1985
Year known, month+day unknown 1985-XX-XX Some day in 1985
Only decade and final digit known 197X-XX-XX or use set 1970-1979

3.3 Multiple Possible Decades (Set Notation)

When the decade is uncertain but constrained to specific options:

Scenario EDTF Format Meaning
Born in 1970s OR 1980s [197X,198X] One of: some year in 1970s or 1980s
Born in specific years [1975,1985] Either 1975 or 1985
Born 1970-1985 range 1970/1985 Interval: between 1970 and 1985

3.4 Estimated Dates with Significant Digits

When you can estimate a year with confidence bounds:

1975S2  = Estimated 1975, significant to 2 digits (1900-1999)
1975S3  = Estimated 1975, significant to 3 digits (1970-1979)

This is useful when you can estimate based on career timeline (e.g., "started working 1998, likely born 1970s").

3.5 Living Persons - Birth Date Estimation

For living persons in LinkedIn data, estimate birth decade from:

  1. Graduation year (if available): Subtract ~22 years for bachelor's degree
  2. Career start (first job): Subtract ~22-25 years
  3. Current role seniority: "Senior" roles suggest 35+ years old
# Example: Person graduated 2010
birth_date_estimate:
  edtf: "1988S2"  # Estimated 1988, significant to 2 digits (1980-1999)
  estimation_method: "graduation_year_inference"
  estimation_basis: "Graduated bachelor's 2010, estimated birth ~1988"
  confidence: 0.60

4. PPID Format with Unknown Dates

4.1 PPID Date Component Rules

The PPID format includes birth and death dates:

{TYPE}_{FL}_{FD}_{LL}_{LD}_{NT}
              │       │
              │       └── Last Date (death) - EDTF format
              └── First Date (birth) - EDTF format

4.2 Examples with Unknown Components

Scenario PPID Example
All known PID_NL-NH-AMS_1985-03-15_NL-NH-HAA_2020-08-22_JAN-BERG
Birth year only ID_NL-NH-AMS_1985_XX-XX-XXX_XXXX_JAN-BERG
Birth decade only ID_XX-XX-XXX_197X_XX-XX-XXX_XXXX_JAN-BERG
Nothing known ID_XX-XX-XXX_XXXX_XX-XX-XXX_XXXX_JAN-BERG
Living person ID_NL-NH-AMS_1985_XX-XX-XXX_XXXX_JAN-BERG

4.3 Filename Safety

EDTF characters are filename-safe:

Character Filename Safe? Notes
X YES Uppercase letter
~ YES Allowed on macOS/Linux/Windows
? NO Not allowed on Windows
% CAUTION URL encoding issues
[ ] CAUTION Shell escaping issues
, YES Allowed
/ NO Directory separator
| CAUTION Shell pipe, Windows disallowed

Recommendation: For filenames, use only:

  • X for unknown digits
  • ~ for approximate (suffix only)
  • Avoid ?, %, [], /, | in filenames

When set notation [..] is needed, store in metadata but use simplified form in filename:

  • Filename: ID_XX-XX-XXX_197X_... (simplified)
  • Metadata: birth_date_edtf: "[1975,1985]" (full EDTF)

5. Decision Tree

┌─────────────────────────────────────────┐
│ Person entity missing birth_date        │
└─────────────────┬───────────────────────┘
                  ▼
┌─────────────────────────────────────────┐
│ Search Exa + Linkup for birth date      │
└─────────────────┬───────────────────────┘
                  ▼
          ┌───────┴───────┐
          │ Date found?   │
          └───────┬───────┘
        YES       │       NO
          ▼       │       ▼
┌─────────────────┐   ┌─────────────────────────────┐
│ Record as       │   │ Can estimate from career?   │
│ web_claim with  │   └───────────┬─────────────────┘
│ provenance      │         YES   │   NO
└─────────────────┘           ▼   │   ▼
                  ┌───────────────┐   ┌───────────────┐
                  │ Use EDTF      │   │ Use XXXX      │
                  │ estimate:     │   │ (unknown)     │
                  │ 1988S2 or     │   │               │
                  │ 198X          │   │               │
                  └───────────────┘   └───────────────┘

6. Examples

6.1 Fully Unknown (No Enrichment Found)

# Person: Nora Ruijs (student, no public birth info)
ppid: ID_XX-XX-XXX_XXXX_XX-XX-XXX_XXXX_NORA-RUIJS

birth_date:
  edtf: "XXXX"
  precision: "unknown"
  
enrichment_metadata:
  birth_date_search:
    attempted: true
    search_date: "2025-01-09T14:30:00Z"
    result: "NOT_FOUND"

6.2 Decade Estimated from Career

# Person: Senior curator, started career 1995
ppid: ID_NL-NH-AMS_197X_XX-XX-XXX_XXXX_JAN-BERG

birth_date:
  edtf: "197X"
  edtf_full: "1972S3"  # Estimated 1972, significant to 3 digits
  precision: "decade"
  estimation_method: "career_start_inference"
  estimation_basis: "Career started 1995 as junior curator, estimated age 23"

6.3 Multiple Possible Decades

# Person: Could be born 1970s or 1980s based on conflicting sources
ppid: ID_XX-XX-XXX_197X_XX-XX-XXX_XXXX_MARIA-SILVA  # Simplified for filename

birth_date:
  edtf: "[197X,198X]"  # Full EDTF with set notation
  edtf_filename: "197X"  # Simplified for filename (earlier estimate)
  precision: "decade_uncertain"
  notes: "Sources conflict: LinkedIn suggests 1980s, university bio suggests 1970s"

6.4 Exact Date Found via Enrichment

# Person: Birth date found on institutional bio page
ppid: ID_NL-NH-AMS_1985-03-15_XX-XX-XXX_XXXX_JAN-BERG

birth_date:
  edtf: "1985-03-15"
  precision: "day"
  
web_claims:
  - claim_type: birth_date
    claim_value: "1985-03-15"
    source_url: "https://museum.nl/team/jan-berg"
    retrieved_on: "2025-01-09T14:30:00Z"
    retrieval_agent: "opencode-claude-sonnet-4"

7. Anti-Patterns

7.1 FORBIDDEN: Fabricating Dates

# WRONG - No source, no search attempted
birth_date:
  edtf: "1985-03-15"  # Where did this come from?!

7.2 FORBIDDEN: Using Non-EDTF Notation

# WRONG - Not EDTF compliant
birth_date: "197~8~"      # Invalid notation
birth_date: "1970s"       # Use 197X instead
birth_date: "circa 1985"  # Use 1985~ instead
birth_date: "unknown"     # Use XXXX instead
# WRONG - No search attempted
birth_date:
  edtf: "XXXX"
  # No enrichment_metadata showing search was attempted!

8. Validation Rules

  1. Search Required: Cannot use XXXX without enrichment_metadata.birth_date_search.attempted: true
  2. EDTF Compliance: All dates must parse as valid EDTF (use validator)
  3. Filename Safety: PPID filenames must avoid ?, %, [], /, |
  4. Provenance Required: All found dates must have web_claims with source

9. References