11 KiB
Rule 44: PPID Birth Date Enrichment and Unknown Date Notation
Version: 1.0.0
Created: 2025-01-09
Status: ACTIVE
Related: PPID-GHCID Alignment | EDTF Specification
1. Summary
When birth/death dates are missing from person entity sources, agents MUST:
- Search for dates using Exa Search and Linkup tools
- Record all enrichment data as web claims with provenance
- If not found, use EDTF-compliant notation for estimated/unknown dates
- Never fabricate specific dates without source evidence
2. Enrichment Workflow
2.1 Required Search Before Using Unknown Notation
Before marking a date as unknown, agents MUST attempt enrichment:
Person Entity (missing birth_date)
↓
1. Search Exa: "{full_name} born birth date"
↓
2. Search Exa: "{full_name} {known_employer}"
↓
3. Search Linkup: "{full_name} biography"
↓
4. If found → Record as web_claim with provenance
↓
5. If NOT found → Use EDTF unknown notation
↓
6. Record enrichment_attempt in metadata
2.2 Enrichment Search Requirements
| Search Tool | Query Pattern | When to Use |
|---|---|---|
exa_web_search_exa |
"{name}" born birthday birth date year |
Primary search |
exa_linkedin_search_exa |
"{name}" at "{employer}" |
For work context |
linkup_linkup-search |
"{name}" biography personal |
Deep research |
2.3 Recording Successful Enrichment
When birth date is found, record as web claim:
web_claims:
- claim_type: birth_date
claim_value: "1985-03-15"
source_url: "https://example.org/person/bio"
retrieved_on: "2025-01-09T14:30:00Z"
retrieval_agent: "opencode-claude-sonnet-4"
confidence_score: 0.85
notes: "Found in biography section"
2.4 Recording Failed Enrichment Attempts
Always record that enrichment was attempted:
enrichment_metadata:
birth_date_search:
attempted: true
search_date: "2025-01-09T14:30:00Z"
search_agent: "opencode-claude-sonnet-4"
search_tools_used:
- exa_web_search_exa
- linkup_linkup-search
queries_tried:
- '"Jan van Berg" born birthday'
- '"Jan van Berg" biography'
result: "NOT_FOUND"
notes: "No publicly available birth date found after comprehensive search"
3. EDTF-Compliant Unknown Date Notation
3.1 Standard: Extended Date/Time Format (EDTF)
This project follows the Library of Congress EDTF Specification (ISO 8601-2:2019) for representing uncertain, approximate, and unspecified dates.
Key EDTF Characters:
| Character | Meaning | EDTF Level | Example |
|---|---|---|---|
X |
Unspecified digit | Level 1+ | 19XX = some year 1900-1999 |
~ |
Approximate (circa) | Level 1+ | 1985~ = circa 1985 |
? |
Uncertain | Level 1+ | 1985? = possibly 1985 |
% |
Uncertain AND approximate | Level 1+ | 1985% = possibly circa 1985 |
S |
Significant digits | Level 2 | 1950S2 = 1900-1999, estimated 1950 |
[..] |
One of set | Level 2 | [1970,1980] = either 1970 or 1980 |
{..} |
All of set | Level 2 | {1970..1980} = all years 1970-1980 |
3.2 Unspecified Date Components (X Notation)
Use X to replace unknown digits:
| Known Information | EDTF Format | Meaning |
|---|---|---|
| Only decade known (1970s) | 197X |
Some year 1970-1979 |
| Only century known (1900s) | 19XX |
Some year 1900-1999 |
| Year unknown entirely | XXXX |
Year unknown |
| Year known, month unknown | 1985-XX |
Some month in 1985 |
| Year+month known, day unknown | 1985-04-XX |
Some day in April 1985 |
| Year known, month+day unknown | 1985-XX-XX |
Some day in 1985 |
| Only decade and final digit known | 197X-XX-XX or use set |
1970-1979 |
3.3 Multiple Possible Decades (Set Notation)
When the decade is uncertain but constrained to specific options:
| Scenario | EDTF Format | Meaning |
|---|---|---|
| Born in 1970s OR 1980s | [197X,198X] |
One of: some year in 1970s or 1980s |
| Born in specific years | [1975,1985] |
Either 1975 or 1985 |
| Born 1970-1985 range | 1970/1985 |
Interval: between 1970 and 1985 |
3.4 Estimated Dates with Significant Digits
When you can estimate a year with confidence bounds:
1975S2 = Estimated 1975, significant to 2 digits (1900-1999)
1975S3 = Estimated 1975, significant to 3 digits (1970-1979)
This is useful when you can estimate based on career timeline (e.g., "started working 1998, likely born 1970s").
3.5 Living Persons - Birth Date Estimation
For living persons in LinkedIn data, estimate birth decade from:
- Graduation year (if available): Subtract ~22 years for bachelor's degree
- Career start (first job): Subtract ~22-25 years
- Current role seniority: "Senior" roles suggest 35+ years old
# Example: Person graduated 2010
birth_date_estimate:
edtf: "1988S2" # Estimated 1988, significant to 2 digits (1980-1999)
estimation_method: "graduation_year_inference"
estimation_basis: "Graduated bachelor's 2010, estimated birth ~1988"
confidence: 0.60
4. PPID Format with Unknown Dates
4.1 PPID Date Component Rules
The PPID format includes birth and death dates:
{TYPE}_{FL}_{FD}_{LL}_{LD}_{NT}
│ │
│ └── Last Date (death) - EDTF format
└── First Date (birth) - EDTF format
4.2 Examples with Unknown Components
| Scenario | PPID Example |
|---|---|
| All known | PID_NL-NH-AMS_1985-03-15_NL-NH-HAA_2020-08-22_JAN-BERG |
| Birth year only | ID_NL-NH-AMS_1985_XX-XX-XXX_XXXX_JAN-BERG |
| Birth decade only | ID_XX-XX-XXX_197X_XX-XX-XXX_XXXX_JAN-BERG |
| Nothing known | ID_XX-XX-XXX_XXXX_XX-XX-XXX_XXXX_JAN-BERG |
| Living person | ID_NL-NH-AMS_1985_XX-XX-XXX_XXXX_JAN-BERG |
4.3 Filename Safety
EDTF characters are filename-safe:
| Character | Filename Safe? | Notes |
|---|---|---|
X |
YES | Uppercase letter |
~ |
YES | Allowed on macOS/Linux/Windows |
? |
NO | Not allowed on Windows |
% |
CAUTION | URL encoding issues |
[ ] |
CAUTION | Shell escaping issues |
, |
YES | Allowed |
/ |
NO | Directory separator |
| |
CAUTION | Shell pipe, Windows disallowed |
Recommendation: For filenames, use only:
Xfor unknown digits~for approximate (suffix only)- Avoid
?,%,[],/,|in filenames
When set notation [..] is needed, store in metadata but use simplified form in filename:
- Filename:
ID_XX-XX-XXX_197X_...(simplified) - Metadata:
birth_date_edtf: "[1975,1985]"(full EDTF)
5. Decision Tree
┌─────────────────────────────────────────┐
│ Person entity missing birth_date │
└─────────────────┬───────────────────────┘
▼
┌─────────────────────────────────────────┐
│ Search Exa + Linkup for birth date │
└─────────────────┬───────────────────────┘
▼
┌───────┴───────┐
│ Date found? │
└───────┬───────┘
YES │ NO
▼ │ ▼
┌─────────────────┐ ┌─────────────────────────────┐
│ Record as │ │ Can estimate from career? │
│ web_claim with │ └───────────┬─────────────────┘
│ provenance │ YES │ NO
└─────────────────┘ ▼ │ ▼
┌───────────────┐ ┌───────────────┐
│ Use EDTF │ │ Use XXXX │
│ estimate: │ │ (unknown) │
│ 1988S2 or │ │ │
│ 198X │ │ │
└───────────────┘ └───────────────┘
6. Examples
6.1 Fully Unknown (No Enrichment Found)
# Person: Nora Ruijs (student, no public birth info)
ppid: ID_XX-XX-XXX_XXXX_XX-XX-XXX_XXXX_NORA-RUIJS
birth_date:
edtf: "XXXX"
precision: "unknown"
enrichment_metadata:
birth_date_search:
attempted: true
search_date: "2025-01-09T14:30:00Z"
result: "NOT_FOUND"
6.2 Decade Estimated from Career
# Person: Senior curator, started career 1995
ppid: ID_NL-NH-AMS_197X_XX-XX-XXX_XXXX_JAN-BERG
birth_date:
edtf: "197X"
edtf_full: "1972S3" # Estimated 1972, significant to 3 digits
precision: "decade"
estimation_method: "career_start_inference"
estimation_basis: "Career started 1995 as junior curator, estimated age 23"
6.3 Multiple Possible Decades
# Person: Could be born 1970s or 1980s based on conflicting sources
ppid: ID_XX-XX-XXX_197X_XX-XX-XXX_XXXX_MARIA-SILVA # Simplified for filename
birth_date:
edtf: "[197X,198X]" # Full EDTF with set notation
edtf_filename: "197X" # Simplified for filename (earlier estimate)
precision: "decade_uncertain"
notes: "Sources conflict: LinkedIn suggests 1980s, university bio suggests 1970s"
6.4 Exact Date Found via Enrichment
# Person: Birth date found on institutional bio page
ppid: ID_NL-NH-AMS_1985-03-15_XX-XX-XXX_XXXX_JAN-BERG
birth_date:
edtf: "1985-03-15"
precision: "day"
web_claims:
- claim_type: birth_date
claim_value: "1985-03-15"
source_url: "https://museum.nl/team/jan-berg"
retrieved_on: "2025-01-09T14:30:00Z"
retrieval_agent: "opencode-claude-sonnet-4"
7. Anti-Patterns
7.1 FORBIDDEN: Fabricating Dates
# WRONG - No source, no search attempted
birth_date:
edtf: "1985-03-15" # Where did this come from?!
7.2 FORBIDDEN: Using Non-EDTF Notation
# WRONG - Not EDTF compliant
birth_date: "197~8~" # Invalid notation
birth_date: "1970s" # Use 197X instead
birth_date: "circa 1985" # Use 1985~ instead
birth_date: "unknown" # Use XXXX instead
7.3 FORBIDDEN: Skipping Enrichment Search
# WRONG - No search attempted
birth_date:
edtf: "XXXX"
# No enrichment_metadata showing search was attempted!
8. Validation Rules
- Search Required: Cannot use
XXXXwithoutenrichment_metadata.birth_date_search.attempted: true - EDTF Compliance: All dates must parse as valid EDTF (use validator)
- Filename Safety: PPID filenames must avoid
?,%,[],/,| - Provenance Required: All found dates must have
web_claimswith source