7.8 KiB
Rule 47: Disambiguation Entity Profiles - Prevent Repeated Entity Resolution Errors
Status: CRITICAL
Summary
When entity resolution determines that a web source describes a different person with a similar name, create a PPID profile for that person in data/person/. The PPID system is universal - ANY person who ever lived can have a profile, regardless of heritage relevance.
The Universal PPID Principle
In principle, all persons on Earth should be assigned PPIDs - whether or not they are active in the heritage field. This includes:
- Heritage workers (curators, archivists, librarians, etc.)
- Non-heritage professionals (actors, doctors, athletes, etc.)
- Historical persons (deceased individuals from any era)
- Public figures and private individuals
The heritage_relevance field indicates whether someone works in the heritage sector, but does NOT determine whether they can have a profile. Anyone can have a PPID.
The Problem
During entity resolution, we often discover that web search results describe a different person with a similar name:
| Heritage Profile | Namesake Discovered | Why Different |
|---|---|---|
| Carmen Juliá (UK curator) | Carmen Julia Álvarez (Venezuelan actress) | Different profession, location, timeline |
| Jan de Vries (Rijksmuseum curator) | Jan de Vries (footballer) | Different profession |
| Robert Ritter (heritage worker) | Robert Ritter (Nazi doctor, 1901-1951) | Different era, profession |
Without creating a profile for the namesake, future enrichment attempts may:
- Re-discover the same namesake
- Waste time re-investigating
- Risk attributing false claims again
The Solution: Create PPID Profiles for Namesakes
When entity resolution proves two entities are different, create a regular PPID profile for the namesake:
- Use standard PPID naming convention (no special prefix)
- Set
heritage_relevance.is_heritage_relevant: false - Document the disambiguation in BOTH profiles
Example: Venezuelan Actress Profile
{
"ppid": "ID_VE-XX-CCS_1952_VE-XX-CCS_XXXX_CARMEN-JULIA-ALVAREZ",
"profile_data": {
"full_name": "Carmen Julia Álvarez",
"profession": "actress",
"nationality": "Venezuelan",
"birth_year": 1952,
"birth_location": "Caracas, Venezuela",
"active_period": "1970s-2000s"
},
"heritage_relevance": {
"is_heritage_relevant": false,
"relevance_score": 0.0,
"reason": "Entertainment industry professional - actress in film and television"
},
"disambiguation_notes": {
"commonly_confused_with": [
{
"ppid": "ID_UK-XX-XXX_XXXX_UK-XX-XXX_XXXX_CARMEN-JULIA",
"name": "Carmen Juliá",
"profession": "curator",
"employer": "New Contemporaries",
"location": "UK",
"why_different": "Different profession (actress vs curator), different location (Venezuela vs UK), overlapping active periods in incompatible roles"
}
],
"disambiguation_note": "This is the Venezuelan actress, NOT the UK-based art curator."
},
"web_claims": [
{
"claim_type": "birth_year",
"claim_value": 1952,
"provenance": {
"source_url": "https://en.wikipedia.org/wiki/Carmen_Julia_Álvarez",
"retrieved_on": "2026-01-11T14:30:00Z",
"retrieval_agent": "manual-human-curator"
}
},
{
"claim_type": "profession",
"claim_value": "actress",
"provenance": {
"source_url": "https://en.wikipedia.org/wiki/Carmen_Julia_Álvarez",
"retrieved_on": "2026-01-11T14:30:00Z",
"retrieval_agent": "manual-human-curator"
}
}
],
"extraction_metadata": {
"created_at": "2026-01-11T15:00:00Z",
"created_by": "manual-human-curator",
"creation_reason": "Created during entity resolution to distinguish from heritage worker Carmen Juliá"
}
}
Update the Heritage Profile Too
The heritage profile should also reference the disambiguation:
{
"ppid": "ID_UK-XX-XXX_XXXX_UK-XX-XXX_XXXX_CARMEN-JULIA",
"profile_data": {
"full_name": "Carmen Juliá",
"headline": "Curator at New Contemporaries"
},
"heritage_relevance": {
"is_heritage_relevant": true,
"relevance_score": 0.85
},
"disambiguation_notes": {
"known_namesakes": [
{
"ppid": "ID_VE-XX-CCS_1952_VE-XX-CCS_XXXX_CARMEN-JULIA-ALVAREZ",
"name": "Carmen Julia Álvarez",
"profession": "actress",
"location": "Venezuela",
"why_not_same_person": "Different profession, location, timeline"
}
],
"disambiguation_warning": "Web searches for 'Carmen Julia' return data about Venezuelan actress Carmen Julia Álvarez (born 1952). This is a DIFFERENT person."
}
}
When to Create Namesake Profiles
Create a PPID profile for a namesake when:
- Entity resolution proves they are a different person
- They are notable enough to appear in search results repeatedly (Wikipedia, IMDB, news)
- The confusion risk is high (similar name, some overlapping attributes)
Do NOT create profiles for:
- Random social media accounts with no notable presence
- Obvious mismatches unlikely to recur in searches
Benefits
- Universal person database: Any person can have a PPID
- Prevents repeated mistakes: Future enrichment can check for known namesakes
- Bidirectional linking: Both profiles reference each other
- Consistent data model: No special file naming or profile types needed
- Audit trail: Documents why profiles were created
Workflow
Step 1: During Entity Resolution
When you reject a claim due to identity mismatch with a notable namesake:
1. Document WHY the source describes a different person
2. Check if the namesake is notable (Wikipedia, IMDB, frequent search results)
3. If notable → Create PPID profile for the namesake
4. Link both profiles via disambiguation_notes
Step 2: Create Namesake Profile
Use standard PPID naming:
ID_{birth-location}_{birth-decade}_{current-location}_{death-decade}_{NAME}.json
Example: ID_VE-XX-CCS_1952_VE-XX-CCS_XXXX_CARMEN-JULIA-ALVAREZ.json
Step 3: Update Both Profiles
- Namesake profile: Add
commonly_confused_withpointing to heritage profile - Heritage profile: Add
known_namesakespointing to namesake profile
Historical Persons
Historical persons (deceased) can also have PPID profiles:
{
"ppid": "ID_DE-XX-XXX_1901_DE-XX-XXX_1951_ROBERT-RITTER",
"profile_data": {
"full_name": "Robert Ritter",
"profession": "physician",
"birth_year": 1901,
"death_year": 1951,
"nationality": "German",
"historical_note": "Nazi-era physician involved in racial hygiene programs"
},
"heritage_relevance": {
"is_heritage_relevant": false,
"relevance_score": 0.0
},
"disambiguation_notes": {
"commonly_confused_with": [
{
"ppid": "ID_XX-XX-XXX_XXXX_XX-XX-XXX_XXXX_ROBERT-RITTER",
"name": "Robert Ritter",
"profession": "heritage worker",
"why_different": "Different era - historical figure (1901-1951) vs living heritage professional"
}
]
}
}
Related Rules
- Rule 46: Entity Resolution - Names Are NEVER Sufficient
- Rule 21: Data Fabrication is Strictly Prohibited
- Rule 26: Person Data Provenance - Web Claims for Staff Information
Summary
The PPID system is universal. When you discover during entity resolution that a web source describes a different person:
- Create a regular PPID profile for the namesake (actress, historical figure, etc.)
- Set
heritage_relevance.is_heritage_relevant: false(unless they happen to also work in heritage) - Link both profiles via
disambiguation_notes - Use standard PPID naming - no special prefixes needed
This builds a comprehensive person database while preventing entity resolution errors.