# Rule 47: Disambiguation Entity Profiles - Prevent Repeated Entity Resolution Errors ## Status: CRITICAL ## Summary When entity resolution determines that a web source describes a **different person** with a similar name, **create a PPID profile for that person** in `data/person/`. The PPID system is universal - ANY person who ever lived can have a profile, regardless of heritage relevance. --- ## The Universal PPID Principle **In principle, all persons on Earth should be assigned PPIDs** - whether or not they are active in the heritage field. This includes: - Heritage workers (curators, archivists, librarians, etc.) - Non-heritage professionals (actors, doctors, athletes, etc.) - Historical persons (deceased individuals from any era) - Public figures and private individuals The `heritage_relevance` field indicates whether someone works in the heritage sector, but does NOT determine whether they can have a profile. **Anyone can have a PPID.** --- ## The Problem During entity resolution, we often discover that web search results describe a **different person** with a similar name: | Heritage Profile | Namesake Discovered | Why Different | |------------------|---------------------|---------------| | Carmen Juliá (UK curator) | Carmen Julia Álvarez (Venezuelan actress) | Different profession, location, timeline | | Jan de Vries (Rijksmuseum curator) | Jan de Vries (footballer) | Different profession | | Robert Ritter (heritage worker) | Robert Ritter (Nazi doctor, 1901-1951) | Different era, profession | Without creating a profile for the namesake, future enrichment attempts may: 1. Re-discover the same namesake 2. Waste time re-investigating 3. Risk attributing false claims again --- ## The Solution: Create PPID Profiles for Namesakes When entity resolution proves two entities are different, **create a regular PPID profile for the namesake**: 1. Use standard PPID naming convention (no special prefix) 2. Set `heritage_relevance.is_heritage_relevant: false` 3. Document the disambiguation in BOTH profiles --- ## Example: Venezuelan Actress Profile ```json { "ppid": "ID_VE-XX-CCS_1952_VE-XX-CCS_XXXX_CARMEN-JULIA-ALVAREZ", "profile_data": { "full_name": "Carmen Julia Álvarez", "profession": "actress", "nationality": "Venezuelan", "birth_year": 1952, "birth_location": "Caracas, Venezuela", "active_period": "1970s-2000s" }, "heritage_relevance": { "is_heritage_relevant": false, "relevance_score": 0.0, "reason": "Entertainment industry professional - actress in film and television" }, "disambiguation_notes": { "commonly_confused_with": [ { "ppid": "ID_UK-XX-XXX_XXXX_UK-XX-XXX_XXXX_CARMEN-JULIA", "name": "Carmen Juliá", "profession": "curator", "employer": "New Contemporaries", "location": "UK", "why_different": "Different profession (actress vs curator), different location (Venezuela vs UK), overlapping active periods in incompatible roles" } ], "disambiguation_note": "This is the Venezuelan actress, NOT the UK-based art curator." }, "web_claims": [ { "claim_type": "birth_year", "claim_value": 1952, "provenance": { "source_url": "https://en.wikipedia.org/wiki/Carmen_Julia_Álvarez", "retrieved_on": "2026-01-11T14:30:00Z", "retrieval_agent": "manual-human-curator" } }, { "claim_type": "profession", "claim_value": "actress", "provenance": { "source_url": "https://en.wikipedia.org/wiki/Carmen_Julia_Álvarez", "retrieved_on": "2026-01-11T14:30:00Z", "retrieval_agent": "manual-human-curator" } } ], "extraction_metadata": { "created_at": "2026-01-11T15:00:00Z", "created_by": "manual-human-curator", "creation_reason": "Created during entity resolution to distinguish from heritage worker Carmen Juliá" } } ``` --- ## Update the Heritage Profile Too The heritage profile should also reference the disambiguation: ```json { "ppid": "ID_UK-XX-XXX_XXXX_UK-XX-XXX_XXXX_CARMEN-JULIA", "profile_data": { "full_name": "Carmen Juliá", "headline": "Curator at New Contemporaries" }, "heritage_relevance": { "is_heritage_relevant": true, "relevance_score": 0.85 }, "disambiguation_notes": { "known_namesakes": [ { "ppid": "ID_VE-XX-CCS_1952_VE-XX-CCS_XXXX_CARMEN-JULIA-ALVAREZ", "name": "Carmen Julia Álvarez", "profession": "actress", "location": "Venezuela", "why_not_same_person": "Different profession, location, timeline" } ], "disambiguation_warning": "Web searches for 'Carmen Julia' return data about Venezuelan actress Carmen Julia Álvarez (born 1952). This is a DIFFERENT person." } } ``` --- ## When to Create Namesake Profiles Create a PPID profile for a namesake when: 1. **Entity resolution proves they are a different person** 2. **They are notable enough** to appear in search results repeatedly (Wikipedia, IMDB, news) 3. **The confusion risk is high** (similar name, some overlapping attributes) **Do NOT create profiles for**: - Random social media accounts with no notable presence - Obvious mismatches unlikely to recur in searches --- ## Benefits 1. **Universal person database**: Any person can have a PPID 2. **Prevents repeated mistakes**: Future enrichment can check for known namesakes 3. **Bidirectional linking**: Both profiles reference each other 4. **Consistent data model**: No special file naming or profile types needed 5. **Audit trail**: Documents why profiles were created --- ## Workflow ### Step 1: During Entity Resolution When you reject a claim due to identity mismatch with a notable namesake: ``` 1. Document WHY the source describes a different person 2. Check if the namesake is notable (Wikipedia, IMDB, frequent search results) 3. If notable → Create PPID profile for the namesake 4. Link both profiles via disambiguation_notes ``` ### Step 2: Create Namesake Profile Use standard PPID naming: ``` ID_{birth-location}_{birth-decade}_{current-location}_{death-decade}_{NAME}.json ``` Example: `ID_VE-XX-CCS_1952_VE-XX-CCS_XXXX_CARMEN-JULIA-ALVAREZ.json` ### Step 3: Update Both Profiles - Namesake profile: Add `commonly_confused_with` pointing to heritage profile - Heritage profile: Add `known_namesakes` pointing to namesake profile --- ## Historical Persons Historical persons (deceased) can also have PPID profiles: ```json { "ppid": "ID_DE-XX-XXX_1901_DE-XX-XXX_1951_ROBERT-RITTER", "profile_data": { "full_name": "Robert Ritter", "profession": "physician", "birth_year": 1901, "death_year": 1951, "nationality": "German", "historical_note": "Nazi-era physician involved in racial hygiene programs" }, "heritage_relevance": { "is_heritage_relevant": false, "relevance_score": 0.0 }, "disambiguation_notes": { "commonly_confused_with": [ { "ppid": "ID_XX-XX-XXX_XXXX_XX-XX-XXX_XXXX_ROBERT-RITTER", "name": "Robert Ritter", "profession": "heritage worker", "why_different": "Different era - historical figure (1901-1951) vs living heritage professional" } ] } } ``` --- ## Related Rules - **Rule 46**: Entity Resolution - Names Are NEVER Sufficient - **Rule 21**: Data Fabrication is Strictly Prohibited - **Rule 26**: Person Data Provenance - Web Claims for Staff Information --- ## Summary **The PPID system is universal.** When you discover during entity resolution that a web source describes a different person: 1. **Create a regular PPID profile** for the namesake (actress, historical figure, etc.) 2. **Set `heritage_relevance.is_heritage_relevant: false`** (unless they happen to also work in heritage) 3. **Link both profiles** via `disambiguation_notes` 4. **Use standard PPID naming** - no special prefixes needed This builds a comprehensive person database while preventing entity resolution errors.