2.2 KiB
2.2 KiB
Person Enrichment Cleanup Report
Date: 2026-01-11 02:53
Summary
Critical entity resolution failures were discovered in person profile enrichment. The enrichment script was attributing data from different people with similar names to our person profiles.
Issues Found
| Issue | Count | Example |
|---|---|---|
| Birth years from wrong persons | 122 | Carmen Juliá born 1952 (was Venezuelan actress, not UK curator) |
| Wikipedia articles about different people | 42 | Robert Ritter = Nazi doctor, not heritage worker |
| Genealogy sources (historical namesakes) | 8 | Birth year 1922 from geni.com |
| ResearchGate/Academia from wrong researchers | 80+ | Carmen Julia Navarro = hydrogeologist, not curator |
| Social media from random accounts | 150+ | Instagram accounts of different people |
Cleanup Actions
- Removed all enriched birth_year claims (234 claims, 124 files)
- Removed all social_connection claims (spouse, family from Wikipedia)
- Removed all social_media_content claims (Instagram follower counts)
- Removed claims from high-risk sources (Wikipedia, IMDB, ResearchGate, Academia.edu, Instagram, TikTok)
Total claims removed: 540+ claims
Rule 46 Added
Added new critical rule to prevent future entity resolution failures:
Rule 46: Entity Resolution - Names Are NEVER Sufficient
Key requirements:
- Similar or identical names are NEVER sufficient for entity resolution
- At least 3 of 5 identity attributes must match (career, employer, location, age, education)
- Any conflicting signal (e.g., "actress" vs "curator") = automatic rejection
- Genealogy sites = ALWAYS reject
- Wikipedia = reject unless 4/5 attributes match
Files Modified
.opencode/rules/entity-resolution-no-heuristics.md- Enhanced with stricter requirementsAGENTS.md- Added Rule 46 summarydata/person/_birth_year_removal_log.json- Audit trail of removed claims- 207 person profile files cleaned
Remaining Work
The remaining 609 enriched claims (position, education, hobby, award) may still have entity resolution issues but are lower risk. Future enrichment MUST implement the entity resolution validation in Rule 46.