Commit graph

2 commits

Author SHA1 Message Date
kempersc
9a395f3dbe fix: improve birth year extraction to avoid date suffix false positives
- Skip YYYYMMDD and YYMMDD date patterns at end of email
- Skip digit sequences longer than 4 characters
- Require non-digit before 4-digit years at end
- Add knid.nl/kabelnoord.nl to consumer domains (Friesland ISP)
- Add 11 missing regional archive domains to HERITAGE_DOMAIN_MAP
- Update recalculation script to re-extract email semantics

Results:
- 3,151 false birth years removed
- 'Likely wrong person' reduced from 533 to 325 (-39%)
- 2,944 candidates' scores boosted
2026-01-13 22:37:10 +01:00
kempersc
92b490d690 edit slots 2026-01-13 20:35:11 +01:00