Commit graph

3 commits

Author SHA1 Message Date
kempersc
9339de2cfb data(person): process 44,512 heritage-relevant profiles from entity extractions
Processing Summary:
- Scanned 94,716 LinkedIn entity files
- Identified 44,512 heritage-relevant individuals (47%)
- Created 1,430 new PPID-formatted profiles
- Updated 43,070 existing profiles with entity data
- Final count: 40,731 person profiles

Profile updates include:
- Merged web_claims with full provenance
- Added/updated heritage_relevance scoring
- Added affiliation data with custodian references
- Added inferred birth decades with provenance chains (Rule 45)

All data preserved per Rule 5 (additive only)
2026-01-10 14:01:29 +01:00
kempersc
855fff5962 data(person): resolve PPID locations and enrich profiles
- Rename 512 person files from XX-XX-XXX placeholders to proper GeoNames locations
- Update 2,463 profiles with enriched data
- Add 512 new person profiles (AU, international heritage professionals)
- PPID format: ID_{birth-loc}_{decade}_{work-loc}_{custodian}_{NAME}
2026-01-09 21:09:28 +01:00
kempersc
9e67d0f967 enrich profiles 2026-01-09 20:35:19 +01:00