Merge data from PENDING files (with XX-XXX placeholders) into their
corresponding enriched custodian records with proper GHCIDs.
Countries affected:
- DE: 4 institutions (Deutsche Stiftung, Jewish Museum Berlin, etc.)
- ES: 1 institution (Biblioteca Nacional de España)
- FR: 1 institution (NMO)
- ID: 18 Indonesian museums and archives
- NL: 111 Dutch institutions across all provinces
- US: 1 institution (ARCA)
The PENDING files are deleted after merge; originals archived in
data/custodian/archive/pending_merged_20250109/
Merged LinkedIn-extracted staff sections from PENDING files into their
corresponding proper GHCID custodian files. This consolidates data from
two extraction sources:
- Existing enriched files: Google Maps, Museum Register, YouTube, etc.
- PENDING files: LinkedIn staff data extraction
Files modified:
- 28 custodian files enriched with staff data
- 35 PENDING files deleted (merged into proper locations)
- Originals archived to archive/pending_duplicates_20250109/
Key institutions enriched:
- Rijksmuseum (NL-NH-AMS-M-RM)
- Stedelijk Museum Amsterdam (NL-NH-AMS-M-SMA)
- Amsterdam Museum (NL-NH-AMS-M-AM)
- Regionaal Archief Alkmaar (NL-NH-ALK-A-RAA)
- Maritiem Museum Rotterdam (NL-ZH-ROT-M-MMR)
- And 23 more museums/archives across NL
New scripts:
- scripts/merge_staff_data.py: Automated staff data merger
- scripts/categorize_pending_files.py: PENDING file analysis utility
- Add digital platform discovery data with provenance
- Cleanup duplicate/incorrect custodian entries
- Add GHCID collision resolution suffixes where needed
- Update person entity profiles with career history