kempersc/glam - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
kempersc	eaf80ec756	data(custodian): merge PENDING collision files into existing custodians Merge staff data from 7 PENDING files into their matching custodian records: - NL-XX-XXX-PENDING-SPOT-GRONINGEN → NL-GR-GRO-M-SG (SPOT Groningen, 120 staff) - NL-XX-XXX-PENDING-DIENST-UITVOERING-ONDERWIJS → NL-GR-GRO-O-DUO - NL-XX-XXX-PENDING-ANNE-FRANK-STICHTING → NL-NH-AMS-M-AFS - NL-XX-XXX-PENDING-ALLARD-PIERSON → NL-NH-AMS-M-AP - NL-XX-XXX-PENDING-STICHTING-JOODS-HISTORISCH-MUSEUM → NL-NH-AMS-M-JHM - NL-XX-XXX-PENDING-MINISTERIE-VAN-BUITENLANDSE-ZAKEN → NL-ZH-DHA-O-MBZ - NL-XX-XXX-PENDING-MINISTERIE-VAN-JUSTITIE-EN-VEILIGHEID → NL-ZH-DHA-O-MJV Originals archived in data/custodian/archive/pending_collisions_20250109/ Add scripts/merge_collision_files.py for reproducible merging	2026-01-09 18:33:00 +01:00
kempersc	bd06e4f864	data(custodian): merge 135 PENDING files into existing enriched records Merge data from PENDING files (with XX-XXX placeholders) into their corresponding enriched custodian records with proper GHCIDs. Countries affected: - DE: 4 institutions (Deutsche Stiftung, Jewish Museum Berlin, etc.) - ES: 1 institution (Biblioteca Nacional de España) - FR: 1 institution (NMO) - ID: 18 Indonesian museums and archives - NL: 111 Dutch institutions across all provinces - US: 1 institution (ARCA) The PENDING files are deleted after merge; originals archived in data/custodian/archive/pending_merged_20250109/	2026-01-09 18:25:56 +01:00
kempersc	14be18e7c4	feat(data): merge staff data from 30 more PENDING files into enriched custodians Batch 2 of PENDING file resolution: - Merged LinkedIn staff data from 30 PENDING files into matching enriched custodians - Archived processed PENDING files to data/custodian/archive/pending_merged_20250109/ - Notable merges: ASML (994 staff), BBB (117), Apenheul (100), BOEI (93) Files merged include: - Corporate: ASML, BOS Foundation, Constructing the Limes - Museums: Allard Pierson, Apenheul, various regional museums - Research: Catholic Documentation Centre, Creating Cultures of Care - Cultural orgs: Cultuur Ondernemen, CultuurOost, CultuurKwadraat This continues the effort to consolidate PENDING files (1283 remaining).	2026-01-09 15:42:32 +01:00
kempersc	1f723fd5d7	feat(data): merge staff data from 35 PENDING files into enriched custodians Merged LinkedIn-extracted staff sections from PENDING files into their corresponding proper GHCID custodian files. This consolidates data from two extraction sources: - Existing enriched files: Google Maps, Museum Register, YouTube, etc. - PENDING files: LinkedIn staff data extraction Files modified: - 28 custodian files enriched with staff data - 35 PENDING files deleted (merged into proper locations) - Originals archived to archive/pending_duplicates_20250109/ Key institutions enriched: - Rijksmuseum (NL-NH-AMS-M-RM) - Stedelijk Museum Amsterdam (NL-NH-AMS-M-SMA) - Amsterdam Museum (NL-NH-AMS-M-AM) - Regionaal Archief Alkmaar (NL-NH-ALK-A-RAA) - Maritiem Museum Rotterdam (NL-ZH-ROT-M-MMR) - And 23 more museums/archives across NL New scripts: - scripts/merge_staff_data.py: Automated staff data merger - scripts/categorize_pending_files.py: PENDING file analysis utility	2026-01-09 14:51:17 +01:00
kempersc	17a94613f3	data(custodian): resolve 57 PENDING files to proper GHCID locations Resolved NL-XX-XXX-PENDING files to proper regional GHCIDs: - 57 new files with proper location codes (city, region) - Cities include: Amsterdam, Rotterdam, Utrecht, Leiden, Groningen, etc. - 34 original PENDING files archived to archive/pending_duplicates_20250109/ Examples: - NL-XX-XXX-PENDING-AMSTERDAM-MUSEUM → NL-NH-AMS-M-AM (Amsterdam Museum) - NL-XX-XXX-PENDING-GRONINGEN-MUSEUM → NL-GR-GRO-M-GM (Groninger Museum) - NL-XX-XXX-PENDING-KUNSTHAL-ROTTERDAM → NL-ZH-ROT-G-KR (Kunsthal Rotterdam)	2026-01-09 12:19:19 +01:00
kempersc	349f31ae6f	enrich custodian profiles	2026-01-02 02:10:18 +01:00
kempersc	0c1d19e98b	enrich entries	2025-12-23 13:27:35 +01:00
kempersc	aca68ea47f	remove a,bihguous web-claims	2025-12-21 00:01:54 +01:00
kempersc	23b1d8ee5f	clean up GHCID	2025-12-17 11:58:40 +01:00
kempersc	cb56aa7e40	enrich all custodian timespan	2025-12-15 22:31:41 +01:00
kempersc	181b1cf705	data: enrich Dutch heritage custodians (DR, FL, FR, GE, GR, LI provinces) - Add digital platform discovery data with provenance - Cleanup duplicate/incorrect custodian entries - Add GHCID collision resolution suffixes where needed - Update person entity profiles with career history	2025-12-15 01:34:38 +01:00
kempersc	41959f0766	correct HCID!	2025-12-10 13:01:13 +01:00

12 commits