glam/data
kempersc 1f723fd5d7 feat(data): merge staff data from 35 PENDING files into enriched custodians
Merged LinkedIn-extracted staff sections from PENDING files into their
corresponding proper GHCID custodian files. This consolidates data from
two extraction sources:
- Existing enriched files: Google Maps, Museum Register, YouTube, etc.
- PENDING files: LinkedIn staff data extraction

Files modified:
- 28 custodian files enriched with staff data
- 35 PENDING files deleted (merged into proper locations)
- Originals archived to archive/pending_duplicates_20250109/

Key institutions enriched:
- Rijksmuseum (NL-NH-AMS-M-RM)
- Stedelijk Museum Amsterdam (NL-NH-AMS-M-SMA)
- Amsterdam Museum (NL-NH-AMS-M-AM)
- Regionaal Archief Alkmaar (NL-NH-ALK-A-RAA)
- Maritiem Museum Rotterdam (NL-ZH-ROT-M-MMR)
- And 23 more museums/archives across NL

New scripts:
- scripts/merge_staff_data.py: Automated staff data merger
- scripts/categorize_pending_files.py: PENDING file analysis utility
2026-01-09 14:51:17 +01:00
..
custodian feat(data): merge staff data from 35 PENDING files into enriched custodians 2026-01-09 14:51:17 +01:00
custodian.backup.20251230
custodian_sample
entity_annotation
examples
extracted
google_maps_enrichment
instances
intangible_heritage
isil
json
jsonld
manual_enrichment
museum_register_nl
nde
ontology
raw
rdf
reference
reports
review
test
training
unified
validation
web/lap_gaza_report_2024
wikidata
wikpedia/Destruction_of_cultural_heritage_during_the_Israeli_invasion_of_the_Gaza_Strip
collision_edge_case_analysis.md
deduplication_improvement_summary.md
dutch_collision_report.txt
dutch_collision_stats.json
dutch_deduplication_report.txt
dutch_institutions_with_ghcids.yaml
extraction_checkpoint.json
failed_crawl_urls.txt
failed_crawl_urls_round1_backup.txt
failed_crawl_urls_round3_backup.txt
failed_crawl_urls_round4.txt
ISIL-codes_2025-08-01.csv
linkedin_locations.json
mexican_geography_analysis.yaml
missing_annotations_checkpoint.json
NDE-logo-RGB-basis-nl-blauw.png
reenrich_queue.json
sparql_templates.yaml feat(rag): add database routing to 8 more factual query templates 2026-01-09 12:33:41 +01:00
temp_conv1_artifact2.md
temp_conv2_artifact1.md
temp_mexican_conv1.json
temp_mexican_conv2.json
unenriched_urls_round2.txt
xxx_matches.json