kempersc
40bd3cb8f5
data(custodian): add emic_name fields and remove duplicate files with name suffixes
...
- Add emic_name, name_language, and standardized_name to 1,781 custodian files
- Remove 2,239 duplicate files that had name suffixes in filename
- Consolidate data into base GHCID files per PID stability rules
- Part of UNESCO Memory of the World custodian enrichment
2025-12-08 14:57:34 +01:00
kempersc
f284e87d13
feat: add 24,963 heritage custodian records from global extraction
...
Major batch addition of heritage institution data:
- Japan: 12,077 institutions (libraries, museums, archives)
- Czechia: 6,760 institutions
- Switzerland: 2,390 institutions
- Belgium: 448 institutions
- Belarus: 257 institutions
- Austria: 249 institutions (with corrected GHCIDs)
- Argentina: 235 institutions (bibliotecas populares)
- Brazil: 155 institutions
- Mexico: 110 institutions
- Bulgaria: 98 institutions
- Chile: 83 institutions
- Egypt: 50 institutions
- And additional records from VN, NL, GE, KR, GB, FR, US, IN, etc.
All records include:
- Standardized GHCID identifiers (alphabetic-only abbreviations)
- GeoNames-resolved location data
- ISO 3166-2 region codes
- Provenance metadata with extraction timestamps
2025-12-07 14:24:48 +01:00