Commit graph

8 commits

Author SHA1 Message Date
kempersc
bc6ad46bfa enrich CZ and JP profiles 2025-12-30 23:03:03 +01:00
kempersc
d64f857aa9 add sparql validator and RAG injector 2025-12-30 03:43:31 +01:00
kempersc
59963c8d3f Logo enrichment batch: JP+300, CZ-0 - 12,833 files (40.4%)
- JP: 4,496 processed (37.2% of 12,096)  COMPLETE
- CZ: 2,820 processed (33.4% of 8,432) - batch completed, slight decrease
- CH, NL, BE, AT, BR: 100% complete
- Total: 12,833 of 31,772 files (40.4%)
- Using crawl4ai favicon extraction
2025-12-26 13:42:21 +01:00
kempersc
cab712659d recover location blocks 2025-12-09 11:34:56 +01:00
kempersc
62fdd35321 Refactor code structure for improved readability and maintainability 2025-12-09 11:15:51 +01:00
kempersc
bf7c773955 edit Japanese entries 2025-12-09 09:16:19 +01:00
kempersc
131e3ca259 normalise custodian entries 2025-12-09 07:56:35 +01:00
kempersc
f284e87d13 feat: add 24,963 heritage custodian records from global extraction
Major batch addition of heritage institution data:
- Japan: 12,077 institutions (libraries, museums, archives)
- Czechia: 6,760 institutions
- Switzerland: 2,390 institutions
- Belgium: 448 institutions
- Belarus: 257 institutions
- Austria: 249 institutions (with corrected GHCIDs)
- Argentina: 235 institutions (bibliotecas populares)
- Brazil: 155 institutions
- Mexico: 110 institutions
- Bulgaria: 98 institutions
- Chile: 83 institutions
- Egypt: 50 institutions
- And additional records from VN, NL, GE, KR, GB, FR, US, IN, etc.

All records include:
- Standardized GHCID identifiers (alphabetic-only abbreviations)
- GeoNames-resolved location data
- ISO 3166-2 region codes
- Provenance metadata with extraction timestamps
2025-12-07 14:24:48 +01:00