Commit graph

10 commits

Author SHA1 Message Date
kempersc
90b402dba6 enrich AR en Czech files 2025-12-30 23:01:01 +01:00
kempersc
d64f857aa9 add sparql validator and RAG injector 2025-12-30 03:43:31 +01:00
kempersc
84904e344b Make AGENTS more succint by referring to opencode rules & enrich custodians 2025-12-28 14:56:35 +01:00
kempersc
7b42d720d5 geocode: add coordinates to CZ, BY, CH, FR, ES custodian files from GeoNames (1145 files) 2025-12-09 16:41:41 +01:00
kempersc
d20978dcbe normalize: add canonical location blocks (batch 5) 2025-12-09 14:39:02 +01:00
kempersc
bb41287730 normalize: add canonical location blocks (batch 1) 2025-12-09 13:17:11 +01:00
kempersc
85a951bbea normalize: add canonical location blocks to 586 files
- Fixed 469 JP files missing location: blocks (had data in original_entry.locations)
- Fixed 117 additional JP files found in second pass
- 1 EG file skipped (no location source data available)
- Total files with location: blocks now 27,459 out of 27,511 (99.8%)
- Also includes YAML formatting standardization (line wrapping)

Recovery from data loss in commit 62fdd35321 is now complete.
2025-12-09 12:17:34 +01:00
kempersc
b61271220b enrich entries 2025-12-09 10:46:43 +01:00
kempersc
131e3ca259 normalise custodian entries 2025-12-09 07:56:35 +01:00
kempersc
40bd3cb8f5 data(custodian): add emic_name fields and remove duplicate files with name suffixes
- Add emic_name, name_language, and standardized_name to 1,781 custodian files
- Remove 2,239 duplicate files that had name suffixes in filename
- Consolidate data into base GHCID files per PID stability rules
- Part of UNESCO Memory of the World custodian enrichment
2025-12-08 14:57:34 +01:00