Commit graph

10 commits

Author SHA1 Message Date
kempersc
d64f857aa9 add sparql validator and RAG injector 2025-12-30 03:43:31 +01:00
kempersc
aca68ea47f remove a,bihguous web-claims 2025-12-21 00:01:54 +01:00
kempersc
cc61d99acf geocode: add coordinates to BG and EG custodian files
- BG: Add lat/lon from existing GeoNames IDs (28 files)
- EG: Map city codes to GeoNames (CAI→Cairo, ALX→Alexandria, etc.) (28 files)
- Fix malformed EG-IS-\`A\`-O-SCA.yaml → EG-IS-ISM-O-SCA.yaml
- Overall coverage: 96.4% → 96.6%
2025-12-09 21:59:58 +01:00
kempersc
d20978dcbe normalize: add canonical location blocks (batch 5) 2025-12-09 14:39:02 +01:00
kempersc
bb41287730 normalize: add canonical location blocks (batch 1) 2025-12-09 13:17:11 +01:00
kempersc
85a951bbea normalize: add canonical location blocks to 586 files
- Fixed 469 JP files missing location: blocks (had data in original_entry.locations)
- Fixed 117 additional JP files found in second pass
- 1 EG file skipped (no location source data available)
- Total files with location: blocks now 27,459 out of 27,511 (99.8%)
- Also includes YAML formatting standardization (line wrapping)

Recovery from data loss in commit 62fdd35321 is now complete.
2025-12-09 12:17:34 +01:00
kempersc
b61271220b enrich entries 2025-12-09 10:46:43 +01:00
kempersc
131e3ca259 normalise custodian entries 2025-12-09 07:56:35 +01:00
kempersc
0c4c378e06 fix(data): clean up YAML structure in BE/EG custodian files (450 files)
Remove redundant ch_annotator metadata and duplicate ghcid_history entries
that were causing YAML parsing issues. Files now have cleaner, more
consistent structure while preserving all essential data.
2025-12-07 18:46:42 +01:00
kempersc
f284e87d13 feat: add 24,963 heritage custodian records from global extraction
Major batch addition of heritage institution data:
- Japan: 12,077 institutions (libraries, museums, archives)
- Czechia: 6,760 institutions
- Switzerland: 2,390 institutions
- Belgium: 448 institutions
- Belarus: 257 institutions
- Austria: 249 institutions (with corrected GHCIDs)
- Argentina: 235 institutions (bibliotecas populares)
- Brazil: 155 institutions
- Mexico: 110 institutions
- Bulgaria: 98 institutions
- Chile: 83 institutions
- Egypt: 50 institutions
- And additional records from VN, NL, GE, KR, GB, FR, US, IN, etc.

All records include:
- Standardized GHCID identifiers (alphabetic-only abbreviations)
- GeoNames-resolved location data
- ISO 3166-2 region codes
- Provenance metadata with extraction timestamps
2025-12-07 14:24:48 +01:00