Commit graph

8 commits

Author SHA1 Message Date
kempersc
131e3ca259 normalise custodian entries 2025-12-09 07:56:35 +01:00
kempersc
0938cce6cf feat(loaders): update DuckLake and TypeDB loaders with relation support 2025-12-08 15:00:14 +01:00
kempersc
6a6557bbe8 feat(enrichment): add emic name enrichment and update CustodianName schema
- Add emic_name, name_language, standardized_name to CustodianName
- Add scripts for enriching custodian emic names from Wikidata
- Add YouTube and Google Maps enrichment scripts
- Update DuckLake loader for new schema fields
2025-12-08 14:58:50 +01:00
kempersc
7e3559f7e5 add new entries 2025-12-07 23:08:02 +01:00
kempersc
18874e6070 fix(scripts): normalize org_type codes in DuckLake loader
- Handle single-letter GLAM type codes (G, L, A, M, O, R, C, etc.)
- Handle legacy GRP.HER.* format
- Support compound types like 'M,F' -> 'MUSEUM,FEATURES'
- Fix type hint syntax for Python 3.10+
2025-12-07 19:21:14 +01:00
kempersc
90a1f20271 chore: add YAML history fix scripts and update ducklake/deploy tooling
- Add fix_yaml_history.py and fix_yaml_history_v2.py for cleaning up
  malformed ghcid_history entries with duplicate/redundant data
- Update load_custodians_to_ducklake.py for DuckDB lakehouse loading
- Update migrate_web_archives.py for web archive management
- Update deploy.sh with improvements
- Ignore entire data/ducklake/ directory (generated databases)
2025-12-07 18:45:52 +01:00
kempersc
ee4e57bc75 add new entries 2025-12-07 00:26:01 +01:00
kempersc
1635625032 added web annotations 2025-12-06 19:50:04 +01:00