Commit graph

14 commits

Author SHA1 Message Date
kempersc
f9b950fa24 chore: ignore data/person/ directory (98K+ WCMS profiles) 2026-01-11 20:07:36 +01:00
kempersc
0a888ec682 chore: add node_modules to .gitignore and remove from tracking
- Add node_modules/ and .pnpm-store/ to .gitignore
- Remove 76k node_modules files from git tracking
- Update frontend manifest
2026-01-11 00:41:21 +01:00
kempersc
3eb097d92e data(person): enrich 64 person profiles with comprehensive metadata
- Add inferred birth dates using EDTF notation
- Add inferred birth/current settlements
- Enrich employment history with temporal data
- Add heritage sector relevance scores
- Improve PPID component tracking
- Update .gitignore with large file patterns (warc, nt, trix, geonames.db)
2026-01-11 00:38:09 +01:00
kempersc
349f31ae6f enrich custodian profiles 2026-01-02 02:10:18 +01:00
kempersc
0c1d19e98b enrich entries 2025-12-23 13:27:35 +01:00
kempersc
23b1d8ee5f clean up GHCID 2025-12-17 11:58:40 +01:00
kempersc
e0dd847491 extend ontology 2025-12-16 20:27:39 +01:00
kempersc
c50c35fd3a enrich person custodian 2025-12-14 17:09:55 +01:00
kempersc
b1f93b6f22 enrich person profiles 2025-12-12 12:51:10 +01:00
kempersc
90a1f20271 chore: add YAML history fix scripts and update ducklake/deploy tooling
- Add fix_yaml_history.py and fix_yaml_history_v2.py for cleaning up
  malformed ghcid_history entries with duplicate/redundant data
- Update load_custodians_to_ducklake.py for DuckDB lakehouse loading
- Update migrate_web_archives.py for web archive management
- Update deploy.sh with improvements
- Ignore entire data/ducklake/ directory (generated databases)
2025-12-07 18:45:52 +01:00
kempersc
0b06af0fb6 chore: mark unused function and ignore ducklake databases 2025-12-07 14:28:12 +01:00
kempersc
d661947830 update enriched entries 2025-12-03 17:38:46 +01:00
kempersc
f3c149b1bb update entries 2025-11-30 23:30:29 +01:00
kempersc
3c80de87e0 add isil entries 2025-11-19 23:25:22 +01:00