Commit graph

13 commits

Author SHA1 Message Date
kempersc
99430c2a70 add new entries and semantic routing 2025-12-17 10:11:56 +01:00
kempersc
52ae711c56 add timespans 2025-12-16 09:02:52 +01:00
kempersc
cb56aa7e40 enrich all custodian timespan 2025-12-15 22:31:41 +01:00
kempersc
d9892dba6f fix: handle single-vector Qdrant collections and multi-collection embedding dimensions
- Fixed _vector_search() to check uses_named_vectors() before adding 'using' parameter
- Fixed _person_vector_search() to detect person collection vector size and use appropriate model
- Resolves 'Not existing vector name error: openai_1536' for single-vector collections
- Resolves embedding dimension mismatch between heritage_custodians (1536-dim) and heritage_persons (384-dim)
2025-12-15 10:31:39 +01:00
kempersc
3820f2fc92 chore: Add data reports, infra scripts, and API updates
- Data quality reports for Dutch custodians
- Name mismatch detection reports
- Failed crawl URL tracking
- Caddy configuration updates
- Monitor script for chunk 404 errors
- API endpoint improvements
2025-12-15 01:48:08 +01:00
kempersc
c50c35fd3a enrich person custodian 2025-12-14 17:09:55 +01:00
kempersc
505c12601a Add test script for PiCo extraction from Arabic waqf documents
- Implemented a new script `test_pico_arabic_waqf.py` to test the GLM annotator's ability to extract person observations from Arabic historical documents.
- The script includes environment variable handling for API token, structured prompts for the GLM API, and validation of extraction results.
- Added comprehensive logging for API responses, extraction results, and validation errors.
- Included a sample Arabic waqf text for testing purposes, following the PiCo ontology pattern.
2025-12-12 17:50:17 +01:00
kempersc
b1f93b6f22 enrich person profiles 2025-12-12 12:51:10 +01:00
kempersc
1b1cfbfca0 enrich custodians 2025-12-11 22:32:09 +01:00
kempersc
b61271220b enrich entries 2025-12-09 10:46:43 +01:00
kempersc
bf7c773955 edit Japanese entries 2025-12-09 09:16:19 +01:00
kempersc
131e3ca259 normalise custodian entries 2025-12-09 07:56:35 +01:00
kempersc
1635625032 added web annotations 2025-12-06 19:50:04 +01:00