kempersc/glam - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
kempersc	cb56aa7e40	enrich all custodian timespan	2025-12-15 22:31:41 +01:00
kempersc	68c5aa2724	feat(api): Add heritage person classification and RAG retry logic - Add GLAMORCUBESFIXPHDNT heritage type detection for person profiles - Two-stage classification: blocklist non-heritage orgs, then match keywords - Special handling for Digital (D) type: requires heritage org context - Add career_history heritage_relevant and heritage_type fields - Add exponential backoff retry for Anthropic API overload errors - Fix DSPy 3.x async context with dspy.context() wrapper	2025-12-15 01:31:54 +01:00
kempersc	c6aee998db	correct person labels	2025-12-14 17:29:39 +01:00
kempersc	c50c35fd3a	enrich person custodian	2025-12-14 17:09:55 +01:00
kempersc	d1c9aebd84	feat(rag): Add hybrid language detection and enhanced ontology mapping Implement Heritage RAG pipeline enhancements: 1. Ontology Mapping (new file: ontology_mapping.py) - Hybrid language detection: heritage vocabulary -> fast-langdetect -> English default - HERITAGE_VOCABULARY dict (~40 terms) for domain-specific accuracy - FastText-based ML detection with 0.6 confidence threshold - Support for Dutch, French, German, Spanish, Italian, Portuguese, English - Dynamic synonym extraction from LinkML enum values - 93 comprehensive tests (all passing) 2. Schema Loader Enhancements (schema_loader.py) - Language-tagged multilingual synonym extraction for DSPy signatures - Enhanced enum value parsing with annotations support - Better error handling for malformed schema files 3. DSPy Heritage RAG (dspy_heritage_rag.py) - Fixed all 10 mypy type errors - Enhanced type annotations throughout - Improved query routing with multilingual support 4. Dependencies (pyproject.toml) - Added fast-langdetect ^1.0.0 (primary language detection) - Added types-pyyaml ^6.0.12 (mypy type stubs) Tests: 93 new tests for ontology_mapping, all passing Mypy: Clean (no type errors)	2025-12-14 15:55:18 +01:00
kempersc	505c12601a	Add test script for PiCo extraction from Arabic waqf documents - Implemented a new script `test_pico_arabic_waqf.py` to test the GLM annotator's ability to extract person observations from Arabic historical documents. - The script includes environment variable handling for API token, structured prompts for the GLM API, and validation of extraction results. - Added comprehensive logging for API responses, extraction results, and validation errors. - Included a sample Arabic waqf text for testing purposes, following the PiCo ontology pattern.	2025-12-12 17:50:17 +01:00
kempersc	b1f93b6f22	enrich person profiles	2025-12-12 12:51:10 +01:00
kempersc	1b1cfbfca0	enrich custodians	2025-12-11 22:32:09 +01:00
kempersc	d4906abae4	update postgis data	2025-12-10 23:51:51 +01:00
kempersc	be3fbac601	enrich entries and persons	2025-12-10 18:04:25 +01:00
kempersc	41959f0766	correct HCID!	2025-12-10 13:01:13 +01:00
kempersc	131e3ca259	normalise custodian entries	2025-12-09 07:56:35 +01:00
kempersc	35066eb5eb	fix(typedb): improve concept serialization for frontend compatibility - Add 'type' alias alongside '_type' for frontend compatibility - Handle both bytes and string IID formats from driver - Add 'id' alias alongside '_iid' for consistency	2025-12-08 14:58:35 +01:00
kempersc	7e3559f7e5	add new entries	2025-12-07 23:08:02 +01:00
kempersc	49141c005b	feat(api): add PostGIS boundary REST endpoints Add new endpoints for querying administrative boundaries: - GET /boundaries/countries - List all countries with boundary data - GET /boundaries/countries/{code}/admin1 - List provinces/states - GET /boundaries/countries/{code}/admin2 - List municipalities - GET /boundaries/lookup?lat=&lon= - Point-in-polygon lookup - GET /boundaries/admin2/{id}/geojson - Get single boundary GeoJSON - GET /boundaries/countries/{code}/admin2/geojson - Get all admin2 as GeoJSON - GET /boundaries/stats - Get boundary statistics Uses proper joins through country_id/admin1_id foreign keys.	2025-12-07 18:56:11 +01:00
kempersc	d9325c0bb5	feat: add web archives integration and improve enrichment scripts Backend: - Attach web_archives.duckdb as read-only database in DuckLake - Create views for web_archives, web_pages, web_claims in heritage schema Scripts: - enrich_cities_google.py: Add batch processing and retry logic - migrate_web_archives.py: Improve schema handling and error recovery Frontend: - DuckLakePanel: Add web archives query support - Database.css: Improve layout for query results display	2025-12-07 17:49:07 +01:00
kempersc	ee4e57bc75	add new entries	2025-12-07 00:26:01 +01:00
kempersc	1635625032	added web annotations	2025-12-06 19:50:04 +01:00

18 commits