glam/backend
kempersc d1c9aebd84 feat(rag): Add hybrid language detection and enhanced ontology mapping
Implement Heritage RAG pipeline enhancements:

1. Ontology Mapping (new file: ontology_mapping.py)
   - Hybrid language detection: heritage vocabulary -> fast-langdetect -> English default
   - HERITAGE_VOCABULARY dict (~40 terms) for domain-specific accuracy
   - FastText-based ML detection with 0.6 confidence threshold
   - Support for Dutch, French, German, Spanish, Italian, Portuguese, English
   - Dynamic synonym extraction from LinkML enum values
   - 93 comprehensive tests (all passing)

2. Schema Loader Enhancements (schema_loader.py)
   - Language-tagged multilingual synonym extraction for DSPy signatures
   - Enhanced enum value parsing with annotations support
   - Better error handling for malformed schema files

3. DSPy Heritage RAG (dspy_heritage_rag.py)
   - Fixed all 10 mypy type errors
   - Enhanced type annotations throughout
   - Improved query routing with multilingual support

4. Dependencies (pyproject.toml)
   - Added fast-langdetect ^1.0.0 (primary language detection)
   - Added types-pyyaml ^6.0.12 (mypy type stubs)

Tests: 93 new tests for ontology_mapping, all passing
Mypy: Clean (no type errors)
2025-12-14 15:55:18 +01:00
..
ducklake feat: add web archives integration and improve enrichment scripts 2025-12-07 17:49:07 +01:00
postgres Add test script for PiCo extraction from Arabic waqf documents 2025-12-12 17:50:17 +01:00
rag feat(rag): Add hybrid language detection and enhanced ontology mapping 2025-12-14 15:55:18 +01:00
typedb enrich custodians 2025-12-11 22:32:09 +01:00
valkey update postgis data 2025-12-10 23:51:51 +01:00