kempersc
|
d1c9aebd84
|
feat(rag): Add hybrid language detection and enhanced ontology mapping
Implement Heritage RAG pipeline enhancements:
1. Ontology Mapping (new file: ontology_mapping.py)
- Hybrid language detection: heritage vocabulary -> fast-langdetect -> English default
- HERITAGE_VOCABULARY dict (~40 terms) for domain-specific accuracy
- FastText-based ML detection with 0.6 confidence threshold
- Support for Dutch, French, German, Spanish, Italian, Portuguese, English
- Dynamic synonym extraction from LinkML enum values
- 93 comprehensive tests (all passing)
2. Schema Loader Enhancements (schema_loader.py)
- Language-tagged multilingual synonym extraction for DSPy signatures
- Enhanced enum value parsing with annotations support
- Better error handling for malformed schema files
3. DSPy Heritage RAG (dspy_heritage_rag.py)
- Fixed all 10 mypy type errors
- Enhanced type annotations throughout
- Improved query routing with multilingual support
4. Dependencies (pyproject.toml)
- Added fast-langdetect ^1.0.0 (primary language detection)
- Added types-pyyaml ^6.0.12 (mypy type stubs)
Tests: 93 new tests for ontology_mapping, all passing
Mypy: Clean (no type errors)
|
2025-12-14 15:55:18 +01:00 |
|