Implement Heritage RAG pipeline enhancements: 1. Ontology Mapping (new file: ontology_mapping.py) - Hybrid language detection: heritage vocabulary -> fast-langdetect -> English default - HERITAGE_VOCABULARY dict (~40 terms) for domain-specific accuracy - FastText-based ML detection with 0.6 confidence threshold - Support for Dutch, French, German, Spanish, Italian, Portuguese, English - Dynamic synonym extraction from LinkML enum values - 93 comprehensive tests (all passing) 2. Schema Loader Enhancements (schema_loader.py) - Language-tagged multilingual synonym extraction for DSPy signatures - Enhanced enum value parsing with annotations support - Better error handling for malformed schema files 3. DSPy Heritage RAG (dspy_heritage_rag.py) - Fixed all 10 mypy type errors - Enhanced type annotations throughout - Improved query routing with multilingual support 4. Dependencies (pyproject.toml) - Added fast-langdetect ^1.0.0 (primary language detection) - Added types-pyyaml ^6.0.12 (mypy type stubs) Tests: 93 new tests for ontology_mapping, all passing Mypy: Clean (no type errors) |
||
|---|---|---|
| .. | ||
| annotators | ||
| exporters | ||
| extractors | ||
| fixtures | ||
| geocoding | ||
| identifiers | ||
| parsers | ||
| rag | ||
| scrapers | ||
| __init__.py | ||
| test_bibliographic_module.py | ||
| test_legal_form_migration.py | ||
| test_mermaid_generation.py | ||
| test_nlp_extractor.py | ||
| test_partnership_rdf_integration.py | ||
| test_temporal_validation.py | ||
| test_transliteration.py | ||