glam/tests
kempersc d1c9aebd84 feat(rag): Add hybrid language detection and enhanced ontology mapping
Implement Heritage RAG pipeline enhancements:

1. Ontology Mapping (new file: ontology_mapping.py)
   - Hybrid language detection: heritage vocabulary -> fast-langdetect -> English default
   - HERITAGE_VOCABULARY dict (~40 terms) for domain-specific accuracy
   - FastText-based ML detection with 0.6 confidence threshold
   - Support for Dutch, French, German, Spanish, Italian, Portuguese, English
   - Dynamic synonym extraction from LinkML enum values
   - 93 comprehensive tests (all passing)

2. Schema Loader Enhancements (schema_loader.py)
   - Language-tagged multilingual synonym extraction for DSPy signatures
   - Enhanced enum value parsing with annotations support
   - Better error handling for malformed schema files

3. DSPy Heritage RAG (dspy_heritage_rag.py)
   - Fixed all 10 mypy type errors
   - Enhanced type annotations throughout
   - Improved query routing with multilingual support

4. Dependencies (pyproject.toml)
   - Added fast-langdetect ^1.0.0 (primary language detection)
   - Added types-pyyaml ^6.0.12 (mypy type stubs)

Tests: 93 new tests for ontology_mapping, all passing
Mypy: Clean (no type errors)
2025-12-14 15:55:18 +01:00
..
annotators improve annotator 2025-12-05 16:25:39 +01:00
exporters Add comprehensive tests for NLP institution extraction and RDF partnership integration 2025-11-19 23:20:47 +01:00
extractors Add comprehensive tests for NLP institution extraction and RDF partnership integration 2025-11-19 23:20:47 +01:00
fixtures Add comprehensive tests for NLP institution extraction and RDF partnership integration 2025-11-19 23:20:47 +01:00
geocoding Add comprehensive tests for NLP institution extraction and RDF partnership integration 2025-11-19 23:20:47 +01:00
identifiers Add comprehensive tests for NLP institution extraction and RDF partnership integration 2025-11-19 23:20:47 +01:00
parsers annotation standards added 2025-12-05 15:30:23 +01:00
rag feat(rag): Add hybrid language detection and enhanced ontology mapping 2025-12-14 15:55:18 +01:00
scrapers add isil entries 2025-11-19 23:25:22 +01:00
__init__.py Add comprehensive tests for NLP institution extraction and RDF partnership integration 2025-11-19 23:20:47 +01:00
test_bibliographic_module.py Add comprehensive tests for NLP institution extraction and RDF partnership integration 2025-11-19 23:20:47 +01:00
test_legal_form_migration.py updated schemata 2025-11-21 22:12:33 +01:00
test_mermaid_generation.py Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats 2025-11-22 14:33:51 +01:00
test_nlp_extractor.py annotation standards added 2025-12-05 15:30:23 +01:00
test_partnership_rdf_integration.py Add comprehensive tests for NLP institution extraction and RDF partnership integration 2025-11-19 23:20:47 +01:00
test_temporal_validation.py Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams 2025-11-22 23:01:13 +01:00
test_transliteration.py feat(ghcid): add diacritics normalization and transliteration scripts 2025-12-08 14:59:28 +01:00