kempersc
47e8226595
feat(tests): Complete DSPy GitOps testing framework
...
- Layer 1: 35 unit tests (no LLM required)
- Layer 2: 56 DSPy module tests with LLM
- Layer 3: 10 integration tests with Oxigraph
- Layer 4: Comprehensive evaluation suite
Fixed:
- Coordinate queries to use schema:location -> blank node pattern
- Golden query expected intent for location questions
- Health check test filtering in Layer 4
Added GitHub Actions workflow for CI/CD evaluation
2026-01-11 20:04:33 +01:00
kempersc
d1c9aebd84
feat(rag): Add hybrid language detection and enhanced ontology mapping
...
Implement Heritage RAG pipeline enhancements:
1. Ontology Mapping (new file: ontology_mapping.py)
- Hybrid language detection: heritage vocabulary -> fast-langdetect -> English default
- HERITAGE_VOCABULARY dict (~40 terms) for domain-specific accuracy
- FastText-based ML detection with 0.6 confidence threshold
- Support for Dutch, French, German, Spanish, Italian, Portuguese, English
- Dynamic synonym extraction from LinkML enum values
- 93 comprehensive tests (all passing)
2. Schema Loader Enhancements (schema_loader.py)
- Language-tagged multilingual synonym extraction for DSPy signatures
- Enhanced enum value parsing with annotations support
- Better error handling for malformed schema files
3. DSPy Heritage RAG (dspy_heritage_rag.py)
- Fixed all 10 mypy type errors
- Enhanced type annotations throughout
- Improved query routing with multilingual support
4. Dependencies (pyproject.toml)
- Added fast-langdetect ^1.0.0 (primary language detection)
- Added types-pyyaml ^6.0.12 (mypy type stubs)
Tests: 93 new tests for ontology_mapping, all passing
Mypy: Clean (no type errors)
2025-12-14 15:55:18 +01:00
kempersc
505c12601a
Add test script for PiCo extraction from Arabic waqf documents
...
- Implemented a new script `test_pico_arabic_waqf.py` to test the GLM annotator's ability to extract person observations from Arabic historical documents.
- The script includes environment variable handling for API token, structured prompts for the GLM API, and validation of extraction results.
- Added comprehensive logging for API responses, extraction results, and validation errors.
- Included a sample Arabic waqf text for testing purposes, following the PiCo ontology pattern.
2025-12-12 17:50:17 +01:00
kempersc
41959f0766
correct HCID!
2025-12-10 13:01:13 +01:00
kempersc
1635625032
added web annotations
2025-12-06 19:50:04 +01:00
kempersc
3a242370fc
annotation standards added
2025-12-05 15:30:23 +01:00
kempersc
3c80de87e0
add isil entries
2025-11-19 23:25:22 +01:00