kempersc/glam - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
kempersc	fb7993e3af	fix: filter DSPy field markers from streaming output Implements a state machine to filter streaming tokens: - Only stream tokens from the 'answer' field to the frontend - Skip tokens from 'reasoning', 'citations', 'confidence', 'follow_up' fields - Remove DSPy field markers like '[[ ## answer ## ]]' from streamed content This fixes the issue where raw DSPy signature field markers were being displayed in the chat interface instead of clean answer text.	2025-12-26 03:11:44 +01:00
kempersc	6ab0b19ae2	Logo enrichment batch: CZ+260, JP+260 - 11,663 files (36.7%) - CZ: 2,810 processed (33.3% of 8,432) - JP: 3,336 processed (27.6% of 12,096) - Total: 11,663 of 31,772 (36.7%) - Using crawl4ai favicon extraction	2025-12-25 19:23:41 +01:00
kempersc	717ee3408a	Logo enrichment batch: JP+771, CZ+380 - 10,913 files (34%) - JP: 2,846 processed (24% of 12,096) - CZ: 2,550 processed (30% of 8,432) - CH, NL, BE, AT, BR: 100% complete - Total: 10,913 of 31,772 files (34%) - Using crawl4ai favicon extraction	2025-12-25 13:44:26 +01:00
kempersc	38292d1918	enrich: logo enrichment for JP custodians (1350 processed, 10746 remaining)	2025-12-23 20:56:21 +01:00
kempersc	5e8a432ef0	enrich japanese and dutch custodians	2025-12-23 18:08:45 +01:00
kempersc	0c1d19e98b	enrich entries	2025-12-23 13:27:35 +01:00
kempersc	879cddc47e	fix(rag): update HeritageSPARQLGenerator with correct ontology - Use hc: <https://w3id.org/heritage/custodian/> prefix - Use hc:institutionType with single-letter codes (M, L, A, etc.) - Use Wikidata URIs for countries (Q55=NL, Q31=BE, etc.) - Update all SPARQL examples to use correct ontology - Align with actual RDF data in Oxigraph	2025-12-22 22:32:08 +01:00
kempersc	8e97a7beca	fix(rag): correct SPARQL ontology prefixes for LinkML schema - Update HeritageSPARQLGenerator docstring with correct prefixes - Change main class from hc:Custodian to crm:E39_Actor - Change type property from hcp:institutionType to org:classification - Update type values from single letters to full names (MUSEUM, ARCHIVE, etc.) - Add rate limit handling with exponential backoff for 429 errors - Fix test_live_rag.py sample queries to use correct ontology - Update optimized_models instructions with correct prefixes	2025-12-22 21:31:08 +01:00
kempersc	7a056fa746	enrich entries	2025-12-21 22:12:34 +01:00
kempersc	aca68ea47f	remove a,bihguous web-claims	2025-12-21 00:01:54 +01:00
kempersc	23b1d8ee5f	clean up GHCID	2025-12-17 11:58:40 +01:00
kempersc	99430c2a70	add new entries and semantic routing	2025-12-17 10:11:56 +01:00
kempersc	68c5aa2724	feat(api): Add heritage person classification and RAG retry logic - Add GLAMORCUBESFIXPHDNT heritage type detection for person profiles - Two-stage classification: blocklist non-heritage orgs, then match keywords - Special handling for Digital (D) type: requires heritage org context - Add career_history heritage_relevant and heritage_type fields - Add exponential backoff retry for Anthropic API overload errors - Fix DSPy 3.x async context with dspy.context() wrapper	2025-12-15 01:31:54 +01:00
kempersc	c6aee998db	correct person labels	2025-12-14 17:29:39 +01:00
kempersc	c50c35fd3a	enrich person custodian	2025-12-14 17:09:55 +01:00
kempersc	d1c9aebd84	feat(rag): Add hybrid language detection and enhanced ontology mapping Implement Heritage RAG pipeline enhancements: 1. Ontology Mapping (new file: ontology_mapping.py) - Hybrid language detection: heritage vocabulary -> fast-langdetect -> English default - HERITAGE_VOCABULARY dict (~40 terms) for domain-specific accuracy - FastText-based ML detection with 0.6 confidence threshold - Support for Dutch, French, German, Spanish, Italian, Portuguese, English - Dynamic synonym extraction from LinkML enum values - 93 comprehensive tests (all passing) 2. Schema Loader Enhancements (schema_loader.py) - Language-tagged multilingual synonym extraction for DSPy signatures - Enhanced enum value parsing with annotations support - Better error handling for malformed schema files 3. DSPy Heritage RAG (dspy_heritage_rag.py) - Fixed all 10 mypy type errors - Enhanced type annotations throughout - Improved query routing with multilingual support 4. Dependencies (pyproject.toml) - Added fast-langdetect ^1.0.0 (primary language detection) - Added types-pyyaml ^6.0.12 (mypy type stubs) Tests: 93 new tests for ontology_mapping, all passing Mypy: Clean (no type errors)	2025-12-14 15:55:18 +01:00
kempersc	505c12601a	Add test script for PiCo extraction from Arabic waqf documents - Implemented a new script `test_pico_arabic_waqf.py` to test the GLM annotator's ability to extract person observations from Arabic historical documents. - The script includes environment variable handling for API token, structured prompts for the GLM API, and validation of extraction results. - Added comprehensive logging for API responses, extraction results, and validation errors. - Included a sample Arabic waqf text for testing purposes, following the PiCo ontology pattern.	2025-12-12 17:50:17 +01:00
kempersc	b1f93b6f22	enrich person profiles	2025-12-12 12:51:10 +01:00
kempersc	1b1cfbfca0	enrich custodians	2025-12-11 22:32:09 +01:00

19 commits