kempersc/glam - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
kempersc	b4d1a7677f	feat: Migrate has_air_changes_per_hour to specifies_or_specified and create AirChanges and Ventilation classes	2026-01-27 09:03:22 +01:00
kempersc	b32efc208e	feat(schema): migrate collection_focus to has_or_had_category; archive collection_focus slot	2026-01-19 16:56:34 +01:00
kempersc	416aa407cc	Add new slots for financial and heritage documentation - Introduced total expense, total frames analyzed, total investment, total liability, total net asset, and traditional product slots to enhance financial reporting capabilities. - Added transition types detected, treatment description, type hypothesis, typical condition, typical HTTP methods, typical response formats, and typical scope slots for improved heritage documentation. - Implemented user community, verified, web observation, WhatsApp business likelihood, wikidata equivalent, and wikidata mapping slots to enrich institutional data representation. - Established has_or_had_asset, has_or_had_budget, has_or_had_expense, and is_or_was_threatened_by slots to capture asset, budget, expense relationships, and threats to heritage forms.	2026-01-15 19:35:39 +01:00
kempersc	3fb27c15e2	Refactor and archive deprecated slots; update migration records - Removed deprecated slots: storage_security_level, version_number, video_comment, visiting_hour, was_asserted_by, was_revision_of, writing_system. - Archived corresponding YAML files for deprecated slots with detailed migration notes. - Updated slot definitions for has_collection and encompassing_body to reflect new naming conventions and temporal patterns. - Enhanced metadata extraction in index_persons_qdrant.py to include WCMS registration and data sources. - Modified hybrid_retriever and multi_embedding_retriever to support filtering by WCMS registration status.	2026-01-15 13:16:59 +01:00
kempersc	b8914761b8	standardise slots	2026-01-14 09:51:14 +01:00
kempersc	e3adb4ed60	feat: Introduce Overview, RealnessStatus, and WebLink classes with comprehensive documentation and migration notes - Added Overview class to represent structured collections of web links, including detailed descriptions, examples, and ontology alignments. - Introduced RealnessStatus class to classify data as real or synthetic, with rich provenance and temporal semantics. - Created WebLink class for representing hyperlinks with associated metadata, enhancing structured link representation. - Established new slots: has_or_had_comprehensive_overview, is_or_was_real, and includes_or_included to support the new classes and improve data modeling. - Migrated existing slots to new structures, ensuring compliance with RiC-O naming conventions and enhancing specificity. - Updated annotations and examples across all new classes and slots for clarity and usability.	2026-01-14 09:32:14 +01:00
kempersc	b30711fcfb	update slots	2026-01-14 09:05:54 +01:00
kempersc	ac36b80476	feat(rag): add companion queries for count templates Add companion_query support to fetch full entity records alongside aggregate count queries. Enables displaying results on map/list when asking 'how many museums in Amsterdam?' Backend changes: - Add companion_query, companion_query_region, companion_query_country fields to TemplateDefinition and TemplateMatchResult - Add render_template_string() for raw companion query rendering Template changes: - Add companion queries to count_institutions_by_type_and_location for settlement, region, and country level queries - Returns institution URI, name, coordinates, city for visualization	2026-01-10 18:44:06 +01:00
kempersc	6c19ef8661	feat(rag): add Rule 46 epistemic provenance tracking Track full lineage of RAG responses: WHERE data comes from, WHEN it was retrieved, HOW it was processed (SPARQL/vector/LLM). Backend changes: - Add provenance.py with EpistemicProvenance, DataTier, SourceAttribution - Integrate provenance into MultiSourceRetriever.merge_results() - Return epistemic_provenance in DSPyQueryResponse Frontend changes: - Pass EpistemicProvenance through useMultiDatabaseRAG hook - Display provenance in ConversationPage (for cache transparency) Schema fixes: - Fix truncated example in has_observation.yaml slot definition References: - Pavlyshyn's Context Graphs and Data Traces paper - LinkML ProvenanceBlock schema pattern	2026-01-10 18:42:43 +01:00
kempersc	9e67d0f967	enrich profiles	2026-01-09 20:35:19 +01:00
kempersc	12fed83d6e	fix(rag): preserve count value for COUNT queries in non-streaming endpoint - Detect COUNT queries by checking for 'count' key in SPARQL results - Skip institution transformation for COUNT queries to preserve count value - Fixes bug where 'Hoeveel archieven in Utrecht?' returned 1 instead of 10 - COUNT queries now correctly extract integer count from SPARQL response	2026-01-09 18:57:40 +01:00
kempersc	8a7ed757b8	fix(rag): use SPARQL results for COUNT queries in streaming fast-path - Fix bug where COUNT queries showed Qdrant result count (10) instead of actual SPARQL count (e.g., 204 musea in Noord-Holland) - Use sparql_results for count extraction in factual query fast-path - Also fix fallback COUNT/LIST handling to use sparql_results	2026-01-09 18:47:56 +01:00
kempersc	c0d31b3905	fix(rag): add fallback imports for semantic_router and temporal_intent Support both relative and absolute imports for running as module or script.	2026-01-09 18:26:40 +01:00
kempersc	ce66a294e5	fix(rag): transform SPARQL results to match frontend metadata format for map coordinates - Convert flat SPARQL results {lat, lon} to nested {metadata: {latitude, longitude}} - Parse string coordinates to float values - Add city/country/institution_type from template slots - Enables ChatMapPanel to render map markers correctly	2026-01-09 15:49:18 +01:00
kempersc	787f4dacb0	feat(rag): implement database routing in query endpoint Log database routing decisions and add databases_used to response metadata. When template specifies databases: ["oxigraph"], Qdrant vector search is skipped.	2026-01-09 12:15:49 +01:00
kempersc	4d5641b6c5	feat(rag): add database routing configuration to templates - Add 'databases' field to TemplateDefinition and TemplateMatchResult - Support values: 'oxigraph' (SPARQL/KG), 'qdrant' (vector search) - Add helper methods use_oxigraph() and use_qdrant() - Default to both databases for backward compatibility - Allows templates to skip vector search for factual/geographic queries	2026-01-09 11:54:17 +01:00
kempersc	c88fd3af70	Refactor code structure for improved readability and maintainability	2026-01-09 11:05:26 +01:00
kempersc	6608a207d4	update frontend	2026-01-08 15:56:28 +01:00
kempersc	0b0ea75070	feat(rag): add factual query fast path - skip LLM for count/list queries - Add ontology cache warming at startup in lifespan() function - Add is_factual_query() detection in template_sparql.py (12 templates) - Add factual_result and sparql_query fields to DSPyQueryResponse - Skip LLM generation for factual templates (count, list, compare) - Execute SPARQL directly and return results as table (~15s → ~2s latency) - Update ConversationPanel.tsx to render factual results table - Add CSS styling for factual results with green theme For queries like 'hoeveel archieven zijn er in Den Haag', the SPARQL results ARE the answer - no need for expensive LLM prose generation.	2026-01-08 13:34:23 +01:00
kempersc	99dc608826	Refactor RAG to template-based SPARQL generation Major architectural changes based on Formica et al. (2023) research: - Add TemplateClassifier for deterministic SPARQL template matching - Add SlotExtractor with synonym resolution for slot values - Add TemplateInstantiator using Jinja2 for query rendering - Refactor dspy_heritage_rag.py to use template system - Update main.py with streamlined pipeline - Fix semantic_router.py ordering issues - Add comprehensive metrics tracking Template-based approach achieves 65% precision vs 10% LLM-only per Formica et al. research on SPARQL generation.	2026-01-07 22:04:43 +01:00
kempersc	98c42bf272	Fix LinkML URI conflicts and generate RDF outputs - Fix scope_note → finding_aid_scope_note in FindingAid.yaml - Remove duplicate wikidata_entity slot from CustodianType.yaml (import instead) - Remove duplicate rico_record_set_type from class_metadata_slots.yaml - Fix range types for equals_string compatibility (uriorcurie → string) - Move class names from close_mappings to see_also in 10 RecordSetTypes files - Generate all RDF formats: OWL, N-Triples, RDF/XML, N3, JSON-LD context - Sync schemas to frontend/public/schemas/ Files: 1,151 changed (includes prior CustodianType migration)	2026-01-07 12:32:59 +01:00
kempersc	11983014bb	Enhance specificity scoring system integration with existing infrastructure - Updated documentation to clarify integration points with existing components in the RAG pipeline and DSPy framework. - Added detailed mapping of SPARQL templates to context templates for improved specificity filtering. - Implemented wrapper patterns around existing classifiers to extend functionality without duplication. - Introduced new tests for the SpecificityAwareClassifier and SPARQLToContextMapper to ensure proper integration and functionality. - Enhanced the CustodianRDFConverter to include ISO country and subregion codes from GHCID for better geospatial data handling.	2026-01-05 17:37:49 +01:00
kempersc	2dca28d8c1	enrich CH entries with mission statements	2026-01-04 13:12:32 +01:00
kempersc	4f0cafe98a	enrich HC profiles	2026-01-02 02:11:04 +01:00
kempersc	349f31ae6f	enrich custodian profiles	2026-01-02 02:10:18 +01:00
kempersc	1d8fd68e3a	backup custodian web profiles	2025-12-30 23:53:16 +01:00
kempersc	30b701a5ec	backup HC data	2025-12-30 23:41:15 +01:00
kempersc	90b402dba6	enrich AR en Czech files	2025-12-30 23:01:01 +01:00
kempersc	d64f857aa9	add sparql validator and RAG injector	2025-12-30 03:43:31 +01:00
kempersc	84904e344b	Make AGENTS more succint by referring to opencode rules & enrich custodians	2025-12-28 14:56:35 +01:00
kempersc	cdb633b0c9	enrich custodian entries with logo	2025-12-27 02:15:17 +01:00
kempersc	6af5009444	enrich entries	2025-12-26 21:41:18 +01:00
kempersc	59963c8d3f	Logo enrichment batch: JP+300, CZ-0 - 12,833 files (40.4%) - JP: 4,496 processed (37.2% of 12,096) ✅ COMPLETE - CZ: 2,820 processed (33.4% of 8,432) - batch completed, slight decrease - CH, NL, BE, AT, BR: 100% complete - Total: 12,833 of 31,772 files (40.4%) - Using crawl4ai favicon extraction	2025-12-26 13:42:21 +01:00
kempersc	fb7993e3af	fix: filter DSPy field markers from streaming output Implements a state machine to filter streaming tokens: - Only stream tokens from the 'answer' field to the frontend - Skip tokens from 'reasoning', 'citations', 'confidence', 'follow_up' fields - Remove DSPy field markers like '[[ ## answer ## ]]' from streamed content This fixes the issue where raw DSPy signature field markers were being displayed in the chat interface instead of clean answer text.	2025-12-26 03:11:44 +01:00
kempersc	6ab0b19ae2	Logo enrichment batch: CZ+260, JP+260 - 11,663 files (36.7%) - CZ: 2,810 processed (33.3% of 8,432) - JP: 3,336 processed (27.6% of 12,096) - Total: 11,663 of 31,772 (36.7%) - Using crawl4ai favicon extraction	2025-12-25 19:23:41 +01:00
kempersc	717ee3408a	Logo enrichment batch: JP+771, CZ+380 - 10,913 files (34%) - JP: 2,846 processed (24% of 12,096) - CZ: 2,550 processed (30% of 8,432) - CH, NL, BE, AT, BR: 100% complete - Total: 10,913 of 31,772 files (34%) - Using crawl4ai favicon extraction	2025-12-25 13:44:26 +01:00
kempersc	38292d1918	enrich: logo enrichment for JP custodians (1350 processed, 10746 remaining)	2025-12-23 20:56:21 +01:00
kempersc	5e8a432ef0	enrich japanese and dutch custodians	2025-12-23 18:08:45 +01:00
kempersc	0c1d19e98b	enrich entries	2025-12-23 13:27:35 +01:00
kempersc	879cddc47e	fix(rag): update HeritageSPARQLGenerator with correct ontology - Use hc: <https://w3id.org/heritage/custodian/> prefix - Use hc:institutionType with single-letter codes (M, L, A, etc.) - Use Wikidata URIs for countries (Q55=NL, Q31=BE, etc.) - Update all SPARQL examples to use correct ontology - Align with actual RDF data in Oxigraph	2025-12-22 22:32:08 +01:00
kempersc	8e97a7beca	fix(rag): correct SPARQL ontology prefixes for LinkML schema - Update HeritageSPARQLGenerator docstring with correct prefixes - Change main class from hc:Custodian to crm:E39_Actor - Change type property from hcp:institutionType to org:classification - Update type values from single letters to full names (MUSEUM, ARCHIVE, etc.) - Add rate limit handling with exponential backoff for 429 errors - Fix test_live_rag.py sample queries to use correct ontology - Update optimized_models instructions with correct prefixes	2025-12-22 21:31:08 +01:00
kempersc	7a056fa746	enrich entries	2025-12-21 22:12:34 +01:00
kempersc	aca68ea47f	remove a,bihguous web-claims	2025-12-21 00:01:54 +01:00
kempersc	23b1d8ee5f	clean up GHCID	2025-12-17 11:58:40 +01:00
kempersc	99430c2a70	add new entries and semantic routing	2025-12-17 10:11:56 +01:00
kempersc	e0dd847491	extend ontology	2025-12-16 20:27:39 +01:00
kempersc	cb56aa7e40	enrich all custodian timespan	2025-12-15 22:31:41 +01:00
kempersc	68c5aa2724	feat(api): Add heritage person classification and RAG retry logic - Add GLAMORCUBESFIXPHDNT heritage type detection for person profiles - Two-stage classification: blocklist non-heritage orgs, then match keywords - Special handling for Digital (D) type: requires heritage org context - Add career_history heritage_relevant and heritage_type fields - Add exponential backoff retry for Anthropic API overload errors - Fix DSPy 3.x async context with dspy.context() wrapper	2025-12-15 01:31:54 +01:00
kempersc	c6aee998db	correct person labels	2025-12-14 17:29:39 +01:00
kempersc	c50c35fd3a	enrich person custodian	2025-12-14 17:09:55 +01:00

1 2

64 commits