kempersc/glam - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
kempersc	8c42292235	Add new classes and slots to the ontology - Introduced GeospatialLocation class for specific geospatial locations. - Added HandsOnFacility class representing facilities for hands-on experiences. - Created Hyponym class for narrower terms or instances. - Added ImagingEquipment class for imaging-related equipment. - Introduced LoadingDock class for loading dock facilities. - Created LocalCollection class for locally held collections. - Added Locker class for storage lockers available to visitors/staff. - Introduced MichelinStarRating class for Michelin star ratings. - Created MicrofilmReader class for equipment used to read microfilms. - Added OperationalArchive class for archives containing operational records. - Introduced OperationalUnit class for operational units within organizations. - Added has_or_had_archive slot for associating archives with entities. - Created has_or_had_rating slot for ratings assigned to entities. - Introduced has_or_had_section slot for sections or units within organizations. - Added has_geospatial_location slot linking nominal places to precise geospatial coordinates.	2026-01-27 22:17:11 +01:00
kempersc	80eb3d969c	Add new slots for heritage custodian ontology - Introduced `has_api_version`, `has_appellation_language`, `has_appellation_type`, `has_appellation_value`, `has_applicable_country`, `has_application_deadline`, `has_application_opening_date`, `has_appraisal_note`, `has_approval_date`, `has_archdiocese_name`, `has_architectural_style`, `has_archival_reference`, `has_archive_description`, `has_archive_memento_uri`, `has_archive_name`, `has_archive_path`, `has_archive_search_score`, `has_arrangement`, `has_arrangement_level`, `has_arrangement_note`, `has_articles_archival_stage`, `has_articles_document_format`, `has_articles_document_url`, `has_articles_of_association`, `has_or_had_altitude`, `has_or_had_annotation`, `has_or_had_arrangement`, `has_or_had_document`, `has_or_had_reason`, `has_or_had_style`, `is_or_was_amended_through`, `is_or_was_approved_on`, `is_or_was_archived_as`, `is_or_was_due_on`, `is_or_was_opened_on`, and `is_or_was_used_in` slots. - Each slot includes detailed descriptions, range specifications, and appropriate mappings to existing ontologies.	2026-01-27 10:07:16 +01:00
kempersc	b4d1a7677f	feat: Migrate has_air_changes_per_hour to specifies_or_specified and create AirChanges and Ventilation classes	2026-01-27 09:03:22 +01:00
kempersc	b32efc208e	feat(schema): migrate collection_focus to has_or_had_category; archive collection_focus slot	2026-01-19 16:56:34 +01:00
kempersc	416aa407cc	Add new slots for financial and heritage documentation - Introduced total expense, total frames analyzed, total investment, total liability, total net asset, and traditional product slots to enhance financial reporting capabilities. - Added transition types detected, treatment description, type hypothesis, typical condition, typical HTTP methods, typical response formats, and typical scope slots for improved heritage documentation. - Implemented user community, verified, web observation, WhatsApp business likelihood, wikidata equivalent, and wikidata mapping slots to enrich institutional data representation. - Established has_or_had_asset, has_or_had_budget, has_or_had_expense, and is_or_was_threatened_by slots to capture asset, budget, expense relationships, and threats to heritage forms.	2026-01-15 19:35:39 +01:00
kempersc	3fb27c15e2	Refactor and archive deprecated slots; update migration records - Removed deprecated slots: storage_security_level, version_number, video_comment, visiting_hour, was_asserted_by, was_revision_of, writing_system. - Archived corresponding YAML files for deprecated slots with detailed migration notes. - Updated slot definitions for has_collection and encompassing_body to reflect new naming conventions and temporal patterns. - Enhanced metadata extraction in index_persons_qdrant.py to include WCMS registration and data sources. - Modified hybrid_retriever and multi_embedding_retriever to support filtering by WCMS registration status.	2026-01-15 13:16:59 +01:00
kempersc	e3adb4ed60	feat: Introduce Overview, RealnessStatus, and WebLink classes with comprehensive documentation and migration notes - Added Overview class to represent structured collections of web links, including detailed descriptions, examples, and ontology alignments. - Introduced RealnessStatus class to classify data as real or synthetic, with rich provenance and temporal semantics. - Created WebLink class for representing hyperlinks with associated metadata, enhancing structured link representation. - Established new slots: has_or_had_comprehensive_overview, is_or_was_real, and includes_or_included to support the new classes and improve data modeling. - Migrated existing slots to new structures, ensuring compliance with RiC-O naming conventions and enhancing specificity. - Updated annotations and examples across all new classes and slots for clarity and usability.	2026-01-14 09:32:14 +01:00
kempersc	6c19ef8661	feat(rag): add Rule 46 epistemic provenance tracking Track full lineage of RAG responses: WHERE data comes from, WHEN it was retrieved, HOW it was processed (SPARQL/vector/LLM). Backend changes: - Add provenance.py with EpistemicProvenance, DataTier, SourceAttribution - Integrate provenance into MultiSourceRetriever.merge_results() - Return epistemic_provenance in DSPyQueryResponse Frontend changes: - Pass EpistemicProvenance through useMultiDatabaseRAG hook - Display provenance in ConversationPage (for cache transparency) Schema fixes: - Fix truncated example in has_observation.yaml slot definition References: - Pavlyshyn's Context Graphs and Data Traces paper - LinkML ProvenanceBlock schema pattern	2026-01-10 18:42:43 +01:00
kempersc	12fed83d6e	fix(rag): preserve count value for COUNT queries in non-streaming endpoint - Detect COUNT queries by checking for 'count' key in SPARQL results - Skip institution transformation for COUNT queries to preserve count value - Fixes bug where 'Hoeveel archieven in Utrecht?' returned 1 instead of 10 - COUNT queries now correctly extract integer count from SPARQL response	2026-01-09 18:57:40 +01:00
kempersc	ce66a294e5	fix(rag): transform SPARQL results to match frontend metadata format for map coordinates - Convert flat SPARQL results {lat, lon} to nested {metadata: {latitude, longitude}} - Parse string coordinates to float values - Add city/country/institution_type from template slots - Enables ChatMapPanel to render map markers correctly	2026-01-09 15:49:18 +01:00
kempersc	787f4dacb0	feat(rag): implement database routing in query endpoint Log database routing decisions and add databases_used to response metadata. When template specifies databases: ["oxigraph"], Qdrant vector search is skipped.	2026-01-09 12:15:49 +01:00
kempersc	c88fd3af70	Refactor code structure for improved readability and maintainability	2026-01-09 11:05:26 +01:00
kempersc	6608a207d4	update frontend	2026-01-08 15:56:28 +01:00
kempersc	0b0ea75070	feat(rag): add factual query fast path - skip LLM for count/list queries - Add ontology cache warming at startup in lifespan() function - Add is_factual_query() detection in template_sparql.py (12 templates) - Add factual_result and sparql_query fields to DSPyQueryResponse - Skip LLM generation for factual templates (count, list, compare) - Execute SPARQL directly and return results as table (~15s → ~2s latency) - Update ConversationPanel.tsx to render factual results table - Add CSS styling for factual results with green theme For queries like 'hoeveel archieven zijn er in Den Haag', the SPARQL results ARE the answer - no need for expensive LLM prose generation.	2026-01-08 13:34:23 +01:00
kempersc	99dc608826	Refactor RAG to template-based SPARQL generation Major architectural changes based on Formica et al. (2023) research: - Add TemplateClassifier for deterministic SPARQL template matching - Add SlotExtractor with synonym resolution for slot values - Add TemplateInstantiator using Jinja2 for query rendering - Refactor dspy_heritage_rag.py to use template system - Update main.py with streamlined pipeline - Fix semantic_router.py ordering issues - Add comprehensive metrics tracking Template-based approach achieves 65% precision vs 10% LLM-only per Formica et al. research on SPARQL generation.	2026-01-07 22:04:43 +01:00
kempersc	11983014bb	Enhance specificity scoring system integration with existing infrastructure - Updated documentation to clarify integration points with existing components in the RAG pipeline and DSPy framework. - Added detailed mapping of SPARQL templates to context templates for improved specificity filtering. - Implemented wrapper patterns around existing classifiers to extend functionality without duplication. - Introduced new tests for the SpecificityAwareClassifier and SPARQLToContextMapper to ensure proper integration and functionality. - Enhanced the CustodianRDFConverter to include ISO country and subregion codes from GHCID for better geospatial data handling.	2026-01-05 17:37:49 +01:00
kempersc	2dca28d8c1	enrich CH entries with mission statements	2026-01-04 13:12:32 +01:00
kempersc	4f0cafe98a	enrich HC profiles	2026-01-02 02:11:04 +01:00
kempersc	349f31ae6f	enrich custodian profiles	2026-01-02 02:10:18 +01:00
kempersc	d64f857aa9	add sparql validator and RAG injector	2025-12-30 03:43:31 +01:00
kempersc	84904e344b	Make AGENTS more succint by referring to opencode rules & enrich custodians	2025-12-28 14:56:35 +01:00
kempersc	59963c8d3f	Logo enrichment batch: JP+300, CZ-0 - 12,833 files (40.4%) - JP: 4,496 processed (37.2% of 12,096) ✅ COMPLETE - CZ: 2,820 processed (33.4% of 8,432) - batch completed, slight decrease - CH, NL, BE, AT, BR: 100% complete - Total: 12,833 of 31,772 files (40.4%) - Using crawl4ai favicon extraction	2025-12-26 13:42:21 +01:00
kempersc	6ab0b19ae2	Logo enrichment batch: CZ+260, JP+260 - 11,663 files (36.7%) - CZ: 2,810 processed (33.3% of 8,432) - JP: 3,336 processed (27.6% of 12,096) - Total: 11,663 of 31,772 (36.7%) - Using crawl4ai favicon extraction	2025-12-25 19:23:41 +01:00
kempersc	717ee3408a	Logo enrichment batch: JP+771, CZ+380 - 10,913 files (34%) - JP: 2,846 processed (24% of 12,096) - CZ: 2,550 processed (30% of 8,432) - CH, NL, BE, AT, BR: 100% complete - Total: 10,913 of 31,772 files (34%) - Using crawl4ai favicon extraction	2025-12-25 13:44:26 +01:00
kempersc	38292d1918	enrich: logo enrichment for JP custodians (1350 processed, 10746 remaining)	2025-12-23 20:56:21 +01:00
kempersc	5e8a432ef0	enrich japanese and dutch custodians	2025-12-23 18:08:45 +01:00
kempersc	0c1d19e98b	enrich entries	2025-12-23 13:27:35 +01:00
kempersc	7a056fa746	enrich entries	2025-12-21 22:12:34 +01:00
kempersc	aca68ea47f	remove a,bihguous web-claims	2025-12-21 00:01:54 +01:00
kempersc	23b1d8ee5f	clean up GHCID	2025-12-17 11:58:40 +01:00
kempersc	99430c2a70	add new entries and semantic routing	2025-12-17 10:11:56 +01:00
kempersc	68c5aa2724	feat(api): Add heritage person classification and RAG retry logic - Add GLAMORCUBESFIXPHDNT heritage type detection for person profiles - Two-stage classification: blocklist non-heritage orgs, then match keywords - Special handling for Digital (D) type: requires heritage org context - Add career_history heritage_relevant and heritage_type fields - Add exponential backoff retry for Anthropic API overload errors - Fix DSPy 3.x async context with dspy.context() wrapper	2025-12-15 01:31:54 +01:00
kempersc	c6aee998db	correct person labels	2025-12-14 17:29:39 +01:00
kempersc	c50c35fd3a	enrich person custodian	2025-12-14 17:09:55 +01:00
kempersc	505c12601a	Add test script for PiCo extraction from Arabic waqf documents - Implemented a new script `test_pico_arabic_waqf.py` to test the GLM annotator's ability to extract person observations from Arabic historical documents. - The script includes environment variable handling for API token, structured prompts for the GLM API, and validation of extraction results. - Added comprehensive logging for API responses, extraction results, and validation errors. - Included a sample Arabic waqf text for testing purposes, following the PiCo ontology pattern.	2025-12-12 17:50:17 +01:00
kempersc	b1f93b6f22	enrich person profiles	2025-12-12 12:51:10 +01:00
kempersc	1b1cfbfca0	enrich custodians	2025-12-11 22:32:09 +01:00

37 commits