kempersc/glam - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
kempersc	349f31ae6f	enrich custodian profiles	2026-01-02 02:10:18 +01:00
kempersc	aee76fcc7f	backup html content	2025-12-31 02:36:38 +01:00
kempersc	b7701c8a8e	backup person profiles	2025-12-31 00:04:09 +01:00
kempersc	7108cb1483	backup person profiles	2025-12-31 00:00:25 +01:00
kempersc	38dcd2ce9c	Restore YAML files for Museum Dokkum and Gemeente Smallingerland with enriched data and provenance tracking	2025-12-30 23:58:21 +01:00
kempersc	1d8fd68e3a	backup custodian web profiles	2025-12-30 23:53:16 +01:00
kempersc	f6a5962c3b	backup person profiles	2025-12-30 23:48:50 +01:00
kempersc	cbf88d2a6d	backup person profiles	2025-12-30 23:44:57 +01:00
kempersc	30b701a5ec	backup HC data	2025-12-30 23:41:15 +01:00
kempersc	c417d0c758	Refactor code structure for improved readability and maintainability	2025-12-30 23:38:18 +01:00
kempersc	fb0daab718	backup JP profiles	2025-12-30 23:24:30 +01:00
kempersc	b42d6bf5d2	backup CZ and JP	2025-12-30 23:19:38 +01:00
kempersc	45e873ec0a	enrich JP BE AR profiles	2025-12-30 23:07:03 +01:00
kempersc	bc6ad46bfa	enrich CZ and JP profiles	2025-12-30 23:03:03 +01:00
kempersc	90b402dba6	enrich AR en Czech files	2025-12-30 23:01:01 +01:00
kempersc	f753d7277f	Add country code extraction for location validation in Google Places API	2025-12-30 03:45:29 +01:00
kempersc	cefc847056	Remove custodian entry for Leica AG from YAML file	2025-12-30 03:44:25 +01:00
kempersc	9159ff35db	Add custodian entry for Leica AG with data contamination fixes and location corrections	2025-12-30 03:43:47 +01:00
kempersc	d64f857aa9	add sparql validator and RAG injector	2025-12-30 03:43:31 +01:00
kempersc	84904e344b	Make AGENTS more succint by referring to opencode rules & enrich custodians	2025-12-28 14:56:35 +01:00
kempersc	4cf3fe8a07	Logo enrichment batch: JP+170 (5,166/12,096 = 42.7%) - 14,503 total (45.6%)	2025-12-27 13:17:40 +01:00
kempersc	3447a9cc6c	Logo enrichment batch: JP+440 (4,996/12,096 = 41.3%) - 14,333 total (45.1%)	2025-12-27 12:20:53 +01:00
kempersc	cdb633b0c9	enrich custodian entries with logo	2025-12-27 02:15:17 +01:00
kempersc	fd91fec63f	Logo enrichment batch: JP+320, 13,603 total (42.8%) - JP: 4,516/12,096 (37.4%) ✅ NEW COMMIT - CZ: 3,820/8,432 (45.3%) - batches 7-16 running - CH, NL, BE, AT, BR: 100% complete - Total: 13,603/31,772 (42.8%) - Using crawl4ai favicon extraction	2025-12-26 23:25:40 +01:00
kempersc	2104a90f22	Logo enrichment COMPLETE: CZ 3,820 (45.3%) - CZ: 3,820/8,432 files processed (45.3%) - 9 parallel batches completed (500 files each) - NL person entities added (4 staff profiles) - scripts/discover_websites_crawl4ai.py modified - Using crawl4ai favicon extraction	2025-12-26 21:45:14 +01:00
kempersc	6af5009444	enrich entries	2025-12-26 21:41:18 +01:00
kempersc	ca219340f2	enrich entries	2025-12-26 14:30:31 +01:00
kempersc	59963c8d3f	Logo enrichment batch: JP+300, CZ-0 - 12,833 files (40.4%) - JP: 4,496 processed (37.2% of 12,096) ✅ COMPLETE - CZ: 2,820 processed (33.4% of 8,432) - batch completed, slight decrease - CH, NL, BE, AT, BR: 100% complete - Total: 12,833 of 31,772 files (40.4%) - Using crawl4ai favicon extraction	2025-12-26 13:42:21 +01:00
kempersc	fb7993e3af	fix: filter DSPy field markers from streaming output Implements a state machine to filter streaming tokens: - Only stream tokens from the 'answer' field to the frontend - Skip tokens from 'reasoning', 'citations', 'confidence', 'follow_up' fields - Remove DSPy field markers like '[[ ## answer ## ]]' from streamed content This fixes the issue where raw DSPy signature field markers were being displayed in the chat interface instead of clean answer text.	2025-12-26 03:11:44 +01:00
kempersc	6b9fa33767	Logo enrichment batch: CZ+500, JP+170 - 12,513 files (39.4%) - CZ: 2,820 processed (33.4% of 8,432) - JP: 4,176 processed (34.5% of 12,096) - Total: 12,513 of 31,772 (39.4%) - CZ batch completed: 500 files, 52 logos found - JP batch crashed during run (4,176 files before crash) - Using crawl4ai favicon extraction	2025-12-26 02:03:48 +01:00
kempersc	63400392ff	Fix CZ-52-PAB-L-IPVVZOVI logo: use primary_logo.png instead of favicon.ico - Primary logo (logo.png) identified via crawl4ai direct scraping - Favicon (favicon.ico) retained as secondary asset - Updated claims: primary_logo_url + favicon_url - Summary shows: has_primary_logo: true, total_claims: 2	2025-12-25 21:01:05 +01:00
kempersc	6ab0b19ae2	Logo enrichment batch: CZ+260, JP+260 - 11,663 files (36.7%) - CZ: 2,810 processed (33.3% of 8,432) - JP: 3,336 processed (27.6% of 12,096) - Total: 11,663 of 31,772 (36.7%) - Using crawl4ai favicon extraction	2025-12-25 19:23:41 +01:00
kempersc	717ee3408a	Logo enrichment batch: JP+771, CZ+380 - 10,913 files (34%) - JP: 2,846 processed (24% of 12,096) - CZ: 2,550 processed (30% of 8,432) - CH, NL, BE, AT, BR: 100% complete - Total: 10,913 of 31,772 files (34%) - Using crawl4ai favicon extraction	2025-12-25 13:44:26 +01:00
kempersc	d5f2f542ce	feat(archief-assistent): preserve SPARQL queries in semantic cache - Add sparqlQuery field to CachedResponse interface - Extract SPARQL query before cache storage (not after) - Include sparqlQuery in cache HIT message objects - Handle both snake_case (server) and camelCase field names SPARQL queries are now displayed for both fresh API responses and cached responses, improving debugging and transparency.	2025-12-24 22:26:22 +01:00
kempersc	c3387ef3f1	Logo enrichment batch: CZ +380, JP +125, AR +28 files - CZ: 2,170 processed (26% of 8,432) - JP: 2,075 processed (17% of 12,096) - AR: Started processing - Total checkpoint: 9,762 files across all countries - Using crawl4ai favicon extraction	2025-12-24 12:50:20 +01:00
kempersc	57de5e4b11	CZ logo enrichment: 1,790 files processed (21%) - Added logo_enrichment to 771 Czech custodian files - 87% logo hit rate using crawl4ai favicon extraction - Total checkpoint: 9,257 files across all countries - CZ remaining: 6,642 files	2025-12-24 02:41:26 +01:00
kempersc	ce1f80d024	enrich: logo enrichment progress (CZ: 220, JP: 1600)	2025-12-23 22:08:43 +01:00
kempersc	4f6ca92084	enrich: logo enrichment progress (JP: 1500, CZ: 40 started)	2025-12-23 21:37:10 +01:00
kempersc	8036eb5a3f	enrich: logo enrichment for JP custodians (1490 processed, 10606 remaining)	2025-12-23 21:17:45 +01:00
kempersc	38292d1918	enrich: logo enrichment for JP custodians (1350 processed, 10746 remaining)	2025-12-23 20:56:21 +01:00
kempersc	54869589d1	fix(linkml-viewer): 3D cube visualization bugs - prevent click-to-filter and parse JSON custodian_types - Remove onFaceClick prop from CustodianTypeIndicator3D in class/slot/enum detail views to prevent accidental filtering when clicking decorative cubes (Bug 3) - Add parseCustodianTypesAnnotation() helper to handle JSON-stringified arrays like '["A"]' in YAML annotations, fixing Bug 2 where all 19 letters appeared on every cube - Legend bar retains onTypeClick for intentional filtering functionality	2025-12-23 20:40:32 +01:00
kempersc	5e8a432ef0	enrich japanese and dutch custodians	2025-12-23 18:08:45 +01:00
kempersc	a1fb6344e7	enriching custodian data	2025-12-23 17:26:29 +01:00
kempersc	0c1d19e98b	enrich entries	2025-12-23 13:27:35 +01:00
kempersc	879cddc47e	fix(rag): update HeritageSPARQLGenerator with correct ontology - Use hc: <https://w3id.org/heritage/custodian/> prefix - Use hc:institutionType with single-letter codes (M, L, A, etc.) - Use Wikidata URIs for countries (Q55=NL, Q31=BE, etc.) - Update all SPARQL examples to use correct ontology - Align with actual RDF data in Oxigraph	2025-12-22 22:32:08 +01:00
kempersc	0860f6094d	fix(sparql): correct ontology in dspy_sparql.py to match actual RDF data - Use crm:E39_Actor instead of glam:HeritageCustodian - Use hc:institutionType with single-letter codes (M, L, A, etc.) - Use Wikidata URIs for countries (Q55=NL, Q31=BE, etc.) - Use skos:prefLabel for institution names - Update ONTOLOGY_CONTEXT with correct examples	2025-12-22 22:22:07 +01:00
kempersc	8e97a7beca	fix(rag): correct SPARQL ontology prefixes for LinkML schema - Update HeritageSPARQLGenerator docstring with correct prefixes - Change main class from hc:Custodian to crm:E39_Actor - Change type property from hcp:institutionType to org:classification - Update type values from single letters to full names (MUSEUM, ARCHIVE, etc.) - Add rate limit handling with exponential backoff for 429 errors - Fix test_live_rag.py sample queries to use correct ontology - Update optimized_models instructions with correct prefixes	2025-12-22 21:31:08 +01:00
kempersc	7a056fa746	enrich entries	2025-12-21 22:12:34 +01:00
kempersc	aca68ea47f	remove a,bihguous web-claims	2025-12-21 00:01:54 +01:00
kempersc	23b1d8ee5f	clean up GHCID	2025-12-17 11:58:40 +01:00

1 2 3 4

194 commits