kempersc/glam - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
kempersc	855fff5962	data(person): resolve PPID locations and enrich profiles - Rename 512 person files from XX-XX-XXX placeholders to proper GeoNames locations - Update 2,463 profiles with enriched data - Add 512 new person profiles (AU, international heritage professionals) - PPID format: ID_{birth-loc}_{decade}_{work-loc}_{custodian}_{NAME}	2026-01-09 21:09:28 +01:00
kempersc	eb122e2532	data(custodian): remove 380 PENDING files after collision merge PENDING files were merged into existing custodian records in commit `eaf80ec`. These temporary collision placeholder files are no longer needed.	2026-01-09 21:06:22 +01:00
kempersc	97f85e0050	deps(archief-assistent): add playwright for E2E testing - Add @playwright/test as dev dependency - Alphabetize dependencies list	2026-01-09 21:06:12 +01:00
kempersc	f7bd3e9edc	feat(linkml-viewer): add slot_usage side-by-side comparison view - Add 'Compare' toggle button next to slots with slot_usage overrides - Show generic slot definition vs class-specific override in 3-column grid - Highlight changed properties with green 'changed' badge - Display '(inherited)' when override matches generic definition - Display '(not defined)' when generic has no value for property - Compare: range, description, required, multivalued, slot_uri, pattern, identifier - Full i18n support (Dutch/English translations) - Responsive design: stacks vertically on mobile (<640px)	2026-01-09 21:02:14 +01:00
kempersc	9e67d0f967	enrich profiles	2026-01-09 20:35:19 +01:00
kempersc	12fed83d6e	fix(rag): preserve count value for COUNT queries in non-streaming endpoint - Detect COUNT queries by checking for 'count' key in SPARQL results - Skip institution transformation for COUNT queries to preserve count value - Fixes bug where 'Hoeveel archieven in Utrecht?' returned 1 instead of 10 - COUNT queries now correctly extract integer count from SPARQL response	2026-01-09 18:57:40 +01:00
kempersc	8a7ed757b8	fix(rag): use SPARQL results for COUNT queries in streaming fast-path - Fix bug where COUNT queries showed Qdrant result count (10) instead of actual SPARQL count (e.g., 204 musea in Noord-Holland) - Use sparql_results for count extraction in factual query fast-path - Also fix fallback COUNT/LIST handling to use sparql_results	2026-01-09 18:47:56 +01:00
kempersc	eaf80ec756	data(custodian): merge PENDING collision files into existing custodians Merge staff data from 7 PENDING files into their matching custodian records: - NL-XX-XXX-PENDING-SPOT-GRONINGEN → NL-GR-GRO-M-SG (SPOT Groningen, 120 staff) - NL-XX-XXX-PENDING-DIENST-UITVOERING-ONDERWIJS → NL-GR-GRO-O-DUO - NL-XX-XXX-PENDING-ANNE-FRANK-STICHTING → NL-NH-AMS-M-AFS - NL-XX-XXX-PENDING-ALLARD-PIERSON → NL-NH-AMS-M-AP - NL-XX-XXX-PENDING-STICHTING-JOODS-HISTORISCH-MUSEUM → NL-NH-AMS-M-JHM - NL-XX-XXX-PENDING-MINISTERIE-VAN-BUITENLANDSE-ZAKEN → NL-ZH-DHA-O-MBZ - NL-XX-XXX-PENDING-MINISTERIE-VAN-JUSTITIE-EN-VEILIGHEID → NL-ZH-DHA-O-MJV Originals archived in data/custodian/archive/pending_collisions_20250109/ Add scripts/merge_collision_files.py for reproducible merging	2026-01-09 18:33:00 +01:00
kempersc	e9c9aefc37	data(person): regenerate PPIDs with unidecode support for non-Latin scripts - Add display_name and name_romanized fields to all 7948 person profiles - Resolve UNKNOWN-UNKNOWN collision group (Hebrew/Arabic names now properly romanize) - Hebrew names like אבישי דנינו now generate PPID AVISHI-DANINO instead of UNKNOWN-UNKNOWN - Collision count reduced from 82 to 81 groups Regenerated using generate_ppids.py with unidecode support (commit `abe30cb`)	2026-01-09 18:31:53 +01:00
kempersc	04791a7a91	fix(ppid): fix unidecode import reference typo	2026-01-09 18:29:36 +01:00
kempersc	c45367c60f	data(custodian): resolve more PENDING files with proper GHCIDs Additional batch of PENDING file resolutions: - DK: Aalborg Teater - FR: Airborne Museum, ALCA Nouvelle-Aquitaine - NL: 12 institutions (CODA Apeldoorn, Airborne Museum Arnhem, etc.) - SA: Saudi Arabia Ministry of Culture Files renamed from NL-XX-XXX-PENDING-* to proper country/region codes.	2026-01-09 18:29:09 +01:00
kempersc	abe30cb302	feat(ppid): add unidecode support for non-Latin script transliteration Add optional unidecode dependency to handle Hebrew, Arabic, Chinese, and other non-Latin scripts when generating Person Persistent IDs.	2026-01-09 18:28:41 +01:00
kempersc	932ec5438c	add person profiles with PPID	2026-01-09 18:26:58 +01:00
kempersc	c0d31b3905	fix(rag): add fallback imports for semantic_router and temporal_intent Support both relative and absolute imports for running as module or script.	2026-01-09 18:26:40 +01:00
kempersc	bd06e4f864	data(custodian): merge 135 PENDING files into existing enriched records Merge data from PENDING files (with XX-XXX placeholders) into their corresponding enriched custodian records with proper GHCIDs. Countries affected: - DE: 4 institutions (Deutsche Stiftung, Jewish Museum Berlin, etc.) - ES: 1 institution (Biblioteca Nacional de España) - FR: 1 institution (NMO) - ID: 18 Indonesian museums and archives - NL: 111 Dutch institutions across all provinces - US: 1 institution (ARCA) The PENDING files are deleted after merge; originals archived in data/custodian/archive/pending_merged_20250109/	2026-01-09 18:25:56 +01:00
kempersc	1ad717767a	feat(linkml-viewer): add visual indicators for slot_usage overrides - Add green 'slot_usage' badge for slots with class-specific overrides - Add ✦ markers next to properties that are overridden vs inherited - Add green left border styling for slots with slot_usage - Add i18n translations (nl/en) for override indicators - Merge generic slot definitions with class-specific slot_usage properties This helps users understand which slot properties come from the generic slot definition vs which are overridden at the class level via slot_usage.	2026-01-09 18:23:21 +01:00
kempersc	7ec4e05dd4	feat(merge): add script to merge PENDING files by matching emic names with existing files	2026-01-09 16:42:55 +01:00
kempersc	7f53ec6074	docs(person_pid): add PPID-GHCID alignment and PiCo comparison docs	2026-01-09 15:57:26 +01:00
kempersc	a51c8c400c	data(pending): add 125 international PENDING custodian files with proper country codes Identified 125 institutions from LinkedIn staff extraction that are NOT Dutch: - FR: 45 (French museums, archives, libraries) - ID: 14 (Indonesian institutions) - GB: 14 (British institutions) - DE: 13 (German museums, foundations) - BE: 11 (Belgian museums) - IT: 6 (Italian institutions) - AU: 6 (Australian archives, museums) - Plus smaller counts from IN, US, ES, CH, DK, AT, SA, NO, IL These files have staff data from LinkedIn company pages but need GHCID resolution (currently XX-XXX placeholders for region/city). Dutch PENDING files remain: 1,283	2026-01-09 15:55:31 +01:00
kempersc	ce66a294e5	fix(rag): transform SPARQL results to match frontend metadata format for map coordinates - Convert flat SPARQL results {lat, lon} to nested {metadata: {latitude, longitude}} - Parse string coordinates to float values - Add city/country/institution_type from template slots - Enables ChatMapPanel to render map markers correctly	2026-01-09 15:49:18 +01:00
kempersc	14be18e7c4	feat(data): merge staff data from 30 more PENDING files into enriched custodians Batch 2 of PENDING file resolution: - Merged LinkedIn staff data from 30 PENDING files into matching enriched custodians - Archived processed PENDING files to data/custodian/archive/pending_merged_20250109/ - Notable merges: ASML (994 staff), BBB (117), Apenheul (100), BOEI (93) Files merged include: - Corporate: ASML, BOS Foundation, Constructing the Limes - Museums: Allard Pierson, Apenheul, various regional museums - Research: Catholic Documentation Centre, Creating Cultures of Care - Cultural orgs: Cultuur Ondernemen, CultuurOost, CultuurKwadraat This continues the effort to consolidate PENDING files (1283 remaining).	2026-01-09 15:42:32 +01:00
kempersc	5ab9dd8ea2	docs(person_pid): add implementation guidelines and governance docs Add final two chapters of the Person PID (PPID) design document: - 08_implementation_guidelines.md: Database architecture, API design, data ingestion pipeline, GHCID integration, security, performance, technology stack, deployment, and monitoring specifications - 09_governance_and_sustainability.md: Data governance policies, quality assurance, sustainability planning, community engagement, legal considerations, and long-term maintenance strategies	2026-01-09 14:51:57 +01:00
kempersc	1f723fd5d7	feat(data): merge staff data from 35 PENDING files into enriched custodians Merged LinkedIn-extracted staff sections from PENDING files into their corresponding proper GHCID custodian files. This consolidates data from two extraction sources: - Existing enriched files: Google Maps, Museum Register, YouTube, etc. - PENDING files: LinkedIn staff data extraction Files modified: - 28 custodian files enriched with staff data - 35 PENDING files deleted (merged into proper locations) - Originals archived to archive/pending_duplicates_20250109/ Key institutions enriched: - Rijksmuseum (NL-NH-AMS-M-RM) - Stedelijk Museum Amsterdam (NL-NH-AMS-M-SMA) - Amsterdam Museum (NL-NH-AMS-M-AM) - Regionaal Archief Alkmaar (NL-NH-ALK-A-RAA) - Maritiem Museum Rotterdam (NL-ZH-ROT-M-MMR) - And 23 more museums/archives across NL New scripts: - scripts/merge_staff_data.py: Automated staff data merger - scripts/categorize_pending_files.py: PENDING file analysis utility	2026-01-09 14:51:17 +01:00
kempersc	2c2a312e0a	feat(rag): add database routing to 8 more factual query templates Add databases: ["oxigraph"] to skip vector search for deterministic queries: - count_institutions_by_type_location (count) - count_institutions_by_type (aggregation) - find_institutions_by_founding_date (temporal) - find_custodians_by_budget_threshold (financial) - compare_locations (comparative) - find_by_founding (temporal) - events_in_period (temporal events) - institutions_by_founding_decade (temporal aggregation) Total templates with oxigraph-only routing: 12	2026-01-09 12:33:41 +01:00
kempersc	b9c30fc970	feat(rag): extend database routing to count, temporal, and financial templates Add databases: ["oxigraph"] to 5 more templates that don't benefit from vector search: - count_institutions_by_type_location - compare_locations - find_by_founding - find_custodians_by_budget_threshold - find_institutions_by_founding_date Total templates with Oxigraph-only routing: 10	2026-01-09 12:32:28 +01:00
kempersc	17a94613f3	data(custodian): resolve 57 PENDING files to proper GHCID locations Resolved NL-XX-XXX-PENDING files to proper regional GHCIDs: - 57 new files with proper location codes (city, region) - Cities include: Amsterdam, Rotterdam, Utrecht, Leiden, Groningen, etc. - 34 original PENDING files archived to archive/pending_duplicates_20250109/ Examples: - NL-XX-XXX-PENDING-AMSTERDAM-MUSEUM → NL-NH-AMS-M-AM (Amsterdam Museum) - NL-XX-XXX-PENDING-GRONINGEN-MUSEUM → NL-GR-GRO-M-GM (Groninger Museum) - NL-XX-XXX-PENDING-KUNSTHAL-ROTTERDAM → NL-ZH-ROT-G-KR (Kunsthal Rotterdam)	2026-01-09 12:19:19 +01:00
kempersc	e313744cf6	feat(scripts): add resolve_pending_locations.py for GHCID resolution Script to resolve NL-XX-XXX-PENDING files that have city names in filename: - Looks up city in GeoNames database - Updates YAML with location data (city, region, country) - Generates proper GHCID with UUID v5/v8 - Renames files to match new GHCID - Archives original PENDING files for reference	2026-01-09 12:18:46 +01:00
kempersc	787f4dacb0	feat(rag): implement database routing in query endpoint Log database routing decisions and add databases_used to response metadata. When template specifies databases: ["oxigraph"], Qdrant vector search is skipped.	2026-01-09 12:15:49 +01:00
kempersc	35a057981c	chore(frontend): sync schema files with custodian_type → has_or_had_custodian_type refactor - Remove deprecated slots: custodian_type.yaml, custodian_types.yaml, custodian_type_broader/narrower/related.yaml, custodian_types_primary/rationale.yaml - Add new unified slot: has_or_had_custodian_type.yaml - Sync all 236+ class files with updated slot references - Update manifest.json	2026-01-09 12:15:32 +01:00
kempersc	76644f55f5	feat(rag): add database routing to geographic query templates Add databases: ["oxigraph"] to 4 geographic templates to skip vector search: - list_institutions_by_type_city - list_institutions_by_type_region - list_institutions_by_type_country - list_institutions_in_city Also add documentation explaining database routing configuration in _metadata.	2026-01-09 11:56:18 +01:00
kempersc	5255128159	fix(data): correct GHCID locations for 4 heritage custodians Location corrections based on GeoNames reverse geocoding: - NL-FR-LAN-S-L → NL-FR-DKN-S-L (Historische Werkgroep Kynhout: De Knipe) - NL-LI-HEE-A-CRGR → NL-LI-MAA-A-CRGR (Centrum Regionale Geschiedenis: Maastricht) - NL-NB-MID-S-M → NL-NB-BER-S-M (Heemkundekring De Plaets: Berlicum) - NL-OV-NIJ-A-GH → NL-OV-HEL-A-GH (Gemeente Hellendoorn: Hellendoorn)	2026-01-09 11:55:08 +01:00
kempersc	e128727b13	fix(data): correct GHCID location for Museumreddingboot Terschelling - Rename NL-FR-HOO-M-MT.yaml → NL-FR-TER-M-MT.yaml - HOO (Hooghalen) → TER (Terschelling) - correct island location - Institution is on Terschelling island, not in Drenthe	2026-01-09 11:54:37 +01:00
kempersc	933deb337c	refactor(scripts): generalize GHCID location fixer for all institution types - Add --type/-t flag to specify institution type (A, G, H, I, L, M, N, O, R, S, T, U, X, ALL) - Default still Type I (Intangible Heritage) for backward compatibility - Skip PENDING files that have no location data - Update help text with all supported types	2026-01-09 11:54:28 +01:00
kempersc	4d5641b6c5	feat(rag): add database routing configuration to templates - Add 'databases' field to TemplateDefinition and TemplateMatchResult - Support values: 'oxigraph' (SPARQL/KG), 'qdrant' (vector search) - Add helper methods use_oxigraph() and use_qdrant() - Default to both databases for backward compatibility - Allows templates to skip vector search for factual/geographic queries	2026-01-09 11:54:17 +01:00
kempersc	c88fd3af70	Refactor code structure for improved readability and maintainability	2026-01-09 11:05:26 +01:00
kempersc	0393b321c9	refactor(schema): unify custodian_type slots into has_or_had_custodian_type (Rule 39, 43) - Migrate 236+ class files from custodian_types to has_or_had_custodian_type - Archive deprecated slots: custodian_type, custodian_types, custodian_type_broader/narrower/related - Update main schema and manifest imports - Fix Custodian.yaml class to use new slot - Fix annotation format (list→scalar) in has_or_had_custodian_type.yaml Rules applied: - Rule 39: RiC-O naming convention (hasOrHad pattern) - Rule 43: Slot nouns must be singular (multivalued:true for cardinality) - Rule 38: Slot centralization with semantic URI	2026-01-09 10:55:21 +01:00
kempersc	508b858e16	docs(Rule 40): Add empirical validation showing 33% Google Maps error rate for Type I Audit of 188 Type I custodian files revealed: - 62 false matches (33%) detected and corrected - Categories: domain mismatch (39), name mismatch (8), wrong location (6), wrong org type (5), different entity (3), different event (3) - Documents why Google Maps fails for intangible heritage: virtual orgs, person-based heritage, volunteer networks, event-based orgs This validates KIEN as TIER_1_AUTHORITATIVE for Type I custodians.	2026-01-08 16:47:17 +01:00
kempersc	6608a207d4	update frontend	2026-01-08 15:56:28 +01:00
kempersc	9d68ed8c2e	fix: mark 15 more Google Maps false matches via comprehensive review Manual review of remaining Type I custodian files without official websites identified additional false matches in these categories: Wrong organization type: - Bird catchers vs bird watchers association - Heritage org vs webshop - Regional org vs specific local entity - Federation vs single member association - Bell ringers org vs church building Wrong location: - Amsterdam org matched to Den Haag - Haarlem org matched to Apeldoorn - Rotterdam org matched to Amstelveen - Dutch org matched to Suriname (!) - Giethoorn event matched to Belt-Schutsloot - Duindorp bonfire matched to Scheveningen Different event/entity: - Horse racing org vs summer festival - Street name vs organization - Heritage foundation vs specific local fair Total Type I false matches fixed: 62 of 188 files (33%)	2026-01-08 15:21:31 +01:00
kempersc	0b0ea75070	feat(rag): add factual query fast path - skip LLM for count/list queries - Add ontology cache warming at startup in lifespan() function - Add is_factual_query() detection in template_sparql.py (12 templates) - Add factual_result and sparql_query fields to DSPyQueryResponse - Skip LLM generation for factual templates (count, list, compare) - Execute SPARQL directly and return results as table (~15s → ~2s latency) - Update ConversationPanel.tsx to render factual results table - Add CSS styling for factual results with green theme For queries like 'hoeveel archieven zijn er in Den Haag', the SPARQL results ARE the answer - no need for expensive LLM prose generation.	2026-01-08 13:34:23 +01:00
kempersc	85d9cee82f	fix: mark 8 more Google Maps false matches detected via name mismatch Additional Type I custodian files with obvious name mismatches between KIEN registry entries and Google Maps results. These couldn't be auto-detected via domain mismatch because they lack official websites. Fixes: - Dick Timmerman (person) → carpentry business - Ria Bos (cigar maker) → money transfer agent - Stichting Kracom (Krampuslauf) → Happy Caps retail - Fed. Nederlandse Vertelorganisaties → NET Foundation - Stichting dodenherdenking Alphen → wrong memorial - Sao Joao Rotterdam → Heemraadsplein (location not org) - sport en spel (heritage) → equipment rental - Eiertikken Ommen → restaurant Also adds detection and fix scripts for Google Maps false matches.	2026-01-08 13:26:53 +01:00
kempersc	b2b21abe2b	fix: mark 39 Google Maps false matches for Type I intangible heritage custodians Per Rule 40 (KIEN authoritative source), Google Maps frequently returns false matches for intangible heritage organizations. These are virtual networks without commercial storefronts. Changes: - Mark google_maps_enrichment.status as FALSE_MATCH - Preserve original data in original_false_match for audit trail - Add correction_timestamp and correction_agent provenance - Special handling for NL-GE-TIE-I-M (Stichting MOZA): also fixed YouTube false match (Mozart channel) and removed ~1750 lines of irrelevant video data Detection method: Domain mismatch between Google Maps website field and official KIEN registry website.	2026-01-08 12:16:39 +01:00
kempersc	53ffed3531	Add TemplateSpecificityScores import to SpecificityAnnotation	2026-01-07 22:05:42 +01:00
kempersc	20374b9032	Archive monolithic class_metadata_slots before modular refactoring	2026-01-07 22:05:21 +01:00
kempersc	30b9cb9d14	Add SOTA analysis and update design pattern documentation - Add prompt-query_template_mapping/SOTA_analysis.md with Formica et al. research - Update GraphRAG design patterns documentation - Update temporal semantic hypergraph documentation	2026-01-07 22:05:01 +01:00
kempersc	99dc608826	Refactor RAG to template-based SPARQL generation Major architectural changes based on Formica et al. (2023) research: - Add TemplateClassifier for deterministic SPARQL template matching - Add SlotExtractor with synonym resolution for slot values - Add TemplateInstantiator using Jinja2 for query rendering - Refactor dspy_heritage_rag.py to use template system - Update main.py with streamlined pipeline - Fix semantic_router.py ordering issues - Add comprehensive metrics tracking Template-based approach achieves 65% precision vs 10% LLM-only per Formica et al. research on SPARQL generation.	2026-01-07 22:04:43 +01:00
kempersc	9b769f1ca2	Update manifest timestamp and minor class fixes	2026-01-07 22:04:29 +01:00
kempersc	181991940f	Add new LinkML schema modules for specificity and Wikidata alignment New classes: - SpecificityAnnotation: Track class relevance scores per template - TemplateSpecificityScores: 10 conversation template scores - WikidataAlignment: Link classes to Wikidata entities - DualClassLink: Model dual-aspect class relationships New enums: - WikidataMappingTypeEnum: exact/close/narrow/broad/related mappings - DualClassPatternEnum: place-custodian, collection-custodian patterns New slots (44 files): - Specificity slots: score, rationale, agent, timestamp - Template scores: archive_search, museum_search, library_search, etc. - Wikidata slots: entity_id, label, mapping_type, rationale - Multilingual labels: label_nl, label_de, label_fr, label_es, etc. - Custodian type annotations: custodian_types, rationale, primary - SKOS hierarchy: skos_broader, skos_narrower, skos_related	2026-01-07 22:03:58 +01:00
kempersc	7f792d0250	Remove outdated RDF schema files Clean up old generated RDF/OWL files that have been superseded by the clean schema deployed to Oxigraph (commit `f8421a2903`). Only the current deployed schema should be regenerated as needed.	2026-01-07 22:03:26 +01:00
kempersc	81da4ede50	Add comprehensive slot visualization to LinkML viewer - Add standalone Slots section in visual view alongside Classes and Enums - Display slot_uri, range, identifier badge, description, pattern - Show examples with value/description pairs - Color-coded SKOS mapping tags (exact/close/narrow/broad/related) - Yellow highlighted comments section - Custodian type filtering works with slots - Shared renderSlotDetails() function for consistency	2026-01-07 22:03:08 +01:00

1 2 3 4 5 ...

260 commits