kempersc/glam - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
kempersc	99dc608826	Refactor RAG to template-based SPARQL generation Major architectural changes based on Formica et al. (2023) research: - Add TemplateClassifier for deterministic SPARQL template matching - Add SlotExtractor with synonym resolution for slot values - Add TemplateInstantiator using Jinja2 for query rendering - Refactor dspy_heritage_rag.py to use template system - Update main.py with streamlined pipeline - Fix semantic_router.py ordering issues - Add comprehensive metrics tracking Template-based approach achieves 65% precision vs 10% LLM-only per Formica et al. research on SPARQL generation.	2026-01-07 22:04:43 +01:00
kempersc	9b769f1ca2	Update manifest timestamp and minor class fixes	2026-01-07 22:04:29 +01:00
kempersc	181991940f	Add new LinkML schema modules for specificity and Wikidata alignment New classes: - SpecificityAnnotation: Track class relevance scores per template - TemplateSpecificityScores: 10 conversation template scores - WikidataAlignment: Link classes to Wikidata entities - DualClassLink: Model dual-aspect class relationships New enums: - WikidataMappingTypeEnum: exact/close/narrow/broad/related mappings - DualClassPatternEnum: place-custodian, collection-custodian patterns New slots (44 files): - Specificity slots: score, rationale, agent, timestamp - Template scores: archive_search, museum_search, library_search, etc. - Wikidata slots: entity_id, label, mapping_type, rationale - Multilingual labels: label_nl, label_de, label_fr, label_es, etc. - Custodian type annotations: custodian_types, rationale, primary - SKOS hierarchy: skos_broader, skos_narrower, skos_related	2026-01-07 22:03:58 +01:00
kempersc	7f792d0250	Remove outdated RDF schema files Clean up old generated RDF/OWL files that have been superseded by the clean schema deployed to Oxigraph (commit `f8421a2903`). Only the current deployed schema should be regenerated as needed.	2026-01-07 22:03:26 +01:00
kempersc	81da4ede50	Add comprehensive slot visualization to LinkML viewer - Add standalone Slots section in visual view alongside Classes and Enums - Display slot_uri, range, identifier badge, description, pattern - Show examples with value/description pairs - Color-coded SKOS mapping tags (exact/close/narrow/broad/related) - Yellow highlighted comments section - Custodian type filtering works with slots - Shared renderSlotDetails() function for consistency	2026-01-07 22:03:08 +01:00
kempersc	f8421a2903	Deploy clean RDF schema to Oxigraph - 288,857 triples with 0 malformed URIs - Archive 48 old timestamped RDF files - Fix relative IRIs by adding hc: namespace prefix - Fix file path references in seeAlso predicates - Deployed to sparql.glam-ontology.org triplestore - 11,250 distinct OWL classes - Schema includes all base ontologies (CIDOC-CRM, RiC-O, CPOV, etc.)	2026-01-07 16:53:01 +01:00
kempersc	d19822f958	Remove redundant sections from class descriptions - Created cleanup_class_descriptions_v2.py script using text-based regex - Removed 134 class files' redundant sections: - dual_class_pattern: 80 occurrences - ontological_alignment: 35 occurrences - ontology_alignment_upper: 33 occurrences - multilingual_labels: 26 occurrences - glamorcubes_category: 6 occurrences - example_structure: 6 occurrences - Fixed ArchiveOrganizationType.yaml parse error after cleanup - Added 49 new slot definition files - All 395 class files validate as correct YAML - Deployed to bronhouder.nl/linkml	2026-01-07 13:50:14 +01:00
kempersc	dfa667c90f	Fix LinkML schema for valid RDF generation with proper slot_uri Summary: - Create 46 missing slot definition files with proper slot_uri values - Add slot imports to main schema (01_custodian_name_modular.yaml) - Fix YAML examples sections in 116+ class and slot files - Fix PersonObservation.yaml examples section (nested objects → string literals) Technical changes: - All slots now have explicit slot_uri mapping to base ontologies (RiC-O, Schema.org, SKOS) - Eliminates malformed URIs like 'custodian/:slot_name' in generated RDF - gen-owl now produces valid Turtle with 153,166 triples New slot files (46): - RiC-O slots: rico_note, rico_organizational_principle, rico_has_or_had_holder, etc. - Scope slots: scope_includes, scope_excludes, archive_scope - Organization slots: organization_type, governance_authority, area_served - Platform slots: platform_type_category, portal_type_category - Social media slots: social_media_platform_category, post_type_* - Type hierarchy slots: broader_type, narrower_types, custodian_type_broader - Wikidata slots: wikidata_equivalent, wikidata_mapping Generated output: - schemas/20251121/rdf/01_custodian_name_modular_20260107_134534_clean.owl.ttl (6.9MB) - Validated with rdflib: 153,166 triples, no malformed URIs	2026-01-07 13:48:03 +01:00
kempersc	98c42bf272	Fix LinkML URI conflicts and generate RDF outputs - Fix scope_note → finding_aid_scope_note in FindingAid.yaml - Remove duplicate wikidata_entity slot from CustodianType.yaml (import instead) - Remove duplicate rico_record_set_type from class_metadata_slots.yaml - Fix range types for equals_string compatibility (uriorcurie → string) - Move class names from close_mappings to see_also in 10 RecordSetTypes files - Generate all RDF formats: OWL, N-Triples, RDF/XML, N3, JSON-LD context - Sync schemas to frontend/public/schemas/ Files: 1,151 changed (includes prior CustodianType migration)	2026-01-07 12:32:59 +01:00
kempersc	6c6810fa43	Replace CustodianTypeCodeEnum with CustodianType class references - Remove deprecated CustodianTypeCodeEnum from class_metadata_slots.yaml - Update custodian_types slot to use uriorcurie range (references CustodianType subclasses) - Update custodian_types_primary slot similarly - Add migration note for legacy string format ['A'] vs new URI format Per Rule 9: Enum-to-Class Promotion - Single Source of Truth	2026-01-06 12:37:40 +01:00
kempersc	b34992b1d3	Migrate all 293 class files to ontology-aligned slots Extends migration to all class types (museums, libraries, galleries, etc.) New slots added to class_metadata_slots.yaml: - RiC-O: rico_record_set_type, rico_organizational_principle, rico_has_or_had_holder, rico_note - Multilingual: label_de, label_es, label_fr, label_nl, label_it, label_pt - Scope: scope_includes, scope_excludes, custodian_only, organizational_level, geographic_restriction - Notes: privacy_note, preservation_note, legal_note Migration script now handles 30+ annotation types. All migrated schemas pass linkml-validate. Total: 387 class files now use proper slots instead of annotations.	2026-01-06 12:24:54 +01:00
kempersc	aa763dab25	Migrate 94 archive class annotations to ontology-aligned slots - Add migration script: scripts/migrate_annotations_to_slots.py - Convert custodian_types, wikidata, skos_broader, specificity_* annotations - Replace with proper slots mapped to SKOS, PROV-O, RiC-O predicates - Add ../slots/class_metadata_slots import to all migrated files - Remove AcademicArchive_refactored.yaml (main file now migrated) - Sync changes to frontend/public/schemas/ Migration converts: - custodian_types → hc:custodianTypes slot - wikidata/wikidata_label → wikidata_alignment structured slot - skos_broader → skos:broader slot - specificity_* → specificity_annotation structured slot - dual_class_pattern → dual_class_link structured slot - template_specificity → template_specificity slot All 94 migrated schemas pass linkml-validate.	2026-01-06 11:25:37 +01:00
kempersc	f37f5208ca	Copy class metadata slots to frontend public folder for deployment	2026-01-06 11:17:12 +01:00
kempersc	bc562bd68d	Add class metadata slots to replace annotations with ontology-aligned predicates - Add class_metadata_slots.yaml with slots for: - GLAMORCUBESFIXPHDNT custodian type classification (hc:custodianTypes) - Wikidata alignment (wdt:P31, skos:mappingRelation) - SKOS hierarchical relationships (skos:broader, skos:narrower) - Dual-class pattern linking (rdfs:seeAlso) - Specificity scoring for RAG (prov:generatedAtTime, prov:wasAttributedTo) - Collection holdings (rico:isOrWasHolderOf) - Add AcademicArchive_refactored.yaml demonstrating slot-based approach - Add migration guide documenting annotation-to-slot mappings Ontology sources: SKOS, PROV-O, Dublin Core, RiC-O, Wikidata	2026-01-06 11:16:49 +01:00
kempersc	11983014bb	Enhance specificity scoring system integration with existing infrastructure - Updated documentation to clarify integration points with existing components in the RAG pipeline and DSPy framework. - Added detailed mapping of SPARQL templates to context templates for improved specificity filtering. - Implemented wrapper patterns around existing classifiers to extend functionality without duplication. - Introduced new tests for the SpecificityAwareClassifier and SPARQLToContextMapper to ensure proper integration and functionality. - Enhanced the CustodianRDFConverter to include ISO country and subregion codes from GHCID for better geospatial data handling.	2026-01-05 17:37:49 +01:00
kempersc	41d8905661	Fix Turtle parser multi-line string handling for PiCo ontology - Fixed bug where closing triple-quotes (""") would incorrectly re-trigger multi-line string detection, causing subsequent class definitions to be skipped - Added lineToProcess variable to track which portion of line to process after closing a multi-line string, preventing re-detection of opening quotes - Moved UML large diagram confirmation logic from OntologyViewerPage to UMLVisualization component for better encapsulation - PiCo ontology now correctly shows all 8 classes instead of 2 Deployed and verified on https://bronhouder.nl/ontology?ontology=PiCo	2026-01-05 11:25:43 +01:00
kempersc	242bc8bb35	Add new slots for heritage custodian entities - Created deliverables_slot for expected or achieved deliverable outputs. - Introduced event_id_slot for persistent unique event identifiers. - Added follow_up_date_slot for scheduled follow-up action dates. - Implemented object_ref_slot for references to heritage objects. - Established price_slot for price information across entities. - Added price_currency_slot for currency codes in price information. - Created protocol_slot for API protocol specifications. - Introduced provenance_text_slot for full provenance entry text. - Added record_type_slot for classification of record types. - Implemented response_formats_slot for supported API response formats. - Established status_slot for current status of entities or activities. - Added FactualCountDisplay component for displaying count query results. - Introduced ReplyTypeIndicator component for visualizing reply types. - Created approval_date_slot for formal approval dates. - Added authentication_required_slot for API authentication status. - Implemented capacity_items_slot for maximum storage capacity. - Established conservation_lab_slot for conservation laboratory information. - Added cost_usd_slot for API operation costs in USD.	2026-01-05 00:49:05 +01:00
kempersc	89001fbc53	compact header controls on OntologyViewer and QueryBuilder pages	2026-01-04 17:29:34 +01:00
kempersc	eb61f45de2	compact UML controls toolbar to fit single line when sidebar collapsed	2026-01-04 17:21:53 +01:00
kempersc	2dca28d8c1	enrich CH entries with mission statements	2026-01-04 13:12:32 +01:00
kempersc	4f0cafe98a	enrich HC profiles	2026-01-02 02:11:04 +01:00
kempersc	349f31ae6f	enrich custodian profiles	2026-01-02 02:10:18 +01:00
kempersc	aee76fcc7f	backup html content	2025-12-31 02:36:38 +01:00
kempersc	b7701c8a8e	backup person profiles	2025-12-31 00:04:09 +01:00
kempersc	7108cb1483	backup person profiles	2025-12-31 00:00:25 +01:00
kempersc	38dcd2ce9c	Restore YAML files for Museum Dokkum and Gemeente Smallingerland with enriched data and provenance tracking	2025-12-30 23:58:21 +01:00
kempersc	1d8fd68e3a	backup custodian web profiles	2025-12-30 23:53:16 +01:00
kempersc	f6a5962c3b	backup person profiles	2025-12-30 23:48:50 +01:00
kempersc	cbf88d2a6d	backup person profiles	2025-12-30 23:44:57 +01:00
kempersc	30b701a5ec	backup HC data	2025-12-30 23:41:15 +01:00
kempersc	c417d0c758	Refactor code structure for improved readability and maintainability	2025-12-30 23:38:18 +01:00
kempersc	fb0daab718	backup JP profiles	2025-12-30 23:24:30 +01:00
kempersc	b42d6bf5d2	backup CZ and JP	2025-12-30 23:19:38 +01:00
kempersc	45e873ec0a	enrich JP BE AR profiles	2025-12-30 23:07:03 +01:00
kempersc	bc6ad46bfa	enrich CZ and JP profiles	2025-12-30 23:03:03 +01:00
kempersc	90b402dba6	enrich AR en Czech files	2025-12-30 23:01:01 +01:00
kempersc	f753d7277f	Add country code extraction for location validation in Google Places API	2025-12-30 03:45:29 +01:00
kempersc	cefc847056	Remove custodian entry for Leica AG from YAML file	2025-12-30 03:44:25 +01:00
kempersc	9159ff35db	Add custodian entry for Leica AG with data contamination fixes and location corrections	2025-12-30 03:43:47 +01:00
kempersc	d64f857aa9	add sparql validator and RAG injector	2025-12-30 03:43:31 +01:00
kempersc	84904e344b	Make AGENTS more succint by referring to opencode rules & enrich custodians	2025-12-28 14:56:35 +01:00
kempersc	4cf3fe8a07	Logo enrichment batch: JP+170 (5,166/12,096 = 42.7%) - 14,503 total (45.6%)	2025-12-27 13:17:40 +01:00
kempersc	3447a9cc6c	Logo enrichment batch: JP+440 (4,996/12,096 = 41.3%) - 14,333 total (45.1%)	2025-12-27 12:20:53 +01:00
kempersc	cdb633b0c9	enrich custodian entries with logo	2025-12-27 02:15:17 +01:00
kempersc	fd91fec63f	Logo enrichment batch: JP+320, 13,603 total (42.8%) - JP: 4,516/12,096 (37.4%) ✅ NEW COMMIT - CZ: 3,820/8,432 (45.3%) - batches 7-16 running - CH, NL, BE, AT, BR: 100% complete - Total: 13,603/31,772 (42.8%) - Using crawl4ai favicon extraction	2025-12-26 23:25:40 +01:00
kempersc	2104a90f22	Logo enrichment COMPLETE: CZ 3,820 (45.3%) - CZ: 3,820/8,432 files processed (45.3%) - 9 parallel batches completed (500 files each) - NL person entities added (4 staff profiles) - scripts/discover_websites_crawl4ai.py modified - Using crawl4ai favicon extraction	2025-12-26 21:45:14 +01:00
kempersc	6af5009444	enrich entries	2025-12-26 21:41:18 +01:00
kempersc	ca219340f2	enrich entries	2025-12-26 14:30:31 +01:00
kempersc	59963c8d3f	Logo enrichment batch: JP+300, CZ-0 - 12,833 files (40.4%) - JP: 4,496 processed (37.2% of 12,096) ✅ COMPLETE - CZ: 2,820 processed (33.4% of 8,432) - batch completed, slight decrease - CH, NL, BE, AT, BR: 100% complete - Total: 12,833 of 31,772 files (40.4%) - Using crawl4ai favicon extraction	2025-12-26 13:42:21 +01:00
kempersc	fb7993e3af	fix: filter DSPy field markers from streaming output Implements a state machine to filter streaming tokens: - Only stream tokens from the 'answer' field to the frontend - Skip tokens from 'reasoning', 'citations', 'confidence', 'follow_up' fields - Remove DSPy field markers like '[[ ## answer ## ]]' from streamed content This fixes the issue where raw DSPy signature field markers were being displayed in the chat interface instead of clean answer text.	2025-12-26 03:11:44 +01:00

1 2 3 4 5

215 commits