kempersc/glam - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
kempersc	7ec4e05dd4	feat(merge): add script to merge PENDING files by matching emic names with existing files	2026-01-09 16:42:55 +01:00
kempersc	7f53ec6074	docs(person_pid): add PPID-GHCID alignment and PiCo comparison docs	2026-01-09 15:57:26 +01:00
kempersc	a51c8c400c	data(pending): add 125 international PENDING custodian files with proper country codes Identified 125 institutions from LinkedIn staff extraction that are NOT Dutch: - FR: 45 (French museums, archives, libraries) - ID: 14 (Indonesian institutions) - GB: 14 (British institutions) - DE: 13 (German museums, foundations) - BE: 11 (Belgian museums) - IT: 6 (Italian institutions) - AU: 6 (Australian archives, museums) - Plus smaller counts from IN, US, ES, CH, DK, AT, SA, NO, IL These files have staff data from LinkedIn company pages but need GHCID resolution (currently XX-XXX placeholders for region/city). Dutch PENDING files remain: 1,283	2026-01-09 15:55:31 +01:00
kempersc	ce66a294e5	fix(rag): transform SPARQL results to match frontend metadata format for map coordinates - Convert flat SPARQL results {lat, lon} to nested {metadata: {latitude, longitude}} - Parse string coordinates to float values - Add city/country/institution_type from template slots - Enables ChatMapPanel to render map markers correctly	2026-01-09 15:49:18 +01:00
kempersc	14be18e7c4	feat(data): merge staff data from 30 more PENDING files into enriched custodians Batch 2 of PENDING file resolution: - Merged LinkedIn staff data from 30 PENDING files into matching enriched custodians - Archived processed PENDING files to data/custodian/archive/pending_merged_20250109/ - Notable merges: ASML (994 staff), BBB (117), Apenheul (100), BOEI (93) Files merged include: - Corporate: ASML, BOS Foundation, Constructing the Limes - Museums: Allard Pierson, Apenheul, various regional museums - Research: Catholic Documentation Centre, Creating Cultures of Care - Cultural orgs: Cultuur Ondernemen, CultuurOost, CultuurKwadraat This continues the effort to consolidate PENDING files (1283 remaining).	2026-01-09 15:42:32 +01:00
kempersc	5ab9dd8ea2	docs(person_pid): add implementation guidelines and governance docs Add final two chapters of the Person PID (PPID) design document: - 08_implementation_guidelines.md: Database architecture, API design, data ingestion pipeline, GHCID integration, security, performance, technology stack, deployment, and monitoring specifications - 09_governance_and_sustainability.md: Data governance policies, quality assurance, sustainability planning, community engagement, legal considerations, and long-term maintenance strategies	2026-01-09 14:51:57 +01:00
kempersc	1f723fd5d7	feat(data): merge staff data from 35 PENDING files into enriched custodians Merged LinkedIn-extracted staff sections from PENDING files into their corresponding proper GHCID custodian files. This consolidates data from two extraction sources: - Existing enriched files: Google Maps, Museum Register, YouTube, etc. - PENDING files: LinkedIn staff data extraction Files modified: - 28 custodian files enriched with staff data - 35 PENDING files deleted (merged into proper locations) - Originals archived to archive/pending_duplicates_20250109/ Key institutions enriched: - Rijksmuseum (NL-NH-AMS-M-RM) - Stedelijk Museum Amsterdam (NL-NH-AMS-M-SMA) - Amsterdam Museum (NL-NH-AMS-M-AM) - Regionaal Archief Alkmaar (NL-NH-ALK-A-RAA) - Maritiem Museum Rotterdam (NL-ZH-ROT-M-MMR) - And 23 more museums/archives across NL New scripts: - scripts/merge_staff_data.py: Automated staff data merger - scripts/categorize_pending_files.py: PENDING file analysis utility	2026-01-09 14:51:17 +01:00
kempersc	2c2a312e0a	feat(rag): add database routing to 8 more factual query templates Add databases: ["oxigraph"] to skip vector search for deterministic queries: - count_institutions_by_type_location (count) - count_institutions_by_type (aggregation) - find_institutions_by_founding_date (temporal) - find_custodians_by_budget_threshold (financial) - compare_locations (comparative) - find_by_founding (temporal) - events_in_period (temporal events) - institutions_by_founding_decade (temporal aggregation) Total templates with oxigraph-only routing: 12	2026-01-09 12:33:41 +01:00
kempersc	b9c30fc970	feat(rag): extend database routing to count, temporal, and financial templates Add databases: ["oxigraph"] to 5 more templates that don't benefit from vector search: - count_institutions_by_type_location - compare_locations - find_by_founding - find_custodians_by_budget_threshold - find_institutions_by_founding_date Total templates with Oxigraph-only routing: 10	2026-01-09 12:32:28 +01:00
kempersc	17a94613f3	data(custodian): resolve 57 PENDING files to proper GHCID locations Resolved NL-XX-XXX-PENDING files to proper regional GHCIDs: - 57 new files with proper location codes (city, region) - Cities include: Amsterdam, Rotterdam, Utrecht, Leiden, Groningen, etc. - 34 original PENDING files archived to archive/pending_duplicates_20250109/ Examples: - NL-XX-XXX-PENDING-AMSTERDAM-MUSEUM → NL-NH-AMS-M-AM (Amsterdam Museum) - NL-XX-XXX-PENDING-GRONINGEN-MUSEUM → NL-GR-GRO-M-GM (Groninger Museum) - NL-XX-XXX-PENDING-KUNSTHAL-ROTTERDAM → NL-ZH-ROT-G-KR (Kunsthal Rotterdam)	2026-01-09 12:19:19 +01:00
kempersc	e313744cf6	feat(scripts): add resolve_pending_locations.py for GHCID resolution Script to resolve NL-XX-XXX-PENDING files that have city names in filename: - Looks up city in GeoNames database - Updates YAML with location data (city, region, country) - Generates proper GHCID with UUID v5/v8 - Renames files to match new GHCID - Archives original PENDING files for reference	2026-01-09 12:18:46 +01:00
kempersc	787f4dacb0	feat(rag): implement database routing in query endpoint Log database routing decisions and add databases_used to response metadata. When template specifies databases: ["oxigraph"], Qdrant vector search is skipped.	2026-01-09 12:15:49 +01:00
kempersc	35a057981c	chore(frontend): sync schema files with custodian_type → has_or_had_custodian_type refactor - Remove deprecated slots: custodian_type.yaml, custodian_types.yaml, custodian_type_broader/narrower/related.yaml, custodian_types_primary/rationale.yaml - Add new unified slot: has_or_had_custodian_type.yaml - Sync all 236+ class files with updated slot references - Update manifest.json	2026-01-09 12:15:32 +01:00
kempersc	76644f55f5	feat(rag): add database routing to geographic query templates Add databases: ["oxigraph"] to 4 geographic templates to skip vector search: - list_institutions_by_type_city - list_institutions_by_type_region - list_institutions_by_type_country - list_institutions_in_city Also add documentation explaining database routing configuration in _metadata.	2026-01-09 11:56:18 +01:00
kempersc	5255128159	fix(data): correct GHCID locations for 4 heritage custodians Location corrections based on GeoNames reverse geocoding: - NL-FR-LAN-S-L → NL-FR-DKN-S-L (Historische Werkgroep Kynhout: De Knipe) - NL-LI-HEE-A-CRGR → NL-LI-MAA-A-CRGR (Centrum Regionale Geschiedenis: Maastricht) - NL-NB-MID-S-M → NL-NB-BER-S-M (Heemkundekring De Plaets: Berlicum) - NL-OV-NIJ-A-GH → NL-OV-HEL-A-GH (Gemeente Hellendoorn: Hellendoorn)	2026-01-09 11:55:08 +01:00
kempersc	e128727b13	fix(data): correct GHCID location for Museumreddingboot Terschelling - Rename NL-FR-HOO-M-MT.yaml → NL-FR-TER-M-MT.yaml - HOO (Hooghalen) → TER (Terschelling) - correct island location - Institution is on Terschelling island, not in Drenthe	2026-01-09 11:54:37 +01:00
kempersc	933deb337c	refactor(scripts): generalize GHCID location fixer for all institution types - Add --type/-t flag to specify institution type (A, G, H, I, L, M, N, O, R, S, T, U, X, ALL) - Default still Type I (Intangible Heritage) for backward compatibility - Skip PENDING files that have no location data - Update help text with all supported types	2026-01-09 11:54:28 +01:00
kempersc	4d5641b6c5	feat(rag): add database routing configuration to templates - Add 'databases' field to TemplateDefinition and TemplateMatchResult - Support values: 'oxigraph' (SPARQL/KG), 'qdrant' (vector search) - Add helper methods use_oxigraph() and use_qdrant() - Default to both databases for backward compatibility - Allows templates to skip vector search for factual/geographic queries	2026-01-09 11:54:17 +01:00
kempersc	c88fd3af70	Refactor code structure for improved readability and maintainability	2026-01-09 11:05:26 +01:00
kempersc	0393b321c9	refactor(schema): unify custodian_type slots into has_or_had_custodian_type (Rule 39, 43) - Migrate 236+ class files from custodian_types to has_or_had_custodian_type - Archive deprecated slots: custodian_type, custodian_types, custodian_type_broader/narrower/related - Update main schema and manifest imports - Fix Custodian.yaml class to use new slot - Fix annotation format (list→scalar) in has_or_had_custodian_type.yaml Rules applied: - Rule 39: RiC-O naming convention (hasOrHad pattern) - Rule 43: Slot nouns must be singular (multivalued:true for cardinality) - Rule 38: Slot centralization with semantic URI	2026-01-09 10:55:21 +01:00
kempersc	508b858e16	docs(Rule 40): Add empirical validation showing 33% Google Maps error rate for Type I Audit of 188 Type I custodian files revealed: - 62 false matches (33%) detected and corrected - Categories: domain mismatch (39), name mismatch (8), wrong location (6), wrong org type (5), different entity (3), different event (3) - Documents why Google Maps fails for intangible heritage: virtual orgs, person-based heritage, volunteer networks, event-based orgs This validates KIEN as TIER_1_AUTHORITATIVE for Type I custodians.	2026-01-08 16:47:17 +01:00
kempersc	6608a207d4	update frontend	2026-01-08 15:56:28 +01:00
kempersc	9d68ed8c2e	fix: mark 15 more Google Maps false matches via comprehensive review Manual review of remaining Type I custodian files without official websites identified additional false matches in these categories: Wrong organization type: - Bird catchers vs bird watchers association - Heritage org vs webshop - Regional org vs specific local entity - Federation vs single member association - Bell ringers org vs church building Wrong location: - Amsterdam org matched to Den Haag - Haarlem org matched to Apeldoorn - Rotterdam org matched to Amstelveen - Dutch org matched to Suriname (!) - Giethoorn event matched to Belt-Schutsloot - Duindorp bonfire matched to Scheveningen Different event/entity: - Horse racing org vs summer festival - Street name vs organization - Heritage foundation vs specific local fair Total Type I false matches fixed: 62 of 188 files (33%)	2026-01-08 15:21:31 +01:00
kempersc	0b0ea75070	feat(rag): add factual query fast path - skip LLM for count/list queries - Add ontology cache warming at startup in lifespan() function - Add is_factual_query() detection in template_sparql.py (12 templates) - Add factual_result and sparql_query fields to DSPyQueryResponse - Skip LLM generation for factual templates (count, list, compare) - Execute SPARQL directly and return results as table (~15s → ~2s latency) - Update ConversationPanel.tsx to render factual results table - Add CSS styling for factual results with green theme For queries like 'hoeveel archieven zijn er in Den Haag', the SPARQL results ARE the answer - no need for expensive LLM prose generation.	2026-01-08 13:34:23 +01:00
kempersc	85d9cee82f	fix: mark 8 more Google Maps false matches detected via name mismatch Additional Type I custodian files with obvious name mismatches between KIEN registry entries and Google Maps results. These couldn't be auto-detected via domain mismatch because they lack official websites. Fixes: - Dick Timmerman (person) → carpentry business - Ria Bos (cigar maker) → money transfer agent - Stichting Kracom (Krampuslauf) → Happy Caps retail - Fed. Nederlandse Vertelorganisaties → NET Foundation - Stichting dodenherdenking Alphen → wrong memorial - Sao Joao Rotterdam → Heemraadsplein (location not org) - sport en spel (heritage) → equipment rental - Eiertikken Ommen → restaurant Also adds detection and fix scripts for Google Maps false matches.	2026-01-08 13:26:53 +01:00
kempersc	b2b21abe2b	fix: mark 39 Google Maps false matches for Type I intangible heritage custodians Per Rule 40 (KIEN authoritative source), Google Maps frequently returns false matches for intangible heritage organizations. These are virtual networks without commercial storefronts. Changes: - Mark google_maps_enrichment.status as FALSE_MATCH - Preserve original data in original_false_match for audit trail - Add correction_timestamp and correction_agent provenance - Special handling for NL-GE-TIE-I-M (Stichting MOZA): also fixed YouTube false match (Mozart channel) and removed ~1750 lines of irrelevant video data Detection method: Domain mismatch between Google Maps website field and official KIEN registry website.	2026-01-08 12:16:39 +01:00
kempersc	53ffed3531	Add TemplateSpecificityScores import to SpecificityAnnotation	2026-01-07 22:05:42 +01:00
kempersc	20374b9032	Archive monolithic class_metadata_slots before modular refactoring	2026-01-07 22:05:21 +01:00
kempersc	30b9cb9d14	Add SOTA analysis and update design pattern documentation - Add prompt-query_template_mapping/SOTA_analysis.md with Formica et al. research - Update GraphRAG design patterns documentation - Update temporal semantic hypergraph documentation	2026-01-07 22:05:01 +01:00
kempersc	99dc608826	Refactor RAG to template-based SPARQL generation Major architectural changes based on Formica et al. (2023) research: - Add TemplateClassifier for deterministic SPARQL template matching - Add SlotExtractor with synonym resolution for slot values - Add TemplateInstantiator using Jinja2 for query rendering - Refactor dspy_heritage_rag.py to use template system - Update main.py with streamlined pipeline - Fix semantic_router.py ordering issues - Add comprehensive metrics tracking Template-based approach achieves 65% precision vs 10% LLM-only per Formica et al. research on SPARQL generation.	2026-01-07 22:04:43 +01:00
kempersc	9b769f1ca2	Update manifest timestamp and minor class fixes	2026-01-07 22:04:29 +01:00
kempersc	181991940f	Add new LinkML schema modules for specificity and Wikidata alignment New classes: - SpecificityAnnotation: Track class relevance scores per template - TemplateSpecificityScores: 10 conversation template scores - WikidataAlignment: Link classes to Wikidata entities - DualClassLink: Model dual-aspect class relationships New enums: - WikidataMappingTypeEnum: exact/close/narrow/broad/related mappings - DualClassPatternEnum: place-custodian, collection-custodian patterns New slots (44 files): - Specificity slots: score, rationale, agent, timestamp - Template scores: archive_search, museum_search, library_search, etc. - Wikidata slots: entity_id, label, mapping_type, rationale - Multilingual labels: label_nl, label_de, label_fr, label_es, etc. - Custodian type annotations: custodian_types, rationale, primary - SKOS hierarchy: skos_broader, skos_narrower, skos_related	2026-01-07 22:03:58 +01:00
kempersc	7f792d0250	Remove outdated RDF schema files Clean up old generated RDF/OWL files that have been superseded by the clean schema deployed to Oxigraph (commit `f8421a2903`). Only the current deployed schema should be regenerated as needed.	2026-01-07 22:03:26 +01:00
kempersc	81da4ede50	Add comprehensive slot visualization to LinkML viewer - Add standalone Slots section in visual view alongside Classes and Enums - Display slot_uri, range, identifier badge, description, pattern - Show examples with value/description pairs - Color-coded SKOS mapping tags (exact/close/narrow/broad/related) - Yellow highlighted comments section - Custodian type filtering works with slots - Shared renderSlotDetails() function for consistency	2026-01-07 22:03:08 +01:00
kempersc	f8421a2903	Deploy clean RDF schema to Oxigraph - 288,857 triples with 0 malformed URIs - Archive 48 old timestamped RDF files - Fix relative IRIs by adding hc: namespace prefix - Fix file path references in seeAlso predicates - Deployed to sparql.glam-ontology.org triplestore - 11,250 distinct OWL classes - Schema includes all base ontologies (CIDOC-CRM, RiC-O, CPOV, etc.)	2026-01-07 16:53:01 +01:00
kempersc	d19822f958	Remove redundant sections from class descriptions - Created cleanup_class_descriptions_v2.py script using text-based regex - Removed 134 class files' redundant sections: - dual_class_pattern: 80 occurrences - ontological_alignment: 35 occurrences - ontology_alignment_upper: 33 occurrences - multilingual_labels: 26 occurrences - glamorcubes_category: 6 occurrences - example_structure: 6 occurrences - Fixed ArchiveOrganizationType.yaml parse error after cleanup - Added 49 new slot definition files - All 395 class files validate as correct YAML - Deployed to bronhouder.nl/linkml	2026-01-07 13:50:14 +01:00
kempersc	dfa667c90f	Fix LinkML schema for valid RDF generation with proper slot_uri Summary: - Create 46 missing slot definition files with proper slot_uri values - Add slot imports to main schema (01_custodian_name_modular.yaml) - Fix YAML examples sections in 116+ class and slot files - Fix PersonObservation.yaml examples section (nested objects → string literals) Technical changes: - All slots now have explicit slot_uri mapping to base ontologies (RiC-O, Schema.org, SKOS) - Eliminates malformed URIs like 'custodian/:slot_name' in generated RDF - gen-owl now produces valid Turtle with 153,166 triples New slot files (46): - RiC-O slots: rico_note, rico_organizational_principle, rico_has_or_had_holder, etc. - Scope slots: scope_includes, scope_excludes, archive_scope - Organization slots: organization_type, governance_authority, area_served - Platform slots: platform_type_category, portal_type_category - Social media slots: social_media_platform_category, post_type_* - Type hierarchy slots: broader_type, narrower_types, custodian_type_broader - Wikidata slots: wikidata_equivalent, wikidata_mapping Generated output: - schemas/20251121/rdf/01_custodian_name_modular_20260107_134534_clean.owl.ttl (6.9MB) - Validated with rdflib: 153,166 triples, no malformed URIs	2026-01-07 13:48:03 +01:00
kempersc	98c42bf272	Fix LinkML URI conflicts and generate RDF outputs - Fix scope_note → finding_aid_scope_note in FindingAid.yaml - Remove duplicate wikidata_entity slot from CustodianType.yaml (import instead) - Remove duplicate rico_record_set_type from class_metadata_slots.yaml - Fix range types for equals_string compatibility (uriorcurie → string) - Move class names from close_mappings to see_also in 10 RecordSetTypes files - Generate all RDF formats: OWL, N-Triples, RDF/XML, N3, JSON-LD context - Sync schemas to frontend/public/schemas/ Files: 1,151 changed (includes prior CustodianType migration)	2026-01-07 12:32:59 +01:00
kempersc	6c6810fa43	Replace CustodianTypeCodeEnum with CustodianType class references - Remove deprecated CustodianTypeCodeEnum from class_metadata_slots.yaml - Update custodian_types slot to use uriorcurie range (references CustodianType subclasses) - Update custodian_types_primary slot similarly - Add migration note for legacy string format ['A'] vs new URI format Per Rule 9: Enum-to-Class Promotion - Single Source of Truth	2026-01-06 12:37:40 +01:00
kempersc	b34992b1d3	Migrate all 293 class files to ontology-aligned slots Extends migration to all class types (museums, libraries, galleries, etc.) New slots added to class_metadata_slots.yaml: - RiC-O: rico_record_set_type, rico_organizational_principle, rico_has_or_had_holder, rico_note - Multilingual: label_de, label_es, label_fr, label_nl, label_it, label_pt - Scope: scope_includes, scope_excludes, custodian_only, organizational_level, geographic_restriction - Notes: privacy_note, preservation_note, legal_note Migration script now handles 30+ annotation types. All migrated schemas pass linkml-validate. Total: 387 class files now use proper slots instead of annotations.	2026-01-06 12:24:54 +01:00
kempersc	aa763dab25	Migrate 94 archive class annotations to ontology-aligned slots - Add migration script: scripts/migrate_annotations_to_slots.py - Convert custodian_types, wikidata, skos_broader, specificity_* annotations - Replace with proper slots mapped to SKOS, PROV-O, RiC-O predicates - Add ../slots/class_metadata_slots import to all migrated files - Remove AcademicArchive_refactored.yaml (main file now migrated) - Sync changes to frontend/public/schemas/ Migration converts: - custodian_types → hc:custodianTypes slot - wikidata/wikidata_label → wikidata_alignment structured slot - skos_broader → skos:broader slot - specificity_* → specificity_annotation structured slot - dual_class_pattern → dual_class_link structured slot - template_specificity → template_specificity slot All 94 migrated schemas pass linkml-validate.	2026-01-06 11:25:37 +01:00
kempersc	f37f5208ca	Copy class metadata slots to frontend public folder for deployment	2026-01-06 11:17:12 +01:00
kempersc	bc562bd68d	Add class metadata slots to replace annotations with ontology-aligned predicates - Add class_metadata_slots.yaml with slots for: - GLAMORCUBESFIXPHDNT custodian type classification (hc:custodianTypes) - Wikidata alignment (wdt:P31, skos:mappingRelation) - SKOS hierarchical relationships (skos:broader, skos:narrower) - Dual-class pattern linking (rdfs:seeAlso) - Specificity scoring for RAG (prov:generatedAtTime, prov:wasAttributedTo) - Collection holdings (rico:isOrWasHolderOf) - Add AcademicArchive_refactored.yaml demonstrating slot-based approach - Add migration guide documenting annotation-to-slot mappings Ontology sources: SKOS, PROV-O, Dublin Core, RiC-O, Wikidata	2026-01-06 11:16:49 +01:00
kempersc	11983014bb	Enhance specificity scoring system integration with existing infrastructure - Updated documentation to clarify integration points with existing components in the RAG pipeline and DSPy framework. - Added detailed mapping of SPARQL templates to context templates for improved specificity filtering. - Implemented wrapper patterns around existing classifiers to extend functionality without duplication. - Introduced new tests for the SpecificityAwareClassifier and SPARQLToContextMapper to ensure proper integration and functionality. - Enhanced the CustodianRDFConverter to include ISO country and subregion codes from GHCID for better geospatial data handling.	2026-01-05 17:37:49 +01:00
kempersc	41d8905661	Fix Turtle parser multi-line string handling for PiCo ontology - Fixed bug where closing triple-quotes (""") would incorrectly re-trigger multi-line string detection, causing subsequent class definitions to be skipped - Added lineToProcess variable to track which portion of line to process after closing a multi-line string, preventing re-detection of opening quotes - Moved UML large diagram confirmation logic from OntologyViewerPage to UMLVisualization component for better encapsulation - PiCo ontology now correctly shows all 8 classes instead of 2 Deployed and verified on https://bronhouder.nl/ontology?ontology=PiCo	2026-01-05 11:25:43 +01:00
kempersc	242bc8bb35	Add new slots for heritage custodian entities - Created deliverables_slot for expected or achieved deliverable outputs. - Introduced event_id_slot for persistent unique event identifiers. - Added follow_up_date_slot for scheduled follow-up action dates. - Implemented object_ref_slot for references to heritage objects. - Established price_slot for price information across entities. - Added price_currency_slot for currency codes in price information. - Created protocol_slot for API protocol specifications. - Introduced provenance_text_slot for full provenance entry text. - Added record_type_slot for classification of record types. - Implemented response_formats_slot for supported API response formats. - Established status_slot for current status of entities or activities. - Added FactualCountDisplay component for displaying count query results. - Introduced ReplyTypeIndicator component for visualizing reply types. - Created approval_date_slot for formal approval dates. - Added authentication_required_slot for API authentication status. - Implemented capacity_items_slot for maximum storage capacity. - Established conservation_lab_slot for conservation laboratory information. - Added cost_usd_slot for API operation costs in USD.	2026-01-05 00:49:05 +01:00
kempersc	89001fbc53	compact header controls on OntologyViewer and QueryBuilder pages	2026-01-04 17:29:34 +01:00
kempersc	eb61f45de2	compact UML controls toolbar to fit single line when sidebar collapsed	2026-01-04 17:21:53 +01:00
kempersc	2dca28d8c1	enrich CH entries with mission statements	2026-01-04 13:12:32 +01:00
kempersc	4f0cafe98a	enrich HC profiles	2026-01-02 02:11:04 +01:00

1 2 3 4 5

244 commits