kempersc/glam - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
kempersc	b8914761b8	standardise slots	2026-01-14 09:51:14 +01:00
kempersc	92b490d690	edit slots	2026-01-13 20:35:11 +01:00
kempersc	f74513e8ef	feat: Enhance entity resolution with email semantics and review merging - Updated `entity_review.py` to map email semantic fields from JSON. - Expanded `email_semantics.py` with additional museum mappings. - Introduced a new rule in `.opencode/rules/no-duplicate-ontology-mappings.md` to prevent duplicate ontology mappings. - Added a backup JSON file for entity resolution candidates. - Created `enrich_email_semantics.py` to enrich candidates with email semantic signals. - Developed `merge_entity_reviews.py` to merge reviewed decisions from a backup into new candidates.	2026-01-13 16:43:56 +01:00
kempersc	1fb924c412	feat: add ontology mappings to LinkML schema and enhance entity resolution Schema enhancements (443 files): - Add class_uri with proper ontology references (schema:, prov:, skos:, rico:) - Add close_mappings, related_mappings per Rule 50 convention - Replace stub hc: slot_uri with standard predicates (dcterms:identifier, skos:prefLabel) - Improve descriptions with ontology mapping rationale - Add prefixes blocks to all schema modules Entity Resolution improvements: - Add entity_resolution module with email semantics parsing - Enhance build_entity_resolution.py with email-based matching signals - Extend Entity Review API with filtering by signal types and count - Add candidates caching and indexing for performance - Add ReviewLoginPage component New rules and documentation: - Add Rule 51: No Hallucinated Ontology References - Add .opencode/rules/no-hallucinated-ontology-references.md - Add .opencode/rules/slot-ontology-mapping-reference.md - Add adms.ttl and dqv.ttl ontology files Frontend ontology support: - Add RiC-O_1-1.rdf and schemaorg.owl to public/ontology	2026-01-13 13:51:02 +01:00
kempersc	3b35f4aea5	Refactor code structure for improved readability and maintainability	2026-01-12 18:31:31 +01:00
kempersc	355d8be51d	centralise slots	2026-01-12 14:33:56 +01:00
kempersc	4f0cafe98a	enrich HC profiles	2026-01-02 02:11:04 +01:00
kempersc	d64f857aa9	add sparql validator and RAG injector	2025-12-30 03:43:31 +01:00
kempersc	84904e344b	Make AGENTS more succint by referring to opencode rules & enrich custodians	2025-12-28 14:56:35 +01:00
kempersc	ca219340f2	enrich entries	2025-12-26 14:30:31 +01:00
kempersc	0860f6094d	fix(sparql): correct ontology in dspy_sparql.py to match actual RDF data - Use crm:E39_Actor instead of glam:HeritageCustodian - Use hc:institutionType with single-letter codes (M, L, A, etc.) - Use Wikidata URIs for countries (Q55=NL, Q31=BE, etc.) - Use skos:prefLabel for institution names - Update ONTOLOGY_CONTEXT with correct examples	2025-12-22 22:22:07 +01:00
kempersc	7a056fa746	enrich entries	2025-12-21 22:12:34 +01:00
kempersc	aca68ea47f	remove a,bihguous web-claims	2025-12-21 00:01:54 +01:00
kempersc	99430c2a70	add new entries and semantic routing	2025-12-17 10:11:56 +01:00
kempersc	52ae711c56	add timespans	2025-12-16 09:02:52 +01:00
kempersc	cb56aa7e40	enrich all custodian timespan	2025-12-15 22:31:41 +01:00
kempersc	d9892dba6f	fix: handle single-vector Qdrant collections and multi-collection embedding dimensions - Fixed _vector_search() to check uses_named_vectors() before adding 'using' parameter - Fixed _person_vector_search() to detect person collection vector size and use appropriate model - Resolves 'Not existing vector name error: openai_1536' for single-vector collections - Resolves embedding dimension mismatch between heritage_custodians (1536-dim) and heritage_persons (384-dim)	2025-12-15 10:31:39 +01:00
kempersc	3820f2fc92	chore: Add data reports, infra scripts, and API updates - Data quality reports for Dutch custodians - Name mismatch detection reports - Failed crawl URL tracking - Caddy configuration updates - Monitor script for chunk 404 errors - API endpoint improvements	2025-12-15 01:48:08 +01:00
kempersc	c50c35fd3a	enrich person custodian	2025-12-14 17:09:55 +01:00
kempersc	505c12601a	Add test script for PiCo extraction from Arabic waqf documents - Implemented a new script `test_pico_arabic_waqf.py` to test the GLM annotator's ability to extract person observations from Arabic historical documents. - The script includes environment variable handling for API token, structured prompts for the GLM API, and validation of extraction results. - Added comprehensive logging for API responses, extraction results, and validation errors. - Included a sample Arabic waqf text for testing purposes, following the PiCo ontology pattern.	2025-12-12 17:50:17 +01:00
kempersc	b1f93b6f22	enrich person profiles	2025-12-12 12:51:10 +01:00
kempersc	1b1cfbfca0	enrich custodians	2025-12-11 22:32:09 +01:00
kempersc	b61271220b	enrich entries	2025-12-09 10:46:43 +01:00
kempersc	bf7c773955	edit Japanese entries	2025-12-09 09:16:19 +01:00
kempersc	131e3ca259	normalise custodian entries	2025-12-09 07:56:35 +01:00
kempersc	1635625032	added web annotations	2025-12-06 19:50:04 +01:00

26 commits