kempersc/glam - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
kempersc	b927bc4b43	Update manifest.json and migrate approved_by slot to is_or_was_approved_by; add includes_or_included slot to InformationCarrier; remove bookplate slot and archive it	2026-01-14 15:05:37 +01:00
kempersc	21c207c9da	Refactor schema slots and classes for improved clarity and structure - Migrated `archived_at` to `is_or_was_archived_at` in AuxiliaryDigitalPlatform, WebObservation, and other relevant classes to better reflect historical archival status. - Removed `bold_id` slot and replaced it with `has_or_had_identifier` linked to the new `BOLDIdentifier` class in BiologicalObject. - Introduced `Bookplate` and `Approver` classes to enhance provenance tracking and ownership documentation. - Updated `InformationCarrier` to replace `bookplate` with `includes_or_included` for better representation of ownership marks. - Added new slots `is_or_was_approved_by` and `is_or_was_archived_at` to capture historical approval and archival locations. - Archived old slot definitions for `archived_at` and `bold_id` to maintain schema integrity. - Enhanced LinkedIn profile extraction functionality by integrating Linkup API alongside Exa API.	2026-01-14 13:28:33 +01:00
kempersc	60e66d60f9	Add new slots and classes for enhanced documentation and availability tracking - Introduced `is_or_was_created_through` slot to indicate content creation methods, replacing previous boolean flags. - Added `is_or_was_required` slot for generic temporal boolean requirements, aligning with Schema.org. - Created `AutoGeneration` class to represent automatic content generation, capturing methods and provenance. - Established `AvailabilityStatus` class to model resource availability with temporal validity. - Developed `Documentation` class for structured documentation resources, replacing domain-specific slots. - Implemented `Taxon` class for biological classification in natural history collections. - Archived previous slots related to API availability and documentation, ensuring a clean schema. - Enhanced existing slots with detailed descriptions and examples for clarity and usability.	2026-01-14 13:09:31 +01:00
kempersc	b13674400f	Refactor schema slots and classes for improved organization and clarity - Removed deprecated slots: appraisal_notes, branch_id, is_or_was_real. - Introduced new slots: has_or_had_notes, has_or_had_provenance. - Created Notes class to encapsulate note-related metadata. - Archived removed slots and classes in accordance with the new archive folder convention. - Updated slot_fixes.yaml to reflect migration status and details. - Enhanced documentation for new slots and classes, ensuring compliance with ontology alignment. - Added new slots for note content, date, and type to support the Notes class.	2026-01-14 12:14:07 +01:00
kempersc	b8914761b8	standardise slots	2026-01-14 09:51:14 +01:00
kempersc	b30711fcfb	update slots	2026-01-14 09:05:54 +01:00
kempersc	9a395f3dbe	fix: improve birth year extraction to avoid date suffix false positives - Skip YYYYMMDD and YYMMDD date patterns at end of email - Skip digit sequences longer than 4 characters - Require non-digit before 4-digit years at end - Add knid.nl/kabelnoord.nl to consumer domains (Friesland ISP) - Add 11 missing regional archive domains to HERITAGE_DOMAIN_MAP - Update recalculation script to re-extract email semantics Results: - 3,151 false birth years removed - 'Likely wrong person' reduced from 533 to 325 (-39%) - 2,944 candidates' scores boosted	2026-01-13 22:37:10 +01:00
kempersc	833bb56833	feat(entity-resolution): expand consumer email domain list All checks were successful Deploy Frontend / build-and-deploy (push) Successful in 3m55s Details Add additional Dutch ISP domains for better filtering: - gmail.nl, icloud.nl, aol.nl, aol.com - telfortglasvezel.nl, worldonline.nl, delta.nl, lijbrandt.nl - t-mobilethuis.nl, compaqnet.nl, filternet.nl, onsmail.nl, box.nl - mailinator.com (disposable email)	2026-01-13 20:54:34 +01:00
kempersc	6a3616beac	feat(entity-resolution): expand Dutch heritage domain mappings Some checks are pending Deploy Frontend / build-and-deploy (push) Waiting to run Details Add domain mappings for better email-based entity matching: - Government: noord-holland.nl, amsterdam.nl, rotterdam.nl, denhaag.nl, hoorn.nl, hhnk.nl, rijksoverheid.nl, politie.nl, kadaster.nl, rvo.nl, rivm.nl, staatsbosbeheer.nl, vng.nl - Museums: maritiemmuseum.nl, paleishetloo.nl, slotloevestein.nl - Universities: student.vu.nl, cdh.leidenuniv.nl, jur.ru.nl, student.ru.nl, student.tudelft.nl, eshcc.eur.nl, wur.nl, ou.nl - Hogescholen: hva.nl, student.hu.nl, student.fontys.nl Also remove deprecated activity_id.yaml slot file	2026-01-13 20:53:49 +01:00
kempersc	92b490d690	edit slots	2026-01-13 20:35:11 +01:00
kempersc	f74513e8ef	feat: Enhance entity resolution with email semantics and review merging - Updated `entity_review.py` to map email semantic fields from JSON. - Expanded `email_semantics.py` with additional museum mappings. - Introduced a new rule in `.opencode/rules/no-duplicate-ontology-mappings.md` to prevent duplicate ontology mappings. - Added a backup JSON file for entity resolution candidates. - Created `enrich_email_semantics.py` to enrich candidates with email semantic signals. - Developed `merge_entity_reviews.py` to merge reviewed decisions from a backup into new candidates.	2026-01-13 16:43:56 +01:00
kempersc	1fb924c412	feat: add ontology mappings to LinkML schema and enhance entity resolution Schema enhancements (443 files): - Add class_uri with proper ontology references (schema:, prov:, skos:, rico:) - Add close_mappings, related_mappings per Rule 50 convention - Replace stub hc: slot_uri with standard predicates (dcterms:identifier, skos:prefLabel) - Improve descriptions with ontology mapping rationale - Add prefixes blocks to all schema modules Entity Resolution improvements: - Add entity_resolution module with email semantics parsing - Enhance build_entity_resolution.py with email-based matching signals - Extend Entity Review API with filtering by signal types and count - Add candidates caching and indexing for performance - Add ReviewLoginPage component New rules and documentation: - Add Rule 51: No Hallucinated Ontology References - Add .opencode/rules/no-hallucinated-ontology-references.md - Add .opencode/rules/slot-ontology-mapping-reference.md - Add adms.ttl and dqv.ttl ontology files Frontend ontology support: - Add RiC-O_1-1.rdf and schemaorg.owl to public/ontology	2026-01-13 13:51:02 +01:00
kempersc	3b35f4aea5	Refactor code structure for improved readability and maintainability	2026-01-12 18:31:31 +01:00
kempersc	355d8be51d	centralise slots	2026-01-12 14:33:56 +01:00
kempersc	66ab2908d0	fix: remove deprecated AnnotationMotivationEnum, add European surname data Some checks failed Deploy Frontend / build-and-deploy (push) Failing after 3m21s Details - Move deprecated AnnotationMotivationEnum to archive-deprecated/ (outside served paths) - Add French, Italian, Polish, Spanish surname datasets for entity resolution - Update name_commonality.py with expanded European surname detection - Triggers GitOps workflow to test Forgejo Actions runner	2026-01-11 16:03:18 +01:00
kempersc	fd792fce2c	Refactor code structure for improved readability and maintainability Some checks failed Deploy Frontend / build-and-deploy (push) Has been cancelled Details	2026-01-11 15:27:14 +01:00
kempersc	55ef2a831d	feat(data): add Belgian surnames dataset with metadata and surname counts	2026-01-11 13:50:20 +01:00
kempersc	7d09e4179c	Add US surnames dataset from 2010 Census with metadata and surname counts	2026-01-11 12:28:58 +01:00
kempersc	dfb4744dc7	Evaluate data enrichments of persons	2026-01-11 12:15:27 +01:00
kempersc	4f0cafe98a	enrich HC profiles	2026-01-02 02:11:04 +01:00
kempersc	d64f857aa9	add sparql validator and RAG injector	2025-12-30 03:43:31 +01:00
kempersc	84904e344b	Make AGENTS more succint by referring to opencode rules & enrich custodians	2025-12-28 14:56:35 +01:00
kempersc	ca219340f2	enrich entries	2025-12-26 14:30:31 +01:00
kempersc	0860f6094d	fix(sparql): correct ontology in dspy_sparql.py to match actual RDF data - Use crm:E39_Actor instead of glam:HeritageCustodian - Use hc:institutionType with single-letter codes (M, L, A, etc.) - Use Wikidata URIs for countries (Q55=NL, Q31=BE, etc.) - Use skos:prefLabel for institution names - Update ONTOLOGY_CONTEXT with correct examples	2025-12-22 22:22:07 +01:00
kempersc	7a056fa746	enrich entries	2025-12-21 22:12:34 +01:00
kempersc	aca68ea47f	remove a,bihguous web-claims	2025-12-21 00:01:54 +01:00
kempersc	99430c2a70	add new entries and semantic routing	2025-12-17 10:11:56 +01:00
kempersc	52ae711c56	add timespans	2025-12-16 09:02:52 +01:00
kempersc	cb56aa7e40	enrich all custodian timespan	2025-12-15 22:31:41 +01:00
kempersc	d9892dba6f	fix: handle single-vector Qdrant collections and multi-collection embedding dimensions - Fixed _vector_search() to check uses_named_vectors() before adding 'using' parameter - Fixed _person_vector_search() to detect person collection vector size and use appropriate model - Resolves 'Not existing vector name error: openai_1536' for single-vector collections - Resolves embedding dimension mismatch between heritage_custodians (1536-dim) and heritage_persons (384-dim)	2025-12-15 10:31:39 +01:00
kempersc	3820f2fc92	chore: Add data reports, infra scripts, and API updates - Data quality reports for Dutch custodians - Name mismatch detection reports - Failed crawl URL tracking - Caddy configuration updates - Monitor script for chunk 404 errors - API endpoint improvements	2025-12-15 01:48:08 +01:00
kempersc	c50c35fd3a	enrich person custodian	2025-12-14 17:09:55 +01:00
kempersc	505c12601a	Add test script for PiCo extraction from Arabic waqf documents - Implemented a new script `test_pico_arabic_waqf.py` to test the GLM annotator's ability to extract person observations from Arabic historical documents. - The script includes environment variable handling for API token, structured prompts for the GLM API, and validation of extraction results. - Added comprehensive logging for API responses, extraction results, and validation errors. - Included a sample Arabic waqf text for testing purposes, following the PiCo ontology pattern.	2025-12-12 17:50:17 +01:00
kempersc	b1f93b6f22	enrich person profiles	2025-12-12 12:51:10 +01:00
kempersc	1b1cfbfca0	enrich custodians	2025-12-11 22:32:09 +01:00
kempersc	b61271220b	enrich entries	2025-12-09 10:46:43 +01:00
kempersc	bf7c773955	edit Japanese entries	2025-12-09 09:16:19 +01:00
kempersc	131e3ca259	normalise custodian entries	2025-12-09 07:56:35 +01:00
kempersc	ee4e57bc75	add new entries	2025-12-07 00:26:01 +01:00
kempersc	1635625032	added web annotations	2025-12-06 19:50:04 +01:00
kempersc	4da64eeebf	improve annotator	2025-12-05 16:25:39 +01:00
kempersc	e38fb4613b	improve annotation prompt	2025-12-05 15:51:39 +01:00
kempersc	3a242370fc	annotation standards added	2025-12-05 15:30:23 +01:00
kempersc	d661947830	update enriched entries	2025-12-03 17:38:46 +01:00
kempersc	097d116b72	enrich entries	2025-12-01 16:06:34 +01:00
kempersc	f3c149b1bb	update entries	2025-11-30 23:30:29 +01:00
kempersc	fa5680f0dd	Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats - Introduced custodian_hub_v3.mmd, custodian_hub_v4_final.mmd, and custodian_hub_v5_FINAL.mmd for Mermaid representation. - Created custodian_hub_FINAL.puml and custodian_hub_v3.puml for PlantUML representation. - Defined entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument. - Established relationships and associations between entities, including temporal extents, observations, and reconstruction activities. - Incorporated enumerations for various types, statuses, and classifications relevant to custodians and their activities.	2025-11-22 14:33:51 +01:00
kempersc	3c80de87e0	add isil entries	2025-11-19 23:25:22 +01:00
kempersc	e5a532a8bc	Add comprehensive tests for NLP institution extraction and RDF partnership integration - Introduced `test_nlp_extractor.py` with unit tests for the InstitutionExtractor, covering various extraction patterns (ISIL, Wikidata, VIAF, city names) and ensuring proper classification of institutions (museum, library, archive). - Added tests for extracted entities and result handling to validate the extraction process. - Created `test_partnership_rdf_integration.py` to validate the end-to-end process of extracting partnerships from a conversation and exporting them to RDF format. - Implemented tests for temporal properties in partnerships and ensured compliance with W3C Organization Ontology patterns. - Verified that extracted partnerships are correctly linked with PROV-O provenance metadata.	2025-11-19 23:20:47 +01:00

49 commits