Commit graph

26 commits

Author SHA1 Message Date
kempersc
b8914761b8 standardise slots 2026-01-14 09:51:14 +01:00
kempersc
92b490d690 edit slots 2026-01-13 20:35:11 +01:00
kempersc
f74513e8ef feat: Enhance entity resolution with email semantics and review merging
- Updated `entity_review.py` to map email semantic fields from JSON.
- Expanded `email_semantics.py` with additional museum mappings.
- Introduced a new rule in `.opencode/rules/no-duplicate-ontology-mappings.md` to prevent duplicate ontology mappings.
- Added a backup JSON file for entity resolution candidates.
- Created `enrich_email_semantics.py` to enrich candidates with email semantic signals.
- Developed `merge_entity_reviews.py` to merge reviewed decisions from a backup into new candidates.
2026-01-13 16:43:56 +01:00
kempersc
1fb924c412 feat: add ontology mappings to LinkML schema and enhance entity resolution
Schema enhancements (443 files):
- Add class_uri with proper ontology references (schema:, prov:, skos:, rico:)
- Add close_mappings, related_mappings per Rule 50 convention
- Replace stub hc: slot_uri with standard predicates (dcterms:identifier, skos:prefLabel)
- Improve descriptions with ontology mapping rationale
- Add prefixes blocks to all schema modules

Entity Resolution improvements:
- Add entity_resolution module with email semantics parsing
- Enhance build_entity_resolution.py with email-based matching signals
- Extend Entity Review API with filtering by signal types and count
- Add candidates caching and indexing for performance
- Add ReviewLoginPage component

New rules and documentation:
- Add Rule 51: No Hallucinated Ontology References
- Add .opencode/rules/no-hallucinated-ontology-references.md
- Add .opencode/rules/slot-ontology-mapping-reference.md
- Add adms.ttl and dqv.ttl ontology files

Frontend ontology support:
- Add RiC-O_1-1.rdf and schemaorg.owl to public/ontology
2026-01-13 13:51:02 +01:00
kempersc
3b35f4aea5 Refactor code structure for improved readability and maintainability 2026-01-12 18:31:31 +01:00
kempersc
355d8be51d centralise slots 2026-01-12 14:33:56 +01:00
kempersc
4f0cafe98a enrich HC profiles 2026-01-02 02:11:04 +01:00
kempersc
d64f857aa9 add sparql validator and RAG injector 2025-12-30 03:43:31 +01:00
kempersc
84904e344b Make AGENTS more succint by referring to opencode rules & enrich custodians 2025-12-28 14:56:35 +01:00
kempersc
ca219340f2 enrich entries 2025-12-26 14:30:31 +01:00
kempersc
0860f6094d fix(sparql): correct ontology in dspy_sparql.py to match actual RDF data
- Use crm:E39_Actor instead of glam:HeritageCustodian
- Use hc:institutionType with single-letter codes (M, L, A, etc.)
- Use Wikidata URIs for countries (Q55=NL, Q31=BE, etc.)
- Use skos:prefLabel for institution names
- Update ONTOLOGY_CONTEXT with correct examples
2025-12-22 22:22:07 +01:00
kempersc
7a056fa746 enrich entries 2025-12-21 22:12:34 +01:00
kempersc
aca68ea47f remove a,bihguous web-claims 2025-12-21 00:01:54 +01:00
kempersc
99430c2a70 add new entries and semantic routing 2025-12-17 10:11:56 +01:00
kempersc
52ae711c56 add timespans 2025-12-16 09:02:52 +01:00
kempersc
cb56aa7e40 enrich all custodian timespan 2025-12-15 22:31:41 +01:00
kempersc
d9892dba6f fix: handle single-vector Qdrant collections and multi-collection embedding dimensions
- Fixed _vector_search() to check uses_named_vectors() before adding 'using' parameter
- Fixed _person_vector_search() to detect person collection vector size and use appropriate model
- Resolves 'Not existing vector name error: openai_1536' for single-vector collections
- Resolves embedding dimension mismatch between heritage_custodians (1536-dim) and heritage_persons (384-dim)
2025-12-15 10:31:39 +01:00
kempersc
3820f2fc92 chore: Add data reports, infra scripts, and API updates
- Data quality reports for Dutch custodians
- Name mismatch detection reports
- Failed crawl URL tracking
- Caddy configuration updates
- Monitor script for chunk 404 errors
- API endpoint improvements
2025-12-15 01:48:08 +01:00
kempersc
c50c35fd3a enrich person custodian 2025-12-14 17:09:55 +01:00
kempersc
505c12601a Add test script for PiCo extraction from Arabic waqf documents
- Implemented a new script `test_pico_arabic_waqf.py` to test the GLM annotator's ability to extract person observations from Arabic historical documents.
- The script includes environment variable handling for API token, structured prompts for the GLM API, and validation of extraction results.
- Added comprehensive logging for API responses, extraction results, and validation errors.
- Included a sample Arabic waqf text for testing purposes, following the PiCo ontology pattern.
2025-12-12 17:50:17 +01:00
kempersc
b1f93b6f22 enrich person profiles 2025-12-12 12:51:10 +01:00
kempersc
1b1cfbfca0 enrich custodians 2025-12-11 22:32:09 +01:00
kempersc
b61271220b enrich entries 2025-12-09 10:46:43 +01:00
kempersc
bf7c773955 edit Japanese entries 2025-12-09 09:16:19 +01:00
kempersc
131e3ca259 normalise custodian entries 2025-12-09 07:56:35 +01:00
kempersc
1635625032 added web annotations 2025-12-06 19:50:04 +01:00