Commit graph

6 commits

Author SHA1 Message Date
kempersc
b30711fcfb update slots 2026-01-14 09:05:54 +01:00
kempersc
9a395f3dbe fix: improve birth year extraction to avoid date suffix false positives
- Skip YYYYMMDD and YYMMDD date patterns at end of email
- Skip digit sequences longer than 4 characters
- Require non-digit before 4-digit years at end
- Add knid.nl/kabelnoord.nl to consumer domains (Friesland ISP)
- Add 11 missing regional archive domains to HERITAGE_DOMAIN_MAP
- Update recalculation script to re-extract email semantics

Results:
- 3,151 false birth years removed
- 'Likely wrong person' reduced from 533 to 325 (-39%)
- 2,944 candidates' scores boosted
2026-01-13 22:37:10 +01:00
kempersc
833bb56833 feat(entity-resolution): expand consumer email domain list
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m55s
Add additional Dutch ISP domains for better filtering:
- gmail.nl, icloud.nl, aol.nl, aol.com
- telfortglasvezel.nl, worldonline.nl, delta.nl, lijbrandt.nl
- t-mobilethuis.nl, compaqnet.nl, filternet.nl, onsmail.nl, box.nl
- mailinator.com (disposable email)
2026-01-13 20:54:34 +01:00
kempersc
6a3616beac feat(entity-resolution): expand Dutch heritage domain mappings
Some checks are pending
Deploy Frontend / build-and-deploy (push) Waiting to run
Add domain mappings for better email-based entity matching:
- Government: noord-holland.nl, amsterdam.nl, rotterdam.nl, denhaag.nl,
  hoorn.nl, hhnk.nl, rijksoverheid.nl, politie.nl, kadaster.nl, rvo.nl,
  rivm.nl, staatsbosbeheer.nl, vng.nl
- Museums: maritiemmuseum.nl, paleishetloo.nl, slotloevestein.nl
- Universities: student.vu.nl, cdh.leidenuniv.nl, jur.ru.nl, student.ru.nl,
  student.tudelft.nl, eshcc.eur.nl, wur.nl, ou.nl
- Hogescholen: hva.nl, student.hu.nl, student.fontys.nl

Also remove deprecated activity_id.yaml slot file
2026-01-13 20:53:49 +01:00
kempersc
f74513e8ef feat: Enhance entity resolution with email semantics and review merging
- Updated `entity_review.py` to map email semantic fields from JSON.
- Expanded `email_semantics.py` with additional museum mappings.
- Introduced a new rule in `.opencode/rules/no-duplicate-ontology-mappings.md` to prevent duplicate ontology mappings.
- Added a backup JSON file for entity resolution candidates.
- Created `enrich_email_semantics.py` to enrich candidates with email semantic signals.
- Developed `merge_entity_reviews.py` to merge reviewed decisions from a backup into new candidates.
2026-01-13 16:43:56 +01:00
kempersc
1fb924c412 feat: add ontology mappings to LinkML schema and enhance entity resolution
Schema enhancements (443 files):
- Add class_uri with proper ontology references (schema:, prov:, skos:, rico:)
- Add close_mappings, related_mappings per Rule 50 convention
- Replace stub hc: slot_uri with standard predicates (dcterms:identifier, skos:prefLabel)
- Improve descriptions with ontology mapping rationale
- Add prefixes blocks to all schema modules

Entity Resolution improvements:
- Add entity_resolution module with email semantics parsing
- Enhance build_entity_resolution.py with email-based matching signals
- Extend Entity Review API with filtering by signal types and count
- Add candidates caching and indexing for performance
- Add ReviewLoginPage component

New rules and documentation:
- Add Rule 51: No Hallucinated Ontology References
- Add .opencode/rules/no-hallucinated-ontology-references.md
- Add .opencode/rules/slot-ontology-mapping-reference.md
- Add adms.ttl and dqv.ttl ontology files

Frontend ontology support:
- Add RiC-O_1-1.rdf and schemaorg.owl to public/ontology
2026-01-13 13:51:02 +01:00