Commit graph

18 commits

Author SHA1 Message Date
kempersc
3c4f7acf87 test(archief-assistent): update E2E tests for entity extraction cache
- Simplify cache spec assertions after structured matching implementation
- Refactor map-panel spec for better test isolation and reliability
- Remove redundant geographic false positive tests (handled by entity extraction)
2026-01-10 12:55:22 +01:00
kempersc
7fbff2ff5f feat(archief-assistent): add entity extraction to semantic cache
Prevent geographic false positives in cache lookups. Queries like
"musea in Amsterdam" vs "musea in Noord-Holland" have ~93%
embedding similarity but completely different answers.

Changes:
- Add ExtractedEntities interface for structured cache keys
- Implement fast entity extraction (<5ms, no LLM) with regex patterns
- Extract institution types (GLAMORCUBESFIXPHDNT), locations, and intent
- Generate structured cache keys (e.g., "count:M:amsterdam")
- Raise similarity threshold from 0.85 to 0.97 to match backend DSPy
- Add 'structured' match method to CacheLookupResult

The entity extractor recognizes:
- 19 institution types (Dutch + English patterns)
- 12 Dutch provinces with ISO 3166-2:NL codes
- Major Dutch cities with settlement codes
- Query intents (count, list, info)

This ensures geographic queries get different cache entries even when
embeddings are highly similar.
2026-01-10 10:33:21 +01:00
kempersc
519b0b47a8 Add Playwright test results JSON file with initial test suite and failure details 2026-01-09 21:33:31 +01:00
kempersc
004d342935 chore: minor updates and evaluation results
- auth.setup.ts: require env vars for test credentials (no hardcoded defaults)
- manifest.json: update schema manifest
- full_evaluation_results.json: add RAG evaluation results
- petra-links.json: update birth date from web claim
2026-01-09 21:10:55 +01:00
kempersc
ea35da02dc test(archief-assistent): add Playwright E2E test suite
- Add chat.spec.ts for RAG query testing
- Add count-queries.spec.ts for aggregation validation
- Add map-panel.spec.ts for geographic feature testing
- Add cache.spec.ts for response caching verification
- Add auth.setup.ts for authentication handling
- Configure playwright.config.ts for multi-browser testing
- Tests run against production archief.support
2026-01-09 21:09:56 +01:00
kempersc
97f85e0050 deps(archief-assistent): add playwright for E2E testing
- Add @playwright/test as dev dependency
- Alphabetize dependencies list
2026-01-09 21:06:12 +01:00
kempersc
9e67d0f967 enrich profiles 2026-01-09 20:35:19 +01:00
kempersc
c88fd3af70 Refactor code structure for improved readability and maintainability 2026-01-09 11:05:26 +01:00
kempersc
6608a207d4 update frontend 2026-01-08 15:56:28 +01:00
kempersc
98c42bf272 Fix LinkML URI conflicts and generate RDF outputs
- Fix scope_note → finding_aid_scope_note in FindingAid.yaml
- Remove duplicate wikidata_entity slot from CustodianType.yaml (import instead)
- Remove duplicate rico_record_set_type from class_metadata_slots.yaml
- Fix range types for equals_string compatibility (uriorcurie → string)
- Move class names from close_mappings to see_also in 10 RecordSetTypes files
- Generate all RDF formats: OWL, N-Triples, RDF/XML, N3, JSON-LD context
- Sync schemas to frontend/public/schemas/

Files: 1,151 changed (includes prior CustodianType migration)
2026-01-07 12:32:59 +01:00
kempersc
2dca28d8c1 enrich CH entries with mission statements 2026-01-04 13:12:32 +01:00
kempersc
4f0cafe98a enrich HC profiles 2026-01-02 02:11:04 +01:00
kempersc
d64f857aa9 add sparql validator and RAG injector 2025-12-30 03:43:31 +01:00
kempersc
84904e344b Make AGENTS more succint by referring to opencode rules & enrich custodians 2025-12-28 14:56:35 +01:00
kempersc
cdb633b0c9 enrich custodian entries with logo 2025-12-27 02:15:17 +01:00
kempersc
d5f2f542ce feat(archief-assistent): preserve SPARQL queries in semantic cache
- Add sparqlQuery field to CachedResponse interface
- Extract SPARQL query before cache storage (not after)
- Include sparqlQuery in cache HIT message objects
- Handle both snake_case (server) and camelCase field names

SPARQL queries are now displayed for both fresh API responses
and cached responses, improving debugging and transparency.
2025-12-24 22:26:22 +01:00
kempersc
0c1d19e98b enrich entries 2025-12-23 13:27:35 +01:00
kempersc
aca68ea47f remove a,bihguous web-claims 2025-12-21 00:01:54 +01:00