Prevent geographic false positives in cache lookups. Queries like "musea in Amsterdam" vs "musea in Noord-Holland" have ~93% embedding similarity but completely different answers. Changes: - Add ExtractedEntities interface for structured cache keys - Implement fast entity extraction (<5ms, no LLM) with regex patterns - Extract institution types (GLAMORCUBESFIXPHDNT), locations, and intent - Generate structured cache keys (e.g., "count:M:amsterdam") - Raise similarity threshold from 0.85 to 0.97 to match backend DSPy - Add 'structured' match method to CacheLookupResult The entity extractor recognizes: - 19 institution types (Dutch + English patterns) - 12 Dutch provinces with ISO 3166-2:NL codes - Major Dutch cities with settlement codes - Query intents (count, list, info) This ensures geographic queries get different cache entries even when embeddings are highly similar. |
||
|---|---|---|
| .. | ||
| backend | ||
| e2e | ||
| node_modules | ||
| playwright-report | ||
| public | ||
| src | ||
| test-results | ||
| tests | ||
| deploy-backend.sh | ||
| index.html | ||
| package.json | ||
| playwright-results.json | ||
| playwright.config.ts | ||
| tsconfig.app.json | ||
| tsconfig.json | ||
| tsconfig.node.json | ||
| vite.config.ts | ||