Prevent geographic false positives in cache lookups. Queries like "musea in Amsterdam" vs "musea in Noord-Holland" have ~93% embedding similarity but completely different answers. Changes: - Add ExtractedEntities interface for structured cache keys - Implement fast entity extraction (<5ms, no LLM) with regex patterns - Extract institution types (GLAMORCUBESFIXPHDNT), locations, and intent - Generate structured cache keys (e.g., "count:M:amsterdam") - Raise similarity threshold from 0.85 to 0.97 to match backend DSPy - Add 'structured' match method to CacheLookupResult The entity extractor recognizes: - 19 institution types (Dutch + English patterns) - 12 Dutch provinces with ISO 3166-2:NL codes - Major Dutch cities with settlement codes - Query intents (count, list, info) This ensures geographic queries get different cache entries even when embeddings are highly similar. |
||
|---|---|---|
| .. | ||
| BrowsePage.tsx | ||
| ChatPage.tsx | ||
| LoginPage.tsx | ||
| MapPage.tsx | ||
| OntologyPage.tsx | ||
| RulesPage.tsx | ||
| StatsPage.tsx | ||