- Introduced Rule 42: No Ontology Prefixes in Slot Names to enforce clean naming conventions. - Established Rule: No Rough Edits in Schema Files to ensure structural integrity during modifications. - Implemented Rule: No Version Indicators in Names to maintain stable semantic naming. - Created Rule: Ontology Detection vs Heuristics to emphasize the importance of verifying ontology definitions. - Defined Rule 50: Ontology-to-LinkML Mapping Convention to standardize mapping practices. - Added Rule: Polished Slot Storage Location to specify directory structure for polished slot files. - Enforced Rule: Preserve Bespoke Slots Until Refactoring to prevent unintended migrations during slot updates. - Instituted Rule 56: Semantic Consistency Over Simplicity to mandate execution of revisions in slot_fixes.yaml. - Added new Genealogy Archives Registry Enrichment class with multilingual support and structured aliases.
583 lines
24 KiB
Markdown
583 lines
24 KiB
Markdown
# Rule 46: Ontology-Driven Cache Segmentation
|
|
|
|
🚨 **CRITICAL**: The semantic cache MUST use vocabulary derived from LinkML `*Type.yaml` and `*Types.yaml` schema files to extract entities for cache key generation. Hardcoded regex patterns are deprecated.
|
|
|
|
**Status**: Implemented (Evolved v2.0)
|
|
**Version**: 2.0 (Epistemological Evolution)
|
|
**Updated**: 2026-01-10
|
|
|
|
## Evolution Overview
|
|
|
|
Rule 46 v2.0 incorporates insights from Volodymyr Pavlyshyn's work on agentic memory systems:
|
|
|
|
1. **Epistemic Provenance** (Phase 1) - Track WHERE, WHEN, HOW data originated
|
|
2. **Topological Distance** (Phase 2) - Use ontology structure, not just embeddings
|
|
3. **Holarchic Cache** (Phase 3) - Entries as holons with up/down links
|
|
4. **Message Passing** (Phase 4, planned) - Smalltalk-style introspectable cache
|
|
5. **Clarity Trading** (Phase 5, planned) - Block ambiguous queries from cache
|
|
|
|
## Epistemic Provenance
|
|
|
|
Every cached response carries epistemological metadata:
|
|
|
|
```typescript
|
|
interface EpistemicProvenance {
|
|
dataSource: 'ISIL_REGISTRY' | 'WIKIDATA' | 'CUSTODIAN_YAML' | 'LLM_INFERENCE' | ...;
|
|
dataTier: 1 | 2 | 3 | 4; // TIER_1_AUTHORITATIVE → TIER_4_INFERRED
|
|
sourceTimestamp: string;
|
|
derivationChain: string[]; // ["SPARQL:Qdrant", "RAG:retrieve", "LLM:generate"]
|
|
revalidationPolicy: 'static' | 'daily' | 'weekly' | 'on_access';
|
|
}
|
|
```
|
|
|
|
**Benefit**: Users see "This answer is from TIER_1 ISIL registry data, captured 2025-01-08".
|
|
|
|
## Topological Distance
|
|
|
|
Beyond embedding similarity, cache matching considers **structural distance** in the type hierarchy:
|
|
|
|
```
|
|
HeritageCustodian (*)
|
|
│
|
|
┌──────────────────┼──────────────────┐
|
|
▼ ▼ ▼
|
|
MuseumType (M) ArchiveType (A) LibraryType (L)
|
|
│ │ │
|
|
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
|
|
▼ ▼ ▼ ▼ ▼ ▼
|
|
ArtMuseum History Municipal State Public Academic
|
|
```
|
|
|
|
**Combined Similarity Formula**:
|
|
```typescript
|
|
finalScore = 0.7 * embeddingSimilarity + 0.3 * (1 - topologicalDistance)
|
|
```
|
|
|
|
**Benefit**: "Art museum" won't match "natural history museum" even with 95% embedding similarity.
|
|
|
|
## Holarchic Cache Structure
|
|
|
|
Cache entries are **holons** - simultaneously complete AND parts of aggregates:
|
|
|
|
| Level | Example | Aggregates |
|
|
|-------|---------|------------|
|
|
| Micro | "Rijksmuseum details" | None |
|
|
| Meso | "Museums in Amsterdam" | List of micro holons |
|
|
| Macro | "Heritage in Noord-Holland" | All meso holons in region |
|
|
|
|
```typescript
|
|
interface CachedQuery {
|
|
// ... existing fields ...
|
|
holonLevel?: 'micro' | 'meso' | 'macro';
|
|
participatesIn?: string[]; // Higher-level cache keys
|
|
aggregates?: string[]; // Lower-level entries
|
|
}
|
|
```
|
|
|
|
## Problem Statement
|
|
|
|
The ArchiefAssistent semantic cache prevents geographic false positives using entity extraction:
|
|
|
|
```
|
|
Query: "Hoeveel musea in Amsterdam?"
|
|
Cached: "Hoeveel musea in Noord-Holland?"
|
|
Result: BLOCKED (location mismatch) ✅
|
|
```
|
|
|
|
However, the current implementation uses **hardcoded regex patterns**:
|
|
|
|
```typescript
|
|
// DEPRECATED: Hardcoded patterns in semantic-cache.ts
|
|
const INSTITUTION_PATTERNS: Record<InstitutionTypeCode, RegExp> = {
|
|
M: /\b(muse(um|a|ums?)|musea)/i,
|
|
A: /\b(archie[fv]en?|archives?|archief)/i,
|
|
// ... 19 patterns to maintain manually
|
|
};
|
|
```
|
|
|
|
**Problems with hardcoded patterns**:
|
|
1. **Maintenance burden** - Every new institution type requires code changes
|
|
2. **Missing subtypes** - "kunstmuseum" vs "museum" should cache separately
|
|
3. **No multilingual support** - Only Dutch/English, misses German/French labels
|
|
4. **Duplication** - Same vocabulary exists in LinkML schemas
|
|
5. **No record type awareness** - "burgerlijke stand" queries mixed with general archive queries
|
|
|
|
## Solution: Schema-Derived Vocabulary
|
|
|
|
The LinkML schema already contains rich vocabulary:
|
|
|
|
| Schema File | Content | Cache Utility |
|
|
|-------------|---------|---------------|
|
|
| `CustodianType.yaml` | 19 top-level types | Primary segmentation (M/A/L/G...) |
|
|
| `MuseumType.yaml` | 187+ museum subtypes | Subtype segmentation |
|
|
| `ArchiveOrganizationType.yaml` | 144+ archive subtypes | Subtype segmentation |
|
|
| `*RecordSetTypes.yaml` | Record type taxonomies | Finding aids specificity |
|
|
|
|
### Vocabulary Sources in Schema
|
|
|
|
1. **`type_label`** - Multilingual labels via `skos:prefLabel`
|
|
2. **`structured_aliases`** - Language-tagged alternative names
|
|
3. **`keywords`** - Search terms for entity recognition
|
|
4. **`wikidata_entity`** - Linked Data identifiers
|
|
|
|
## Architecture
|
|
|
|
### Overview: Two-Tier Embedding Hierarchy
|
|
|
|
The system uses a **hierarchical embedding approach** for fast semantic routing:
|
|
|
|
1. **Tier 1: Types File Embeddings** - Which category? (Museum vs Archive vs Library)
|
|
2. **Tier 2: Individual Type Embeddings** - Which specific type? (ArtMuseum vs NaturalHistoryMuseum)
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ BUILD TIME: Extract vocabulary + generate embeddings │
|
|
│ │
|
|
│ schemas/20251121/linkml/modules/classes/*Type.yaml │
|
|
│ schemas/20251121/linkml/modules/classes/*Types.yaml │
|
|
│ ↓ │
|
|
│ scripts/extract-types-vocab.ts │
|
|
│ ↓ │
|
|
│ ┌───────────────────────────────────────────────────────────────────┐ │
|
|
│ │ types-vocab.json │ │
|
|
│ │ ├── tier1Embeddings: { MuseumType: [...], ArchiveType: [...] } │ │
|
|
│ │ ├── tier2Embeddings: { ArtMuseum: [...], MunicipalArchive: [...]}│ │
|
|
│ │ └── termLog: { "kunstmuseum": { type: "M", subtype: "ART_MUSEUM"}│ │
|
|
│ └───────────────────────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼ (loaded at runtime)
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ RUNTIME: Two-Tier Semantic Routing │
|
|
│ │
|
|
│ Query: "Hoeveel gemeentearchieven in Amsterdam?" │
|
|
│ ↓ │
|
|
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
│ │ TIER 1: Types File Selection │ │
|
|
│ │ Query embedding vs Tier1 embeddings (19 categories) │ │
|
|
│ │ Result: ArchiveOrganizationType (similarity: 0.89) │ │
|
|
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
│ ↓ │
|
|
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
│ │ TIER 2: Specific Type Selection │ │
|
|
│ │ Query embedding vs Tier2 embeddings (144 archive subtypes) │ │
|
|
│ │ Result: MunicipalArchive (similarity: 0.94) │ │
|
|
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
│ ↓ │
|
|
│ Structured cache key: "count:A.MUNICIPAL_ARCHIVE:amsterdam" │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Tier 1: Types File Embeddings
|
|
|
|
Each Types file (e.g., `MuseumType.yaml`, `ArchiveOrganizationType.yaml`) gets ONE embedding
|
|
representing the **accumulated vocabulary** of all types within that file.
|
|
|
|
**Embedding Text Construction**:
|
|
```
|
|
MuseumType: museum musea kunstmuseum art museum natural history museum
|
|
science museum open-air museum ecomuseum virtual museum
|
|
heritage farm national museum regional museum university museum
|
|
[... all keywords from all 187 subtypes ...]
|
|
```
|
|
|
|
**Purpose**: Fast first-pass filter to identify which GLAMORCUBESFIXPHDNT category the query relates to.
|
|
|
|
| Types File | Code | Accumulated Terms Count |
|
|
|------------|------|------------------------|
|
|
| MuseumType | M | ~500+ terms from 187 subtypes |
|
|
| ArchiveOrganizationType | A | ~400+ terms from 144 subtypes |
|
|
| LibraryType | L | ~200+ terms from subtypes |
|
|
| GalleryType | G | ~100+ terms from subtypes |
|
|
| ... | ... | ... |
|
|
|
|
### Tier 2: Individual Type Embeddings
|
|
|
|
Each **specific type** within a Types file gets its own embedding from its accumulated terms.
|
|
|
|
**Embedding Text Construction**:
|
|
```
|
|
MunicipalArchive: gemeentearchief stadsarchief city archive municipal archive
|
|
town archive local government records burgerlijke stand
|
|
bevolkingsregister council minutes building permits
|
|
[... all keywords + structured_aliases + labels ...]
|
|
```
|
|
|
|
**Purpose**: Precise subtype identification after Tier 1 narrows the category.
|
|
|
|
### Term Log Structure
|
|
|
|
A lookup table mapping every extracted term to its type/subtype:
|
|
|
|
```json
|
|
{
|
|
"termLog": {
|
|
"kunstmuseum": {
|
|
"typeCode": "M",
|
|
"typeName": "MuseumType",
|
|
"subtypeName": "ART_MUSEUM",
|
|
"wikidata": "Q207694",
|
|
"language": "nl"
|
|
},
|
|
"art museum": {
|
|
"typeCode": "M",
|
|
"typeName": "MuseumType",
|
|
"subtypeName": "ART_MUSEUM",
|
|
"wikidata": "Q207694",
|
|
"language": "en"
|
|
},
|
|
"gemeentearchief": {
|
|
"typeCode": "A",
|
|
"typeName": "ArchiveOrganizationType",
|
|
"subtypeName": "MUNICIPAL_ARCHIVE",
|
|
"wikidata": "Q8362876",
|
|
"language": "nl"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Purpose**:
|
|
1. Fast O(1) keyword lookup (no embedding needed for exact matches)
|
|
2. Audit trail of which terms map to which types
|
|
3. Debugging which queries match which types
|
|
|
|
### Runtime Lookup Strategy
|
|
|
|
```typescript
|
|
async function extractEntitiesWithEmbeddings(query: string): Promise<ExtractedEntities> {
|
|
const vocab = await loadTypesVocabulary();
|
|
const normalized = query.toLowerCase();
|
|
|
|
// FAST PATH: Check termLog for exact keyword matches
|
|
for (const [term, mapping] of Object.entries(vocab.termLog)) {
|
|
if (normalized.includes(term)) {
|
|
return {
|
|
institutionType: mapping.typeCode,
|
|
institutionSubtype: mapping.subtypeName,
|
|
subtypeWikidata: mapping.wikidata,
|
|
// ... location and intent extraction
|
|
};
|
|
}
|
|
}
|
|
|
|
// SLOW PATH: Embedding-based semantic matching
|
|
const queryEmbedding = await generateEmbedding(query);
|
|
|
|
// Tier 1: Find best matching Types file
|
|
let bestType: string | null = null;
|
|
let bestTypeSimilarity = 0;
|
|
for (const [typeName, typeEmbedding] of Object.entries(vocab.tier1Embeddings)) {
|
|
const similarity = cosineSimilarity(queryEmbedding, typeEmbedding);
|
|
if (similarity > bestTypeSimilarity && similarity > 0.7) {
|
|
bestTypeSimilarity = similarity;
|
|
bestType = typeName;
|
|
}
|
|
}
|
|
|
|
if (!bestType) return {}; // No type matched
|
|
|
|
// Tier 2: Find best matching subtype within the Types file
|
|
const typeCode = vocab.institutionTypes[bestType].code;
|
|
let bestSubtype: string | null = null;
|
|
let bestSubtypeSimilarity = 0;
|
|
|
|
for (const [subtypeName, subtypeEmbedding] of Object.entries(vocab.tier2Embeddings[typeCode] || {})) {
|
|
const similarity = cosineSimilarity(queryEmbedding, subtypeEmbedding);
|
|
if (similarity > bestSubtypeSimilarity && similarity > 0.75) {
|
|
bestSubtypeSimilarity = similarity;
|
|
bestSubtype = subtypeName;
|
|
}
|
|
}
|
|
|
|
return {
|
|
institutionType: typeCode,
|
|
institutionSubtype: bestSubtype,
|
|
// ... location and intent extraction
|
|
};
|
|
}
|
|
```
|
|
|
|
### Embedding Model Choice
|
|
|
|
For build-time embedding generation, use the same model as the semantic cache:
|
|
|
|
| Option | Model | Dimensions | Quality |
|
|
|--------|-------|------------|---------|
|
|
| **Primary** | `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` | 384 | Good multilingual |
|
|
| Fallback | `all-MiniLM-L6-v2` | 384 | English-focused |
|
|
| High Quality | `multilingual-e5-large` | 1024 | Best multilingual |
|
|
|
|
**Build-time generation**: Embeddings are generated ONCE at build time and stored in JSON.
|
|
This avoids runtime embedding API calls for type classification.
|
|
|
|
## TypesVocabulary JSON Structure
|
|
|
|
Generated at build time with **pre-computed embeddings**:
|
|
|
|
```json
|
|
{
|
|
"version": "2026-01-10T12:00:00Z",
|
|
"schemaVersion": "20251121",
|
|
"embeddingModel": "paraphrase-multilingual-MiniLM-L12-v2",
|
|
"embeddingDimensions": 384,
|
|
|
|
"tier1Embeddings": {
|
|
"MuseumType": [0.023, -0.045, 0.087, ...],
|
|
"ArchiveOrganizationType": [0.012, 0.056, -0.034, ...],
|
|
"LibraryType": [-0.034, 0.089, 0.012, ...],
|
|
"GalleryType": [0.045, -0.023, 0.067, ...]
|
|
},
|
|
|
|
"tier2Embeddings": {
|
|
"M": {
|
|
"ART_MUSEUM": [0.034, -0.056, 0.078, ...],
|
|
"NATURAL_HISTORY_MUSEUM": [0.045, 0.023, -0.089, ...],
|
|
"SCIENCE_MUSEUM": [0.067, -0.012, 0.045, ...]
|
|
},
|
|
"A": {
|
|
"MUNICIPAL_ARCHIVE": [0.089, 0.034, -0.056, ...],
|
|
"NATIONAL_ARCHIVE": [0.012, -0.078, 0.045, ...],
|
|
"CHURCH_ARCHIVE": [-0.023, 0.067, 0.034, ...]
|
|
}
|
|
},
|
|
|
|
"termLog": {
|
|
"kunstmuseum": {"typeCode": "M", "subtypeName": "ART_MUSEUM", "wikidata": "Q207694", "lang": "nl"},
|
|
"art museum": {"typeCode": "M", "subtypeName": "ART_MUSEUM", "wikidata": "Q207694", "lang": "en"},
|
|
"gemeentearchief": {"typeCode": "A", "subtypeName": "MUNICIPAL_ARCHIVE", "wikidata": "Q8362876", "lang": "nl"},
|
|
"stadsarchief": {"typeCode": "A", "subtypeName": "MUNICIPAL_ARCHIVE", "wikidata": "Q8362876", "lang": "nl"},
|
|
"city archive": {"typeCode": "A", "subtypeName": "MUNICIPAL_ARCHIVE", "wikidata": "Q8362876", "lang": "en"},
|
|
"burgerlijke stand": {"typeCode": "A", "recordSetType": "CIVIL_REGISTRY", "lang": "nl"},
|
|
"geboorteakte": {"typeCode": "A", "recordSetType": "CIVIL_REGISTRY", "lang": "nl"}
|
|
},
|
|
|
|
"institutionTypes": {
|
|
"M": {
|
|
"code": "M",
|
|
"className": "MuseumType",
|
|
"baseWikidata": "Q33506",
|
|
"accumulatedTerms": "museum musea kunstmuseum art museum natural history museum science museum open-air museum ecomuseum virtual museum heritage farm national museum regional museum university museum...",
|
|
"keywords": {
|
|
"nl": ["museum", "musea"],
|
|
"en": ["museum", "museums"],
|
|
"de": ["Museum", "Museen"]
|
|
},
|
|
"subtypes": {
|
|
"ART_MUSEUM": {
|
|
"className": "ArtMuseum",
|
|
"wikidata": "Q207694",
|
|
"accumulatedTerms": "kunstmuseum art museum kunstmusea art museums fine art museum visual arts museum painting gallery sculpture museum",
|
|
"keywords": {
|
|
"nl": ["kunstmuseum", "kunstmusea"],
|
|
"en": ["art museum", "art museums"]
|
|
}
|
|
},
|
|
"NATURAL_HISTORY_MUSEUM": {
|
|
"className": "NaturalHistoryMuseum",
|
|
"wikidata": "Q559049",
|
|
"accumulatedTerms": "natuurhistorisch museum natuurmuseum natural history museum science museum fossils taxidermy specimens geology biology",
|
|
"keywords": {
|
|
"nl": ["natuurhistorisch museum", "natuurmuseum"],
|
|
"en": ["natural history museum"]
|
|
}
|
|
}
|
|
}
|
|
},
|
|
"A": {
|
|
"code": "A",
|
|
"className": "ArchiveOrganizationType",
|
|
"baseWikidata": "Q166118",
|
|
"accumulatedTerms": "archief archieven archive archives gemeentearchief stadsarchief nationaal archief rijksarchief church archive company archive film archive...",
|
|
"keywords": {
|
|
"nl": ["archief", "archieven"],
|
|
"en": ["archive", "archives"]
|
|
},
|
|
"subtypes": {
|
|
"MUNICIPAL_ARCHIVE": {
|
|
"className": "MunicipalArchive",
|
|
"wikidata": "Q8362876",
|
|
"accumulatedTerms": "gemeentearchief stadsarchief municipal archive city archive town archive local government records civil registry population register building permits council minutes",
|
|
"keywords": {
|
|
"nl": ["gemeentearchief", "stadsarchief", "gemeentelijke archiefdienst"],
|
|
"en": ["municipal archive", "city archive", "town archive"]
|
|
}
|
|
},
|
|
"NATIONAL_ARCHIVE": {
|
|
"className": "NationalArchive",
|
|
"wikidata": "Q1188452",
|
|
"accumulatedTerms": "nationaal archief rijksarchief national archive state archive government records national records federal archive",
|
|
"keywords": {
|
|
"nl": ["nationaal archief", "rijksarchief"],
|
|
"en": ["national archive", "state archive"]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
},
|
|
|
|
"recordSetTypes": {
|
|
"CIVIL_REGISTRY": {
|
|
"className": "CivilRegistrySeries",
|
|
"accumulatedTerms": "burgerlijke stand geboorteakte huwelijksakte overlijdensakte bevolkingsregister civil registry birth records marriage records death records population register vital records genealogy",
|
|
"keywords": {
|
|
"nl": ["burgerlijke stand", "geboorteakte", "huwelijksakte", "overlijdensakte", "bevolkingsregister"],
|
|
"en": ["civil registry", "birth records", "marriage records", "death records"]
|
|
}
|
|
},
|
|
"COUNCIL_GOVERNANCE": {
|
|
"className": "CouncilGovernanceFonds",
|
|
"accumulatedTerms": "gemeenteraad raadsnotulen raadsbesluit verordening council minutes ordinances resolutions bylaws municipal council town council city council",
|
|
"keywords": {
|
|
"nl": ["gemeenteraad", "raadsnotulen", "raadsbesluit", "verordening"],
|
|
"en": ["council minutes", "ordinances", "resolutions"]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Key Additions for Embedding Support
|
|
|
|
| Field | Purpose |
|
|
|-------|---------|
|
|
| `tier1Embeddings` | Pre-computed embeddings for each Types file (19 categories) |
|
|
| `tier2Embeddings` | Pre-computed embeddings for each subtype (500+ types) |
|
|
| `termLog` | Fast O(1) lookup table for exact keyword matches |
|
|
| `accumulatedTerms` | Raw text used to generate embeddings (for debugging/regeneration) |
|
|
| `embeddingModel` | Model used to generate embeddings (for reproducibility) |
|
|
|
|
## Enhanced ExtractedEntities Interface
|
|
|
|
```typescript
|
|
export interface ExtractedEntities {
|
|
// Existing fields
|
|
institutionType?: InstitutionTypeCode | null;
|
|
location?: string | null;
|
|
locationType?: 'city' | 'province' | null;
|
|
intent?: 'count' | 'list' | 'info' | null;
|
|
|
|
// NEW: Ontology-derived fields
|
|
institutionSubtype?: string | null; // e.g., 'MUNICIPAL_ARCHIVE', 'ART_MUSEUM'
|
|
recordSetType?: string | null; // e.g., 'CIVIL_REGISTRY', 'COUNCIL_GOVERNANCE'
|
|
subtypeWikidata?: string | null; // e.g., 'Q8362876' for LOD integration
|
|
}
|
|
```
|
|
|
|
## Enhanced Cache Key Format
|
|
|
|
```
|
|
{intent}:{institutionType}[.{subtype}][:{recordSetType}]:{location}
|
|
|
|
Examples:
|
|
- "count:m:amsterdam" # Basic museum count
|
|
- "count:m.art_museum:amsterdam" # Art museum count (subtype)
|
|
- "list:a.municipal_archive:nh" # Municipal archives in Noord-Holland
|
|
- "query:a:civil_registry:utrecht" # Civil registry in Utrecht
|
|
- "info:a.national_archive::nl" # National archive info (no location filter)
|
|
```
|
|
|
|
## Implementation Files
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `scripts/extract-types-vocab.ts` | Build-time vocabulary extraction from LinkML |
|
|
| `apps/archief-assistent/public/types-vocab.json` | Generated vocabulary file |
|
|
| `apps/archief-assistent/src/lib/types-vocabulary.ts` | Runtime vocabulary loader |
|
|
| `apps/archief-assistent/src/lib/semantic-cache.ts` | Updated entity extraction |
|
|
|
|
## Build Integration
|
|
|
|
Add to `apps/archief-assistent/package.json`:
|
|
|
|
```json
|
|
{
|
|
"scripts": {
|
|
"prebuild": "tsx ../../scripts/extract-types-vocab.ts",
|
|
"build": "vite build"
|
|
}
|
|
}
|
|
```
|
|
|
|
## Keyword Extraction Priority
|
|
|
|
When extracting keywords from schema files:
|
|
|
|
1. **`keywords`** array (highest priority) - Explicit search terms
|
|
2. **`structured_aliases.literal_form`** - Multilingual alternative names
|
|
3. **`type_label`** - Preferred labels per language
|
|
4. **Class name conversion** - `MunicipalArchive` → "municipal archive"
|
|
|
|
## Cache Segmentation Rules
|
|
|
|
### Rule 1: Subtype Specificity
|
|
|
|
Queries with **specific subtypes** should NOT match **generic type** cache entries:
|
|
|
|
```
|
|
Query: "kunstmusea in Amsterdam" → key: "count:m.art_museum:amsterdam"
|
|
Cached: "musea in Amsterdam" → key: "count:m:amsterdam"
|
|
Result: MISS (subtype mismatch) ✅
|
|
```
|
|
|
|
### Rule 2: Record Set Type Isolation
|
|
|
|
Queries about **specific record types** should cache separately:
|
|
|
|
```
|
|
Query: "burgerlijke stand Utrecht" → key: "query:a:civil_registry:utrecht"
|
|
Cached: "archieven in Utrecht" → key: "list:a:utrecht"
|
|
Result: MISS (record set type mismatch) ✅
|
|
```
|
|
|
|
### Rule 3: Subtype-to-Type Fallback
|
|
|
|
Generic queries CAN match subtype cache entries (broader is acceptable):
|
|
|
|
```
|
|
Query: "musea in Amsterdam" → key: "count:m:amsterdam"
|
|
Cached: "kunstmusea in Amsterdam" → key: "count:m.art_museum:amsterdam"
|
|
Result: MISS (don't return subset for superset query)
|
|
```
|
|
|
|
## Migration Notes
|
|
|
|
1. **Backwards Compatible**: Existing cache entries without `institutionSubtype` continue to work
|
|
2. **Gradual Rollout**: New cache entries get subtype, old entries remain valid
|
|
3. **Cache Clear**: Consider clearing cache after deployment to ensure consistency
|
|
|
|
## Validation
|
|
|
|
Run E2E tests to verify:
|
|
|
|
```bash
|
|
cd apps/archief-assistent
|
|
npm run test:e2e
|
|
```
|
|
|
|
Key test cases:
|
|
- Geographic isolation (Amsterdam ≠ Rotterdam ≠ Noord-Holland)
|
|
- Subtype isolation (kunstmuseum ≠ museum)
|
|
- Record set isolation (burgerlijke stand ≠ archive)
|
|
- Intent isolation (count ≠ list ≠ info)
|
|
|
|
## References
|
|
|
|
- **Rule 41**: Types classes define SPARQL template variables
|
|
- **Rule 0b**: Type/Types file naming convention
|
|
- **CustodianType.yaml**: Base taxonomy definition
|
|
- **AGENTS.md**: GLAMORCUBESFIXPHDNT taxonomy documentation
|
|
|
|
---
|
|
|
|
**Created**: 2026-01-10
|
|
**Author**: OpenCode Agent
|
|
**Status**: Implemented (v2.0)
|
|
|
|
## References
|
|
|
|
- Pavlyshyn, V. "Context Graphs and Data Traces: Building Epistemology Layers for Agentic Memory"
|
|
- Pavlyshyn, V. "The Shape of Knowledge: Topology Theory for Knowledge Graphs"
|
|
- Pavlyshyn, V. "Beyond Hierarchy: Why Agentic AI Systems Need Holarchies"
|
|
- Pavlyshyn, V. "Smalltalk: The Language That Changed Everything"
|
|
- Pavlyshyn, V. "Clarity Traders: Beyond Vibe Coding"
|