kempersc
ad74d8379e
feat(scripts): improve types-vocab extraction to derive all vocabulary from schema
...
- Remove hardcoded type mappings, derive dynamically from LinkML
- Extract keywords from annotations, structured_aliases, and comments
- Add rename_plural_slot.py utility for schema slot renaming
2026-01-10 15:37:52 +01:00
kempersc
13938c92ca
chore(schemas): sync LinkML schemas to frontend apps
...
Copies authoritative schemas from schemas/20251121/ to:
- frontend/public/schemas/20251121/
- apps/archief-assistent/public/schemas/20251121/
This ensures slot definitions with corrected ontology property
references (commit 2808dad6cd ) are available to frontend apps.
2026-01-10 15:02:25 +01:00
kempersc
e5a08a353d
enrich person profiles
2026-01-10 14:14:04 +01:00
kempersc
f2bc2d54cb
feat(archief-assistent): integrate ontology-driven vocabulary into semantic cache
...
Implements Rule 46: Ontology-Driven Cache Segmentation
Semantic Cache Enhancements:
- Add institutionSubtype, recordSetType, wikidataEntity to ExtractedEntities
- Add extractionMethod field to track vocabulary vs regex extraction
- Implement async extractEntitiesWithVocabulary() using term log
- Maintain sync regex fallback for cache key generation (<5ms)
Build Pipeline:
- Add prebuild hook to regenerate types-vocab.json from LinkML schemas
- Extract vocabulary from *Type.yaml and *Types.yaml schema files
- Generate GLAMORCUBESFIXPHDNT code mappings automatically
New Script:
- scripts/extract-types-vocab.ts - Extracts vocabulary from LinkML schemas
- Supports --skip-embeddings flag for faster builds
- Outputs to apps/archief-assistent/public/types-vocab.json
This enables richer cache segmentation using ontology-derived subtypes
(e.g., 'MUNICIPAL_ARCHIVE', 'ART_MUSEUM') instead of just top-level
GLAMORCUBESFIXPHDNT codes.
2026-01-10 13:30:30 +01:00
kempersc
01b9d77566
feat(archief-assistent): add ontology-driven types vocabulary for cache segmentation
...
Add LinkML-derived vocabulary for semantic cache entity extraction (Rule 46):
- types-vocab.json: 10,142 lines of institution type vocabulary from LinkML
- 19 GLAMORCUBESFIXPHDNT type codes with Dutch/English/German/French labels
- Includes subtypes (kunstmuseum, rijksmuseum, streekarchief, etc.)
- Extracted from CustodianType.yaml and CustodianTypes.yaml
- types-vocabulary.ts: TypeScript module for entity extraction
- Exports INSTITUTION_TYPES with regex patterns per type code
- Replaces hardcoded patterns with schema-derived vocabulary
- Supports multilingual matching
- Rule 46 documentation (.opencode/rules/)
- Specifies vocabulary extraction workflow
- Defines cache key generation algorithm
- Migration path from hardcoded patterns
2026-01-10 12:57:03 +01:00
kempersc
4f0cafe98a
enrich HC profiles
2026-01-02 02:11:04 +01:00
kempersc
d64f857aa9
add sparql validator and RAG injector
2025-12-30 03:43:31 +01:00
kempersc
aca68ea47f
remove a,bihguous web-claims
2025-12-21 00:01:54 +01:00