Commit graph

280 commits

Author SHA1 Message Date
kempersc
e5a08a353d enrich person profiles 2026-01-10 14:14:04 +01:00
kempersc
9339de2cfb data(person): process 44,512 heritage-relevant profiles from entity extractions
Processing Summary:
- Scanned 94,716 LinkedIn entity files
- Identified 44,512 heritage-relevant individuals (47%)
- Created 1,430 new PPID-formatted profiles
- Updated 43,070 existing profiles with entity data
- Final count: 40,731 person profiles

Profile updates include:
- Merged web_claims with full provenance
- Added/updated heritage_relevance scoring
- Added affiliation data with custodian references
- Added inferred birth decades with provenance chains (Rule 45)

All data preserved per Rule 5 (additive only)
2026-01-10 14:01:29 +01:00
kempersc
3a15f2bdaa feat(scripts): add entity-to-PPID processing script
- Processes 94,716 LinkedIn entity files from data/custodian/person/entity/
- Identifies heritage-relevant profiles (47% of total)
- Generates PPID-formatted filenames with inferred locations/dates
- Merges with existing profiles, preserving all provenance data
- Applies Rules 12, 20, 27, 44, 45 for person data architecture
- Fixed edge case: handle null education/experience arrays
2026-01-10 13:58:06 +01:00
kempersc
57e77c8b19 chore(deps): add tsx, yaml, and @types/node for schema extraction script
Dependencies for scripts/extract-types-vocab.ts:
- tsx: TypeScript execution for Node.js scripts
- yaml: Parse LinkML schema files
- @types/node: TypeScript definitions for Node.js APIs
2026-01-10 13:33:12 +01:00
kempersc
0845d9f30e feat(scripts): add person enrichment and slot mapping utilities
Person Enrichment Scripts:
- enrich_person_comprehensive.py: Full-featured web search enrichment via Linkup
  with Rule 6/21/26/34/35 compliance (dual timestamps, no fabrication)
- enrich_ppids_linkup.py: Batch PPID enrichment pipeline
- extract_persons_with_provenance.py: Extract person data from LinkedIn HTML
  with XPath provenance tracking

LinkML Slot Management:
- update_slot_mappings.py: Update slots for RiC-O naming (Rule 39) and
  semantic URI requirements (Rule 38)
- update_class_slot_references.py: Update class files referencing renamed slots
- validate_slot_mappings.py: Validate slot definitions against ontology rules

All scripts follow established project conventions for provenance and
ontology alignment.
2026-01-10 13:32:32 +01:00
kempersc
6f3cf95492 data(person): fix data quality issues and PPID corrections
Data Quality Corrections:
- TIRANA-ADISUNA: Fix erroneous death_year claim (was education end date 2016,
  not death). Set is_living=true. Reassess heritage_relevance=false (tourism
  ministry is not a GLAM institution)
- ALEX-ALSEMGEEST: Rename from NL-ZH-TH (The Hague) to NL-ZH-ROT (Rotterdam)
  based on verified birth location. Update birth year to 1980

Profile Enrichments (5 profiles with XX-XX-XXX placeholders):
- Add web claims with proper provenance timestamps
- Add LinkedIn-verified education and position claims
- Document correction rationale in modification_reason

Heritage Relevance Reassessments:
- Government ministries (Tourism, etc.) marked as non-heritage
- Only GLAM institutions (Galleries, Libraries, Archives, Museums) qualify
2026-01-10 13:31:39 +01:00
kempersc
f2bc2d54cb feat(archief-assistent): integrate ontology-driven vocabulary into semantic cache
Implements Rule 46: Ontology-Driven Cache Segmentation

Semantic Cache Enhancements:
- Add institutionSubtype, recordSetType, wikidataEntity to ExtractedEntities
- Add extractionMethod field to track vocabulary vs regex extraction
- Implement async extractEntitiesWithVocabulary() using term log
- Maintain sync regex fallback for cache key generation (<5ms)

Build Pipeline:
- Add prebuild hook to regenerate types-vocab.json from LinkML schemas
- Extract vocabulary from *Type.yaml and *Types.yaml schema files
- Generate GLAMORCUBESFIXPHDNT code mappings automatically

New Script:
- scripts/extract-types-vocab.ts - Extracts vocabulary from LinkML schemas
- Supports --skip-embeddings flag for faster builds
- Outputs to apps/archief-assistent/public/types-vocab.json

This enables richer cache segmentation using ontology-derived subtypes
(e.g., 'MUNICIPAL_ARCHIVE', 'ART_MUSEUM') instead of just top-level
GLAMORCUBESFIXPHDNT codes.
2026-01-10 13:30:30 +01:00
kempersc
2808dad6cd fix(linkml): correct invalid ontology property references in slot definitions
- confidence_score: prov:confidence doesn't exist → hc:confidenceScore
- deliverables: schema:result doesn't exist → hc:deliverables
- circumstances_of_death: wikidata:P1196 is identifier, not predicate → hc:circumstancesOfDeath
- deceased: schema:deathDate wrong semantics for boolean → hc:deceased
- death_place: fix sdo prefix to schema, remove wd:P20 as exact mapping
- date_of_death: wikidata:P570 is identifier, not predicate
- martyred: correct prefix inconsistencies
- given_name/literal_name: fix sdo→schema prefix
- occupation/religion/status: standardize prefix declarations

Add comments documenting why Wikidata properties (P-numbers) cannot be
used as slot_uri (they are entity identifiers, not RDF predicates).
2026-01-10 13:29:55 +01:00
kempersc
49f4054802 data(person/entity): add 83,845 LinkedIn profile extractions from company pages
Bulk extraction of heritage professional profiles from LinkedIn company pages
using extract_persons_with_provenance.py script.

Key characteristics:
- Source: LinkedIn company 'People' pages for heritage institutions
- File format: {linkedin-slug}_{timestamp}.json
- Total size: ~3.6GB
- Includes: profile_data, heritage_relevance, affiliations, web_claims
- Provenance: Full XPath + archived HTML references (Rule 6 compliant)
- Dual timestamps: statement_created_at + source_archived_at (Rule 35)

Extraction metadata includes:
- extraction_agent: extract_persons_with_provenance.py
- source_file: Original archived HTML filename
- source_archived_at: When LinkedIn page was captured
- schema_version: 1.0.0

Note: URL-encoded filenames preserve international characters (Arabic,
Hebrew, Chinese, Turkish, accented Latin, etc.)
2026-01-10 13:27:08 +01:00
kempersc
01b9d77566 feat(archief-assistent): add ontology-driven types vocabulary for cache segmentation
Add LinkML-derived vocabulary for semantic cache entity extraction (Rule 46):

- types-vocab.json: 10,142 lines of institution type vocabulary from LinkML
  - 19 GLAMORCUBESFIXPHDNT type codes with Dutch/English/German/French labels
  - Includes subtypes (kunstmuseum, rijksmuseum, streekarchief, etc.)
  - Extracted from CustodianType.yaml and CustodianTypes.yaml

- types-vocabulary.ts: TypeScript module for entity extraction
  - Exports INSTITUTION_TYPES with regex patterns per type code
  - Replaces hardcoded patterns with schema-derived vocabulary
  - Supports multilingual matching

- Rule 46 documentation (.opencode/rules/)
  - Specifies vocabulary extraction workflow
  - Defines cache key generation algorithm
  - Migration path from hardcoded patterns
2026-01-10 12:57:03 +01:00
kempersc
30cd8842d9 data(person): update profiles with web claims and PPID corrections
- Rename SENNAY-GHEBREAB profile: NL-ZH-ROT → ET-XX-ADD (Ethiopian birth)
- Enrich profiles with inferred birth decades and settlements
- Add web claims provenance for enriched data
- Update 16 profiles with improved location resolution

Files: +1 new (renamed), 16 modified, 1 deleted
2026-01-10 12:56:28 +01:00
kempersc
095a3f949c refactor(linkml): apply RiC-O slot naming conventions to /schemas/ (Rule 39)
Apply same RiC-O-style slot naming refactor to /schemas/20251121/linkml/
that was previously applied to frontend/public/schemas/:

- Add 'has_' prefix for possession predicates
- Add 'is_or_was_' prefix for temporal inverse relationships
- Add 'has_or_had_' for bidirectional temporal relations
- Add new slots: is_or_was_aggregated_by, is_or_was_allocated_by, etc.
- Update count slots with proper descriptions

This ensures consistency between the source schema directory and the
frontend-served schemas.

514 files changed, +6,325 insertions, -4,255 deletions
2026-01-10 12:55:45 +01:00
kempersc
3c4f7acf87 test(archief-assistent): update E2E tests for entity extraction cache
- Simplify cache spec assertions after structured matching implementation
- Refactor map-panel spec for better test isolation and reliability
- Remove redundant geographic false positive tests (handled by entity extraction)
2026-01-10 12:55:22 +01:00
kempersc
5eaab2bd30 data(person): enrich heritage professional profiles with web claims
Batch enrichment of 3,728 person profiles with additional data:
- Birth decade inference from education/career history
- Location resolution for inferred birth settlements
- Web claims with full provenance (source_url, retrieved_on)
- Organizational subdivision extraction
- Heritage relevance scoring

Also includes:
- 14 profile renames for PPID format corrections
- Updated _manifest.json with extraction statistics
- New _extraction_log.txt and _extraction_summary.json

Enrichment follows AGENTS.md rules:
- Rule 44: EDTF unknown date notation (XXXX, 196X, etc.)
- Rule 45: Inferred data with explicit provenance
- Rule 30: Confidence scoring (0.50-0.95)
- Rule 31: Organizational subdivision extraction

35,052 files changed, +4,507,411 insertions, -63,118 deletions
2026-01-10 10:35:20 +01:00
kempersc
8a475d5c02 refactor(linkml): apply RiC-O slot naming conventions (Rule 39)
Rename slots to follow Records in Contexts (RiC-O) style naming:
- Add 'has_' prefix for possession predicates (has_acquisition_method)
- Add 'is_or_was_' prefix for temporal relationships
- Add 'has_or_had_' for bidirectional temporal relations

Key changes across 496 schema files:
- acquisition_method → has_acquisition_method
- acquisition_date → has_acquisition_date
- acquisition_source → has_acquisition_source
- access_policy_ref → has_access_policy_reference
- arrangement → has_arrangement
- parent_custodian → is_or_was_suborganization_of (hierarchy)
- parent_custodian → associated_custodian (event association)

Also adds new slots following RiC-O patterns:
- is_or_was_aggregated_by
- is_or_was_allocated_by
- is_or_was_archive_department_of
- was_approved_by, was_archived_at, was_asserted_by

This aligns with AGENTS.md Rule 39: Slot Naming Convention (RiC-O Style)
for accurate temporal semantics in heritage custodian ontology.

Net change: +2,063 lines (new slots added, old patterns consolidated)
2026-01-10 10:33:51 +01:00
kempersc
7fbff2ff5f feat(archief-assistent): add entity extraction to semantic cache
Prevent geographic false positives in cache lookups. Queries like
"musea in Amsterdam" vs "musea in Noord-Holland" have ~93%
embedding similarity but completely different answers.

Changes:
- Add ExtractedEntities interface for structured cache keys
- Implement fast entity extraction (<5ms, no LLM) with regex patterns
- Extract institution types (GLAMORCUBESFIXPHDNT), locations, and intent
- Generate structured cache keys (e.g., "count:M:amsterdam")
- Raise similarity threshold from 0.85 to 0.97 to match backend DSPy
- Add 'structured' match method to CacheLookupResult

The entity extractor recognizes:
- 19 institution types (Dutch + English patterns)
- 12 Dutch provinces with ISO 3166-2:NL codes
- Major Dutch cities with settlement codes
- Query intents (count, list, info)

This ensures geographic queries get different cache entries even when
embeddings are highly similar.
2026-01-10 10:33:21 +01:00
kempersc
519b0b47a8 Add Playwright test results JSON file with initial test suite and failure details 2026-01-09 21:33:31 +01:00
kempersc
004d342935 chore: minor updates and evaluation results
- auth.setup.ts: require env vars for test credentials (no hardcoded defaults)
- manifest.json: update schema manifest
- full_evaluation_results.json: add RAG evaluation results
- petra-links.json: update birth date from web claim
2026-01-09 21:10:55 +01:00
kempersc
dd0ee2cf11 feat(scripts): expand university location mappings and add web enrichment
- enrich_ppids.py: Add 40+ Dutch universities and hogescholen to location mapping
- enrich_ppids_web.py: New script for web-based PPID enrichment
- resolve_pending_known_orgs.py: Updates for pending org resolution
2026-01-09 21:10:14 +01:00
kempersc
ea35da02dc test(archief-assistent): add Playwright E2E test suite
- Add chat.spec.ts for RAG query testing
- Add count-queries.spec.ts for aggregation validation
- Add map-panel.spec.ts for geographic feature testing
- Add cache.spec.ts for response caching verification
- Add auth.setup.ts for authentication handling
- Configure playwright.config.ts for multi-browser testing
- Tests run against production archief.support
2026-01-09 21:09:56 +01:00
kempersc
855fff5962 data(person): resolve PPID locations and enrich profiles
- Rename 512 person files from XX-XX-XXX placeholders to proper GeoNames locations
- Update 2,463 profiles with enriched data
- Add 512 new person profiles (AU, international heritage professionals)
- PPID format: ID_{birth-loc}_{decade}_{work-loc}_{custodian}_{NAME}
2026-01-09 21:09:28 +01:00
kempersc
eb122e2532 data(custodian): remove 380 PENDING files after collision merge
PENDING files were merged into existing custodian records in commit eaf80ec.
These temporary collision placeholder files are no longer needed.
2026-01-09 21:06:22 +01:00
kempersc
97f85e0050 deps(archief-assistent): add playwright for E2E testing
- Add @playwright/test as dev dependency
- Alphabetize dependencies list
2026-01-09 21:06:12 +01:00
kempersc
f7bd3e9edc feat(linkml-viewer): add slot_usage side-by-side comparison view
- Add 'Compare' toggle button next to slots with slot_usage overrides
- Show generic slot definition vs class-specific override in 3-column grid
- Highlight changed properties with green 'changed' badge
- Display '(inherited)' when override matches generic definition
- Display '(not defined)' when generic has no value for property
- Compare: range, description, required, multivalued, slot_uri, pattern, identifier
- Full i18n support (Dutch/English translations)
- Responsive design: stacks vertically on mobile (<640px)
2026-01-09 21:02:14 +01:00
kempersc
9e67d0f967 enrich profiles 2026-01-09 20:35:19 +01:00
kempersc
12fed83d6e fix(rag): preserve count value for COUNT queries in non-streaming endpoint
- Detect COUNT queries by checking for 'count' key in SPARQL results
- Skip institution transformation for COUNT queries to preserve count value
- Fixes bug where 'Hoeveel archieven in Utrecht?' returned 1 instead of 10
- COUNT queries now correctly extract integer count from SPARQL response
2026-01-09 18:57:40 +01:00
kempersc
8a7ed757b8 fix(rag): use SPARQL results for COUNT queries in streaming fast-path
- Fix bug where COUNT queries showed Qdrant result count (10) instead of
  actual SPARQL count (e.g., 204 musea in Noord-Holland)
- Use sparql_results for count extraction in factual query fast-path
- Also fix fallback COUNT/LIST handling to use sparql_results
2026-01-09 18:47:56 +01:00
kempersc
eaf80ec756 data(custodian): merge PENDING collision files into existing custodians
Merge staff data from 7 PENDING files into their matching custodian records:
- NL-XX-XXX-PENDING-SPOT-GRONINGEN → NL-GR-GRO-M-SG (SPOT Groningen, 120 staff)
- NL-XX-XXX-PENDING-DIENST-UITVOERING-ONDERWIJS → NL-GR-GRO-O-DUO
- NL-XX-XXX-PENDING-ANNE-FRANK-STICHTING → NL-NH-AMS-M-AFS
- NL-XX-XXX-PENDING-ALLARD-PIERSON → NL-NH-AMS-M-AP
- NL-XX-XXX-PENDING-STICHTING-JOODS-HISTORISCH-MUSEUM → NL-NH-AMS-M-JHM
- NL-XX-XXX-PENDING-MINISTERIE-VAN-BUITENLANDSE-ZAKEN → NL-ZH-DHA-O-MBZ
- NL-XX-XXX-PENDING-MINISTERIE-VAN-JUSTITIE-EN-VEILIGHEID → NL-ZH-DHA-O-MJV

Originals archived in data/custodian/archive/pending_collisions_20250109/
Add scripts/merge_collision_files.py for reproducible merging
2026-01-09 18:33:00 +01:00
kempersc
e9c9aefc37 data(person): regenerate PPIDs with unidecode support for non-Latin scripts
- Add display_name and name_romanized fields to all 7948 person profiles
- Resolve UNKNOWN-UNKNOWN collision group (Hebrew/Arabic names now properly romanize)
- Hebrew names like אבישי דנינו now generate PPID AVISHI-DANINO instead of UNKNOWN-UNKNOWN
- Collision count reduced from 82 to 81 groups

Regenerated using generate_ppids.py with unidecode support (commit abe30cb)
2026-01-09 18:31:53 +01:00
kempersc
04791a7a91 fix(ppid): fix unidecode import reference typo 2026-01-09 18:29:36 +01:00
kempersc
c45367c60f data(custodian): resolve more PENDING files with proper GHCIDs
Additional batch of PENDING file resolutions:
- DK: Aalborg Teater
- FR: Airborne Museum, ALCA Nouvelle-Aquitaine
- NL: 12 institutions (CODA Apeldoorn, Airborne Museum Arnhem, etc.)
- SA: Saudi Arabia Ministry of Culture

Files renamed from NL-XX-XXX-PENDING-* to proper country/region codes.
2026-01-09 18:29:09 +01:00
kempersc
abe30cb302 feat(ppid): add unidecode support for non-Latin script transliteration
Add optional unidecode dependency to handle Hebrew, Arabic, Chinese,
and other non-Latin scripts when generating Person Persistent IDs.
2026-01-09 18:28:41 +01:00
kempersc
932ec5438c add person profiles with PPID 2026-01-09 18:26:58 +01:00
kempersc
c0d31b3905 fix(rag): add fallback imports for semantic_router and temporal_intent
Support both relative and absolute imports for running as module or script.
2026-01-09 18:26:40 +01:00
kempersc
bd06e4f864 data(custodian): merge 135 PENDING files into existing enriched records
Merge data from PENDING files (with XX-XXX placeholders) into their
corresponding enriched custodian records with proper GHCIDs.

Countries affected:
- DE: 4 institutions (Deutsche Stiftung, Jewish Museum Berlin, etc.)
- ES: 1 institution (Biblioteca Nacional de España)
- FR: 1 institution (NMO)
- ID: 18 Indonesian museums and archives
- NL: 111 Dutch institutions across all provinces
- US: 1 institution (ARCA)

The PENDING files are deleted after merge; originals archived in
data/custodian/archive/pending_merged_20250109/
2026-01-09 18:25:56 +01:00
kempersc
1ad717767a feat(linkml-viewer): add visual indicators for slot_usage overrides
- Add green 'slot_usage' badge for slots with class-specific overrides
- Add ✦ markers next to properties that are overridden vs inherited
- Add green left border styling for slots with slot_usage
- Add i18n translations (nl/en) for override indicators
- Merge generic slot definitions with class-specific slot_usage properties

This helps users understand which slot properties come from the generic
slot definition vs which are overridden at the class level via slot_usage.
2026-01-09 18:23:21 +01:00
kempersc
7ec4e05dd4 feat(merge): add script to merge PENDING files by matching emic names with existing files 2026-01-09 16:42:55 +01:00
kempersc
7f53ec6074 docs(person_pid): add PPID-GHCID alignment and PiCo comparison docs 2026-01-09 15:57:26 +01:00
kempersc
a51c8c400c data(pending): add 125 international PENDING custodian files with proper country codes
Identified 125 institutions from LinkedIn staff extraction that are NOT Dutch:
- FR: 45 (French museums, archives, libraries)
- ID: 14 (Indonesian institutions)
- GB: 14 (British institutions)
- DE: 13 (German museums, foundations)
- BE: 11 (Belgian museums)
- IT: 6 (Italian institutions)
- AU: 6 (Australian archives, museums)
- Plus smaller counts from IN, US, ES, CH, DK, AT, SA, NO, IL

These files have staff data from LinkedIn company pages but need
GHCID resolution (currently XX-XXX placeholders for region/city).

Dutch PENDING files remain: 1,283
2026-01-09 15:55:31 +01:00
kempersc
ce66a294e5 fix(rag): transform SPARQL results to match frontend metadata format for map coordinates
- Convert flat SPARQL results {lat, lon} to nested {metadata: {latitude, longitude}}
- Parse string coordinates to float values
- Add city/country/institution_type from template slots
- Enables ChatMapPanel to render map markers correctly
2026-01-09 15:49:18 +01:00
kempersc
14be18e7c4 feat(data): merge staff data from 30 more PENDING files into enriched custodians
Batch 2 of PENDING file resolution:
- Merged LinkedIn staff data from 30 PENDING files into matching enriched custodians
- Archived processed PENDING files to data/custodian/archive/pending_merged_20250109/
- Notable merges: ASML (994 staff), BBB (117), Apenheul (100), BOEI (93)

Files merged include:
- Corporate: ASML, BOS Foundation, Constructing the Limes
- Museums: Allard Pierson, Apenheul, various regional museums
- Research: Catholic Documentation Centre, Creating Cultures of Care
- Cultural orgs: Cultuur Ondernemen, CultuurOost, CultuurKwadraat

This continues the effort to consolidate PENDING files (1283 remaining).
2026-01-09 15:42:32 +01:00
kempersc
5ab9dd8ea2 docs(person_pid): add implementation guidelines and governance docs
Add final two chapters of the Person PID (PPID) design document:

- 08_implementation_guidelines.md: Database architecture, API design,
  data ingestion pipeline, GHCID integration, security, performance,
  technology stack, deployment, and monitoring specifications

- 09_governance_and_sustainability.md: Data governance policies,
  quality assurance, sustainability planning, community engagement,
  legal considerations, and long-term maintenance strategies
2026-01-09 14:51:57 +01:00
kempersc
1f723fd5d7 feat(data): merge staff data from 35 PENDING files into enriched custodians
Merged LinkedIn-extracted staff sections from PENDING files into their
corresponding proper GHCID custodian files. This consolidates data from
two extraction sources:
- Existing enriched files: Google Maps, Museum Register, YouTube, etc.
- PENDING files: LinkedIn staff data extraction

Files modified:
- 28 custodian files enriched with staff data
- 35 PENDING files deleted (merged into proper locations)
- Originals archived to archive/pending_duplicates_20250109/

Key institutions enriched:
- Rijksmuseum (NL-NH-AMS-M-RM)
- Stedelijk Museum Amsterdam (NL-NH-AMS-M-SMA)
- Amsterdam Museum (NL-NH-AMS-M-AM)
- Regionaal Archief Alkmaar (NL-NH-ALK-A-RAA)
- Maritiem Museum Rotterdam (NL-ZH-ROT-M-MMR)
- And 23 more museums/archives across NL

New scripts:
- scripts/merge_staff_data.py: Automated staff data merger
- scripts/categorize_pending_files.py: PENDING file analysis utility
2026-01-09 14:51:17 +01:00
kempersc
2c2a312e0a feat(rag): add database routing to 8 more factual query templates
Add databases: ["oxigraph"] to skip vector search for deterministic queries:
- count_institutions_by_type_location (count)
- count_institutions_by_type (aggregation)
- find_institutions_by_founding_date (temporal)
- find_custodians_by_budget_threshold (financial)
- compare_locations (comparative)
- find_by_founding (temporal)
- events_in_period (temporal events)
- institutions_by_founding_decade (temporal aggregation)

Total templates with oxigraph-only routing: 12
2026-01-09 12:33:41 +01:00
kempersc
b9c30fc970 feat(rag): extend database routing to count, temporal, and financial templates
Add databases: ["oxigraph"] to 5 more templates that don't benefit from vector search:
- count_institutions_by_type_location
- compare_locations
- find_by_founding
- find_custodians_by_budget_threshold
- find_institutions_by_founding_date

Total templates with Oxigraph-only routing: 10
2026-01-09 12:32:28 +01:00
kempersc
17a94613f3 data(custodian): resolve 57 PENDING files to proper GHCID locations
Resolved NL-XX-XXX-PENDING files to proper regional GHCIDs:
- 57 new files with proper location codes (city, region)
- Cities include: Amsterdam, Rotterdam, Utrecht, Leiden, Groningen, etc.
- 34 original PENDING files archived to archive/pending_duplicates_20250109/

Examples:
- NL-XX-XXX-PENDING-AMSTERDAM-MUSEUM → NL-NH-AMS-M-AM (Amsterdam Museum)
- NL-XX-XXX-PENDING-GRONINGEN-MUSEUM → NL-GR-GRO-M-GM (Groninger Museum)
- NL-XX-XXX-PENDING-KUNSTHAL-ROTTERDAM → NL-ZH-ROT-G-KR (Kunsthal Rotterdam)
2026-01-09 12:19:19 +01:00
kempersc
e313744cf6 feat(scripts): add resolve_pending_locations.py for GHCID resolution
Script to resolve NL-XX-XXX-PENDING files that have city names in filename:
- Looks up city in GeoNames database
- Updates YAML with location data (city, region, country)
- Generates proper GHCID with UUID v5/v8
- Renames files to match new GHCID
- Archives original PENDING files for reference
2026-01-09 12:18:46 +01:00
kempersc
787f4dacb0 feat(rag): implement database routing in query endpoint
Log database routing decisions and add databases_used to response metadata.
When template specifies databases: ["oxigraph"], Qdrant vector search is skipped.
2026-01-09 12:15:49 +01:00
kempersc
35a057981c chore(frontend): sync schema files with custodian_type → has_or_had_custodian_type refactor
- Remove deprecated slots: custodian_type.yaml, custodian_types.yaml,
  custodian_type_broader/narrower/related.yaml, custodian_types_primary/rationale.yaml
- Add new unified slot: has_or_had_custodian_type.yaml
- Sync all 236+ class files with updated slot references
- Update manifest.json
2026-01-09 12:15:32 +01:00
kempersc
76644f55f5 feat(rag): add database routing to geographic query templates
Add databases: ["oxigraph"] to 4 geographic templates to skip vector search:
- list_institutions_by_type_city
- list_institutions_by_type_region
- list_institutions_by_type_country
- list_institutions_in_city

Also add documentation explaining database routing configuration in _metadata.
2026-01-09 11:56:18 +01:00