- Introduced GeospatialLocation class for specific geospatial locations.
- Added HandsOnFacility class representing facilities for hands-on experiences.
- Created Hyponym class for narrower terms or instances.
- Added ImagingEquipment class for imaging-related equipment.
- Introduced LoadingDock class for loading dock facilities.
- Created LocalCollection class for locally held collections.
- Added Locker class for storage lockers available to visitors/staff.
- Introduced MichelinStarRating class for Michelin star ratings.
- Created MicrofilmReader class for equipment used to read microfilms.
- Added OperationalArchive class for archives containing operational records.
- Introduced OperationalUnit class for operational units within organizations.
- Added has_or_had_archive slot for associating archives with entities.
- Created has_or_had_rating slot for ratings assigned to entities.
- Introduced has_or_had_section slot for sections or units within organizations.
- Added has_geospatial_location slot linking nominal places to precise geospatial coordinates.
- Introduced total expense, total frames analyzed, total investment, total liability, total net asset, and traditional product slots to enhance financial reporting capabilities.
- Added transition types detected, treatment description, type hypothesis, typical condition, typical HTTP methods, typical response formats, and typical scope slots for improved heritage documentation.
- Implemented user community, verified, web observation, WhatsApp business likelihood, wikidata equivalent, and wikidata mapping slots to enrich institutional data representation.
- Established has_or_had_asset, has_or_had_budget, has_or_had_expense, and is_or_was_threatened_by slots to capture asset, budget, expense relationships, and threats to heritage forms.
- Removed deprecated slots: storage_security_level, version_number, video_comment, visiting_hour, was_asserted_by, was_revision_of, writing_system.
- Archived corresponding YAML files for deprecated slots with detailed migration notes.
- Updated slot definitions for has_collection and encompassing_body to reflect new naming conventions and temporal patterns.
- Enhanced metadata extraction in index_persons_qdrant.py to include WCMS registration and data sources.
- Modified hybrid_retriever and multi_embedding_retriever to support filtering by WCMS registration status.
- Added Overview class to represent structured collections of web links, including detailed descriptions, examples, and ontology alignments.
- Introduced RealnessStatus class to classify data as real or synthetic, with rich provenance and temporal semantics.
- Created WebLink class for representing hyperlinks with associated metadata, enhancing structured link representation.
- Established new slots: has_or_had_comprehensive_overview, is_or_was_real, and includes_or_included to support the new classes and improve data modeling.
- Migrated existing slots to new structures, ensuring compliance with RiC-O naming conventions and enhancing specificity.
- Updated annotations and examples across all new classes and slots for clarity and usability.
Add companion_query support to fetch full entity records alongside
aggregate count queries. Enables displaying results on map/list when
asking 'how many museums in Amsterdam?'
Backend changes:
- Add companion_query, companion_query_region, companion_query_country
fields to TemplateDefinition and TemplateMatchResult
- Add render_template_string() for raw companion query rendering
Template changes:
- Add companion queries to count_institutions_by_type_and_location
for settlement, region, and country level queries
- Returns institution URI, name, coordinates, city for visualization
Track full lineage of RAG responses: WHERE data comes from, WHEN it was
retrieved, HOW it was processed (SPARQL/vector/LLM).
Backend changes:
- Add provenance.py with EpistemicProvenance, DataTier, SourceAttribution
- Integrate provenance into MultiSourceRetriever.merge_results()
- Return epistemic_provenance in DSPyQueryResponse
Frontend changes:
- Pass EpistemicProvenance through useMultiDatabaseRAG hook
- Display provenance in ConversationPage (for cache transparency)
Schema fixes:
- Fix truncated example in has_observation.yaml slot definition
References:
- Pavlyshyn's Context Graphs and Data Traces paper
- LinkML ProvenanceBlock schema pattern
- Detect COUNT queries by checking for 'count' key in SPARQL results
- Skip institution transformation for COUNT queries to preserve count value
- Fixes bug where 'Hoeveel archieven in Utrecht?' returned 1 instead of 10
- COUNT queries now correctly extract integer count from SPARQL response
- Fix bug where COUNT queries showed Qdrant result count (10) instead of
actual SPARQL count (e.g., 204 musea in Noord-Holland)
- Use sparql_results for count extraction in factual query fast-path
- Also fix fallback COUNT/LIST handling to use sparql_results
Log database routing decisions and add databases_used to response metadata.
When template specifies databases: ["oxigraph"], Qdrant vector search is skipped.
- Add 'databases' field to TemplateDefinition and TemplateMatchResult
- Support values: 'oxigraph' (SPARQL/KG), 'qdrant' (vector search)
- Add helper methods use_oxigraph() and use_qdrant()
- Default to both databases for backward compatibility
- Allows templates to skip vector search for factual/geographic queries
- Add ontology cache warming at startup in lifespan() function
- Add is_factual_query() detection in template_sparql.py (12 templates)
- Add factual_result and sparql_query fields to DSPyQueryResponse
- Skip LLM generation for factual templates (count, list, compare)
- Execute SPARQL directly and return results as table (~15s → ~2s latency)
- Update ConversationPanel.tsx to render factual results table
- Add CSS styling for factual results with green theme
For queries like 'hoeveel archieven zijn er in Den Haag', the SPARQL
results ARE the answer - no need for expensive LLM prose generation.
Major architectural changes based on Formica et al. (2023) research:
- Add TemplateClassifier for deterministic SPARQL template matching
- Add SlotExtractor with synonym resolution for slot values
- Add TemplateInstantiator using Jinja2 for query rendering
- Refactor dspy_heritage_rag.py to use template system
- Update main.py with streamlined pipeline
- Fix semantic_router.py ordering issues
- Add comprehensive metrics tracking
Template-based approach achieves 65% precision vs 10% LLM-only
per Formica et al. research on SPARQL generation.
- Updated documentation to clarify integration points with existing components in the RAG pipeline and DSPy framework.
- Added detailed mapping of SPARQL templates to context templates for improved specificity filtering.
- Implemented wrapper patterns around existing classifiers to extend functionality without duplication.
- Introduced new tests for the SpecificityAwareClassifier and SPARQLToContextMapper to ensure proper integration and functionality.
- Enhanced the CustodianRDFConverter to include ISO country and subregion codes from GHCID for better geospatial data handling.
Implements a state machine to filter streaming tokens:
- Only stream tokens from the 'answer' field to the frontend
- Skip tokens from 'reasoning', 'citations', 'confidence', 'follow_up' fields
- Remove DSPy field markers like '[[ ## answer ## ]]' from streamed content
This fixes the issue where raw DSPy signature field markers were being
displayed in the chat interface instead of clean answer text.
- Use hc: <https://w3id.org/heritage/custodian/> prefix
- Use hc:institutionType with single-letter codes (M, L, A, etc.)
- Use Wikidata URIs for countries (Q55=NL, Q31=BE, etc.)
- Update all SPARQL examples to use correct ontology
- Align with actual RDF data in Oxigraph
- Update HeritageSPARQLGenerator docstring with correct prefixes
- Change main class from hc:Custodian to crm:E39_Actor
- Change type property from hcp:institutionType to org:classification
- Update type values from single letters to full names (MUSEUM, ARCHIVE, etc.)
- Add rate limit handling with exponential backoff for 429 errors
- Fix test_live_rag.py sample queries to use correct ontology
- Update optimized_models instructions with correct prefixes
- Add GLAMORCUBESFIXPHDNT heritage type detection for person profiles
- Two-stage classification: blocklist non-heritage orgs, then match keywords
- Special handling for Digital (D) type: requires heritage org context
- Add career_history heritage_relevant and heritage_type fields
- Add exponential backoff retry for Anthropic API overload errors
- Fix DSPy 3.x async context with dspy.context() wrapper