Track full lineage of RAG responses: WHERE data comes from, WHEN it was
retrieved, HOW it was processed (SPARQL/vector/LLM).
Backend changes:
- Add provenance.py with EpistemicProvenance, DataTier, SourceAttribution
- Integrate provenance into MultiSourceRetriever.merge_results()
- Return epistemic_provenance in DSPyQueryResponse
Frontend changes:
- Pass EpistemicProvenance through useMultiDatabaseRAG hook
- Display provenance in ConversationPage (for cache transparency)
Schema fixes:
- Fix truncated example in has_observation.yaml slot definition
References:
- Pavlyshyn's Context Graphs and Data Traces paper
- LinkML ProvenanceBlock schema pattern
- Created deliverables_slot for expected or achieved deliverable outputs.
- Introduced event_id_slot for persistent unique event identifiers.
- Added follow_up_date_slot for scheduled follow-up action dates.
- Implemented object_ref_slot for references to heritage objects.
- Established price_slot for price information across entities.
- Added price_currency_slot for currency codes in price information.
- Created protocol_slot for API protocol specifications.
- Introduced provenance_text_slot for full provenance entry text.
- Added record_type_slot for classification of record types.
- Implemented response_formats_slot for supported API response formats.
- Established status_slot for current status of entities or activities.
- Added FactualCountDisplay component for displaying count query results.
- Introduced ReplyTypeIndicator component for visualizing reply types.
- Created approval_date_slot for formal approval dates.
- Added authentication_required_slot for API authentication status.
- Implemented capacity_items_slot for maximum storage capacity.
- Established conservation_lab_slot for conservation laboratory information.
- Added cost_usd_slot for API operation costs in USD.
- Institution Browser: multi-select for types and countries
- URL query param sync for shareable filter URLs
- New utility: countryNames.ts with flag emoji support
- New utility: imageProxy.ts for image URL handling
- New component: SearchableMultiSelect dropdown
- Career timeline CSS and component updates
- Media gallery improvements
- Lazy load error boundary component
- Version check utility
- Add refresh button to assistant messages for re-running queries with fresh results
- Highlight refresh button (amber) for cached responses to draw attention
- Add spinning icon animation while refreshing
- Fix cache clear to return detailed success/failure status for local vs shared cache
- Add bypass cache toggle that forces fresh queries (one-shot, resets after query)
- Add Dutch/English translations for new UI elements
- Implemented a new script `test_pico_arabic_waqf.py` to test the GLM annotator's ability to extract person observations from Arabic historical documents.
- The script includes environment variable handling for API token, structured prompts for the GLM API, and validation of extraction results.
- Added comprehensive logging for API responses, extraction results, and validation errors.
- Included a sample Arabic waqf text for testing purposes, following the PiCo ontology pattern.
- Introduced LEGAL-FORM-FILTER rule to standardize CustodianName by removing legal form designations.
- Documented rationale, examples, and implementation guidelines for the filtering process.
docs: Create README for value standardization rules
- Established a comprehensive README outlining various value standardization rules applicable to Heritage Custodian classes.
- Categorized rules into Name Standardization, Geographic Standardization, Web Observation, and Schema Evolution.
feat: Implement transliteration standards for non-Latin scripts
- Added TRANSLIT-ISO rule to ensure GHCID abbreviations are generated from emic names using ISO standards for transliteration.
- Included detailed guidelines for various scripts and languages, along with implementation examples.
feat: Define XPath provenance rules for web observations
- Created XPATH-PROVENANCE rule mandating XPath pointers for claims extracted from web sources.
- Established a workflow for archiving websites and verifying claims against archived HTML.
chore: Update records lifecycle diagram
- Generated a new Mermaid diagram illustrating the records lifecycle for heritage custodians.
- Included phases for active records, inactive archives, and processed heritage collections with key relationships and classifications.
Database Panels:
- Add D3.js force-directed graph visualization to Oxigraph and TypeDB panels
- Add 'Explore' tab with class/entity browser, graph/table toggle, and search
- Add data explorer to PostgreSQL panel with table browser, pagination, search, export
- Fix SPARQL variable naming bug in Oxigraph getGraphData() function
- Add node details panel showing selected entity attributes
- Add zoom/pan controls and node coloring by entity type
Map Features:
- Add TimelineSlider component for temporal filtering of institutions
- Support dual-handle range slider with decade histogram
- Add quick presets (Ancient, Medieval, Modern, Contemporary)
- Show institution density visualization by founding decade
Hooks:
- Extend useOxigraph with getGraphData() for graph visualization
- Extend useTypeDB with getGraphData() for graph visualization
- Extend usePostgreSQL with getTableData() and exportTableData()
- Improve useDuckLakeInstitutions with temporal filtering support
Styles:
- Add HeritageDashboard.css with shared panel styling
- Add TimelineSlider.css for timeline component styling
Replace static netherlands_municipalities_simplified.geojson with dynamic
PostGIS API call to /boundaries/countries/NL/admin2/geojson.
Transform API response properties to expected format:
- API: {code, name, name_local, admin1_code, admin1_name}
- Expected: {code, naam, provincieCode, provincieNaam}
This ensures NL boundary data comes from the authoritative PostGIS
database rather than a static file that could become outdated.
New React hook that fetches administrative boundaries from the PostGIS API:
- Supports international boundaries (NL, JP, CZ, DE, BE, CH, AT, etc.)
- Caches admin1, admin2, and GeoJSON data
- Provides point-in-polygon lookup
- Includes utility functions for filtering boundaries by code/name
- Replaces static GeoJSON file loading pattern
- Fix polygon rendering with static paint properties instead of data-driven
- Add ensureSourceAndLayers() helper for reliable layer management
- Use setPaintProperty() for historical vs modern styling distinction
- Improve Database page layout with back buttons and cleaner navigation
- Add ResizableNestedTable component for DuckLake data display
- Optimize spacing and layout in Database.css
- Update schema manifest
- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.
- Implemented three independent aspects for custodians: CustodianLegalStatus, CustodianName, and CustodianPlace.
- Renamed CustodianReconstruction to CustodianLegalStatus and updated all references.
- Created new components for CustodianPlace and PlaceSpecificityEnum.
- Removed direct links from CustodianObservation to Custodian, aligning with PROV-O standards.
- Generated comprehensive example instance demonstrating the new architecture.
- Updated documentation to reflect changes and provide guidance on multi-aspect modeling.
- Added React hook for managing IndexedDB operations, including storing and loading transformation results.
- Created complete YAML example for Rijksmuseum, illustrating the integration of all three aspects.