Infrastructure changes to enable automatic frontend deployment when schemas change:
- Add .forgejo/workflows/deploy-frontend.yml workflow triggered by:
- Changes to frontend/** or schemas/20251121/linkml/**
- Manual workflow dispatch
- Rewrite generate-schema-manifest.cjs to properly scan all schema directories
- Recursively scans classes, enums, slots, modules directories
- Uses singular category names (class, enum, slot) matching TypeScript types
- Includes all 4 main schemas at root level
- Skips archive directories and backup files
- Update schema-loader.ts to match new manifest format
- Add SchemaCategory interface
- Update SchemaManifest to use categories as array
- Add flattenCategories() helper function
- Add getSchemaCategories() and getSchemaCategoriesSync() functions
The workflow builds frontend with updated manifest and deploys to bronhouder.nl
- Update VideoAnnotation class with new motivation type references
- Add AnnotationMotivationType and AnnotationMotivationTypes class files
- Add motivation_type slots (description, id, name)
- Archive deprecated AnnotationMotivationEnum
- Update slot references for derived_from_entity, has_observation, has_person_observation
Track full lineage of RAG responses: WHERE data comes from, WHEN it was
retrieved, HOW it was processed (SPARQL/vector/LLM).
Backend changes:
- Add provenance.py with EpistemicProvenance, DataTier, SourceAttribution
- Integrate provenance into MultiSourceRetriever.merge_results()
- Return epistemic_provenance in DSPyQueryResponse
Frontend changes:
- Pass EpistemicProvenance through useMultiDatabaseRAG hook
- Display provenance in ConversationPage (for cache transparency)
Schema fixes:
- Fix truncated example in has_observation.yaml slot definition
References:
- Pavlyshyn's Context Graphs and Data Traces paper
- LinkML ProvenanceBlock schema pattern
Copies authoritative schemas from schemas/20251121/ to:
- frontend/public/schemas/20251121/
- apps/archief-assistent/public/schemas/20251121/
This ensures slot definitions with corrected ontology property
references (commit 2808dad6cd) are available to frontend apps.
- auth.setup.ts: require env vars for test credentials (no hardcoded defaults)
- manifest.json: update schema manifest
- full_evaluation_results.json: add RAG evaluation results
- petra-links.json: update birth date from web claim
- Add 'Compare' toggle button next to slots with slot_usage overrides
- Show generic slot definition vs class-specific override in 3-column grid
- Highlight changed properties with green 'changed' badge
- Display '(inherited)' when override matches generic definition
- Display '(not defined)' when generic has no value for property
- Compare: range, description, required, multivalued, slot_uri, pattern, identifier
- Full i18n support (Dutch/English translations)
- Responsive design: stacks vertically on mobile (<640px)
- Add green 'slot_usage' badge for slots with class-specific overrides
- Add ✦ markers next to properties that are overridden vs inherited
- Add green left border styling for slots with slot_usage
- Add i18n translations (nl/en) for override indicators
- Merge generic slot definitions with class-specific slot_usage properties
This helps users understand which slot properties come from the generic
slot definition vs which are overridden at the class level via slot_usage.
- Migrate 236+ class files from custodian_types to has_or_had_custodian_type
- Archive deprecated slots: custodian_type, custodian_types, custodian_type_broader/narrower/related
- Update main schema and manifest imports
- Fix Custodian.yaml class to use new slot
- Fix annotation format (list→scalar) in has_or_had_custodian_type.yaml
Rules applied:
- Rule 39: RiC-O naming convention (hasOrHad pattern)
- Rule 43: Slot nouns must be singular (multivalued:true for cardinality)
- Rule 38: Slot centralization with semantic URI
- Add ontology cache warming at startup in lifespan() function
- Add is_factual_query() detection in template_sparql.py (12 templates)
- Add factual_result and sparql_query fields to DSPyQueryResponse
- Skip LLM generation for factual templates (count, list, compare)
- Execute SPARQL directly and return results as table (~15s → ~2s latency)
- Update ConversationPanel.tsx to render factual results table
- Add CSS styling for factual results with green theme
For queries like 'hoeveel archieven zijn er in Den Haag', the SPARQL
results ARE the answer - no need for expensive LLM prose generation.
- Remove deprecated CustodianTypeCodeEnum from class_metadata_slots.yaml
- Update custodian_types slot to use uriorcurie range (references CustodianType subclasses)
- Update custodian_types_primary slot similarly
- Add migration note for legacy string format ['A'] vs new URI format
Per Rule 9: Enum-to-Class Promotion - Single Source of Truth
- Updated documentation to clarify integration points with existing components in the RAG pipeline and DSPy framework.
- Added detailed mapping of SPARQL templates to context templates for improved specificity filtering.
- Implemented wrapper patterns around existing classifiers to extend functionality without duplication.
- Introduced new tests for the SpecificityAwareClassifier and SPARQLToContextMapper to ensure proper integration and functionality.
- Enhanced the CustodianRDFConverter to include ISO country and subregion codes from GHCID for better geospatial data handling.
- Fixed bug where closing triple-quotes (""") would incorrectly re-trigger
multi-line string detection, causing subsequent class definitions to be skipped
- Added lineToProcess variable to track which portion of line to process after
closing a multi-line string, preventing re-detection of opening quotes
- Moved UML large diagram confirmation logic from OntologyViewerPage to
UMLVisualization component for better encapsulation
- PiCo ontology now correctly shows all 8 classes instead of 2
Deployed and verified on https://bronhouder.nl/ontology?ontology=PiCo
- Created deliverables_slot for expected or achieved deliverable outputs.
- Introduced event_id_slot for persistent unique event identifiers.
- Added follow_up_date_slot for scheduled follow-up action dates.
- Implemented object_ref_slot for references to heritage objects.
- Established price_slot for price information across entities.
- Added price_currency_slot for currency codes in price information.
- Created protocol_slot for API protocol specifications.
- Introduced provenance_text_slot for full provenance entry text.
- Added record_type_slot for classification of record types.
- Implemented response_formats_slot for supported API response formats.
- Established status_slot for current status of entities or activities.
- Added FactualCountDisplay component for displaying count query results.
- Introduced ReplyTypeIndicator component for visualizing reply types.
- Created approval_date_slot for formal approval dates.
- Added authentication_required_slot for API authentication status.
- Implemented capacity_items_slot for maximum storage capacity.
- Established conservation_lab_slot for conservation laboratory information.
- Added cost_usd_slot for API operation costs in USD.
- Remove onFaceClick prop from CustodianTypeIndicator3D in class/slot/enum detail views
to prevent accidental filtering when clicking decorative cubes (Bug 3)
- Add parseCustodianTypesAnnotation() helper to handle JSON-stringified arrays like '["A"]'
in YAML annotations, fixing Bug 2 where all 19 letters appeared on every cube
- Legend bar retains onTypeClick for intentional filtering functionality
- Update SPARQL CONSTRUCT query to use correct ontology namespaces:
- hc: https://w3id.org/heritage/custodian/ (was nde)
- Use nde:Custodian type (was crm:E39_Actor)
- Use schema:location + geo:lat/long (was crm predicates)
- Remove LIMIT 500 clause to fetch all results
- Show all node types by default instead of random single type
- Fixes issue where Knowledge Graph showed incomplete/random data
- Larger default size (700x550) for better readability
- Resizable from all 8 edges/corners with visual SE grip indicator
- Clearer button icons (18px, strokeWidth 2.5)
- Draggable, minimizable, pinnable panel
- Dark theme and mobile responsive support
- Institution Browser: multi-select for types and countries
- URL query param sync for shareable filter URLs
- New utility: countryNames.ts with flag emoji support
- New utility: imageProxy.ts for image URL handling
- New component: SearchableMultiSelect dropdown
- Career timeline CSS and component updates
- Media gallery improvements
- Lazy load error boundary component
- Version check utility
- Add refresh button to assistant messages for re-running queries with fresh results
- Highlight refresh button (amber) for cached responses to draw attention
- Add spinning icon animation while refreshing
- Fix cache clear to return detailed success/failure status for local vs shared cache
- Add bypass cache toggle that forces fresh queries (one-shot, resets after query)
- Add Dutch/English translations for new UI elements
- Add SyncPanel component with bilingual (NL/EN) support
- Add relative URL handling for production (bronhouder.nl)
- Integrate SyncPanel into Database page
- Show sync status for all 4 databases (DuckLake, PostgreSQL, Oxigraph, Qdrant)
- Support dry-run mode and file limit options
- Implemented a new script `test_pico_arabic_waqf.py` to test the GLM annotator's ability to extract person observations from Arabic historical documents.
- The script includes environment variable handling for API token, structured prompts for the GLM API, and validation of extraction results.
- Added comprehensive logging for API responses, extraction results, and validation errors.
- Included a sample Arabic waqf text for testing purposes, following the PiCo ontology pattern.
- Introduced LEGAL-FORM-FILTER rule to standardize CustodianName by removing legal form designations.
- Documented rationale, examples, and implementation guidelines for the filtering process.
docs: Create README for value standardization rules
- Established a comprehensive README outlining various value standardization rules applicable to Heritage Custodian classes.
- Categorized rules into Name Standardization, Geographic Standardization, Web Observation, and Schema Evolution.
feat: Implement transliteration standards for non-Latin scripts
- Added TRANSLIT-ISO rule to ensure GHCID abbreviations are generated from emic names using ISO standards for transliteration.
- Included detailed guidelines for various scripts and languages, along with implementation examples.
feat: Define XPath provenance rules for web observations
- Created XPATH-PROVENANCE rule mandating XPath pointers for claims extracted from web sources.
- Established a workflow for archiving websites and verifying claims against archived HTML.
chore: Update records lifecycle diagram
- Generated a new Mermaid diagram illustrating the records lifecycle for heritage custodians.
- Included phases for active records, inactive archives, and processed heritage collections with key relationships and classifications.
Database Panels:
- Add D3.js force-directed graph visualization to Oxigraph and TypeDB panels
- Add 'Explore' tab with class/entity browser, graph/table toggle, and search
- Add data explorer to PostgreSQL panel with table browser, pagination, search, export
- Fix SPARQL variable naming bug in Oxigraph getGraphData() function
- Add node details panel showing selected entity attributes
- Add zoom/pan controls and node coloring by entity type
Map Features:
- Add TimelineSlider component for temporal filtering of institutions
- Support dual-handle range slider with decade histogram
- Add quick presets (Ancient, Medieval, Modern, Contemporary)
- Show institution density visualization by founding decade
Hooks:
- Extend useOxigraph with getGraphData() for graph visualization
- Extend useTypeDB with getGraphData() for graph visualization
- Extend usePostgreSQL with getTableData() and exportTableData()
- Improve useDuckLakeInstitutions with temporal filtering support
Styles:
- Add HeritageDashboard.css with shared panel styling
- Add TimelineSlider.css for timeline component styling
Replace static netherlands_municipalities_simplified.geojson with dynamic
PostGIS API call to /boundaries/countries/NL/admin2/geojson.
Transform API response properties to expected format:
- API: {code, name, name_local, admin1_code, admin1_name}
- Expected: {code, naam, provincieCode, provincieNaam}
This ensures NL boundary data comes from the authoritative PostGIS
database rather than a static file that could become outdated.
- Add search bar to filter table data across all columns
- Filter web archive claims by selected page
- Include source_page in claim queries for filtering
- Fix TypeScript unused parameter warning
New React hook that fetches administrative boundaries from the PostGIS API:
- Supports international boundaries (NL, JP, CZ, DE, BE, CH, AT, etc.)
- Caches admin1, admin2, and GeoJSON data
- Provides point-in-polygon lookup
- Includes utility functions for filtering boundaries by code/name
- Replaces static GeoJSON file loading pattern
- Fix polygon rendering with static paint properties instead of data-driven
- Add ensureSourceAndLayers() helper for reliable layer management
- Use setPaintProperty() for historical vs modern styling distinction
- Improve Database page layout with back buttons and cleaner navigation
- Add ResizableNestedTable component for DuckLake data display
- Optimize spacing and layout in Database.css
- Update schema manifest
- Introduced `llm_extract_archiveslab.py` script for entity and relationship extraction using LLMAnnotator with GLAM-NER v1.7.0.
- Replaced regex-based extraction with generative LLM inference.
- Added functions for loading markdown content, converting annotation sessions to dictionaries, and generating extraction statistics.
- Implemented comprehensive logging of extraction results, including counts of entities, relationships, and specific types like heritage institutions and persons.
- Results and statistics are saved in JSON format for further analysis.
- Implemented `generate_mermaid_with_instances.py` to create ER diagrams that include all classes, relationships, enum values, and instance data.
- Loaded instance data from YAML files and enriched enum definitions with meaningful annotations.
- Configured output paths for generated diagrams in both frontend and schema directories.
- Added support for excluding technical classes and limiting the number of displayed enum and instance values for readability.