Infrastructure changes to enable automatic frontend deployment when schemas change:
- Add .forgejo/workflows/deploy-frontend.yml workflow triggered by:
- Changes to frontend/** or schemas/20251121/linkml/**
- Manual workflow dispatch
- Rewrite generate-schema-manifest.cjs to properly scan all schema directories
- Recursively scans classes, enums, slots, modules directories
- Uses singular category names (class, enum, slot) matching TypeScript types
- Includes all 4 main schemas at root level
- Skips archive directories and backup files
- Update schema-loader.ts to match new manifest format
- Add SchemaCategory interface
- Update SchemaManifest to use categories as array
- Add flattenCategories() helper function
- Add getSchemaCategories() and getSchemaCategoriesSync() functions
The workflow builds frontend with updated manifest and deploys to bronhouder.nl
Track full lineage of RAG responses: WHERE data comes from, WHEN it was
retrieved, HOW it was processed (SPARQL/vector/LLM).
Backend changes:
- Add provenance.py with EpistemicProvenance, DataTier, SourceAttribution
- Integrate provenance into MultiSourceRetriever.merge_results()
- Return epistemic_provenance in DSPyQueryResponse
Frontend changes:
- Pass EpistemicProvenance through useMultiDatabaseRAG hook
- Display provenance in ConversationPage (for cache transparency)
Schema fixes:
- Fix truncated example in has_observation.yaml slot definition
References:
- Pavlyshyn's Context Graphs and Data Traces paper
- LinkML ProvenanceBlock schema pattern
Copies authoritative schemas from schemas/20251121/ to:
- frontend/public/schemas/20251121/
- apps/archief-assistent/public/schemas/20251121/
This ensures slot definitions with corrected ontology property
references (commit 2808dad6cd) are available to frontend apps.
- auth.setup.ts: require env vars for test credentials (no hardcoded defaults)
- manifest.json: update schema manifest
- full_evaluation_results.json: add RAG evaluation results
- petra-links.json: update birth date from web claim
- Migrate 236+ class files from custodian_types to has_or_had_custodian_type
- Archive deprecated slots: custodian_type, custodian_types, custodian_type_broader/narrower/related
- Update main schema and manifest imports
- Fix Custodian.yaml class to use new slot
- Fix annotation format (list→scalar) in has_or_had_custodian_type.yaml
Rules applied:
- Rule 39: RiC-O naming convention (hasOrHad pattern)
- Rule 43: Slot nouns must be singular (multivalued:true for cardinality)
- Rule 38: Slot centralization with semantic URI
- Updated documentation to clarify integration points with existing components in the RAG pipeline and DSPy framework.
- Added detailed mapping of SPARQL templates to context templates for improved specificity filtering.
- Implemented wrapper patterns around existing classifiers to extend functionality without duplication.
- Introduced new tests for the SpecificityAwareClassifier and SPARQLToContextMapper to ensure proper integration and functionality.
- Enhanced the CustodianRDFConverter to include ISO country and subregion codes from GHCID for better geospatial data handling.
- Fixed bug where closing triple-quotes (""") would incorrectly re-trigger
multi-line string detection, causing subsequent class definitions to be skipped
- Added lineToProcess variable to track which portion of line to process after
closing a multi-line string, preventing re-detection of opening quotes
- Moved UML large diagram confirmation logic from OntologyViewerPage to
UMLVisualization component for better encapsulation
- PiCo ontology now correctly shows all 8 classes instead of 2
Deployed and verified on https://bronhouder.nl/ontology?ontology=PiCo
- Created deliverables_slot for expected or achieved deliverable outputs.
- Introduced event_id_slot for persistent unique event identifiers.
- Added follow_up_date_slot for scheduled follow-up action dates.
- Implemented object_ref_slot for references to heritage objects.
- Established price_slot for price information across entities.
- Added price_currency_slot for currency codes in price information.
- Created protocol_slot for API protocol specifications.
- Introduced provenance_text_slot for full provenance entry text.
- Added record_type_slot for classification of record types.
- Implemented response_formats_slot for supported API response formats.
- Established status_slot for current status of entities or activities.
- Added FactualCountDisplay component for displaying count query results.
- Introduced ReplyTypeIndicator component for visualizing reply types.
- Created approval_date_slot for formal approval dates.
- Added authentication_required_slot for API authentication status.
- Implemented capacity_items_slot for maximum storage capacity.
- Established conservation_lab_slot for conservation laboratory information.
- Added cost_usd_slot for API operation costs in USD.
- Institution Browser: multi-select for types and countries
- URL query param sync for shareable filter URLs
- New utility: countryNames.ts with flag emoji support
- New utility: imageProxy.ts for image URL handling
- New component: SearchableMultiSelect dropdown
- Career timeline CSS and component updates
- Media gallery improvements
- Lazy load error boundary component
- Version check utility
- Implemented a new script `test_pico_arabic_waqf.py` to test the GLM annotator's ability to extract person observations from Arabic historical documents.
- The script includes environment variable handling for API token, structured prompts for the GLM API, and validation of extraction results.
- Added comprehensive logging for API responses, extraction results, and validation errors.
- Included a sample Arabic waqf text for testing purposes, following the PiCo ontology pattern.
- Introduced LEGAL-FORM-FILTER rule to standardize CustodianName by removing legal form designations.
- Documented rationale, examples, and implementation guidelines for the filtering process.
docs: Create README for value standardization rules
- Established a comprehensive README outlining various value standardization rules applicable to Heritage Custodian classes.
- Categorized rules into Name Standardization, Geographic Standardization, Web Observation, and Schema Evolution.
feat: Implement transliteration standards for non-Latin scripts
- Added TRANSLIT-ISO rule to ensure GHCID abbreviations are generated from emic names using ISO standards for transliteration.
- Included detailed guidelines for various scripts and languages, along with implementation examples.
feat: Define XPath provenance rules for web observations
- Created XPATH-PROVENANCE rule mandating XPath pointers for claims extracted from web sources.
- Established a workflow for archiving websites and verifying claims against archived HTML.
chore: Update records lifecycle diagram
- Generated a new Mermaid diagram illustrating the records lifecycle for heritage custodians.
- Included phases for active records, inactive archives, and processed heritage collections with key relationships and classifications.