Commit graph

142 commits

Author SHA1 Message Date
kempersc
fcf36f9a11 fix: prevent ontology popup flash by using useLayoutEffect for centering
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m56s
DSPy RAG Evaluation / Layer 1 - Unit Tests (push) Successful in 11m5s
DSPy RAG Evaluation / Layer 2 - DSPy Module Tests (push) Successful in 12m50s
DSPy RAG Evaluation / Layer 3 - Integration Tests (push) Successful in 10m51s
DSPy RAG Evaluation / Layer 4 - Comprehensive Evaluation (push) Successful in 11m59s
DSPy RAG Evaluation / Quality Gate (push) Successful in 1s
2026-01-13 20:38:21 +01:00
kempersc
92b490d690 edit slots 2026-01-13 20:35:11 +01:00
kempersc
2907c0372a feat: add Getty AAT support and resolve PREMIS/BIBFRAME URIs to human-readable LOC docs
- Add Getty AAT (Art & Architecture Thesaurus) vocabulary support
  - fetchGettyAATEntity() fetches term info from vocab.getty.edu JSON-LD API
  - Extracts English labels, scope notes, and aliases
  - Shows 'concept' term type for SKOS concepts

- Add getHumanReadableUrl() to map RDF URIs to documentation pages
  - PREMIS 3.0: http://www.loc.gov/premis/rdf/v3/X → id.loc.gov HTML docs
  - BIBFRAME: http://id.loc.gov/ontologies/bibframe/X → id.loc.gov HTML docs
  - Uses c_ prefix for classes, p_ for properties

- Add Getty vocabulary prefixes (aat:, tgn:, ulan:)
- Add ontology badge colors for PREMIS 3, LOCN, Getty AAT
2026-01-13 19:14:31 +01:00
kempersc
fc63164335 fix: close ontology popup when navigating to different LinkML schema files
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 4m17s
When users click on a different class, enum, or slot in the sidebar,
the ontology term popup now automatically closes. This prevents the
popup from persisting and showing stale information from the
previously viewed schema element.
2026-01-13 18:22:49 +01:00
kempersc
635beca582 sync: update frontend schema copies with duplicate mapping fixes (Rule 52)
Some checks are pending
Deploy Frontend / build-and-deploy (push) Waiting to run
2026-01-13 18:13:54 +01:00
kempersc
3b676f3ea5 fix: remove duplicate ontology mappings rendering in LinkML viewer
Some checks are pending
Deploy Frontend / build-and-deploy (push) Waiting to run
The slot details section was rendering close_mappings, narrow_mappings,
broad_mappings, and related_mappings twice each. This caused the mappings
to appear duplicated on pages like /linkml?class=AcademicArchive.

Removed 68 lines of duplicate JSX code.
2026-01-13 18:11:54 +01:00
kempersc
8a3c907f59 fix: resolve CIDOC-CRM relative URIs, add EDM ontology, render HTML in descriptions
Some checks failed
Deploy Frontend / build-and-deploy (push) Has been cancelled
DSPy RAG Evaluation / Layer 1 - Unit Tests (push) Successful in 11m12s
DSPy RAG Evaluation / Layer 2 - DSPy Module Tests (push) Successful in 12m55s
DSPy RAG Evaluation / Layer 3 - Integration Tests (push) Successful in 10m24s
DSPy RAG Evaluation / Layer 4 - Comprehensive Evaluation (push) Successful in 11m31s
DSPy RAG Evaluation / Quality Gate (push) Successful in 2s
- Fix resolveUri() to handle bare local names like 'E27_Site' used by CIDOC-CRM
  (previously only handled URIs starting with '#')
- Add EDM (Europeana Data Model) ontology to frontend
  - Copy edm.owl to frontend/public/ontology/
  - Register in ONTOLOGY_FILES array
  - Add 'edm' prefix to STANDARD_PREFIXES
  - Add EDM color to ONTOLOGY_COLORS
- Render HTML content in ontology descriptions safely using DOMPurify
  - Sanitize HTML to allow only safe tags (a, br, em, strong, etc.)
  - Fix Schema.org relative links to absolute URLs
  - Add target='_blank' to external links
2026-01-13 16:50:40 +01:00
kempersc
f74513e8ef feat: Enhance entity resolution with email semantics and review merging
- Updated `entity_review.py` to map email semantic fields from JSON.
- Expanded `email_semantics.py` with additional museum mappings.
- Introduced a new rule in `.opencode/rules/no-duplicate-ontology-mappings.md` to prevent duplicate ontology mappings.
- Added a backup JSON file for entity resolution candidates.
- Created `enrich_email_semantics.py` to enrich candidates with email semantic signals.
- Developed `merge_entity_reviews.py` to merge reviewed decisions from a backup into new candidates.
2026-01-13 16:43:56 +01:00
kempersc
21ed120ac2 fix: correct hallucinated RiC-O terms and add locn ontology
RiC-O hallucinated terms fixed:
- FindingAidType.yaml: rico:FindingAidType → rico:DocumentaryFormType
- has_acquisition_method.yaml: rico:hasOrHadActivityType → prov:wasGeneratedBy
- has_activity_type.yaml: rico:hasOrHadActivityType → dcterms:type
- has_arrangement.yaml: rico:hasOrHadArrangement → dcterms:description
- has_or_had_finding_aid.yaml: rico:isDescribedBy → rico:isOrWasDescribedBy

The following terms do NOT exist in RiC-O 1.1:
- rico:FindingAidType (use rico:DocumentaryFormType)
- rico:hasOrHadActivityType (no equivalent)
- rico:hasOrHadArrangement (no equivalent)
- rico:isDescribedBy (correct form: rico:isOrWasDescribedBy)

Added LOCN ontology support:
- Copied locn.ttl to frontend/public/ontology/
- Added LOCN to ONTOLOGY_FILES in ontology-loader.ts
- Added locn prefix to OntologyTermPopup.tsx
- LOCN (http://www.w3.org/ns/locn#) is W3C Location Core Vocabulary
  for addresses and geometry (used by locn:Address)
2026-01-13 16:42:32 +01:00
kempersc
6781073d06 fix: add @base directive support for Turtle/RDF parsing
The VCard ontology file (and 3 others) use @base directive with relative URIs
like <#Address>. The Turtle parser was not extracting @base or resolving
relative URIs against it.

Changes:
- Extract @base directive in first pass alongside @prefix
- Add baseUri parameter to expandUri() function
- Handle relative URIs starting with # (resolve against base)
- Handle empty relative URI <> (returns base URI itself)
- Pass baseUri through to processSubject() function

This fixes the 'Term not found' error for vcard:Address and similar terms
that use relative URI notation in their ontology definitions.

Affected ontologies: vcard.rdf, prov.ttl, era_ontology.ttl, ebg-ontology.ttl
2026-01-13 15:54:29 +01:00
kempersc
f2b10fca19 fix: correct hallucinated PREMIS terms and Schema.org namespace mismatch
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m48s
PREMIS ontology fixes (8 schema files):
- Replace invalid premis:hasRepresentation with dcterms:hasFormat
- Replace invalid premis:hasAccessRestriction with odrl:hasPolicy
- Replace invalid premis:hasPreservationPolicy with dcterms:conformsTo
- Replace invalid premis:hasAccessPolicy with dcterms:accessRights
- Replace invalid premis:hasStoragePolicy with dcterms:conformsTo
- Replace invalid premis:ProcessingStatus with skos:Concept
- Add proper close_mappings to valid PREMIS classes (premis:Representation, etc.)
- Document hallucinated terms in Rule 51 (AGENTS.md) for future prevention

Schema.org namespace fixes (3 frontend files):
- Update OntologyTermPopup.tsx: add normalizeSchemaOrgUri() function
- Update ontology-loader.ts: change schema prefix to https://schema.org/
- Update linkml-schema-service.ts: change schema prefix to https://schema.org/
- The schemaorg.owl file uses https:// but code was using http://

These changes ensure ontology term lookups work correctly for Schema.org
terms and that LinkML schema files only reference valid ontology predicates.
2026-01-13 14:16:33 +01:00
kempersc
1fb924c412 feat: add ontology mappings to LinkML schema and enhance entity resolution
Schema enhancements (443 files):
- Add class_uri with proper ontology references (schema:, prov:, skos:, rico:)
- Add close_mappings, related_mappings per Rule 50 convention
- Replace stub hc: slot_uri with standard predicates (dcterms:identifier, skos:prefLabel)
- Improve descriptions with ontology mapping rationale
- Add prefixes blocks to all schema modules

Entity Resolution improvements:
- Add entity_resolution module with email semantics parsing
- Enhance build_entity_resolution.py with email-based matching signals
- Extend Entity Review API with filtering by signal types and count
- Add candidates caching and indexing for performance
- Add ReviewLoginPage component

New rules and documentation:
- Add Rule 51: No Hallucinated Ontology References
- Add .opencode/rules/no-hallucinated-ontology-references.md
- Add .opencode/rules/slot-ontology-mapping-reference.md
- Add adms.ttl and dqv.ttl ontology files

Frontend ontology support:
- Add RiC-O_1-1.rdf and schemaorg.owl to public/ontology
2026-01-13 13:51:02 +01:00
kempersc
c5fb9ec88e feat: add route for Entity Review page with lazy loading 2026-01-13 01:49:43 +01:00
kempersc
8d7aca0f98 Refactor code structure for improved readability and maintainability 2026-01-12 19:13:35 +01:00
kempersc
3b35f4aea5 Refactor code structure for improved readability and maintainability 2026-01-12 18:31:31 +01:00
kempersc
846a6cdcec Add new Record Set Types for various archival collections
- Introduced SoundArchiveRecordSetType, SpecialCollectionRecordSetType, SpecializedArchiveRecordSetType, SpecializedArchivesCzechiaRecordSetType, StateArchivesRecordSetType, StateArchivesSectionRecordSetType, StateDistrictArchiveRecordSetType, StateRegionalArchiveCzechiaRecordSetType, TelevisionArchiveRecordSetType, TradeUnionArchiveRecordSetType, UniversityArchiveRecordSetType, VereinsarchivRecordSetType, VerlagsarchivRecordSetType, VerwaltungsarchivRecordSetType, WebArchiveRecordSetType, and WomensArchivesRecordSetType.
- Each new type includes appropriate metadata, slots, and relationships to existing classes.
- Implemented a script to detect and fix Type class violations in LinkML files.
2026-01-12 15:20:29 +01:00
kempersc
5807840bbc fix: update generated timestamp in manifest.json
Some checks failed
Deploy Frontend / build-and-deploy (push) Successful in 4m2s
DSPy RAG Evaluation / Layer 1 - Unit Tests (push) Failing after 6s
DSPy RAG Evaluation / Layer 3 - Integration Tests (push) Has been skipped
DSPy RAG Evaluation / Layer 2 - DSPy Module Tests (push) Has been skipped
DSPy RAG Evaluation / Layer 4 - Comprehensive Evaluation (push) Has been skipped
DSPy RAG Evaluation / Quality Gate (push) Failing after 2s
2026-01-12 14:46:05 +01:00
kempersc
355d8be51d centralise slots 2026-01-12 14:33:56 +01:00
kempersc
f497be98d1 chore: update schema manifest after slot centralization
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m54s
2026-01-11 23:28:03 +01:00
kempersc
da5660cf4c chore: update schema manifest timestamp 2026-01-11 23:19:52 +01:00
kempersc
5d3d8530b0 chore: trigger DSPy eval workflow
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 4m13s
2026-01-11 22:40:23 +01:00
kempersc
56c373bba8 Implement fast WCMS migration script with state file checkpointing and batch processing 2026-01-11 22:26:37 +01:00
kempersc
174a420c08 refactor(schema): centralize 1515 inline slot definitions per Rule 48
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m57s
- Remove inline slot definitions from 144 class files
- Create 7 new centralized slot files in modules/slots/:
  - custodian_type_broader.yaml
  - custodian_type_narrower.yaml
  - custodian_type_related.yaml
  - definition.yaml
  - finding_aid_access_restriction.yaml
  - finding_aid_description.yaml
  - finding_aid_temporal_coverage.yaml
- Add centralize_inline_slots.py automation script
- Update manifest with new timestamp

Rule 48: Class files must NOT define inline slots - all slots
must be imported from modules/slots/ directory.

Note: Pre-existing IdentifierFormat duplicate class definition
(in Standard.yaml and IdentifierFormat.yaml) not addressed in
this commit - requires separate schema refactor.
2026-01-11 22:02:14 +01:00
kempersc
3e6c2367ad feat(linkml-viewer): UX improvements - entry counts, deep links, settings persistence
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 4m4s
- Add entry count badge next to schema file name showing (xC, yE, zS) counts
- Add tooltip explaining LinkML file names vs class names
- Remove redundant section headers (Classes, Enums, Slots collapsible sections)
- Add URL params for enum (?enum=) and slot (?slot=) deep linking
- Persist category filters, dev tools visibility, and legend visibility to localStorage
- Set 'Main Schema' filter to OFF by default (confusing for users)
- Add Rule 48: Class files must not define inline slots
2026-01-11 21:42:35 +01:00
kempersc
eff3153f3f feat(schema): add Environmental Zone Type slot definitions
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m56s
Add 4 slot files for EnvironmentalZoneType class:
- environmental_zone_type_id: URI identifier slot
- environmental_zone_type_code: code slot for zone type codes
- environmental_zone_type_label: human-readable label
- environmental_zone_type_description: detailed description

Update manifest.json with new slot count (2084 slots total)
2026-01-11 21:22:44 +01:00
kempersc
95d79d0078 fix: update manifest with new generated timestamp and file counts; add EnvironmentalZoneType classes and new slot requirements
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 4m51s
2026-01-11 21:15:49 +01:00
kempersc
10bb5b69c5 Add Environmental Zone Type Enumeration and related slots
- Introduced EnvironmentalZoneTypeEnum.yaml to classify climate-controlled storage zones with detailed descriptions and recommended conditions for various materials.
- Created slots for environmental zone type code, description, ID, label, and HC preset URI to facilitate structured data representation.
- Implemented boolean slots for specific environmental requirements including dark storage, dust-free environment, ESD protection, and UV filtering, referencing relevant ISO standards.
- Enhanced documentation for each slot to clarify usage and preservation context.
2026-01-11 21:14:59 +01:00
kempersc
66ab2908d0 fix: remove deprecated AnnotationMotivationEnum, add European surname data
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 3m21s
- Move deprecated AnnotationMotivationEnum to archive-deprecated/ (outside served paths)
- Add French, Italian, Polish, Spanish surname datasets for entity resolution
- Update name_commonality.py with expanded European surname detection
- Triggers GitOps workflow to test Forgejo Actions runner
2026-01-11 16:03:18 +01:00
kempersc
fd792fce2c Refactor code structure for improved readability and maintainability
Some checks failed
Deploy Frontend / build-and-deploy (push) Has been cancelled
2026-01-11 15:27:14 +01:00
kempersc
0f7fbf1ca0 feat(ci): add Forgejo Actions workflow for auto-deploy on LinkML schema changes
Some checks are pending
Deploy Frontend / build-and-deploy (push) Waiting to run
Infrastructure changes to enable automatic frontend deployment when schemas change:

- Add .forgejo/workflows/deploy-frontend.yml workflow triggered by:
  - Changes to frontend/** or schemas/20251121/linkml/**
  - Manual workflow dispatch

- Rewrite generate-schema-manifest.cjs to properly scan all schema directories
  - Recursively scans classes, enums, slots, modules directories
  - Uses singular category names (class, enum, slot) matching TypeScript types
  - Includes all 4 main schemas at root level
  - Skips archive directories and backup files

- Update schema-loader.ts to match new manifest format
  - Add SchemaCategory interface
  - Update SchemaManifest to use categories as array
  - Add flattenCategories() helper function
  - Add getSchemaCategories() and getSchemaCategoriesSync() functions

The workflow builds frontend with updated manifest and deploys to bronhouder.nl
2026-01-11 14:16:57 +01:00
kempersc
329b341bb1 refactor(schema): sync AnnotationMotivationType changes to frontend public schemas
- Update VideoAnnotation class with new motivation type references
- Add AnnotationMotivationType and AnnotationMotivationTypes class files
- Add motivation_type slots (description, id, name)
- Archive deprecated AnnotationMotivationEnum
- Update slot references for derived_from_entity, has_observation, has_person_observation
2026-01-11 14:16:39 +01:00
kempersc
9726cc7917 feat(frontend): Add AnnotationMotivationType to LinkML schema manifest
Add new AnnotationMotivationType and AnnotationMotivationTypes to the
SCHEMA_FILES array so they appear in the /linkml viewer.
2026-01-11 13:56:11 +01:00
kempersc
0a888ec682 chore: add node_modules to .gitignore and remove from tracking
- Add node_modules/ and .pnpm-store/ to .gitignore
- Remove 76k node_modules files from git tracking
- Update frontend manifest
2026-01-11 00:41:21 +01:00
kempersc
a4184cb805 feat(infra): add webhook-based schema deployment pipeline
- Add FastAPI webhook receiver for Forgejo push events
- Add setup script for server deployment
- Add Caddy snippet for webhook endpoint
- Add local sync-schemas.sh helper script
- Sync frontend schemas with source (archived deprecated slots)

Infrastructure scripts staged for optional webhook deployment.
Current deployment uses: ./infrastructure/deploy.sh --frontend
2026-01-10 21:45:02 +01:00
kempersc
6c19ef8661 feat(rag): add Rule 46 epistemic provenance tracking
Track full lineage of RAG responses: WHERE data comes from, WHEN it was
retrieved, HOW it was processed (SPARQL/vector/LLM).

Backend changes:
- Add provenance.py with EpistemicProvenance, DataTier, SourceAttribution
- Integrate provenance into MultiSourceRetriever.merge_results()
- Return epistemic_provenance in DSPyQueryResponse

Frontend changes:
- Pass EpistemicProvenance through useMultiDatabaseRAG hook
- Display provenance in ConversationPage (for cache transparency)

Schema fixes:
- Fix truncated example in has_observation.yaml slot definition

References:
- Pavlyshyn's Context Graphs and Data Traces paper
- LinkML ProvenanceBlock schema pattern
2026-01-10 18:42:43 +01:00
kempersc
28c3aaf33f enrich profiles 2026-01-10 17:31:02 +01:00
kempersc
13938c92ca chore(schemas): sync LinkML schemas to frontend apps
Copies authoritative schemas from schemas/20251121/ to:
- frontend/public/schemas/20251121/
- apps/archief-assistent/public/schemas/20251121/

This ensures slot definitions with corrected ontology property
references (commit 2808dad6cd) are available to frontend apps.
2026-01-10 15:02:25 +01:00
kempersc
8a475d5c02 refactor(linkml): apply RiC-O slot naming conventions (Rule 39)
Rename slots to follow Records in Contexts (RiC-O) style naming:
- Add 'has_' prefix for possession predicates (has_acquisition_method)
- Add 'is_or_was_' prefix for temporal relationships
- Add 'has_or_had_' for bidirectional temporal relations

Key changes across 496 schema files:
- acquisition_method → has_acquisition_method
- acquisition_date → has_acquisition_date
- acquisition_source → has_acquisition_source
- access_policy_ref → has_access_policy_reference
- arrangement → has_arrangement
- parent_custodian → is_or_was_suborganization_of (hierarchy)
- parent_custodian → associated_custodian (event association)

Also adds new slots following RiC-O patterns:
- is_or_was_aggregated_by
- is_or_was_allocated_by
- is_or_was_archive_department_of
- was_approved_by, was_archived_at, was_asserted_by

This aligns with AGENTS.md Rule 39: Slot Naming Convention (RiC-O Style)
for accurate temporal semantics in heritage custodian ontology.

Net change: +2,063 lines (new slots added, old patterns consolidated)
2026-01-10 10:33:51 +01:00
kempersc
004d342935 chore: minor updates and evaluation results
- auth.setup.ts: require env vars for test credentials (no hardcoded defaults)
- manifest.json: update schema manifest
- full_evaluation_results.json: add RAG evaluation results
- petra-links.json: update birth date from web claim
2026-01-09 21:10:55 +01:00
kempersc
f7bd3e9edc feat(linkml-viewer): add slot_usage side-by-side comparison view
- Add 'Compare' toggle button next to slots with slot_usage overrides
- Show generic slot definition vs class-specific override in 3-column grid
- Highlight changed properties with green 'changed' badge
- Display '(inherited)' when override matches generic definition
- Display '(not defined)' when generic has no value for property
- Compare: range, description, required, multivalued, slot_uri, pattern, identifier
- Full i18n support (Dutch/English translations)
- Responsive design: stacks vertically on mobile (<640px)
2026-01-09 21:02:14 +01:00
kempersc
9e67d0f967 enrich profiles 2026-01-09 20:35:19 +01:00
kempersc
932ec5438c add person profiles with PPID 2026-01-09 18:26:58 +01:00
kempersc
1ad717767a feat(linkml-viewer): add visual indicators for slot_usage overrides
- Add green 'slot_usage' badge for slots with class-specific overrides
- Add ✦ markers next to properties that are overridden vs inherited
- Add green left border styling for slots with slot_usage
- Add i18n translations (nl/en) for override indicators
- Merge generic slot definitions with class-specific slot_usage properties

This helps users understand which slot properties come from the generic
slot definition vs which are overridden at the class level via slot_usage.
2026-01-09 18:23:21 +01:00
kempersc
35a057981c chore(frontend): sync schema files with custodian_type → has_or_had_custodian_type refactor
- Remove deprecated slots: custodian_type.yaml, custodian_types.yaml,
  custodian_type_broader/narrower/related.yaml, custodian_types_primary/rationale.yaml
- Add new unified slot: has_or_had_custodian_type.yaml
- Sync all 236+ class files with updated slot references
- Update manifest.json
2026-01-09 12:15:32 +01:00
kempersc
c88fd3af70 Refactor code structure for improved readability and maintainability 2026-01-09 11:05:26 +01:00
kempersc
0393b321c9 refactor(schema): unify custodian_type slots into has_or_had_custodian_type (Rule 39, 43)
- Migrate 236+ class files from custodian_types to has_or_had_custodian_type
- Archive deprecated slots: custodian_type, custodian_types, custodian_type_broader/narrower/related
- Update main schema and manifest imports
- Fix Custodian.yaml class to use new slot
- Fix annotation format (list→scalar) in has_or_had_custodian_type.yaml

Rules applied:
- Rule 39: RiC-O naming convention (hasOrHad pattern)
- Rule 43: Slot nouns must be singular (multivalued:true for cardinality)
- Rule 38: Slot centralization with semantic URI
2026-01-09 10:55:21 +01:00
kempersc
6608a207d4 update frontend 2026-01-08 15:56:28 +01:00
kempersc
0b0ea75070 feat(rag): add factual query fast path - skip LLM for count/list queries
- Add ontology cache warming at startup in lifespan() function
- Add is_factual_query() detection in template_sparql.py (12 templates)
- Add factual_result and sparql_query fields to DSPyQueryResponse
- Skip LLM generation for factual templates (count, list, compare)
- Execute SPARQL directly and return results as table (~15s → ~2s latency)
- Update ConversationPanel.tsx to render factual results table
- Add CSS styling for factual results with green theme

For queries like 'hoeveel archieven zijn er in Den Haag', the SPARQL
results ARE the answer - no need for expensive LLM prose generation.
2026-01-08 13:34:23 +01:00
kempersc
9b769f1ca2 Update manifest timestamp and minor class fixes 2026-01-07 22:04:29 +01:00
kempersc
81da4ede50 Add comprehensive slot visualization to LinkML viewer
- Add standalone Slots section in visual view alongside Classes and Enums
- Display slot_uri, range, identifier badge, description, pattern
- Show examples with value/description pairs
- Color-coded SKOS mapping tags (exact/close/narrow/broad/related)
- Yellow highlighted comments section
- Custodian type filtering works with slots
- Shared renderSlotDetails() function for consistency
2026-01-07 22:03:08 +01:00