Commit graph

335 commits

Author SHA1 Message Date
kempersc
174a420c08 refactor(schema): centralize 1515 inline slot definitions per Rule 48
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m57s
- Remove inline slot definitions from 144 class files
- Create 7 new centralized slot files in modules/slots/:
  - custodian_type_broader.yaml
  - custodian_type_narrower.yaml
  - custodian_type_related.yaml
  - definition.yaml
  - finding_aid_access_restriction.yaml
  - finding_aid_description.yaml
  - finding_aid_temporal_coverage.yaml
- Add centralize_inline_slots.py automation script
- Update manifest with new timestamp

Rule 48: Class files must NOT define inline slots - all slots
must be imported from modules/slots/ directory.

Note: Pre-existing IdentifierFormat duplicate class definition
(in Standard.yaml and IdentifierFormat.yaml) not addressed in
this commit - requires separate schema refactor.
2026-01-11 22:02:14 +01:00
kempersc
3e6c2367ad feat(linkml-viewer): UX improvements - entry counts, deep links, settings persistence
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 4m4s
- Add entry count badge next to schema file name showing (xC, yE, zS) counts
- Add tooltip explaining LinkML file names vs class names
- Remove redundant section headers (Classes, Enums, Slots collapsible sections)
- Add URL params for enum (?enum=) and slot (?slot=) deep linking
- Persist category filters, dev tools visibility, and legend visibility to localStorage
- Set 'Main Schema' filter to OFF by default (confusing for users)
- Add Rule 48: Class files must not define inline slots
2026-01-11 21:42:35 +01:00
kempersc
eff3153f3f feat(schema): add Environmental Zone Type slot definitions
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m56s
Add 4 slot files for EnvironmentalZoneType class:
- environmental_zone_type_id: URI identifier slot
- environmental_zone_type_code: code slot for zone type codes
- environmental_zone_type_label: human-readable label
- environmental_zone_type_description: detailed description

Update manifest.json with new slot count (2084 slots total)
2026-01-11 21:22:44 +01:00
kempersc
7e9df1d600 chore(ci): remove GitHub dspy-eval workflow (replaced by Forgejo workflow) 2026-01-11 21:20:05 +01:00
kempersc
8470bf5860 feat(ci): add DSPy RAG evaluation workflow for Forgejo
Some checks failed
DSPy RAG Evaluation / Layer 1 - Unit Tests (push) Failing after 6m24s
DSPy RAG Evaluation / Layer 3 - Integration Tests (push) Has been skipped
DSPy RAG Evaluation / Layer 2 - DSPy Module Tests (push) Has been skipped
DSPy RAG Evaluation / Layer 4 - Comprehensive Evaluation (push) Has been skipped
DSPy RAG Evaluation / Quality Gate (push) Failing after 1s
Implements 4-layer testing pyramid:
- Layer 1: Fast unit tests (no LLM, ~5 min)
- Layer 2: DSPy module tests with LLM (~20 min)
- Layer 3: Integration tests via SSH tunnel to Oxigraph
- Layer 4: Comprehensive evaluation (nightly)

Includes:
- SSH tunnel setup for Oxigraph access
- Quality gate checks
- JUnit XML output for test results
- Scheduled nightly runs at 2 AM UTC
- Manual trigger with evaluation level selection
2026-01-11 21:19:40 +01:00
kempersc
95d79d0078 fix: update manifest with new generated timestamp and file counts; add EnvironmentalZoneType classes and new slot requirements
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 4m51s
2026-01-11 21:15:49 +01:00
kempersc
10bb5b69c5 Add Environmental Zone Type Enumeration and related slots
- Introduced EnvironmentalZoneTypeEnum.yaml to classify climate-controlled storage zones with detailed descriptions and recommended conditions for various materials.
- Created slots for environmental zone type code, description, ID, label, and HC preset URI to facilitate structured data representation.
- Implemented boolean slots for specific environmental requirements including dark storage, dust-free environment, ESD protection, and UV filtering, referencing relevant ISO standards.
- Enhanced documentation for each slot to clarify usage and preservation context.
2026-01-11 21:14:59 +01:00
kempersc
f9b950fa24 chore: ignore data/person/ directory (98K+ WCMS profiles) 2026-01-11 20:07:36 +01:00
kempersc
47e8226595 feat(tests): Complete DSPy GitOps testing framework
- Layer 1: 35 unit tests (no LLM required)
- Layer 2: 56 DSPy module tests with LLM
- Layer 3: 10 integration tests with Oxigraph
- Layer 4: Comprehensive evaluation suite

Fixed:
- Coordinate queries to use schema:location -> blank node pattern
- Golden query expected intent for location questions
- Health check test filtering in Layer 4

Added GitHub Actions workflow for CI/CD evaluation
2026-01-11 20:04:33 +01:00
kempersc
fce186b649 enrich person profiles 2026-01-11 18:08:40 +01:00
kempersc
a79d95fbf9 fix(ci): add jq to system dependencies and remove stale submodule entries
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 4m34s
- Add jq to apt-get install for deployment verification step
- Remove orphaned submodule entries (exa-mcp-server-source, mcp-wikidata) from git index
- Rename 'Install rsync' step to 'Install system dependencies'
2026-01-11 17:29:27 +01:00
kempersc
44469d3e4a chore: trigger workflow to test SSH key secret fix
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 4m51s
2026-01-11 17:09:14 +01:00
kempersc
0b29e7e805 chore: trigger GitOps workflow to test SSH key fix 2026-01-11 17:08:26 +01:00
kempersc
e94b58a289 fix(ci): install rsync in CI container
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 4m48s
The node:20-bookworm image doesn't include rsync which is needed
for the sync-schemas npm script.
2026-01-11 17:02:24 +01:00
kempersc
29ef609465 fix(ci): let pnpm version be read from package.json packageManager field
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 1m23s
The pnpm/action-setup detects version from package.json's packageManager
field automatically. Specifying version in workflow causes conflict.
2026-01-11 17:00:02 +01:00
kempersc
03a506382d fix(ci): include pnpm workspace files in sparse checkout
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 32s
The pnpm-lock.yaml and other workspace files are at the repository root,
not in frontend/. Add them to sparse checkout for pnpm install to work.
2026-01-11 16:58:22 +01:00
kempersc
5aeda1c195 fix(ci): disable pnpm caching due to path resolution issues
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 50s
The setup-node action fails to cache pnpm dependencies because the
store path /workspace/kempersc/glam/.pnpm-store/v3 can't be resolved.
Disabling caching for now to get the build working.
2026-01-11 16:43:49 +01:00
kempersc
b2b80cdad8 fix(ci): use pnpm instead of npm for workspace:* dependency support
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 59s
The frontend uses pnpm workspaces with 'workspace:*' protocol that npm
doesn't support. This updates the workflow to:
- Install pnpm using pnpm/action-setup
- Use pnpm for install, sync-schemas, generate-manifest, and build
- Cache pnpm dependencies using pnpm-lock.yaml
2026-01-11 16:41:57 +01:00
kempersc
b91be82af2 fix(ci): use sparse checkout to avoid large data/ directory
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 5m59s
The repository has 314K+ files including backup data that exceeds
the CI runner's disk space. This change uses sparse checkout to only
fetch frontend/ and schemas/ directories needed for the build.
2026-01-11 16:32:58 +01:00
kempersc
66ab2908d0 fix: remove deprecated AnnotationMotivationEnum, add European surname data
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 3m21s
- Move deprecated AnnotationMotivationEnum to archive-deprecated/ (outside served paths)
- Add French, Italian, Polish, Spanish surname datasets for entity resolution
- Update name_commonality.py with expanded European surname detection
- Triggers GitOps workflow to test Forgejo Actions runner
2026-01-11 16:03:18 +01:00
kempersc
fd792fce2c Refactor code structure for improved readability and maintainability
Some checks failed
Deploy Frontend / build-and-deploy (push) Has been cancelled
2026-01-11 15:27:14 +01:00
kempersc
055fd890ff test: verify pre-commit hook regenerates manifest
Some checks are pending
Deploy Frontend / build-and-deploy (push) Waiting to run
2026-01-11 15:21:58 +01:00
kempersc
3a661d6013 fix(schema): regenerate manifest to remove stale AnnotationMotivationEnum reference
Some checks are pending
Deploy Frontend / build-and-deploy (push) Waiting to run
The old enum was properly archived to modules/enums/archive/ with .deprecated
suffix per Rule 9, but the manifest wasn't regenerated. Now correctly shows
only AnnotationMotivationType.yaml and AnnotationMotivationTypes.yaml.
2026-01-11 15:16:50 +01:00
kempersc
b6e069a1d5 chore(schema): bump version to 0.9.12 - test webhook deployment
Some checks are pending
Deploy Frontend / build-and-deploy (push) Waiting to run
2026-01-11 15:08:35 +01:00
kempersc
0f7fbf1ca0 feat(ci): add Forgejo Actions workflow for auto-deploy on LinkML schema changes
Some checks are pending
Deploy Frontend / build-and-deploy (push) Waiting to run
Infrastructure changes to enable automatic frontend deployment when schemas change:

- Add .forgejo/workflows/deploy-frontend.yml workflow triggered by:
  - Changes to frontend/** or schemas/20251121/linkml/**
  - Manual workflow dispatch

- Rewrite generate-schema-manifest.cjs to properly scan all schema directories
  - Recursively scans classes, enums, slots, modules directories
  - Uses singular category names (class, enum, slot) matching TypeScript types
  - Includes all 4 main schemas at root level
  - Skips archive directories and backup files

- Update schema-loader.ts to match new manifest format
  - Add SchemaCategory interface
  - Update SchemaManifest to use categories as array
  - Add flattenCategories() helper function
  - Add getSchemaCategories() and getSchemaCategoriesSync() functions

The workflow builds frontend with updated manifest and deploys to bronhouder.nl
2026-01-11 14:16:57 +01:00
kempersc
329b341bb1 refactor(schema): sync AnnotationMotivationType changes to frontend public schemas
- Update VideoAnnotation class with new motivation type references
- Add AnnotationMotivationType and AnnotationMotivationTypes class files
- Add motivation_type slots (description, id, name)
- Archive deprecated AnnotationMotivationEnum
- Update slot references for derived_from_entity, has_observation, has_person_observation
2026-01-11 14:16:39 +01:00
kempersc
9726cc7917 feat(frontend): Add AnnotationMotivationType to LinkML schema manifest
Add new AnnotationMotivationType and AnnotationMotivationTypes to the
SCHEMA_FILES array so they appear in the /linkml viewer.
2026-01-11 13:56:11 +01:00
kempersc
55ef2a831d feat(data): add Belgian surnames dataset with metadata and surname counts 2026-01-11 13:50:20 +01:00
kempersc
be8b14f6ac refactor: Convert AnnotationMotivationEnum to Type/Types class hierarchy
- Create AnnotationMotivationType abstract base class (oa:Motivation)
- Create 10 concrete motivation subclasses in AnnotationMotivationTypes.yaml:
  - 6 W3C Web Annotation standard: classifying, describing, identifying,
    tagging, linking, commenting
  - 4 heritage-specific: accessibility, discovery, preservation, research
- Update has_annotation_motivation slot to use AnnotationMotivationType range
- Update VideoAnnotation.yaml imports and remove inline enum
- Archive deprecated AnnotationMotivationEnum.yaml
- Add motivation_type_id, motivation_type_name, motivation_type_description slots

Follows Rule 0b (Type/Types naming convention) and Rule 9 (enum-to-class promotion)
2026-01-11 13:48:28 +01:00
kempersc
7d09e4179c Add US surnames dataset from 2010 Census with metadata and surname counts 2026-01-11 12:28:58 +01:00
kempersc
dfb4744dc7 Evaluate data enrichments of persons 2026-01-11 12:15:27 +01:00
kempersc
49a8c341b5 chore(data): update geonames database journal file 2026-01-11 02:51:52 +01:00
kempersc
170fd73c49 feat(agents): update critical rules section to include entity resolution guidelines 2026-01-11 02:51:18 +01:00
kempersc
556cc6c294 Add workspace configuration for Git and Gitea integration
- Set up GitHub integration to be disabled.
- Configure Git settings including path and autofetch options.
- Add Gitea instance URL and repository details.
- Enable YAML support for LinkML schemas with validation.
- Define file associations for YAML files.
- Recommend essential extensions for development and exclude unwanted ones.
2026-01-11 02:50:39 +01:00
kempersc
b3e57e709c Refactor code structure for improved readability and maintainability 2026-01-11 02:24:34 +01:00
kempersc
3c3be47e32 feat(infra): add fast push-based schema sync to production
- Replace slow Forgejo→Server git pull with direct local rsync
- Add git-push-schemas.sh wrapper script for manual pushes
- Add post-commit hook for automatic schema sync
- Fix YAML syntax errors in slot comment blocks
- Update deploy-webhook.py to use master branch
2026-01-11 01:22:47 +01:00
kempersc
0df26a6e44 data(person): additional person profile enrichments 2026-01-11 00:41:59 +01:00
kempersc
0a888ec682 chore: add node_modules to .gitignore and remove from tracking
- Add node_modules/ and .pnpm-store/ to .gitignore
- Remove 76k node_modules files from git tracking
- Update frontend manifest
2026-01-11 00:41:21 +01:00
kempersc
3eb097d92e data(person): enrich 64 person profiles with comprehensive metadata
- Add inferred birth dates using EDTF notation
- Add inferred birth/current settlements
- Enrich employment history with temporal data
- Add heritage sector relevance scores
- Improve PPID component tracking
- Update .gitignore with large file patterns (warc, nt, trix, geonames.db)
2026-01-11 00:38:09 +01:00
kempersc
3dd1f11059 chore: sync repository to self-hosted Forgejo 2026-01-11 00:12:16 +01:00
kempersc
a4184cb805 feat(infra): add webhook-based schema deployment pipeline
- Add FastAPI webhook receiver for Forgejo push events
- Add setup script for server deployment
- Add Caddy snippet for webhook endpoint
- Add local sync-schemas.sh helper script
- Sync frontend schemas with source (archived deprecated slots)

Infrastructure scripts staged for optional webhook deployment.
Current deployment uses: ./infrastructure/deploy.sh --frontend
2026-01-10 21:45:02 +01:00
kempersc
f02cffe1e8 refactor(schema): migrate 5 deprecated slots to temporal naming convention
Migrate slots to follow RiC-O-style temporal naming (Rule 39):
- accepts_external_work → accepts_or_accepted_external_work
- accepts_visiting_scholars → accepts_or_accepted_visiting_scholar
- accepts_payment_methods → accepts_or_accepted_payment_method
- access → has_or_had_access_condition
- access_policy_ref → has_or_had_access_policy_reference

Updated classes to use new slot names:
- ConservationLab.yaml
- ResearchCenter.yaml
- GiftShop.yaml
- ArchiveReference.yaml
- FindingAid.yaml
- Collection.yaml

Archived deprecated slots to schemas/20251121/linkml/archive/slots/
with _archived_20260110 suffix per Rule 9 (enum-to-class principle).
2026-01-10 21:09:29 +01:00
kempersc
ac36b80476 feat(rag): add companion queries for count templates
Add companion_query support to fetch full entity records alongside
aggregate count queries. Enables displaying results on map/list when
asking 'how many museums in Amsterdam?'

Backend changes:
- Add companion_query, companion_query_region, companion_query_country
  fields to TemplateDefinition and TemplateMatchResult
- Add render_template_string() for raw companion query rendering

Template changes:
- Add companion queries to count_institutions_by_type_and_location
  for settlement, region, and country level queries
- Returns institution URI, name, coordinates, city for visualization
2026-01-10 18:44:06 +01:00
kempersc
f8b4ecad7d data(person): enrich 7 person profiles with detailed employment history
Update heritage professional profiles with:
- Separate role entries for different positions at same institution
- Employment date ranges (start_date, end_date)
- Updated observed_on timestamps
- Direct LinkedIn profile URLs as source

Profiles updated:
- Antoinet Nijssen (Noord-Hollands Archief)
- Anna Lakmaker
- Annelies Reus
- Marianne Hamersma
- Marcel Auwers
- Hans Felius
- Nico Vriend
2026-01-10 18:43:27 +01:00
kempersc
6c19ef8661 feat(rag): add Rule 46 epistemic provenance tracking
Track full lineage of RAG responses: WHERE data comes from, WHEN it was
retrieved, HOW it was processed (SPARQL/vector/LLM).

Backend changes:
- Add provenance.py with EpistemicProvenance, DataTier, SourceAttribution
- Integrate provenance into MultiSourceRetriever.merge_results()
- Return epistemic_provenance in DSPyQueryResponse

Frontend changes:
- Pass EpistemicProvenance through useMultiDatabaseRAG hook
- Display provenance in ConversationPage (for cache transparency)

Schema fixes:
- Fix truncated example in has_observation.yaml slot definition

References:
- Pavlyshyn's Context Graphs and Data Traces paper
- LinkML ProvenanceBlock schema pattern
2026-01-10 18:42:43 +01:00
kempersc
54dd4a9803 docs(server): add SERVER_OPERATIONS.md for Hetzner cx32 deployment
Document server disk architecture, PyTorch CPU-only setup, service
management, and recovery procedures learned from disk space crisis.

- Document dual-disk architecture (/: root 75GB, /mnt/data: 49GB)
- PyTorch CPU-only installation via --index-url whl/cpu
- Custodian data symlink: /mnt/data/custodian → /var/lib/glam/api/data/
- Service restart procedures for Oxigraph, GLAM API, Qdrant, etc.
- Emergency recovery commands for disk space crises
2026-01-10 18:42:15 +01:00
kempersc
28c3aaf33f enrich profiles 2026-01-10 17:31:02 +01:00
kempersc
bd257c52f4 data(person): update 2 additional profiles 2026-01-10 15:39:12 +01:00
kempersc
2f33e6a230 data(person): update DR-STAPEL profile 2026-01-10 15:38:37 +01:00
kempersc
cce484c6b8 feat(archief-assistent): enhance semantic cache with ontology-driven vocabulary
- Integrate tier-2 embeddings from types-vocab.json
- Add segment-based caching for improved retrieval
- Update tests and documentation
2026-01-10 15:38:11 +01:00