Commit graph

353 commits

Author SHA1 Message Date
kempersc
a7c06ea653 chore: trigger dspy-eval workflow for debugging
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 2m3s
DSPy RAG Evaluation / Layer 1 - Unit Tests (push) Failing after 5s
DSPy RAG Evaluation / Layer 3 - Integration Tests (push) Has been skipped
DSPy RAG Evaluation / Layer 2 - DSPy Module Tests (push) Has been skipped
DSPy RAG Evaluation / Layer 4 - Comprehensive Evaluation (push) Has been skipped
DSPy RAG Evaluation / Quality Gate (push) Failing after 2s
2026-01-12 19:15:07 +01:00
kempersc
8d7aca0f98 Refactor code structure for improved readability and maintainability 2026-01-12 19:13:35 +01:00
kempersc
3b35f4aea5 Refactor code structure for improved readability and maintainability 2026-01-12 18:31:31 +01:00
kempersc
846a6cdcec Add new Record Set Types for various archival collections
- Introduced SoundArchiveRecordSetType, SpecialCollectionRecordSetType, SpecializedArchiveRecordSetType, SpecializedArchivesCzechiaRecordSetType, StateArchivesRecordSetType, StateArchivesSectionRecordSetType, StateDistrictArchiveRecordSetType, StateRegionalArchiveCzechiaRecordSetType, TelevisionArchiveRecordSetType, TradeUnionArchiveRecordSetType, UniversityArchiveRecordSetType, VereinsarchivRecordSetType, VerlagsarchivRecordSetType, VerwaltungsarchivRecordSetType, WebArchiveRecordSetType, and WomensArchivesRecordSetType.
- Each new type includes appropriate metadata, slots, and relationships to existing classes.
- Implemented a script to detect and fix Type class violations in LinkML files.
2026-01-12 15:20:29 +01:00
kempersc
5807840bbc fix: update generated timestamp in manifest.json
Some checks failed
Deploy Frontend / build-and-deploy (push) Successful in 4m2s
DSPy RAG Evaluation / Layer 1 - Unit Tests (push) Failing after 6s
DSPy RAG Evaluation / Layer 3 - Integration Tests (push) Has been skipped
DSPy RAG Evaluation / Layer 2 - DSPy Module Tests (push) Has been skipped
DSPy RAG Evaluation / Layer 4 - Comprehensive Evaluation (push) Has been skipped
DSPy RAG Evaluation / Quality Gate (push) Failing after 2s
2026-01-12 14:46:05 +01:00
kempersc
355d8be51d centralise slots 2026-01-12 14:33:56 +01:00
kempersc
f2b2481272 chore: trigger dspy-eval workflow (touch workflow file)
Some checks failed
DSPy RAG Evaluation / Layer 1 - Unit Tests (push) Failing after 5s
DSPy RAG Evaluation / Layer 3 - Integration Tests (push) Has been skipped
DSPy RAG Evaluation / Layer 2 - DSPy Module Tests (push) Has been skipped
DSPy RAG Evaluation / Layer 4 - Comprehensive Evaluation (push) Has been skipped
DSPy RAG Evaluation / Quality Gate (push) Failing after 2s
2026-01-12 11:17:49 +01:00
kempersc
81e614098f fix(caddy): disable browser caching for schema files
Add Cache-Control: no-cache, must-revalidate for /schemas/* paths
This ensures frontend always fetches fresh YAML content after updates
2026-01-11 23:38:08 +01:00
kempersc
070c87af7b refactor(migrate_wcms_resume): use recursive glob to find user JSON files and skip macOS hidden files 2026-01-11 23:32:27 +01:00
kempersc
f497be98d1 chore: update schema manifest after slot centralization
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m54s
2026-01-11 23:28:03 +01:00
kempersc
0d5d48568d refactor(schema): centralize slot definitions per Rule 38
- Remove slot_uri, description, mappings from slot_usage sections
- Move these properties to centralized slot files in modules/slots/
- Keep only class-specific overrides in slot_usage (required, inlined, examples)
- Update 1,499 centralized slot files with enriched definitions
- Clean 188 class files

Violations fixed:
- slot_uri in slot_usage: 1,676 → 0
- description in slot_usage: 2,287 → 0 (moved to centralized)

Schema still validates: 816 classes, 2028 slots, 127 enums
2026-01-11 23:27:17 +01:00
kempersc
da5660cf4c chore: update schema manifest timestamp 2026-01-11 23:19:52 +01:00
kempersc
5d3d8530b0 chore: trigger DSPy eval workflow
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 4m13s
2026-01-11 22:40:23 +01:00
kempersc
75d53a006e chore: trigger DSPy eval workflow after runner restart 2026-01-11 22:39:14 +01:00
kempersc
02d5b61e40 fix(ci): use Python container image instead of setup-python action
Some checks failed
Deploy Frontend / build-and-deploy (push) Successful in 4m3s
DSPy RAG Evaluation / Layer 1 - Unit Tests (push) Failing after 12s
DSPy RAG Evaluation / Layer 3 - Integration Tests (push) Has been skipped
DSPy RAG Evaluation / Layer 2 - DSPy Module Tests (push) Has been skipped
DSPy RAG Evaluation / Layer 4 - Comprehensive Evaluation (push) Has been skipped
DSPy RAG Evaluation / Quality Gate (push) Failing after 1s
- Switch all jobs to use python:3.11-slim container
- Remove setup-python action (not cached on Forgejo runner)
- Add apt-get install for openssh-client and curl in SSH tunnel jobs
- Increased timeout for unit-tests to 10 minutes
- Remove unused PYTHON_VERSION env var
2026-01-11 22:28:49 +01:00
kempersc
56c373bba8 Implement fast WCMS migration script with state file checkpointing and batch processing 2026-01-11 22:26:37 +01:00
kempersc
8856be1085 chore: trigger DSPy eval workflow
Some checks failed
DSPy RAG Evaluation / Layer 1 - Unit Tests (push) Failing after 5m9s
DSPy RAG Evaluation / Layer 3 - Integration Tests (push) Has been skipped
DSPy RAG Evaluation / Layer 2 - DSPy Module Tests (push) Has been skipped
DSPy RAG Evaluation / Layer 4 - Comprehensive Evaluation (push) Has been skipped
DSPy RAG Evaluation / Quality Gate (push) Failing after 2s
2026-01-11 22:18:17 +01:00
kempersc
888439ede2 chore: trigger DSPy eval workflow test 2026-01-11 22:18:05 +01:00
kempersc
174a420c08 refactor(schema): centralize 1515 inline slot definitions per Rule 48
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m57s
- Remove inline slot definitions from 144 class files
- Create 7 new centralized slot files in modules/slots/:
  - custodian_type_broader.yaml
  - custodian_type_narrower.yaml
  - custodian_type_related.yaml
  - definition.yaml
  - finding_aid_access_restriction.yaml
  - finding_aid_description.yaml
  - finding_aid_temporal_coverage.yaml
- Add centralize_inline_slots.py automation script
- Update manifest with new timestamp

Rule 48: Class files must NOT define inline slots - all slots
must be imported from modules/slots/ directory.

Note: Pre-existing IdentifierFormat duplicate class definition
(in Standard.yaml and IdentifierFormat.yaml) not addressed in
this commit - requires separate schema refactor.
2026-01-11 22:02:14 +01:00
kempersc
3e6c2367ad feat(linkml-viewer): UX improvements - entry counts, deep links, settings persistence
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 4m4s
- Add entry count badge next to schema file name showing (xC, yE, zS) counts
- Add tooltip explaining LinkML file names vs class names
- Remove redundant section headers (Classes, Enums, Slots collapsible sections)
- Add URL params for enum (?enum=) and slot (?slot=) deep linking
- Persist category filters, dev tools visibility, and legend visibility to localStorage
- Set 'Main Schema' filter to OFF by default (confusing for users)
- Add Rule 48: Class files must not define inline slots
2026-01-11 21:42:35 +01:00
kempersc
eff3153f3f feat(schema): add Environmental Zone Type slot definitions
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m56s
Add 4 slot files for EnvironmentalZoneType class:
- environmental_zone_type_id: URI identifier slot
- environmental_zone_type_code: code slot for zone type codes
- environmental_zone_type_label: human-readable label
- environmental_zone_type_description: detailed description

Update manifest.json with new slot count (2084 slots total)
2026-01-11 21:22:44 +01:00
kempersc
7e9df1d600 chore(ci): remove GitHub dspy-eval workflow (replaced by Forgejo workflow) 2026-01-11 21:20:05 +01:00
kempersc
8470bf5860 feat(ci): add DSPy RAG evaluation workflow for Forgejo
Some checks failed
DSPy RAG Evaluation / Layer 1 - Unit Tests (push) Failing after 6m24s
DSPy RAG Evaluation / Layer 3 - Integration Tests (push) Has been skipped
DSPy RAG Evaluation / Layer 2 - DSPy Module Tests (push) Has been skipped
DSPy RAG Evaluation / Layer 4 - Comprehensive Evaluation (push) Has been skipped
DSPy RAG Evaluation / Quality Gate (push) Failing after 1s
Implements 4-layer testing pyramid:
- Layer 1: Fast unit tests (no LLM, ~5 min)
- Layer 2: DSPy module tests with LLM (~20 min)
- Layer 3: Integration tests via SSH tunnel to Oxigraph
- Layer 4: Comprehensive evaluation (nightly)

Includes:
- SSH tunnel setup for Oxigraph access
- Quality gate checks
- JUnit XML output for test results
- Scheduled nightly runs at 2 AM UTC
- Manual trigger with evaluation level selection
2026-01-11 21:19:40 +01:00
kempersc
95d79d0078 fix: update manifest with new generated timestamp and file counts; add EnvironmentalZoneType classes and new slot requirements
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 4m51s
2026-01-11 21:15:49 +01:00
kempersc
10bb5b69c5 Add Environmental Zone Type Enumeration and related slots
- Introduced EnvironmentalZoneTypeEnum.yaml to classify climate-controlled storage zones with detailed descriptions and recommended conditions for various materials.
- Created slots for environmental zone type code, description, ID, label, and HC preset URI to facilitate structured data representation.
- Implemented boolean slots for specific environmental requirements including dark storage, dust-free environment, ESD protection, and UV filtering, referencing relevant ISO standards.
- Enhanced documentation for each slot to clarify usage and preservation context.
2026-01-11 21:14:59 +01:00
kempersc
f9b950fa24 chore: ignore data/person/ directory (98K+ WCMS profiles) 2026-01-11 20:07:36 +01:00
kempersc
47e8226595 feat(tests): Complete DSPy GitOps testing framework
- Layer 1: 35 unit tests (no LLM required)
- Layer 2: 56 DSPy module tests with LLM
- Layer 3: 10 integration tests with Oxigraph
- Layer 4: Comprehensive evaluation suite

Fixed:
- Coordinate queries to use schema:location -> blank node pattern
- Golden query expected intent for location questions
- Health check test filtering in Layer 4

Added GitHub Actions workflow for CI/CD evaluation
2026-01-11 20:04:33 +01:00
kempersc
fce186b649 enrich person profiles 2026-01-11 18:08:40 +01:00
kempersc
a79d95fbf9 fix(ci): add jq to system dependencies and remove stale submodule entries
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 4m34s
- Add jq to apt-get install for deployment verification step
- Remove orphaned submodule entries (exa-mcp-server-source, mcp-wikidata) from git index
- Rename 'Install rsync' step to 'Install system dependencies'
2026-01-11 17:29:27 +01:00
kempersc
44469d3e4a chore: trigger workflow to test SSH key secret fix
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 4m51s
2026-01-11 17:09:14 +01:00
kempersc
0b29e7e805 chore: trigger GitOps workflow to test SSH key fix 2026-01-11 17:08:26 +01:00
kempersc
e94b58a289 fix(ci): install rsync in CI container
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 4m48s
The node:20-bookworm image doesn't include rsync which is needed
for the sync-schemas npm script.
2026-01-11 17:02:24 +01:00
kempersc
29ef609465 fix(ci): let pnpm version be read from package.json packageManager field
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 1m23s
The pnpm/action-setup detects version from package.json's packageManager
field automatically. Specifying version in workflow causes conflict.
2026-01-11 17:00:02 +01:00
kempersc
03a506382d fix(ci): include pnpm workspace files in sparse checkout
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 32s
The pnpm-lock.yaml and other workspace files are at the repository root,
not in frontend/. Add them to sparse checkout for pnpm install to work.
2026-01-11 16:58:22 +01:00
kempersc
5aeda1c195 fix(ci): disable pnpm caching due to path resolution issues
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 50s
The setup-node action fails to cache pnpm dependencies because the
store path /workspace/kempersc/glam/.pnpm-store/v3 can't be resolved.
Disabling caching for now to get the build working.
2026-01-11 16:43:49 +01:00
kempersc
b2b80cdad8 fix(ci): use pnpm instead of npm for workspace:* dependency support
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 59s
The frontend uses pnpm workspaces with 'workspace:*' protocol that npm
doesn't support. This updates the workflow to:
- Install pnpm using pnpm/action-setup
- Use pnpm for install, sync-schemas, generate-manifest, and build
- Cache pnpm dependencies using pnpm-lock.yaml
2026-01-11 16:41:57 +01:00
kempersc
b91be82af2 fix(ci): use sparse checkout to avoid large data/ directory
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 5m59s
The repository has 314K+ files including backup data that exceeds
the CI runner's disk space. This change uses sparse checkout to only
fetch frontend/ and schemas/ directories needed for the build.
2026-01-11 16:32:58 +01:00
kempersc
66ab2908d0 fix: remove deprecated AnnotationMotivationEnum, add European surname data
Some checks failed
Deploy Frontend / build-and-deploy (push) Failing after 3m21s
- Move deprecated AnnotationMotivationEnum to archive-deprecated/ (outside served paths)
- Add French, Italian, Polish, Spanish surname datasets for entity resolution
- Update name_commonality.py with expanded European surname detection
- Triggers GitOps workflow to test Forgejo Actions runner
2026-01-11 16:03:18 +01:00
kempersc
fd792fce2c Refactor code structure for improved readability and maintainability
Some checks failed
Deploy Frontend / build-and-deploy (push) Has been cancelled
2026-01-11 15:27:14 +01:00
kempersc
055fd890ff test: verify pre-commit hook regenerates manifest
Some checks are pending
Deploy Frontend / build-and-deploy (push) Waiting to run
2026-01-11 15:21:58 +01:00
kempersc
3a661d6013 fix(schema): regenerate manifest to remove stale AnnotationMotivationEnum reference
Some checks are pending
Deploy Frontend / build-and-deploy (push) Waiting to run
The old enum was properly archived to modules/enums/archive/ with .deprecated
suffix per Rule 9, but the manifest wasn't regenerated. Now correctly shows
only AnnotationMotivationType.yaml and AnnotationMotivationTypes.yaml.
2026-01-11 15:16:50 +01:00
kempersc
b6e069a1d5 chore(schema): bump version to 0.9.12 - test webhook deployment
Some checks are pending
Deploy Frontend / build-and-deploy (push) Waiting to run
2026-01-11 15:08:35 +01:00
kempersc
0f7fbf1ca0 feat(ci): add Forgejo Actions workflow for auto-deploy on LinkML schema changes
Some checks are pending
Deploy Frontend / build-and-deploy (push) Waiting to run
Infrastructure changes to enable automatic frontend deployment when schemas change:

- Add .forgejo/workflows/deploy-frontend.yml workflow triggered by:
  - Changes to frontend/** or schemas/20251121/linkml/**
  - Manual workflow dispatch

- Rewrite generate-schema-manifest.cjs to properly scan all schema directories
  - Recursively scans classes, enums, slots, modules directories
  - Uses singular category names (class, enum, slot) matching TypeScript types
  - Includes all 4 main schemas at root level
  - Skips archive directories and backup files

- Update schema-loader.ts to match new manifest format
  - Add SchemaCategory interface
  - Update SchemaManifest to use categories as array
  - Add flattenCategories() helper function
  - Add getSchemaCategories() and getSchemaCategoriesSync() functions

The workflow builds frontend with updated manifest and deploys to bronhouder.nl
2026-01-11 14:16:57 +01:00
kempersc
329b341bb1 refactor(schema): sync AnnotationMotivationType changes to frontend public schemas
- Update VideoAnnotation class with new motivation type references
- Add AnnotationMotivationType and AnnotationMotivationTypes class files
- Add motivation_type slots (description, id, name)
- Archive deprecated AnnotationMotivationEnum
- Update slot references for derived_from_entity, has_observation, has_person_observation
2026-01-11 14:16:39 +01:00
kempersc
9726cc7917 feat(frontend): Add AnnotationMotivationType to LinkML schema manifest
Add new AnnotationMotivationType and AnnotationMotivationTypes to the
SCHEMA_FILES array so they appear in the /linkml viewer.
2026-01-11 13:56:11 +01:00
kempersc
55ef2a831d feat(data): add Belgian surnames dataset with metadata and surname counts 2026-01-11 13:50:20 +01:00
kempersc
be8b14f6ac refactor: Convert AnnotationMotivationEnum to Type/Types class hierarchy
- Create AnnotationMotivationType abstract base class (oa:Motivation)
- Create 10 concrete motivation subclasses in AnnotationMotivationTypes.yaml:
  - 6 W3C Web Annotation standard: classifying, describing, identifying,
    tagging, linking, commenting
  - 4 heritage-specific: accessibility, discovery, preservation, research
- Update has_annotation_motivation slot to use AnnotationMotivationType range
- Update VideoAnnotation.yaml imports and remove inline enum
- Archive deprecated AnnotationMotivationEnum.yaml
- Add motivation_type_id, motivation_type_name, motivation_type_description slots

Follows Rule 0b (Type/Types naming convention) and Rule 9 (enum-to-class promotion)
2026-01-11 13:48:28 +01:00
kempersc
7d09e4179c Add US surnames dataset from 2010 Census with metadata and surname counts 2026-01-11 12:28:58 +01:00
kempersc
dfb4744dc7 Evaluate data enrichments of persons 2026-01-11 12:15:27 +01:00
kempersc
49a8c341b5 chore(data): update geonames database journal file 2026-01-11 02:51:52 +01:00