kempersc/glam - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
kempersc	e94b58a289	fix(ci): install rsync in CI container Some checks failed Deploy Frontend / build-and-deploy (push) Failing after 4m48s Details The node:20-bookworm image doesn't include rsync which is needed for the sync-schemas npm script.	2026-01-11 17:02:24 +01:00
kempersc	29ef609465	fix(ci): let pnpm version be read from package.json packageManager field Some checks failed Deploy Frontend / build-and-deploy (push) Failing after 1m23s Details The pnpm/action-setup detects version from package.json's packageManager field automatically. Specifying version in workflow causes conflict.	2026-01-11 17:00:02 +01:00
kempersc	03a506382d	fix(ci): include pnpm workspace files in sparse checkout Some checks failed Deploy Frontend / build-and-deploy (push) Failing after 32s Details The pnpm-lock.yaml and other workspace files are at the repository root, not in frontend/. Add them to sparse checkout for pnpm install to work.	2026-01-11 16:58:22 +01:00
kempersc	5aeda1c195	fix(ci): disable pnpm caching due to path resolution issues Some checks failed Deploy Frontend / build-and-deploy (push) Failing after 50s Details The setup-node action fails to cache pnpm dependencies because the store path /workspace/kempersc/glam/.pnpm-store/v3 can't be resolved. Disabling caching for now to get the build working.	2026-01-11 16:43:49 +01:00
kempersc	b2b80cdad8	fix(ci): use pnpm instead of npm for workspace:* dependency support Some checks failed Deploy Frontend / build-and-deploy (push) Failing after 59s Details The frontend uses pnpm workspaces with 'workspace:*' protocol that npm doesn't support. This updates the workflow to: - Install pnpm using pnpm/action-setup - Use pnpm for install, sync-schemas, generate-manifest, and build - Cache pnpm dependencies using pnpm-lock.yaml	2026-01-11 16:41:57 +01:00
kempersc	b91be82af2	fix(ci): use sparse checkout to avoid large data/ directory Some checks failed Deploy Frontend / build-and-deploy (push) Failing after 5m59s Details The repository has 314K+ files including backup data that exceeds the CI runner's disk space. This change uses sparse checkout to only fetch frontend/ and schemas/ directories needed for the build.	2026-01-11 16:32:58 +01:00
kempersc	66ab2908d0	fix: remove deprecated AnnotationMotivationEnum, add European surname data Some checks failed Deploy Frontend / build-and-deploy (push) Failing after 3m21s Details - Move deprecated AnnotationMotivationEnum to archive-deprecated/ (outside served paths) - Add French, Italian, Polish, Spanish surname datasets for entity resolution - Update name_commonality.py with expanded European surname detection - Triggers GitOps workflow to test Forgejo Actions runner	2026-01-11 16:03:18 +01:00
kempersc	fd792fce2c	Refactor code structure for improved readability and maintainability Some checks failed Deploy Frontend / build-and-deploy (push) Has been cancelled Details	2026-01-11 15:27:14 +01:00
kempersc	055fd890ff	test: verify pre-commit hook regenerates manifest Some checks are pending Deploy Frontend / build-and-deploy (push) Waiting to run Details	2026-01-11 15:21:58 +01:00
kempersc	3a661d6013	fix(schema): regenerate manifest to remove stale AnnotationMotivationEnum reference Some checks are pending Deploy Frontend / build-and-deploy (push) Waiting to run Details The old enum was properly archived to modules/enums/archive/ with .deprecated suffix per Rule 9, but the manifest wasn't regenerated. Now correctly shows only AnnotationMotivationType.yaml and AnnotationMotivationTypes.yaml.	2026-01-11 15:16:50 +01:00
kempersc	b6e069a1d5	chore(schema): bump version to 0.9.12 - test webhook deployment Some checks are pending Deploy Frontend / build-and-deploy (push) Waiting to run Details	2026-01-11 15:08:35 +01:00
kempersc	0f7fbf1ca0	feat(ci): add Forgejo Actions workflow for auto-deploy on LinkML schema changes Some checks are pending Deploy Frontend / build-and-deploy (push) Waiting to run Details Infrastructure changes to enable automatic frontend deployment when schemas change: - Add .forgejo/workflows/deploy-frontend.yml workflow triggered by: - Changes to frontend/ or schemas/20251121/linkml/ - Manual workflow dispatch - Rewrite generate-schema-manifest.cjs to properly scan all schema directories - Recursively scans classes, enums, slots, modules directories - Uses singular category names (class, enum, slot) matching TypeScript types - Includes all 4 main schemas at root level - Skips archive directories and backup files - Update schema-loader.ts to match new manifest format - Add SchemaCategory interface - Update SchemaManifest to use categories as array - Add flattenCategories() helper function - Add getSchemaCategories() and getSchemaCategoriesSync() functions The workflow builds frontend with updated manifest and deploys to bronhouder.nl	2026-01-11 14:16:57 +01:00
kempersc	329b341bb1	refactor(schema): sync AnnotationMotivationType changes to frontend public schemas - Update VideoAnnotation class with new motivation type references - Add AnnotationMotivationType and AnnotationMotivationTypes class files - Add motivation_type slots (description, id, name) - Archive deprecated AnnotationMotivationEnum - Update slot references for derived_from_entity, has_observation, has_person_observation	2026-01-11 14:16:39 +01:00
kempersc	9726cc7917	feat(frontend): Add AnnotationMotivationType to LinkML schema manifest Add new AnnotationMotivationType and AnnotationMotivationTypes to the SCHEMA_FILES array so they appear in the /linkml viewer.	2026-01-11 13:56:11 +01:00
kempersc	55ef2a831d	feat(data): add Belgian surnames dataset with metadata and surname counts	2026-01-11 13:50:20 +01:00
kempersc	be8b14f6ac	refactor: Convert AnnotationMotivationEnum to Type/Types class hierarchy - Create AnnotationMotivationType abstract base class (oa:Motivation) - Create 10 concrete motivation subclasses in AnnotationMotivationTypes.yaml: - 6 W3C Web Annotation standard: classifying, describing, identifying, tagging, linking, commenting - 4 heritage-specific: accessibility, discovery, preservation, research - Update has_annotation_motivation slot to use AnnotationMotivationType range - Update VideoAnnotation.yaml imports and remove inline enum - Archive deprecated AnnotationMotivationEnum.yaml - Add motivation_type_id, motivation_type_name, motivation_type_description slots Follows Rule 0b (Type/Types naming convention) and Rule 9 (enum-to-class promotion)	2026-01-11 13:48:28 +01:00
kempersc	7d09e4179c	Add US surnames dataset from 2010 Census with metadata and surname counts	2026-01-11 12:28:58 +01:00
kempersc	dfb4744dc7	Evaluate data enrichments of persons	2026-01-11 12:15:27 +01:00
kempersc	49a8c341b5	chore(data): update geonames database journal file	2026-01-11 02:51:52 +01:00
kempersc	170fd73c49	feat(agents): update critical rules section to include entity resolution guidelines	2026-01-11 02:51:18 +01:00
kempersc	556cc6c294	Add workspace configuration for Git and Gitea integration - Set up GitHub integration to be disabled. - Configure Git settings including path and autofetch options. - Add Gitea instance URL and repository details. - Enable YAML support for LinkML schemas with validation. - Define file associations for YAML files. - Recommend essential extensions for development and exclude unwanted ones.	2026-01-11 02:50:39 +01:00
kempersc	b3e57e709c	Refactor code structure for improved readability and maintainability	2026-01-11 02:24:34 +01:00
kempersc	3c3be47e32	feat(infra): add fast push-based schema sync to production - Replace slow Forgejo→Server git pull with direct local rsync - Add git-push-schemas.sh wrapper script for manual pushes - Add post-commit hook for automatic schema sync - Fix YAML syntax errors in slot comment blocks - Update deploy-webhook.py to use master branch	2026-01-11 01:22:47 +01:00
kempersc	0df26a6e44	data(person): additional person profile enrichments	2026-01-11 00:41:59 +01:00
kempersc	0a888ec682	chore: add node_modules to .gitignore and remove from tracking - Add node_modules/ and .pnpm-store/ to .gitignore - Remove 76k node_modules files from git tracking - Update frontend manifest	2026-01-11 00:41:21 +01:00
kempersc	3eb097d92e	data(person): enrich 64 person profiles with comprehensive metadata - Add inferred birth dates using EDTF notation - Add inferred birth/current settlements - Enrich employment history with temporal data - Add heritage sector relevance scores - Improve PPID component tracking - Update .gitignore with large file patterns (warc, nt, trix, geonames.db)	2026-01-11 00:38:09 +01:00
kempersc	3dd1f11059	chore: sync repository to self-hosted Forgejo	2026-01-11 00:12:16 +01:00
kempersc	a4184cb805	feat(infra): add webhook-based schema deployment pipeline - Add FastAPI webhook receiver for Forgejo push events - Add setup script for server deployment - Add Caddy snippet for webhook endpoint - Add local sync-schemas.sh helper script - Sync frontend schemas with source (archived deprecated slots) Infrastructure scripts staged for optional webhook deployment. Current deployment uses: ./infrastructure/deploy.sh --frontend	2026-01-10 21:45:02 +01:00
kempersc	f02cffe1e8	refactor(schema): migrate 5 deprecated slots to temporal naming convention Migrate slots to follow RiC-O-style temporal naming (Rule 39): - accepts_external_work → accepts_or_accepted_external_work - accepts_visiting_scholars → accepts_or_accepted_visiting_scholar - accepts_payment_methods → accepts_or_accepted_payment_method - access → has_or_had_access_condition - access_policy_ref → has_or_had_access_policy_reference Updated classes to use new slot names: - ConservationLab.yaml - ResearchCenter.yaml - GiftShop.yaml - ArchiveReference.yaml - FindingAid.yaml - Collection.yaml Archived deprecated slots to schemas/20251121/linkml/archive/slots/ with _archived_20260110 suffix per Rule 9 (enum-to-class principle).	2026-01-10 21:09:29 +01:00
kempersc	ac36b80476	feat(rag): add companion queries for count templates Add companion_query support to fetch full entity records alongside aggregate count queries. Enables displaying results on map/list when asking 'how many museums in Amsterdam?' Backend changes: - Add companion_query, companion_query_region, companion_query_country fields to TemplateDefinition and TemplateMatchResult - Add render_template_string() for raw companion query rendering Template changes: - Add companion queries to count_institutions_by_type_and_location for settlement, region, and country level queries - Returns institution URI, name, coordinates, city for visualization	2026-01-10 18:44:06 +01:00
kempersc	f8b4ecad7d	data(person): enrich 7 person profiles with detailed employment history Update heritage professional profiles with: - Separate role entries for different positions at same institution - Employment date ranges (start_date, end_date) - Updated observed_on timestamps - Direct LinkedIn profile URLs as source Profiles updated: - Antoinet Nijssen (Noord-Hollands Archief) - Anna Lakmaker - Annelies Reus - Marianne Hamersma - Marcel Auwers - Hans Felius - Nico Vriend	2026-01-10 18:43:27 +01:00
kempersc	6c19ef8661	feat(rag): add Rule 46 epistemic provenance tracking Track full lineage of RAG responses: WHERE data comes from, WHEN it was retrieved, HOW it was processed (SPARQL/vector/LLM). Backend changes: - Add provenance.py with EpistemicProvenance, DataTier, SourceAttribution - Integrate provenance into MultiSourceRetriever.merge_results() - Return epistemic_provenance in DSPyQueryResponse Frontend changes: - Pass EpistemicProvenance through useMultiDatabaseRAG hook - Display provenance in ConversationPage (for cache transparency) Schema fixes: - Fix truncated example in has_observation.yaml slot definition References: - Pavlyshyn's Context Graphs and Data Traces paper - LinkML ProvenanceBlock schema pattern	2026-01-10 18:42:43 +01:00
kempersc	54dd4a9803	docs(server): add SERVER_OPERATIONS.md for Hetzner cx32 deployment Document server disk architecture, PyTorch CPU-only setup, service management, and recovery procedures learned from disk space crisis. - Document dual-disk architecture (/: root 75GB, /mnt/data: 49GB) - PyTorch CPU-only installation via --index-url whl/cpu - Custodian data symlink: /mnt/data/custodian → /var/lib/glam/api/data/ - Service restart procedures for Oxigraph, GLAM API, Qdrant, etc. - Emergency recovery commands for disk space crises	2026-01-10 18:42:15 +01:00
kempersc	28c3aaf33f	enrich profiles	2026-01-10 17:31:02 +01:00
kempersc	bd257c52f4	data(person): update 2 additional profiles	2026-01-10 15:39:12 +01:00
kempersc	2f33e6a230	data(person): update DR-STAPEL profile	2026-01-10 15:38:37 +01:00
kempersc	cce484c6b8	feat(archief-assistent): enhance semantic cache with ontology-driven vocabulary - Integrate tier-2 embeddings from types-vocab.json - Add segment-based caching for improved retrieval - Update tests and documentation	2026-01-10 15:38:11 +01:00
kempersc	ad74d8379e	feat(scripts): improve types-vocab extraction to derive all vocabulary from schema - Remove hardcoded type mappings, derive dynamically from LinkML - Extract keywords from annotations, structured_aliases, and comments - Add rename_plural_slot.py utility for schema slot renaming	2026-01-10 15:37:52 +01:00
kempersc	ec18e1810d	data(person): enrich 7 profiles with detailed affiliations and GHCIDs - Add GHCID references to custodian affiliations - Add start dates for employment periods - Expand heritage type classifications (A→[A,F]) - Add detailed rationales based on career history - Add full_initials from archival publications	2026-01-10 15:36:49 +01:00
kempersc	626bd3a095	refactor(schemas): apply naming conventions to 261 class files - Apply Rule 39: RiC-O style hasOrHad/isOrWas for temporal slots - Apply Rule 43: Singular noun convention (keywords → keyword) - Update slot references to match renamed slot files - Maintain schema integrity across all class definitions	2026-01-10 15:36:33 +01:00
kempersc	94bfc9061e	refactor(schemas): consolidate slot definitions and remove 305 redundant files - Apply Rule 39: RiC-O style temporal naming (hasOrHad, isOrWas) - Apply Rule 43: Singular noun convention for slot names - Remove duplicate slot definitions consolidated into centralized files - Net reduction: 6,162 lines across 305 deleted files	2026-01-10 15:36:13 +01:00
kempersc	13938c92ca	chore(schemas): sync LinkML schemas to frontend apps Copies authoritative schemas from schemas/20251121/ to: - frontend/public/schemas/20251121/ - apps/archief-assistent/public/schemas/20251121/ This ensures slot definitions with corrected ontology property references (commit `2808dad6cd`) are available to frontend apps.	2026-01-10 15:02:25 +01:00
kempersc	e5a08a353d	enrich person profiles	2026-01-10 14:14:04 +01:00
kempersc	9339de2cfb	data(person): process 44,512 heritage-relevant profiles from entity extractions Processing Summary: - Scanned 94,716 LinkedIn entity files - Identified 44,512 heritage-relevant individuals (47%) - Created 1,430 new PPID-formatted profiles - Updated 43,070 existing profiles with entity data - Final count: 40,731 person profiles Profile updates include: - Merged web_claims with full provenance - Added/updated heritage_relevance scoring - Added affiliation data with custodian references - Added inferred birth decades with provenance chains (Rule 45) All data preserved per Rule 5 (additive only)	2026-01-10 14:01:29 +01:00
kempersc	3a15f2bdaa	feat(scripts): add entity-to-PPID processing script - Processes 94,716 LinkedIn entity files from data/custodian/person/entity/ - Identifies heritage-relevant profiles (47% of total) - Generates PPID-formatted filenames with inferred locations/dates - Merges with existing profiles, preserving all provenance data - Applies Rules 12, 20, 27, 44, 45 for person data architecture - Fixed edge case: handle null education/experience arrays	2026-01-10 13:58:06 +01:00
kempersc	57e77c8b19	chore(deps): add tsx, yaml, and @types/node for schema extraction script Dependencies for scripts/extract-types-vocab.ts: - tsx: TypeScript execution for Node.js scripts - yaml: Parse LinkML schema files - @types/node: TypeScript definitions for Node.js APIs	2026-01-10 13:33:12 +01:00
kempersc	0845d9f30e	feat(scripts): add person enrichment and slot mapping utilities Person Enrichment Scripts: - enrich_person_comprehensive.py: Full-featured web search enrichment via Linkup with Rule 6/21/26/34/35 compliance (dual timestamps, no fabrication) - enrich_ppids_linkup.py: Batch PPID enrichment pipeline - extract_persons_with_provenance.py: Extract person data from LinkedIn HTML with XPath provenance tracking LinkML Slot Management: - update_slot_mappings.py: Update slots for RiC-O naming (Rule 39) and semantic URI requirements (Rule 38) - update_class_slot_references.py: Update class files referencing renamed slots - validate_slot_mappings.py: Validate slot definitions against ontology rules All scripts follow established project conventions for provenance and ontology alignment.	2026-01-10 13:32:32 +01:00
kempersc	6f3cf95492	data(person): fix data quality issues and PPID corrections Data Quality Corrections: - TIRANA-ADISUNA: Fix erroneous death_year claim (was education end date 2016, not death). Set is_living=true. Reassess heritage_relevance=false (tourism ministry is not a GLAM institution) - ALEX-ALSEMGEEST: Rename from NL-ZH-TH (The Hague) to NL-ZH-ROT (Rotterdam) based on verified birth location. Update birth year to 1980 Profile Enrichments (5 profiles with XX-XX-XXX placeholders): - Add web claims with proper provenance timestamps - Add LinkedIn-verified education and position claims - Document correction rationale in modification_reason Heritage Relevance Reassessments: - Government ministries (Tourism, etc.) marked as non-heritage - Only GLAM institutions (Galleries, Libraries, Archives, Museums) qualify	2026-01-10 13:31:39 +01:00
kempersc	f2bc2d54cb	feat(archief-assistent): integrate ontology-driven vocabulary into semantic cache Implements Rule 46: Ontology-Driven Cache Segmentation Semantic Cache Enhancements: - Add institutionSubtype, recordSetType, wikidataEntity to ExtractedEntities - Add extractionMethod field to track vocabulary vs regex extraction - Implement async extractEntitiesWithVocabulary() using term log - Maintain sync regex fallback for cache key generation (<5ms) Build Pipeline: - Add prebuild hook to regenerate types-vocab.json from LinkML schemas - Extract vocabulary from Type.yaml and Types.yaml schema files - Generate GLAMORCUBESFIXPHDNT code mappings automatically New Script: - scripts/extract-types-vocab.ts - Extracts vocabulary from LinkML schemas - Supports --skip-embeddings flag for faster builds - Outputs to apps/archief-assistent/public/types-vocab.json This enables richer cache segmentation using ontology-derived subtypes (e.g., 'MUNICIPAL_ARCHIVE', 'ART_MUSEUM') instead of just top-level GLAMORCUBESFIXPHDNT codes.	2026-01-10 13:30:30 +01:00
kempersc	2808dad6cd	fix(linkml): correct invalid ontology property references in slot definitions - confidence_score: prov:confidence doesn't exist → hc:confidenceScore - deliverables: schema:result doesn't exist → hc:deliverables - circumstances_of_death: wikidata:P1196 is identifier, not predicate → hc:circumstancesOfDeath - deceased: schema:deathDate wrong semantics for boolean → hc:deceased - death_place: fix sdo prefix to schema, remove wd:P20 as exact mapping - date_of_death: wikidata:P570 is identifier, not predicate - martyred: correct prefix inconsistencies - given_name/literal_name: fix sdo→schema prefix - occupation/religion/status: standardize prefix declarations Add comments documenting why Wikidata properties (P-numbers) cannot be used as slot_uri (they are entity identifiers, not RDF predicates).	2026-01-10 13:29:55 +01:00

1 2 3 4 5 ...

322 commits