kempersc/glam - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
kempersc	fcb704c97e	Update generated timestamp in manifest.json All checks were successful Deploy Frontend / build-and-deploy (push) Successful in 2m18s Details DSPy RAG Evaluation / Layer 1 - Unit Tests (push) Successful in 5m16s Details DSPy RAG Evaluation / Layer 2 - DSPy Module Tests (push) Successful in 7m6s Details DSPy RAG Evaluation / Layer 3 - Integration Tests (push) Successful in 5m21s Details DSPy RAG Evaluation / Layer 4 - Comprehensive Evaluation (push) Successful in 6m28s Details DSPy RAG Evaluation / Quality Gate (push) Successful in 1s Details	2026-01-29 17:55:50 +01:00
kempersc	1516d509cf	Add metadata to LinkML class definitions and update prefixes - Added `id`, `name`, `title`, and `description` fields to multiple LinkML class YAML files. - Standardized prefixes across all class definitions. - Introduced a new script `fix_linkml_metadata.py` to automate the addition of metadata to class files. - Updated existing class files to ensure compliance with the new metadata structure.	2026-01-29 17:40:47 +01:00
kempersc	7cf10084b4	Implement scripts for schema modifications and ontology verification - Added `fix_dual_class_link.py` to remove dual class link references from specified YAML files. - Created `fix_specific_ghosts.py` to apply specific replacements in YAML files based on defined mappings. - Introduced `migrate_staff_count.py` to migrate staff count references to a new structure in specified YAML files. - Developed `migrate_type_slots.py` to replace type-related slots with new identifiers across YAML files. - Implemented `scan_ghost_references.py` to identify and report ghost references to archived slots and classes in YAML files. - Added `verify_ontology_terms.py` to verify the presence of ontology terms in specified ontology files against schema definitions.	2026-01-29 17:10:25 +01:00
kempersc	1f8776bef4	Update schemas and slots with new mappings and descriptions - Updated manifest.json with new generated timestamp. - Added close mappings to APIRequest and Administration classes. - Renamed slots in AccessPolicy to has_or_had_embargo_end_date and has_or_had_embargo_reason. - Changed class_uri for Accumulation to rico:AccumulationRelation and updated description. - Added exact mappings to Altitude, AppellationType, and ArchitecturalStyle classes. - Removed deprecated slots from CollectionManagementSystem and updated has_or_had_type. - Added new slots for has_or_had_embargo_end_date and has_or_had_embargo_reason. - Updated slot definitions for has_or_had_assessment, has_or_had_sequence_index, and others with new URIs and mappings. - Removed unused slots end_seconds and end_time. - Added new slot definitions for has_or_had_exhibition_type, has_or_had_extent_text, and is_or_was_documented_by.	2026-01-29 13:33:23 +01:00
kempersc	c60b523f29	Implement feature X to enhance user experience and fix bug Y in module Z	2026-01-29 00:12:27 +01:00
kempersc	f800e198ff	Refactor code structure for improved readability and maintainability	2026-01-28 01:11:55 +01:00
kempersc	4c3978ab2f	feat: Migrate community_significance and frame_sample_rate slots to new structures - Removed community_significance slot and migrated its functionality to has_or_had_significance, utilizing the Significance class for structured representation. - Introduced has_or_had_significance slot with detailed examples and descriptions. - Archived community_significance slot and its YAML file. - Removed frame_sample_rate slot, migrating its functionality to the analyzes_or_analyzed slot, now supporting the VideoFrame class for frame analysis. - Created VideoFrame class to encapsulate frame analysis parameters, including sample rate and total frames processed. - Updated relevant schemas and examples to reflect these changes, ensuring compliance with migration rules. - Regenerated manifest to include new structures and updated counts.	2026-01-22 15:51:02 +01:00
kempersc	24cddb82dc	enrich ppid profiles	2026-01-16 12:50:50 +01:00
kempersc	7424b85352	Add new slots for heritage custodian entities - Introduced setpoint_max, setpoint_min, setpoint_tolerance, setpoint_type, setpoint_unit, setpoint_value, temperature_target, track_id, typical_http_methods, typical_metadata_standard, typical_response_formats, typical_scope, typical_technical_feature, unit_code, unit_symbol, unit_type, wikidata_entity, wikidata_equivalent, and wikidata_id slots. - Each slot includes a unique identifier, name, title, description, and annotations for custodian types and specificity score.	2026-01-16 01:04:38 +01:00
kempersc	c2629f6d29	Fix LinkML schema validation errors (0 errors, 30 warnings) Schema Migration Fixes: - Fix YAML import indentation in ~650 slot files (linkml:types and enum imports) - Rename slot reference: has_or_had_holds_record_set_type → hold_or_held_record_set_type (70+ archive class files, main schema, manifest.json) - Fix ProvenanceBlock.yaml: remove invalid any_of range, use string with multivalued - Fix has_or_had_provenance.yaml: remove nested template_specificity from annotations Validation Status: - 0 errors (was multiple import/reference errors) - 30 warnings (missing descriptions on inline slots, intentional SCREAMING_CASE names) Files changed: ~3,850 (slots, classes, main schema, manifest)	2026-01-15 23:21:38 +01:00
kempersc	0cc8c8ca8f	Add archived slot definitions for various attributes in the HC ontology - Introduced new YAML files for slots including typical_scope, typical_technical_feature, unit_affiliation, used, used_by, user_community, verified, web_observation, whatsapp_business_likelihood, wikidata_alignment, wikidata, wikidata_entity, wikidata_equivalent, wikidata_id, wikidata_mapping, stores_or_stored, and time_of_destruction. - Each slot includes detailed descriptions, mappings, and examples to enhance the ontology's semantic structure. - Migrated and centralized the 'stores_object' slot into 'stores_or_stored' to comply with RiC-O naming conventions. - Added comprehensive documentation for temporal-aware slots to support better data integration and querying capabilities.	2026-01-15 20:44:51 +01:00
kempersc	416aa407cc	Add new slots for financial and heritage documentation - Introduced total expense, total frames analyzed, total investment, total liability, total net asset, and traditional product slots to enhance financial reporting capabilities. - Added transition types detected, treatment description, type hypothesis, typical condition, typical HTTP methods, typical response formats, and typical scope slots for improved heritage documentation. - Implemented user community, verified, web observation, WhatsApp business likelihood, wikidata equivalent, and wikidata mapping slots to enrich institutional data representation. - Established has_or_had_asset, has_or_had_budget, has_or_had_expense, and is_or_was_threatened_by slots to capture asset, budget, expense relationships, and threats to heritage forms.	2026-01-15 19:35:39 +01:00
kempersc	3fb27c15e2	Refactor and archive deprecated slots; update migration records - Removed deprecated slots: storage_security_level, version_number, video_comment, visiting_hour, was_asserted_by, was_revision_of, writing_system. - Archived corresponding YAML files for deprecated slots with detailed migration notes. - Updated slot definitions for has_collection and encompassing_body to reflect new naming conventions and temporal patterns. - Enhanced metadata extraction in index_persons_qdrant.py to include WCMS registration and data sources. - Modified hybrid_retriever and multi_embedding_retriever to support filtering by WCMS registration status.	2026-01-15 13:16:59 +01:00
kempersc	043ea868b5	fix(schema): Resolve broken imports after slot migration All checks were successful Deploy Frontend / build-and-deploy (push) Successful in 4m31s Details - Fix empty import list elements (- # comment pattern) in Laptop, Expenses, FunctionType, Overview, WebLink, Photography classes - Replace valid_from/valid_to slots with temporal_extent in class slots lists - Update slot_usage to use temporal_extent with TimeSpan range - Update examples to use temporal_extent with begin_of_the_begin/end_of_the_end - Fix typo is_or_was_is_or_was_archived_at → is_or_was_archived_at in WebObservation - Add TimeSpan imports to classes using temporal_extent - Fix relative import paths for Timestamp in temporal slots - Fix CustodianIdentifier → Identifier imports in FundingAgenda, ReadingRoomAnnex Schema validates successfully with 902 classes and 2043 slots.	2026-01-15 12:25:27 +01:00
kempersc	6c3fa6b5a3	Remove deprecated slots and add new slot definitions for enhanced data modeling - Deleted obsolete slot definitions for work_location and workshop_space. - Introduced new TaxonName class to represent scientific taxonomic names with detailed attributes. - Archived existing slots related to surname_prefix, target_name, taxon_name, terminal_count, text_region_count, title, title_proper, total_chapter, total_characters_extracted, total_connections_extracted, track_name, transcript_format, traveling_venue, type_label, type_status, typical_responsibility, unesco_domain, unesco_inscription_year, unesco_list_status, uniform_title, unit_name, used_by_custodian, uv_filtered_required, valid_from_geo, valid_to_geo, validation_status, variant_of_name, verification_date, viability_status, within_auxiliary_place, and within_place. - Updated slot descriptions and structures to improve clarity and compliance with standards.	2026-01-15 11:42:35 +01:00
kempersc	b30711fcfb	update slots	2026-01-14 09:05:54 +01:00
kempersc	9a395f3dbe	fix: improve birth year extraction to avoid date suffix false positives - Skip YYYYMMDD and YYMMDD date patterns at end of email - Skip digit sequences longer than 4 characters - Require non-digit before 4-digit years at end - Add knid.nl/kabelnoord.nl to consumer domains (Friesland ISP) - Add 11 missing regional archive domains to HERITAGE_DOMAIN_MAP - Update recalculation script to re-extract email semantics Results: - 3,151 false birth years removed - 'Likely wrong person' reduced from 533 to 325 (-39%) - 2,944 candidates' scores boosted	2026-01-13 22:37:10 +01:00
kempersc	92b490d690	edit slots	2026-01-13 20:35:11 +01:00
kempersc	f74513e8ef	feat: Enhance entity resolution with email semantics and review merging - Updated `entity_review.py` to map email semantic fields from JSON. - Expanded `email_semantics.py` with additional museum mappings. - Introduced a new rule in `.opencode/rules/no-duplicate-ontology-mappings.md` to prevent duplicate ontology mappings. - Added a backup JSON file for entity resolution candidates. - Created `enrich_email_semantics.py` to enrich candidates with email semantic signals. - Developed `merge_entity_reviews.py` to merge reviewed decisions from a backup into new candidates.	2026-01-13 16:43:56 +01:00
kempersc	1fb924c412	feat: add ontology mappings to LinkML schema and enhance entity resolution Schema enhancements (443 files): - Add class_uri with proper ontology references (schema:, prov:, skos:, rico:) - Add close_mappings, related_mappings per Rule 50 convention - Replace stub hc: slot_uri with standard predicates (dcterms:identifier, skos:prefLabel) - Improve descriptions with ontology mapping rationale - Add prefixes blocks to all schema modules Entity Resolution improvements: - Add entity_resolution module with email semantics parsing - Enhance build_entity_resolution.py with email-based matching signals - Extend Entity Review API with filtering by signal types and count - Add candidates caching and indexing for performance - Add ReviewLoginPage component New rules and documentation: - Add Rule 51: No Hallucinated Ontology References - Add .opencode/rules/no-hallucinated-ontology-references.md - Add .opencode/rules/slot-ontology-mapping-reference.md - Add adms.ttl and dqv.ttl ontology files Frontend ontology support: - Add RiC-O_1-1.rdf and schemaorg.owl to public/ontology	2026-01-13 13:51:02 +01:00
kempersc	846a6cdcec	Add new Record Set Types for various archival collections - Introduced SoundArchiveRecordSetType, SpecialCollectionRecordSetType, SpecializedArchiveRecordSetType, SpecializedArchivesCzechiaRecordSetType, StateArchivesRecordSetType, StateArchivesSectionRecordSetType, StateDistrictArchiveRecordSetType, StateRegionalArchiveCzechiaRecordSetType, TelevisionArchiveRecordSetType, TradeUnionArchiveRecordSetType, UniversityArchiveRecordSetType, VereinsarchivRecordSetType, VerlagsarchivRecordSetType, VerwaltungsarchivRecordSetType, WebArchiveRecordSetType, and WomensArchivesRecordSetType. - Each new type includes appropriate metadata, slots, and relationships to existing classes. - Implemented a script to detect and fix Type class violations in LinkML files.	2026-01-12 15:20:29 +01:00
kempersc	355d8be51d	centralise slots	2026-01-12 14:33:56 +01:00
kempersc	070c87af7b	refactor(migrate_wcms_resume): use recursive glob to find user JSON files and skip macOS hidden files	2026-01-11 23:32:27 +01:00
kempersc	56c373bba8	Implement fast WCMS migration script with state file checkpointing and batch processing	2026-01-11 22:26:37 +01:00
kempersc	174a420c08	refactor(schema): centralize 1515 inline slot definitions per Rule 48 All checks were successful Deploy Frontend / build-and-deploy (push) Successful in 3m57s Details - Remove inline slot definitions from 144 class files - Create 7 new centralized slot files in modules/slots/: - custodian_type_broader.yaml - custodian_type_narrower.yaml - custodian_type_related.yaml - definition.yaml - finding_aid_access_restriction.yaml - finding_aid_description.yaml - finding_aid_temporal_coverage.yaml - Add centralize_inline_slots.py automation script - Update manifest with new timestamp Rule 48: Class files must NOT define inline slots - all slots must be imported from modules/slots/ directory. Note: Pre-existing IdentifierFormat duplicate class definition (in Standard.yaml and IdentifierFormat.yaml) not addressed in this commit - requires separate schema refactor.	2026-01-11 22:02:14 +01:00
kempersc	fce186b649	enrich person profiles	2026-01-11 18:08:40 +01:00
kempersc	fd792fce2c	Refactor code structure for improved readability and maintainability Some checks failed Deploy Frontend / build-and-deploy (push) Has been cancelled Details	2026-01-11 15:27:14 +01:00
kempersc	55ef2a831d	feat(data): add Belgian surnames dataset with metadata and surname counts	2026-01-11 13:50:20 +01:00
kempersc	7d09e4179c	Add US surnames dataset from 2010 Census with metadata and surname counts	2026-01-11 12:28:58 +01:00
kempersc	dfb4744dc7	Evaluate data enrichments of persons	2026-01-11 12:15:27 +01:00
kempersc	556cc6c294	Add workspace configuration for Git and Gitea integration - Set up GitHub integration to be disabled. - Configure Git settings including path and autofetch options. - Add Gitea instance URL and repository details. - Enable YAML support for LinkML schemas with validation. - Define file associations for YAML files. - Recommend essential extensions for development and exclude unwanted ones.	2026-01-11 02:50:39 +01:00
kempersc	b3e57e709c	Refactor code structure for improved readability and maintainability	2026-01-11 02:24:34 +01:00
kempersc	0df26a6e44	data(person): additional person profile enrichments	2026-01-11 00:41:59 +01:00
kempersc	3eb097d92e	data(person): enrich 64 person profiles with comprehensive metadata - Add inferred birth dates using EDTF notation - Add inferred birth/current settlements - Enrich employment history with temporal data - Add heritage sector relevance scores - Improve PPID component tracking - Update .gitignore with large file patterns (warc, nt, trix, geonames.db)	2026-01-11 00:38:09 +01:00
kempersc	28c3aaf33f	enrich profiles	2026-01-10 17:31:02 +01:00
kempersc	ad74d8379e	feat(scripts): improve types-vocab extraction to derive all vocabulary from schema - Remove hardcoded type mappings, derive dynamically from LinkML - Extract keywords from annotations, structured_aliases, and comments - Add rename_plural_slot.py utility for schema slot renaming	2026-01-10 15:37:52 +01:00
kempersc	e5a08a353d	enrich person profiles	2026-01-10 14:14:04 +01:00
kempersc	3a15f2bdaa	feat(scripts): add entity-to-PPID processing script - Processes 94,716 LinkedIn entity files from data/custodian/person/entity/ - Identifies heritage-relevant profiles (47% of total) - Generates PPID-formatted filenames with inferred locations/dates - Merges with existing profiles, preserving all provenance data - Applies Rules 12, 20, 27, 44, 45 for person data architecture - Fixed edge case: handle null education/experience arrays	2026-01-10 13:58:06 +01:00
kempersc	0845d9f30e	feat(scripts): add person enrichment and slot mapping utilities Person Enrichment Scripts: - enrich_person_comprehensive.py: Full-featured web search enrichment via Linkup with Rule 6/21/26/34/35 compliance (dual timestamps, no fabrication) - enrich_ppids_linkup.py: Batch PPID enrichment pipeline - extract_persons_with_provenance.py: Extract person data from LinkedIn HTML with XPath provenance tracking LinkML Slot Management: - update_slot_mappings.py: Update slots for RiC-O naming (Rule 39) and semantic URI requirements (Rule 38) - update_class_slot_references.py: Update class files referencing renamed slots - validate_slot_mappings.py: Validate slot definitions against ontology rules All scripts follow established project conventions for provenance and ontology alignment.	2026-01-10 13:32:32 +01:00
kempersc	f2bc2d54cb	feat(archief-assistent): integrate ontology-driven vocabulary into semantic cache Implements Rule 46: Ontology-Driven Cache Segmentation Semantic Cache Enhancements: - Add institutionSubtype, recordSetType, wikidataEntity to ExtractedEntities - Add extractionMethod field to track vocabulary vs regex extraction - Implement async extractEntitiesWithVocabulary() using term log - Maintain sync regex fallback for cache key generation (<5ms) Build Pipeline: - Add prebuild hook to regenerate types-vocab.json from LinkML schemas - Extract vocabulary from Type.yaml and Types.yaml schema files - Generate GLAMORCUBESFIXPHDNT code mappings automatically New Script: - scripts/extract-types-vocab.ts - Extracts vocabulary from LinkML schemas - Supports --skip-embeddings flag for faster builds - Outputs to apps/archief-assistent/public/types-vocab.json This enables richer cache segmentation using ontology-derived subtypes (e.g., 'MUNICIPAL_ARCHIVE', 'ART_MUSEUM') instead of just top-level GLAMORCUBESFIXPHDNT codes.	2026-01-10 13:30:30 +01:00
kempersc	dd0ee2cf11	feat(scripts): expand university location mappings and add web enrichment - enrich_ppids.py: Add 40+ Dutch universities and hogescholen to location mapping - enrich_ppids_web.py: New script for web-based PPID enrichment - resolve_pending_known_orgs.py: Updates for pending org resolution	2026-01-09 21:10:14 +01:00
kempersc	9e67d0f967	enrich profiles	2026-01-09 20:35:19 +01:00
kempersc	eaf80ec756	data(custodian): merge PENDING collision files into existing custodians Merge staff data from 7 PENDING files into their matching custodian records: - NL-XX-XXX-PENDING-SPOT-GRONINGEN → NL-GR-GRO-M-SG (SPOT Groningen, 120 staff) - NL-XX-XXX-PENDING-DIENST-UITVOERING-ONDERWIJS → NL-GR-GRO-O-DUO - NL-XX-XXX-PENDING-ANNE-FRANK-STICHTING → NL-NH-AMS-M-AFS - NL-XX-XXX-PENDING-ALLARD-PIERSON → NL-NH-AMS-M-AP - NL-XX-XXX-PENDING-STICHTING-JOODS-HISTORISCH-MUSEUM → NL-NH-AMS-M-JHM - NL-XX-XXX-PENDING-MINISTERIE-VAN-BUITENLANDSE-ZAKEN → NL-ZH-DHA-O-MBZ - NL-XX-XXX-PENDING-MINISTERIE-VAN-JUSTITIE-EN-VEILIGHEID → NL-ZH-DHA-O-MJV Originals archived in data/custodian/archive/pending_collisions_20250109/ Add scripts/merge_collision_files.py for reproducible merging	2026-01-09 18:33:00 +01:00
kempersc	04791a7a91	fix(ppid): fix unidecode import reference typo	2026-01-09 18:29:36 +01:00
kempersc	abe30cb302	feat(ppid): add unidecode support for non-Latin script transliteration Add optional unidecode dependency to handle Hebrew, Arabic, Chinese, and other non-Latin scripts when generating Person Persistent IDs.	2026-01-09 18:28:41 +01:00
kempersc	932ec5438c	add person profiles with PPID	2026-01-09 18:26:58 +01:00
kempersc	7ec4e05dd4	feat(merge): add script to merge PENDING files by matching emic names with existing files	2026-01-09 16:42:55 +01:00
kempersc	1f723fd5d7	feat(data): merge staff data from 35 PENDING files into enriched custodians Merged LinkedIn-extracted staff sections from PENDING files into their corresponding proper GHCID custodian files. This consolidates data from two extraction sources: - Existing enriched files: Google Maps, Museum Register, YouTube, etc. - PENDING files: LinkedIn staff data extraction Files modified: - 28 custodian files enriched with staff data - 35 PENDING files deleted (merged into proper locations) - Originals archived to archive/pending_duplicates_20250109/ Key institutions enriched: - Rijksmuseum (NL-NH-AMS-M-RM) - Stedelijk Museum Amsterdam (NL-NH-AMS-M-SMA) - Amsterdam Museum (NL-NH-AMS-M-AM) - Regionaal Archief Alkmaar (NL-NH-ALK-A-RAA) - Maritiem Museum Rotterdam (NL-ZH-ROT-M-MMR) - And 23 more museums/archives across NL New scripts: - scripts/merge_staff_data.py: Automated staff data merger - scripts/categorize_pending_files.py: PENDING file analysis utility	2026-01-09 14:51:17 +01:00
kempersc	e313744cf6	feat(scripts): add resolve_pending_locations.py for GHCID resolution Script to resolve NL-XX-XXX-PENDING files that have city names in filename: - Looks up city in GeoNames database - Updates YAML with location data (city, region, country) - Generates proper GHCID with UUID v5/v8 - Renames files to match new GHCID - Archives original PENDING files for reference	2026-01-09 12:18:46 +01:00
kempersc	933deb337c	refactor(scripts): generalize GHCID location fixer for all institution types - Add --type/-t flag to specify institution type (A, G, H, I, L, M, N, O, R, S, T, U, X, ALL) - Default still Type I (Intangible Heritage) for backward compatibility - Skip PENDING files that have no location data - Update help text with all supported types	2026-01-09 11:54:28 +01:00

1 2 3 4

151 commits