kempersc/glam - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
kempersc	54b26343c9	Add initial version of QUDT ontology file	2026-01-17 00:08:39 +01:00
kempersc	24cddb82dc	enrich ppid profiles	2026-01-16 12:50:50 +01:00
kempersc	7424b85352	Add new slots for heritage custodian entities - Introduced setpoint_max, setpoint_min, setpoint_tolerance, setpoint_type, setpoint_unit, setpoint_value, temperature_target, track_id, typical_http_methods, typical_metadata_standard, typical_response_formats, typical_scope, typical_technical_feature, unit_code, unit_symbol, unit_type, wikidata_entity, wikidata_equivalent, and wikidata_id slots. - Each slot includes a unique identifier, name, title, description, and annotations for custodian types and specificity score.	2026-01-16 01:04:38 +01:00
kempersc	f9f3cc8e74	fix: resolve YAML import indentation and add missing slot descriptions Schema Improvements: - Fix YAML import indentation across 800+ class files (sed: '^- ../' → ' - ../') - Add descriptions to 26 inline slots missing them (lint warnings) - Fix malformed imports in BirthPlace.yaml and CustodianObservation.yaml Validation Results: - linkml-lint: 4 warnings (intentional SCREAMING_CASE tier names) - gen-owl: SUCCESS (164,069 lines generated) - gen-json-schema: SUCCESS (9.4MB generated) Files affected: 1,034 files, +23,908 -15,200 lines	2026-01-16 00:09:28 +01:00
kempersc	043ea868b5	fix(schema): Resolve broken imports after slot migration All checks were successful Deploy Frontend / build-and-deploy (push) Successful in 4m31s Details - Fix empty import list elements (- # comment pattern) in Laptop, Expenses, FunctionType, Overview, WebLink, Photography classes - Replace valid_from/valid_to slots with temporal_extent in class slots lists - Update slot_usage to use temporal_extent with TimeSpan range - Update examples to use temporal_extent with begin_of_the_begin/end_of_the_end - Fix typo is_or_was_is_or_was_archived_at → is_or_was_archived_at in WebObservation - Add TimeSpan imports to classes using temporal_extent - Fix relative import paths for Timestamp in temporal slots - Fix CustodianIdentifier → Identifier imports in FundingAgenda, ReadingRoomAnnex Schema validates successfully with 902 classes and 2043 slots.	2026-01-15 12:25:27 +01:00
kempersc	b13674400f	Refactor schema slots and classes for improved organization and clarity - Removed deprecated slots: appraisal_notes, branch_id, is_or_was_real. - Introduced new slots: has_or_had_notes, has_or_had_provenance. - Created Notes class to encapsulate note-related metadata. - Archived removed slots and classes in accordance with the new archive folder convention. - Updated slot_fixes.yaml to reflect migration status and details. - Enhanced documentation for new slots and classes, ensuring compliance with ontology alignment. - Added new slots for note content, date, and type to support the Notes class.	2026-01-14 12:14:07 +01:00
kempersc	b30711fcfb	update slots	2026-01-14 09:05:54 +01:00
kempersc	d51bba5003	data: update entity resolution confidence scores Regenerated confidence scores with updated scoring algorithm: - Total candidates: 78,746 - Adjusted: 2,832 (was 3,869) - Boosted: 2,499 (was 3,192) - Penalized: 333 (was 677) - Likely wrong person: 533 - Reviews preserved: 57 Confidence scoring version: 2.0	2026-01-13 21:54:18 +01:00
kempersc	92b490d690	edit slots	2026-01-13 20:35:11 +01:00
kempersc	f74513e8ef	feat: Enhance entity resolution with email semantics and review merging - Updated `entity_review.py` to map email semantic fields from JSON. - Expanded `email_semantics.py` with additional museum mappings. - Introduced a new rule in `.opencode/rules/no-duplicate-ontology-mappings.md` to prevent duplicate ontology mappings. - Added a backup JSON file for entity resolution candidates. - Created `enrich_email_semantics.py` to enrich candidates with email semantic signals. - Developed `merge_entity_reviews.py` to merge reviewed decisions from a backup into new candidates.	2026-01-13 16:43:56 +01:00
kempersc	1fb924c412	feat: add ontology mappings to LinkML schema and enhance entity resolution Schema enhancements (443 files): - Add class_uri with proper ontology references (schema:, prov:, skos:, rico:) - Add close_mappings, related_mappings per Rule 50 convention - Replace stub hc: slot_uri with standard predicates (dcterms:identifier, skos:prefLabel) - Improve descriptions with ontology mapping rationale - Add prefixes blocks to all schema modules Entity Resolution improvements: - Add entity_resolution module with email semantics parsing - Enhance build_entity_resolution.py with email-based matching signals - Extend Entity Review API with filtering by signal types and count - Add candidates caching and indexing for performance - Add ReviewLoginPage component New rules and documentation: - Add Rule 51: No Hallucinated Ontology References - Add .opencode/rules/no-hallucinated-ontology-references.md - Add .opencode/rules/slot-ontology-mapping-reference.md - Add adms.ttl and dqv.ttl ontology files Frontend ontology support: - Add RiC-O_1-1.rdf and schemaorg.owl to public/ontology	2026-01-13 13:51:02 +01:00
kempersc	846a6cdcec	Add new Record Set Types for various archival collections - Introduced SoundArchiveRecordSetType, SpecialCollectionRecordSetType, SpecializedArchiveRecordSetType, SpecializedArchivesCzechiaRecordSetType, StateArchivesRecordSetType, StateArchivesSectionRecordSetType, StateDistrictArchiveRecordSetType, StateRegionalArchiveCzechiaRecordSetType, TelevisionArchiveRecordSetType, TradeUnionArchiveRecordSetType, UniversityArchiveRecordSetType, VereinsarchivRecordSetType, VerlagsarchivRecordSetType, VerwaltungsarchivRecordSetType, WebArchiveRecordSetType, and WomensArchivesRecordSetType. - Each new type includes appropriate metadata, slots, and relationships to existing classes. - Implemented a script to detect and fix Type class violations in LinkML files.	2026-01-12 15:20:29 +01:00
kempersc	355d8be51d	centralise slots	2026-01-12 14:33:56 +01:00
kempersc	070c87af7b	refactor(migrate_wcms_resume): use recursive glob to find user JSON files and skip macOS hidden files	2026-01-11 23:32:27 +01:00
kempersc	56c373bba8	Implement fast WCMS migration script with state file checkpointing and batch processing	2026-01-11 22:26:37 +01:00
kempersc	fce186b649	enrich person profiles	2026-01-11 18:08:40 +01:00
kempersc	fd792fce2c	Refactor code structure for improved readability and maintainability Some checks failed Deploy Frontend / build-and-deploy (push) Has been cancelled Details	2026-01-11 15:27:14 +01:00
kempersc	55ef2a831d	feat(data): add Belgian surnames dataset with metadata and surname counts	2026-01-11 13:50:20 +01:00
kempersc	7d09e4179c	Add US surnames dataset from 2010 Census with metadata and surname counts	2026-01-11 12:28:58 +01:00
kempersc	dfb4744dc7	Evaluate data enrichments of persons	2026-01-11 12:15:27 +01:00
kempersc	49a8c341b5	chore(data): update geonames database journal file	2026-01-11 02:51:52 +01:00
kempersc	170fd73c49	feat(agents): update critical rules section to include entity resolution guidelines	2026-01-11 02:51:18 +01:00
kempersc	556cc6c294	Add workspace configuration for Git and Gitea integration - Set up GitHub integration to be disabled. - Configure Git settings including path and autofetch options. - Add Gitea instance URL and repository details. - Enable YAML support for LinkML schemas with validation. - Define file associations for YAML files. - Recommend essential extensions for development and exclude unwanted ones.	2026-01-11 02:50:39 +01:00
kempersc	b3e57e709c	Refactor code structure for improved readability and maintainability	2026-01-11 02:24:34 +01:00
kempersc	0df26a6e44	data(person): additional person profile enrichments	2026-01-11 00:41:59 +01:00
kempersc	3eb097d92e	data(person): enrich 64 person profiles with comprehensive metadata - Add inferred birth dates using EDTF notation - Add inferred birth/current settlements - Enrich employment history with temporal data - Add heritage sector relevance scores - Improve PPID component tracking - Update .gitignore with large file patterns (warc, nt, trix, geonames.db)	2026-01-11 00:38:09 +01:00
kempersc	ac36b80476	feat(rag): add companion queries for count templates Add companion_query support to fetch full entity records alongside aggregate count queries. Enables displaying results on map/list when asking 'how many museums in Amsterdam?' Backend changes: - Add companion_query, companion_query_region, companion_query_country fields to TemplateDefinition and TemplateMatchResult - Add render_template_string() for raw companion query rendering Template changes: - Add companion queries to count_institutions_by_type_and_location for settlement, region, and country level queries - Returns institution URI, name, coordinates, city for visualization	2026-01-10 18:44:06 +01:00
kempersc	f8b4ecad7d	data(person): enrich 7 person profiles with detailed employment history Update heritage professional profiles with: - Separate role entries for different positions at same institution - Employment date ranges (start_date, end_date) - Updated observed_on timestamps - Direct LinkedIn profile URLs as source Profiles updated: - Antoinet Nijssen (Noord-Hollands Archief) - Anna Lakmaker - Annelies Reus - Marianne Hamersma - Marcel Auwers - Hans Felius - Nico Vriend	2026-01-10 18:43:27 +01:00
kempersc	28c3aaf33f	enrich profiles	2026-01-10 17:31:02 +01:00
kempersc	bd257c52f4	data(person): update 2 additional profiles	2026-01-10 15:39:12 +01:00
kempersc	2f33e6a230	data(person): update DR-STAPEL profile	2026-01-10 15:38:37 +01:00
kempersc	ec18e1810d	data(person): enrich 7 profiles with detailed affiliations and GHCIDs - Add GHCID references to custodian affiliations - Add start dates for employment periods - Expand heritage type classifications (A→[A,F]) - Add detailed rationales based on career history - Add full_initials from archival publications	2026-01-10 15:36:49 +01:00
kempersc	e5a08a353d	enrich person profiles	2026-01-10 14:14:04 +01:00
kempersc	9339de2cfb	data(person): process 44,512 heritage-relevant profiles from entity extractions Processing Summary: - Scanned 94,716 LinkedIn entity files - Identified 44,512 heritage-relevant individuals (47%) - Created 1,430 new PPID-formatted profiles - Updated 43,070 existing profiles with entity data - Final count: 40,731 person profiles Profile updates include: - Merged web_claims with full provenance - Added/updated heritage_relevance scoring - Added affiliation data with custodian references - Added inferred birth decades with provenance chains (Rule 45) All data preserved per Rule 5 (additive only)	2026-01-10 14:01:29 +01:00
kempersc	6f3cf95492	data(person): fix data quality issues and PPID corrections Data Quality Corrections: - TIRANA-ADISUNA: Fix erroneous death_year claim (was education end date 2016, not death). Set is_living=true. Reassess heritage_relevance=false (tourism ministry is not a GLAM institution) - ALEX-ALSEMGEEST: Rename from NL-ZH-TH (The Hague) to NL-ZH-ROT (Rotterdam) based on verified birth location. Update birth year to 1980 Profile Enrichments (5 profiles with XX-XX-XXX placeholders): - Add web claims with proper provenance timestamps - Add LinkedIn-verified education and position claims - Document correction rationale in modification_reason Heritage Relevance Reassessments: - Government ministries (Tourism, etc.) marked as non-heritage - Only GLAM institutions (Galleries, Libraries, Archives, Museums) qualify	2026-01-10 13:31:39 +01:00
kempersc	49f4054802	data(person/entity): add 83,845 LinkedIn profile extractions from company pages Bulk extraction of heritage professional profiles from LinkedIn company pages using extract_persons_with_provenance.py script. Key characteristics: - Source: LinkedIn company 'People' pages for heritage institutions - File format: {linkedin-slug}_{timestamp}.json - Total size: ~3.6GB - Includes: profile_data, heritage_relevance, affiliations, web_claims - Provenance: Full XPath + archived HTML references (Rule 6 compliant) - Dual timestamps: statement_created_at + source_archived_at (Rule 35) Extraction metadata includes: - extraction_agent: extract_persons_with_provenance.py - source_file: Original archived HTML filename - source_archived_at: When LinkedIn page was captured - schema_version: 1.0.0 Note: URL-encoded filenames preserve international characters (Arabic, Hebrew, Chinese, Turkish, accented Latin, etc.)	2026-01-10 13:27:08 +01:00
kempersc	30cd8842d9	data(person): update profiles with web claims and PPID corrections - Rename SENNAY-GHEBREAB profile: NL-ZH-ROT → ET-XX-ADD (Ethiopian birth) - Enrich profiles with inferred birth decades and settlements - Add web claims provenance for enriched data - Update 16 profiles with improved location resolution Files: +1 new (renamed), 16 modified, 1 deleted	2026-01-10 12:56:28 +01:00
kempersc	5eaab2bd30	data(person): enrich heritage professional profiles with web claims Batch enrichment of 3,728 person profiles with additional data: - Birth decade inference from education/career history - Location resolution for inferred birth settlements - Web claims with full provenance (source_url, retrieved_on) - Organizational subdivision extraction - Heritage relevance scoring Also includes: - 14 profile renames for PPID format corrections - Updated _manifest.json with extraction statistics - New _extraction_log.txt and _extraction_summary.json Enrichment follows AGENTS.md rules: - Rule 44: EDTF unknown date notation (XXXX, 196X, etc.) - Rule 45: Inferred data with explicit provenance - Rule 30: Confidence scoring (0.50-0.95) - Rule 31: Organizational subdivision extraction 35,052 files changed, +4,507,411 insertions, -63,118 deletions	2026-01-10 10:35:20 +01:00
kempersc	519b0b47a8	Add Playwright test results JSON file with initial test suite and failure details	2026-01-09 21:33:31 +01:00
kempersc	004d342935	chore: minor updates and evaluation results - auth.setup.ts: require env vars for test credentials (no hardcoded defaults) - manifest.json: update schema manifest - full_evaluation_results.json: add RAG evaluation results - petra-links.json: update birth date from web claim	2026-01-09 21:10:55 +01:00
kempersc	855fff5962	data(person): resolve PPID locations and enrich profiles - Rename 512 person files from XX-XX-XXX placeholders to proper GeoNames locations - Update 2,463 profiles with enriched data - Add 512 new person profiles (AU, international heritage professionals) - PPID format: ID_{birth-loc}_{decade}_{work-loc}_{custodian}_{NAME}	2026-01-09 21:09:28 +01:00
kempersc	eb122e2532	data(custodian): remove 380 PENDING files after collision merge PENDING files were merged into existing custodian records in commit `eaf80ec`. These temporary collision placeholder files are no longer needed.	2026-01-09 21:06:22 +01:00
kempersc	9e67d0f967	enrich profiles	2026-01-09 20:35:19 +01:00
kempersc	eaf80ec756	data(custodian): merge PENDING collision files into existing custodians Merge staff data from 7 PENDING files into their matching custodian records: - NL-XX-XXX-PENDING-SPOT-GRONINGEN → NL-GR-GRO-M-SG (SPOT Groningen, 120 staff) - NL-XX-XXX-PENDING-DIENST-UITVOERING-ONDERWIJS → NL-GR-GRO-O-DUO - NL-XX-XXX-PENDING-ANNE-FRANK-STICHTING → NL-NH-AMS-M-AFS - NL-XX-XXX-PENDING-ALLARD-PIERSON → NL-NH-AMS-M-AP - NL-XX-XXX-PENDING-STICHTING-JOODS-HISTORISCH-MUSEUM → NL-NH-AMS-M-JHM - NL-XX-XXX-PENDING-MINISTERIE-VAN-BUITENLANDSE-ZAKEN → NL-ZH-DHA-O-MBZ - NL-XX-XXX-PENDING-MINISTERIE-VAN-JUSTITIE-EN-VEILIGHEID → NL-ZH-DHA-O-MJV Originals archived in data/custodian/archive/pending_collisions_20250109/ Add scripts/merge_collision_files.py for reproducible merging	2026-01-09 18:33:00 +01:00
kempersc	e9c9aefc37	data(person): regenerate PPIDs with unidecode support for non-Latin scripts - Add display_name and name_romanized fields to all 7948 person profiles - Resolve UNKNOWN-UNKNOWN collision group (Hebrew/Arabic names now properly romanize) - Hebrew names like אבישי דנינו now generate PPID AVISHI-DANINO instead of UNKNOWN-UNKNOWN - Collision count reduced from 82 to 81 groups Regenerated using generate_ppids.py with unidecode support (commit `abe30cb`)	2026-01-09 18:31:53 +01:00
kempersc	c45367c60f	data(custodian): resolve more PENDING files with proper GHCIDs Additional batch of PENDING file resolutions: - DK: Aalborg Teater - FR: Airborne Museum, ALCA Nouvelle-Aquitaine - NL: 12 institutions (CODA Apeldoorn, Airborne Museum Arnhem, etc.) - SA: Saudi Arabia Ministry of Culture Files renamed from NL-XX-XXX-PENDING-* to proper country/region codes.	2026-01-09 18:29:09 +01:00
kempersc	932ec5438c	add person profiles with PPID	2026-01-09 18:26:58 +01:00
kempersc	bd06e4f864	data(custodian): merge 135 PENDING files into existing enriched records Merge data from PENDING files (with XX-XXX placeholders) into their corresponding enriched custodian records with proper GHCIDs. Countries affected: - DE: 4 institutions (Deutsche Stiftung, Jewish Museum Berlin, etc.) - ES: 1 institution (Biblioteca Nacional de España) - FR: 1 institution (NMO) - ID: 18 Indonesian museums and archives - NL: 111 Dutch institutions across all provinces - US: 1 institution (ARCA) The PENDING files are deleted after merge; originals archived in data/custodian/archive/pending_merged_20250109/	2026-01-09 18:25:56 +01:00
kempersc	a51c8c400c	data(pending): add 125 international PENDING custodian files with proper country codes Identified 125 institutions from LinkedIn staff extraction that are NOT Dutch: - FR: 45 (French museums, archives, libraries) - ID: 14 (Indonesian institutions) - GB: 14 (British institutions) - DE: 13 (German museums, foundations) - BE: 11 (Belgian museums) - IT: 6 (Italian institutions) - AU: 6 (Australian archives, museums) - Plus smaller counts from IN, US, ES, CH, DK, AT, SA, NO, IL These files have staff data from LinkedIn company pages but need GHCID resolution (currently XX-XXX placeholders for region/city). Dutch PENDING files remain: 1,283	2026-01-09 15:55:31 +01:00
kempersc	14be18e7c4	feat(data): merge staff data from 30 more PENDING files into enriched custodians Batch 2 of PENDING file resolution: - Merged LinkedIn staff data from 30 PENDING files into matching enriched custodians - Archived processed PENDING files to data/custodian/archive/pending_merged_20250109/ - Notable merges: ASML (994 staff), BBB (117), Apenheul (100), BOEI (93) Files merged include: - Corporate: ASML, BOS Foundation, Constructing the Limes - Museums: Allard Pierson, Apenheul, various regional museums - Research: Catholic Documentation Centre, Creating Cultures of Care - Cultural orgs: Cultuur Ondernemen, CultuurOost, CultuurKwadraat This continues the effort to consolidate PENDING files (1283 remaining).	2026-01-09 15:42:32 +01:00

1 2 3 4 5

205 commits