kempersc/glam - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
kempersc	242bc8bb35	Add new slots for heritage custodian entities - Created deliverables_slot for expected or achieved deliverable outputs. - Introduced event_id_slot for persistent unique event identifiers. - Added follow_up_date_slot for scheduled follow-up action dates. - Implemented object_ref_slot for references to heritage objects. - Established price_slot for price information across entities. - Added price_currency_slot for currency codes in price information. - Created protocol_slot for API protocol specifications. - Introduced provenance_text_slot for full provenance entry text. - Added record_type_slot for classification of record types. - Implemented response_formats_slot for supported API response formats. - Established status_slot for current status of entities or activities. - Added FactualCountDisplay component for displaying count query results. - Introduced ReplyTypeIndicator component for visualizing reply types. - Created approval_date_slot for formal approval dates. - Added authentication_required_slot for API authentication status. - Implemented capacity_items_slot for maximum storage capacity. - Established conservation_lab_slot for conservation laboratory information. - Added cost_usd_slot for API operation costs in USD.	2026-01-05 00:49:05 +01:00
kempersc	2dca28d8c1	enrich CH entries with mission statements	2026-01-04 13:12:32 +01:00
kempersc	4f0cafe98a	enrich HC profiles	2026-01-02 02:11:04 +01:00
kempersc	349f31ae6f	enrich custodian profiles	2026-01-02 02:10:18 +01:00
kempersc	aee76fcc7f	backup html content	2025-12-31 02:36:38 +01:00
kempersc	b7701c8a8e	backup person profiles	2025-12-31 00:04:09 +01:00
kempersc	7108cb1483	backup person profiles	2025-12-31 00:00:25 +01:00
kempersc	38dcd2ce9c	Restore YAML files for Museum Dokkum and Gemeente Smallingerland with enriched data and provenance tracking	2025-12-30 23:58:21 +01:00
kempersc	1d8fd68e3a	backup custodian web profiles	2025-12-30 23:53:16 +01:00
kempersc	f6a5962c3b	backup person profiles	2025-12-30 23:48:50 +01:00
kempersc	cbf88d2a6d	backup person profiles	2025-12-30 23:44:57 +01:00
kempersc	30b701a5ec	backup HC data	2025-12-30 23:41:15 +01:00
kempersc	c417d0c758	Refactor code structure for improved readability and maintainability	2025-12-30 23:38:18 +01:00
kempersc	fb0daab718	backup JP profiles	2025-12-30 23:24:30 +01:00
kempersc	b42d6bf5d2	backup CZ and JP	2025-12-30 23:19:38 +01:00
kempersc	45e873ec0a	enrich JP BE AR profiles	2025-12-30 23:07:03 +01:00
kempersc	bc6ad46bfa	enrich CZ and JP profiles	2025-12-30 23:03:03 +01:00
kempersc	90b402dba6	enrich AR en Czech files	2025-12-30 23:01:01 +01:00
kempersc	cefc847056	Remove custodian entry for Leica AG from YAML file	2025-12-30 03:44:25 +01:00
kempersc	9159ff35db	Add custodian entry for Leica AG with data contamination fixes and location corrections	2025-12-30 03:43:47 +01:00
kempersc	d64f857aa9	add sparql validator and RAG injector	2025-12-30 03:43:31 +01:00
kempersc	84904e344b	Make AGENTS more succint by referring to opencode rules & enrich custodians	2025-12-28 14:56:35 +01:00
kempersc	4cf3fe8a07	Logo enrichment batch: JP+170 (5,166/12,096 = 42.7%) - 14,503 total (45.6%)	2025-12-27 13:17:40 +01:00
kempersc	3447a9cc6c	Logo enrichment batch: JP+440 (4,996/12,096 = 41.3%) - 14,333 total (45.1%)	2025-12-27 12:20:53 +01:00
kempersc	cdb633b0c9	enrich custodian entries with logo	2025-12-27 02:15:17 +01:00
kempersc	fd91fec63f	Logo enrichment batch: JP+320, 13,603 total (42.8%) - JP: 4,516/12,096 (37.4%) ✅ NEW COMMIT - CZ: 3,820/8,432 (45.3%) - batches 7-16 running - CH, NL, BE, AT, BR: 100% complete - Total: 13,603/31,772 (42.8%) - Using crawl4ai favicon extraction	2025-12-26 23:25:40 +01:00
kempersc	2104a90f22	Logo enrichment COMPLETE: CZ 3,820 (45.3%) - CZ: 3,820/8,432 files processed (45.3%) - 9 parallel batches completed (500 files each) - NL person entities added (4 staff profiles) - scripts/discover_websites_crawl4ai.py modified - Using crawl4ai favicon extraction	2025-12-26 21:45:14 +01:00
kempersc	6af5009444	enrich entries	2025-12-26 21:41:18 +01:00
kempersc	59963c8d3f	Logo enrichment batch: JP+300, CZ-0 - 12,833 files (40.4%) - JP: 4,496 processed (37.2% of 12,096) ✅ COMPLETE - CZ: 2,820 processed (33.4% of 8,432) - batch completed, slight decrease - CH, NL, BE, AT, BR: 100% complete - Total: 12,833 of 31,772 files (40.4%) - Using crawl4ai favicon extraction	2025-12-26 13:42:21 +01:00
kempersc	6b9fa33767	Logo enrichment batch: CZ+500, JP+170 - 12,513 files (39.4%) - CZ: 2,820 processed (33.4% of 8,432) - JP: 4,176 processed (34.5% of 12,096) - Total: 12,513 of 31,772 (39.4%) - CZ batch completed: 500 files, 52 logos found - JP batch crashed during run (4,176 files before crash) - Using crawl4ai favicon extraction	2025-12-26 02:03:48 +01:00
kempersc	63400392ff	Fix CZ-52-PAB-L-IPVVZOVI logo: use primary_logo.png instead of favicon.ico - Primary logo (logo.png) identified via crawl4ai direct scraping - Favicon (favicon.ico) retained as secondary asset - Updated claims: primary_logo_url + favicon_url - Summary shows: has_primary_logo: true, total_claims: 2	2025-12-25 21:01:05 +01:00
kempersc	6ab0b19ae2	Logo enrichment batch: CZ+260, JP+260 - 11,663 files (36.7%) - CZ: 2,810 processed (33.3% of 8,432) - JP: 3,336 processed (27.6% of 12,096) - Total: 11,663 of 31,772 (36.7%) - Using crawl4ai favicon extraction	2025-12-25 19:23:41 +01:00
kempersc	717ee3408a	Logo enrichment batch: JP+771, CZ+380 - 10,913 files (34%) - JP: 2,846 processed (24% of 12,096) - CZ: 2,550 processed (30% of 8,432) - CH, NL, BE, AT, BR: 100% complete - Total: 10,913 of 31,772 files (34%) - Using crawl4ai favicon extraction	2025-12-25 13:44:26 +01:00
kempersc	c3387ef3f1	Logo enrichment batch: CZ +380, JP +125, AR +28 files - CZ: 2,170 processed (26% of 8,432) - JP: 2,075 processed (17% of 12,096) - AR: Started processing - Total checkpoint: 9,762 files across all countries - Using crawl4ai favicon extraction	2025-12-24 12:50:20 +01:00
kempersc	57de5e4b11	CZ logo enrichment: 1,790 files processed (21%) - Added logo_enrichment to 771 Czech custodian files - 87% logo hit rate using crawl4ai favicon extraction - Total checkpoint: 9,257 files across all countries - CZ remaining: 6,642 files	2025-12-24 02:41:26 +01:00
kempersc	ce1f80d024	enrich: logo enrichment progress (CZ: 220, JP: 1600)	2025-12-23 22:08:43 +01:00
kempersc	4f6ca92084	enrich: logo enrichment progress (JP: 1500, CZ: 40 started)	2025-12-23 21:37:10 +01:00
kempersc	8036eb5a3f	enrich: logo enrichment for JP custodians (1490 processed, 10606 remaining)	2025-12-23 21:17:45 +01:00
kempersc	38292d1918	enrich: logo enrichment for JP custodians (1350 processed, 10746 remaining)	2025-12-23 20:56:21 +01:00
kempersc	5e8a432ef0	enrich japanese and dutch custodians	2025-12-23 18:08:45 +01:00
kempersc	a1fb6344e7	enriching custodian data	2025-12-23 17:26:29 +01:00
kempersc	0c1d19e98b	enrich entries	2025-12-23 13:27:35 +01:00
kempersc	7a056fa746	enrich entries	2025-12-21 22:12:34 +01:00
kempersc	aca68ea47f	remove a,bihguous web-claims	2025-12-21 00:01:54 +01:00
kempersc	23b1d8ee5f	clean up GHCID	2025-12-17 11:58:40 +01:00
kempersc	99430c2a70	add new entries and semantic routing	2025-12-17 10:11:56 +01:00
kempersc	e0dd847491	extend ontology	2025-12-16 20:27:39 +01:00
kempersc	b0416efc7d	enrich custodians and persons	2025-12-16 11:57:34 +01:00
kempersc	52ae711c56	add timespans	2025-12-16 09:02:52 +01:00
kempersc	b1340e30c8	add timespan	2025-12-15 22:35:35 +01:00
kempersc	cb56aa7e40	enrich all custodian timespan	2025-12-15 22:31:41 +01:00
kempersc	525662ea16	data: fix remaining person entity profiles	2025-12-15 01:48:33 +01:00
kempersc	3820f2fc92	chore: Add data reports, infra scripts, and API updates - Data quality reports for Dutch custodians - Name mismatch detection reports - Failed crawl URL tracking - Caddy configuration updates - Monitor script for chunk 404 errors - API endpoint improvements	2025-12-15 01:48:08 +01:00
kempersc	70c30a52d4	data: update person entity profiles with heritage classification	2025-12-15 01:47:42 +01:00
kempersc	181b1cf705	data: enrich Dutch heritage custodians (DR, FL, FR, GE, GR, LI provinces) - Add digital platform discovery data with provenance - Cleanup duplicate/incorrect custodian entries - Add GHCID collision resolution suffixes where needed - Update person entity profiles with career history	2025-12-15 01:34:38 +01:00
kempersc	1d26cade66	correct person labels	2025-12-14 17:58:55 +01:00
kempersc	c6aee998db	correct person labels	2025-12-14 17:29:39 +01:00
kempersc	c50c35fd3a	enrich person custodian	2025-12-14 17:09:55 +01:00
kempersc	505c12601a	Add test script for PiCo extraction from Arabic waqf documents - Implemented a new script `test_pico_arabic_waqf.py` to test the GLM annotator's ability to extract person observations from Arabic historical documents. - The script includes environment variable handling for API token, structured prompts for the GLM API, and validation of extraction results. - Added comprehensive logging for API responses, extraction results, and validation errors. - Included a sample Arabic waqf text for testing purposes, following the PiCo ontology pattern.	2025-12-12 17:50:17 +01:00
kempersc	b1f93b6f22	enrich person profiles	2025-12-12 12:51:10 +01:00
kempersc	03263f67d6	moved web archives	2025-12-12 00:40:26 +01:00
kempersc	1b1cfbfca0	enrich custodians	2025-12-11 22:32:09 +01:00
kempersc	d4906abae4	update postgis data	2025-12-10 23:51:51 +01:00
kempersc	be3fbac601	enrich entries and persons	2025-12-10 18:04:25 +01:00
kempersc	41959f0766	correct HCID!	2025-12-10 13:01:13 +01:00
kempersc	c4b0f17a43	geocode: complete 100% coverage - add coordinates to final 26 files (CZ, BE, AR, LB, ML)	2025-12-10 01:07:34 +01:00
kempersc	82e58f6d40	geocode: add coordinates to 29 custodian files via Wikidata P131/P159 lookups	2025-12-10 01:04:29 +01:00
kempersc	6e2c36413e	geocode: add coordinates to 540 Japanese custodian files using postal codes - Download GeoNames JP postal code database (142K entries) - Create geocode_japan_postal.py with postal code lookup - Handle unicode hyphen variants in postal codes - Add manual mappings for remote Tokyo islands (Hachijojima, Miyakejima) - Implement prefix fallback for company postal codes - Total JP files geocoded: 540 (99.81% coverage) This brings overall geocoding coverage from 97.84% to 99.81%	2025-12-10 00:27:33 +01:00
kempersc	251b5eee68	geocode: add coordinates to 26 more custodian files - Improved city name cleaning: - Roman numeral district suffixes (Kolín V. -> Kolín) - City + country suffixes (Genève 4 - Suisse -> Genève) - Czech postal notation (p. Luka nad Jihlavou -> Luka nad Jihlavou) - Historical city names (Gottwaldov -> Zlín, renamed 1990) - Manual mappings for Swiss districts (Lugano Massagno -> Lugano)	2025-12-09 22:47:32 +01:00
kempersc	35e1686160	geocode: add coordinates to 69 custodian files across multiple countries Countries updated: AR, AT, BG, BR, CA, CL, CN, CU, FI, GE, IR, JO, KG, KR, LB, LI, LV, MX, MY, NI, NL, PS, PY, SX, TM, VN - Manual city name mappings for transliteration variants - St. Pölten -> Sankt Pölten (AT) - Gaza City -> Gaza (PS) - Beit Hanoun -> Bayt Hanun (PS) - Veliko Tarnovo via geonames_id (BG)	2025-12-09 22:44:12 +01:00
kempersc	ef9607d991	geocode: add coordinates to 80 Czech custodian files - Handle Czech address patterns: - House numbers with čp./č.p. prefix - X nad/pod Y town names (rivers/landmarks) - Hyphenated district names (Město-Část) - Trailing numbers and suffixes	2025-12-09 22:41:09 +01:00
kempersc	dee7a4c7d9	geocode: add coordinates to 147 Swiss custodian files - Improved city name normalization to handle: - St. Gallen / St.Gallen -> Sankt Gallen - Canton suffixes (Buchs SG, Brugg AG) - Hyphenated districts (Bernex - Genève) - Postal codes with slashes (Ecublens/VD) - German prepositions (Hausen b. Brugg) - Created scripts/geocode_from_city_name.py for unified geocoding	2025-12-09 22:38:33 +01:00
kempersc	cc61d99acf	geocode: add coordinates to BG and EG custodian files - BG: Add lat/lon from existing GeoNames IDs (28 files) - EG: Map city codes to GeoNames (CAI→Cairo, ALX→Alexandria, etc.) (28 files) - Fix malformed EG-IS-\`A\`-O-SCA.yaml → EG-IS-ISM-O-SCA.yaml - Overall coverage: 96.4% → 96.6%	2025-12-09 21:59:58 +01:00
kempersc	2137c522db	geocode: add coordinates to JP compound cities and CZ files from GeoNames - JP: Handle Gun/Cho/Machi/Mura compound city names (2615 files) - CZ: Map city codes to GeoNames entries (667 files) - Overall coverage: 84.5% → 96.4%	2025-12-09 21:49:40 +01:00
kempersc	92b5e58ef3	geocode: add coordinates to AT, BE, DE, GB, PL, UA, US custodian files from GeoNames	2025-12-09 20:38:34 +01:00
kempersc	3a6ead8fde	feat: Add legal form filtering rule for CustodianName - Introduced LEGAL-FORM-FILTER rule to standardize CustodianName by removing legal form designations. - Documented rationale, examples, and implementation guidelines for the filtering process. docs: Create README for value standardization rules - Established a comprehensive README outlining various value standardization rules applicable to Heritage Custodian classes. - Categorized rules into Name Standardization, Geographic Standardization, Web Observation, and Schema Evolution. feat: Implement transliteration standards for non-Latin scripts - Added TRANSLIT-ISO rule to ensure GHCID abbreviations are generated from emic names using ISO standards for transliteration. - Included detailed guidelines for various scripts and languages, along with implementation examples. feat: Define XPath provenance rules for web observations - Created XPATH-PROVENANCE rule mandating XPath pointers for claims extracted from web sources. - Established a workflow for archiving websites and verifying claims against archived HTML. chore: Update records lifecycle diagram - Generated a new Mermaid diagram illustrating the records lifecycle for heritage custodians. - Included phases for active records, inactive archives, and processed heritage collections with key relationships and classifications.	2025-12-09 16:58:41 +01:00
kempersc	7b42d720d5	geocode: add coordinates to CZ, BY, CH, FR, ES custodian files from GeoNames (1145 files)	2025-12-09 16:41:41 +01:00
kempersc	b54904ad0a	fix: normalize YAML null formatting in Eye Filmmuseum file	2025-12-09 16:34:12 +01:00
kempersc	2c25ed6a96	geocode: add coordinates to JP custodian files from GeoNames (batch 2 - remaining 3639 files)	2025-12-09 16:33:29 +01:00
kempersc	9bc454cdbf	geocode: add coordinates to JP custodian files from GeoNames (batch 1 - 3000 files)	2025-12-09 16:32:01 +01:00
kempersc	982620ba0c	normalize: add canonical location blocks (batch 7 - final) - Final 42 files updated - Normalization complete: all 27,511 custodian files have location block - 15,419 files have coordinates with coordinate_provenance - 12,092 files have address-only location blocks	2025-12-09 14:57:33 +01:00
kempersc	e28576ee65	normalize: add canonical location blocks (batch 6) - 2,546 files updated with location blocks - All 27,511 custodian files now have location: block - 15,421 files have coordinates with coordinate_provenance - 12,090 files have address-only location blocks	2025-12-09 14:44:03 +01:00
kempersc	d20978dcbe	normalize: add canonical location blocks (batch 5)	2025-12-09 14:39:02 +01:00
kempersc	3f60aa6238	normalize: add canonical location blocks (batch 4 - final)	2025-12-09 14:18:15 +01:00
kempersc	5b3d4d1ed5	normalize: add canonical location blocks (batch 3)	2025-12-09 14:14:13 +01:00
kempersc	b739ad4e61	normalize: add canonical location blocks (batch 2)	2025-12-09 13:28:59 +01:00
kempersc	bb41287730	normalize: add canonical location blocks (batch 1)	2025-12-09 13:17:11 +01:00
kempersc	a7321b1bb9	reconstruct location blocks	2025-12-09 12:25:16 +01:00
kempersc	85a951bbea	normalize: add canonical location blocks to 586 files - Fixed 469 JP files missing location: blocks (had data in original_entry.locations) - Fixed 117 additional JP files found in second pass - 1 EG file skipped (no location source data available) - Total files with location: blocks now 27,459 out of 27,511 (99.8%) - Also includes YAML formatting standardization (line wrapping) Recovery from data loss in commit `62fdd35321` is now complete.	2025-12-09 12:17:34 +01:00
kempersc	cab712659d	recover location blocks	2025-12-09 11:34:56 +01:00
kempersc	62fdd35321	Refactor code structure for improved readability and maintainability	2025-12-09 11:15:51 +01:00
kempersc	b61271220b	enrich entries	2025-12-09 10:46:43 +01:00
kempersc	bf7c773955	edit Japanese entries	2025-12-09 09:16:19 +01:00
kempersc	c283daa1a2	normalise dutch entries	2025-12-09 08:02:27 +01:00
kempersc	609866886a	enrich entries	2025-12-09 07:58:14 +01:00
kempersc	131e3ca259	normalise custodian entries	2025-12-09 07:56:35 +01:00
kempersc	40bd3cb8f5	data(custodian): add emic_name fields and remove duplicate files with name suffixes - Add emic_name, name_language, and standardized_name to 1,781 custodian files - Remove 2,239 duplicate files that had name suffixes in filename - Consolidate data into base GHCID files per PID stability rules - Part of UNESCO Memory of the World custodian enrichment	2025-12-08 14:57:34 +01:00
kempersc	7e3559f7e5	add new entries	2025-12-07 23:08:02 +01:00
kempersc	0c4c378e06	fix(data): clean up YAML structure in BE/EG custodian files (450 files) Remove redundant ch_annotator metadata and duplicate ghcid_history entries that were causing YAML parsing issues. Files now have cleaner, more consistent structure while preserving all essential data.	2025-12-07 18:46:42 +01:00
kempersc	d9325c0bb5	feat: add web archives integration and improve enrichment scripts Backend: - Attach web_archives.duckdb as read-only database in DuckLake - Create views for web_archives, web_pages, web_claims in heritage schema Scripts: - enrich_cities_google.py: Add batch processing and retry logic - migrate_web_archives.py: Improve schema handling and error recovery Frontend: - DuckLakePanel: Add web archives query support - Database.css: Improve layout for query results display	2025-12-07 17:49:07 +01:00

1 2 3 4

191 commits