kempersc/glam - Forgejo: Beyond coding. We Forge.

Author	SHA1	Message	Date
kempersc	e45c1a3c85	feat(scripts): add city enrichment and location resolution utilities Enrichment scripts for country-specific city data: - enrich_austrian_cities.py, enrich_belgian_cities.py, enrich_belgian_v2.py - enrich_bulgarian_cities.py, enrich_czech_cities.py, enrich_czech_cities_fast.py - enrich_japanese_cities.py, enrich_swiss_isil_cities.py, enrich_cities_google.py Location resolution utilities: - resolve_cities_from_file_coords.py - Resolve cities using coordinates in filenames - resolve_cities_wikidata.py - Use Wikidata P131 for city resolution - resolve_country_codes.py - Standardize country codes - resolve_cz_xx_regions.py - Fix Czech XX region codes - resolve_locations_by_name.py - Name-based location lookup - resolve_regions_from_city.py - Derive regions from city data - update_ghcid_with_geonames.py - Update GHCIDs with GeoNames data CH-Annotator integration: - create_custodian_from_ch_annotator.py - Create custodians from annotations - add_ch_annotator_location_claims.py - Add location claims - extract_locations_ch_annotator.py - Extract locations from annotations Migration and fixes: - migrate_egyptian_from_ch.py - Migrate Egyptian data - migrate_web_archives.py - Migrate web archive data - fix_belgian_cities.py - Fix Belgian city data	2025-12-07 14:26:59 +01:00
kempersc	4825f57951	feat(frontend): improve werkgebied display and database UI - Fix polygon rendering with static paint properties instead of data-driven - Add ensureSourceAndLayers() helper for reliable layer management - Use setPaintProperty() for historical vs modern styling distinction - Improve Database page layout with back buttons and cleaner navigation - Add ResizableNestedTable component for DuckLake data display - Optimize spacing and layout in Database.css - Update schema manifest	2025-12-07 14:26:37 +01:00
kempersc	f284e87d13	feat: add 24,963 heritage custodian records from global extraction Major batch addition of heritage institution data: - Japan: 12,077 institutions (libraries, museums, archives) - Czechia: 6,760 institutions - Switzerland: 2,390 institutions - Belgium: 448 institutions - Belarus: 257 institutions - Austria: 249 institutions (with corrected GHCIDs) - Argentina: 235 institutions (bibliotecas populares) - Brazil: 155 institutions - Mexico: 110 institutions - Bulgaria: 98 institutions - Chile: 83 institutions - Egypt: 50 institutions - And additional records from VN, NL, GE, KR, GB, FR, US, IN, etc. All records include: - Standardized GHCID identifiers (alphabetic-only abbreviations) - GeoNames-resolved location data - ISO 3166-2 region codes - Provenance metadata with extraction timestamps	2025-12-07 14:24:48 +01:00
kempersc	63a6bccd9b	fix: remove custodian files with invalid GHCID special characters Remove 229 custodian YAML files containing invalid characters in GHCIDs: - Ampersand (&) in abbreviations (e.g., BM&HS, UNL&AG, DR&IMSM) - Parentheses in abbreviations (e.g., WHO(RA, VK(, SL() - Unicode characters in filenames (Ö, Ä, Å, É, İ, Ż, etc.) These files are replaced with corrected versions using alphabetic-only abbreviations per AGENTS.md Rule 8 (Special Characters MUST Be Excluded). Related scripts updated for location resolution.	2025-12-07 14:23:50 +01:00
kempersc	ee4e57bc75	add new entries	2025-12-07 00:26:01 +01:00
kempersc	1635625032	added web annotations	2025-12-06 19:50:04 +01:00
kempersc	55e2cd2340	feat: implement LLM-based extraction for Archives Lab content - Introduced `llm_extract_archiveslab.py` script for entity and relationship extraction using LLMAnnotator with GLAM-NER v1.7.0. - Replaced regex-based extraction with generative LLM inference. - Added functions for loading markdown content, converting annotation sessions to dictionaries, and generating extraction statistics. - Implemented comprehensive logging of extraction results, including counts of entities, relationships, and specific types like heritage institutions and persons. - Results and statistics are saved in JSON format for further analysis.	2025-12-05 23:16:21 +01:00
kempersc	4da64eeebf	improve annotator	2025-12-05 16:25:39 +01:00
kempersc	e38fb4613b	improve annotation prompt	2025-12-05 15:51:39 +01:00
kempersc	3a242370fc	annotation standards added	2025-12-05 15:30:23 +01:00
kempersc	d661947830	update enriched entries	2025-12-03 17:38:46 +01:00
kempersc	ef89b1213a	validate enrichments	2025-12-02 14:36:01 +01:00
kempersc	8ebca2f845	add pid	2025-12-02 00:00:45 +01:00
kempersc	4b833d20b2	add pids	2025-12-01 23:55:55 +01:00
kempersc	7dce283c17	Add new enums for PersonalCollectionType, ResearchCenterType, and TasteScentHeritage classifications; implement validation script for custodian names against authoritative sources	2025-12-01 18:39:22 +01:00
kempersc	48a2b26f59	feat: Add script to generate Mermaid ER diagrams with instance data from LinkML schemas - Implemented `generate_mermaid_with_instances.py` to create ER diagrams that include all classes, relationships, enum values, and instance data. - Loaded instance data from YAML files and enriched enum definitions with meaningful annotations. - Configured output paths for generated diagrams in both frontend and schema directories. - Added support for excluding technical classes and limiting the number of displayed enum and instance values for readability.	2025-12-01 16:58:03 +01:00
kempersc	097d116b72	enrich entries	2025-12-01 16:06:34 +01:00
kempersc	2497e5913f	enrich entries	2025-12-01 00:37:24 +01:00
kempersc	f3c149b1bb	update entries	2025-11-30 23:30:29 +01:00
kempersc	ff92698c7a	Implement feature X to enhance user experience and fix bug Y in module Z	2025-11-30 23:25:05 +01:00
kempersc	d623f0af4a	store archived websites	2025-11-29 20:40:46 +01:00
kempersc	572ccd5daf	archive websites	2025-11-29 18:18:04 +01:00
kempersc	0ab8f24a6b	archive websites	2025-11-29 18:05:16 +01:00
kempersc	da1eae6597	Refactor code structure for improved readability and maintainability	2025-11-29 12:27:39 +01:00
kempersc	30162e6526	Add script to validate KB library entries and generate enrichment report - Implemented a Python script to validate KB library YAML files for required fields and data quality. - Analyzed enrichment coverage from Wikidata and Google Maps, generating statistics. - Created a comprehensive markdown report summarizing validation results and enrichment quality. - Included error handling for file loading and validation processes. - Generated JSON statistics for further analysis.	2025-11-28 14:48:33 +01:00
kempersc	5cdce584b2	Add complete schema for heritage custodian observation reconstruction - Introduced a comprehensive class diagram for the heritage custodian observation reconstruction schema. - Defined multiple classes including AllocationAgency, ArchiveOrganizationType, AuxiliaryDigitalPlatform, and others, with relevant attributes and relationships. - Established inheritance and associations among classes to represent complex relationships within the schema. - Generated on 2025-11-28, version 0.9.0, excluding the Container class.	2025-11-28 13:13:23 +01:00
kempersc	0d1741c55e	Refactor code structure for improved readability and maintainability	2025-11-28 11:44:21 +01:00
kempersc	37886f0433	Refactor code structure for improved readability and maintainability	2025-11-27 17:43:14 +01:00
kempersc	5ef8ccac51	Add script to enrich NDE Register NL entries with Wikidata data - Implemented a Python script that fetches and enriches entries from the NDE Register using data from Wikidata. - Utilized the Wikibase REST API and SPARQL endpoints for data retrieval. - Added logging for tracking progress and errors during the enrichment process. - Configured rate limiting based on authentication status for API requests. - Created a structured output in YAML format, including detailed enrichment data. - Generated a log file summarizing the enrichment process and results.	2025-11-27 13:30:00 +01:00
kempersc	cd0ff5b9c7	wrap up voorbeeld lijst	2025-11-27 10:58:53 +01:00
kempersc	a6cbce1749	feat: Implement intersection calculation for UML diagram node links	2025-11-27 10:58:45 +01:00
kempersc	e99b1e644e	feat: Add platform_description slot for detailed auxiliary platform information	2025-11-26 10:18:16 +01:00
kempersc	e2eb7aa5cf	feat: Add auxiliary slots and enums for places and digital platforms	2025-11-26 10:09:06 +01:00
kempersc	eff2f47f6f	Add auxiliary enums and slots for digital platforms and physical locations - Created AuxiliaryDigitalPlatformTypeEnum.yaml to classify types of secondary digital platforms. - Created AuxiliaryPlaceTypeEnum.yaml to classify types of secondary physical locations. - Added OrganizationBranchTypeEnum.yaml for formal organizational branches at auxiliary locations. - Introduced auxiliary_places.yaml slot to link CustodianPlace to subordinate physical locations. - Introduced auxiliary_platforms.yaml slot to link DigitalPlatform to subordinate digital properties. - Added located_at.yaml slot to connect OrganizationalStructure to physical locations.	2025-11-25 15:06:43 +01:00
kempersc	a5a66eb547	add classes	2025-11-25 12:48:07 +01:00
kempersc	3ff0e33bf9	Add UML diagrams and scripts for custodian schema - Created PlantUML diagrams for custodian types, full schema, legal status, and organizational structure. - Implemented a script to generate GraphViz DOT diagrams from OWL/RDF ontology files. - Developed a script to generate UML diagrams from modular LinkML schema, supporting both Mermaid and PlantUML formats. - Enhanced class definitions and relationships in UML diagrams to reflect the latest schema updates.	2025-11-23 23:05:33 +01:00
kempersc	67657c39b6	feat: Complete Country Class Implementation and Hypernyms Removal - Created the Country class with ISO 3166-1 alpha-2 and alpha-3 codes, ensuring minimal design without additional metadata. - Integrated the Country class into CustodianPlace and LegalForm schemas to support country-specific feature types and legal forms. - Removed duplicate keys in FeatureTypeEnum.yaml, resulting in 294 unique feature types. - Eliminated "Hypernyms:" text from FeatureTypeEnum descriptions, verifying that semantic relationships are now conveyed through ontology mappings. - Created example instance file demonstrating integration of Country with CustodianPlace and LegalForm. - Updated documentation to reflect the completion of the Country class implementation and hypernyms removal.	2025-11-23 13:09:38 +01:00
kempersc	6eb18700f0	Add SHACL validation shapes and validation script for Heritage Custodian Ontology - Created SHACL shapes for validating temporal consistency and bidirectional relationships in custodial collections and staff observations. - Implemented a Python script to validate RDF data against the defined SHACL shapes using the pyshacl library. - Added command-line interface for validation with options for specifying data formats and output reports. - Included detailed error handling and reporting for validation results.	2025-11-22 23:22:10 +01:00
kempersc	2761857b0d	Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams - Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams. - Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams. - Added two new PlantUML files for custodian multi-aspect diagrams.	2025-11-22 23:01:13 +01:00
kempersc	8907aa6213	feat: Refactor Heritage Custodian Ontology to Multi-Aspect Model - Implemented three independent aspects for custodians: CustodianLegalStatus, CustodianName, and CustodianPlace. - Renamed CustodianReconstruction to CustodianLegalStatus and updated all references. - Created new components for CustodianPlace and PlaceSpecificityEnum. - Removed direct links from CustodianObservation to Custodian, aligning with PROV-O standards. - Generated comprehensive example instance demonstrating the new architecture. - Updated documentation to reflect changes and provide guidance on multi-aspect modeling. - Added React hook for managing IndexedDB operations, including storing and loading transformation results. - Created complete YAML example for Rijksmuseum, illustrating the integration of all three aspects.	2025-11-22 15:40:17 +01:00
kempersc	94d1054f4a	Refactor code structure for improved readability and maintainability; removed redundant code blocks and optimized function calls.	2025-11-22 15:35:35 +01:00
kempersc	fa5680f0dd	Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats - Introduced custodian_hub_v3.mmd, custodian_hub_v4_final.mmd, and custodian_hub_v5_FINAL.mmd for Mermaid representation. - Created custodian_hub_FINAL.puml and custodian_hub_v3.puml for PlantUML representation. - Defined entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument. - Established relationships and associations between entities, including temporal extents, observations, and reconstruction activities. - Incorporated enumerations for various types, statuses, and classifications relevant to custodians and their activities.	2025-11-22 14:33:51 +01:00
kempersc	284b575e88	Add UML diagrams for Custodian Hub v2 in Mermaid and PlantUML formats - Introduced a new Mermaid diagram for Custodian Hub v2, detailing entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument. - Established relationships between entities, including temporal extents, derivations, and revisions. - Added a comprehensive PlantUML diagram reflecting the same structure and relationships, including enumerations for various types and statuses relevant to custodians and observations. - Enhanced documentation to clarify the hub architecture pattern and its implications for data integrity and source authority.	2025-11-21 22:30:07 +01:00
kempersc	edb1e07941	updated schemata	2025-11-21 22:12:33 +01:00
kempersc	176a7479f9	Add comprehensive ontology mapping rules and update project mission - Update AGENTS.md with PROJECT CORE MISSION section emphasizing ontology engineering focus - Create .opencode/agent/ontology-mapping-rules.md (665 lines) with detailed guidelines: * Ontology consultation workflows (Rule 1) * Wikidata entity mapping procedures (Rule 2) * Multi-aspect modeling requirements (Rule 3) * Temporal independence documentation (Rule 4) * Property research workflows (Rule 5) * Decision trees for ontology selection (Rule 6-7) * Quality assurance checklists (Rule 8-9) * Agent collaboration protocols (Rule 10) - Create ONTOLOGY_RULES_SUMMARY.md as quick reference guide Key principles established: 1. Wikidata Q-numbers are NOT ontology classes (must be mapped) 2. Every heritage entity has multiple aspects with independent temporal lifecycles 3. Base ontologies (CPOV, TOOI, CIDOC-CRM, RiC-O, Schema.org, PiCo) are source of truth 4. Custom properties forbidden when ontology equivalents exist Example: 'Mansion' (Q1802963) requires modeling as: - Place aspect (crm:E27_Site, construction→present) - Custodian aspect (cpov:PublicOrganisation OR schema:Museum, founding→present) - Legal form aspect (org:FormalOrganization, registration→present) - Collections aspect (crm:E78_Curated_Holding, accession→present) - People aspect (picom:PersonObservation, employment periods) - Temporal events (crm:E10_Transfer_of_Custody for custody changes) All agents MUST read ontology files before schema design.	2025-11-20 23:09:02 +01:00
kempersc	e6684e815b	feat: Enhance hyponyms with additional labels and types for better classification	2025-11-20 07:52:23 +01:00
kempersc	38354539a6	feat: Add comprehensive harvester for Thüringen archives - Implemented a new script to extract full metadata from 149 archive detail pages on archive-in-thueringen.de. - Extracted data includes addresses, emails, phones, directors, collection sizes, opening hours, histories, and more. - Introduced structured data parsing and error handling for robust data extraction. - Added rate limiting to respect server load and improve scraping efficiency. - Results are saved in a JSON format with detailed metadata about the extraction process.	2025-11-20 00:25:45 +01:00
kempersc	3c80de87e0	add isil entries	2025-11-19 23:25:22 +01:00
kempersc	e5a532a8bc	Add comprehensive tests for NLP institution extraction and RDF partnership integration - Introduced `test_nlp_extractor.py` with unit tests for the InstitutionExtractor, covering various extraction patterns (ISIL, Wikidata, VIAF, city names) and ensuring proper classification of institutions (museum, library, archive). - Added tests for extracted entities and result handling to validate the extraction process. - Created `test_partnership_rdf_integration.py` to validate the end-to-end process of extracting partnerships from a conversation and exporting them to RDF format. - Implemented tests for temporal properties in partnerships and ensured compliance with W3C Organization Ontology patterns. - Verified that extracted partnerships are correctly linked with PROV-O provenance metadata.	2025-11-19 23:20:47 +01:00
kempersc	5e9f54bd91	Deduplicate Brazilian institutions (212→121) - Merged 91 duplicate Brazilian institution records - Improved Wikidata coverage from 26.4% to 38.8% (+12.4pp) - Created intelligent merge strategy: - Prefer records with higher confidence scores - Merge locations (prefer most complete) - Combine all unique identifiers - Combine all unique digital platforms - Combine all unique collections - Add provenance notes documenting merges - Create backup before deduplication - Generate comprehensive deduplication report Dataset changes: - Total institutions: 13,502 → 13,411 - Brazilian institutions: 212 → 121 - Coverage: 47/121 institutions with Q-numbers (38.8%)	2025-11-11 22:08:34 +01:00

1 2

51 commits