kempersc/glam - Forgejo: Beyond coding. We Forge.

8 commits 2 branches 0 tags 2.5 GiB

Author	SHA1	Message	Date
kempersc	edb1e07941	updated schemata	2025-11-21 22:12:33 +01:00
kempersc	176a7479f9	Add comprehensive ontology mapping rules and update project mission - Update AGENTS.md with PROJECT CORE MISSION section emphasizing ontology engineering focus - Create .opencode/agent/ontology-mapping-rules.md (665 lines) with detailed guidelines: * Ontology consultation workflows (Rule 1) * Wikidata entity mapping procedures (Rule 2) * Multi-aspect modeling requirements (Rule 3) * Temporal independence documentation (Rule 4) * Property research workflows (Rule 5) * Decision trees for ontology selection (Rule 6-7) * Quality assurance checklists (Rule 8-9) * Agent collaboration protocols (Rule 10) - Create ONTOLOGY_RULES_SUMMARY.md as quick reference guide Key principles established: 1. Wikidata Q-numbers are NOT ontology classes (must be mapped) 2. Every heritage entity has multiple aspects with independent temporal lifecycles 3. Base ontologies (CPOV, TOOI, CIDOC-CRM, RiC-O, Schema.org, PiCo) are source of truth 4. Custom properties forbidden when ontology equivalents exist Example: 'Mansion' (Q1802963) requires modeling as: - Place aspect (crm:E27_Site, construction→present) - Custodian aspect (cpov:PublicOrganisation OR schema:Museum, founding→present) - Legal form aspect (org:FormalOrganization, registration→present) - Collections aspect (crm:E78_Curated_Holding, accession→present) - People aspect (picom:PersonObservation, employment periods) - Temporal events (crm:E10_Transfer_of_Custody for custody changes) All agents MUST read ontology files before schema design.	2025-11-20 23:09:02 +01:00
kempersc	e6684e815b	feat: Enhance hyponyms with additional labels and types for better classification	2025-11-20 07:52:23 +01:00
kempersc	38354539a6	feat: Add comprehensive harvester for Thüringen archives - Implemented a new script to extract full metadata from 149 archive detail pages on archive-in-thueringen.de. - Extracted data includes addresses, emails, phones, directors, collection sizes, opening hours, histories, and more. - Introduced structured data parsing and error handling for robust data extraction. - Added rate limiting to respect server load and improve scraping efficiency. - Results are saved in a JSON format with detailed metadata about the extraction process.	2025-11-20 00:25:45 +01:00
kempersc	3c80de87e0	add isil entries	2025-11-19 23:25:22 +01:00
kempersc	e5a532a8bc	Add comprehensive tests for NLP institution extraction and RDF partnership integration - Introduced `test_nlp_extractor.py` with unit tests for the InstitutionExtractor, covering various extraction patterns (ISIL, Wikidata, VIAF, city names) and ensuring proper classification of institutions (museum, library, archive). - Added tests for extracted entities and result handling to validate the extraction process. - Created `test_partnership_rdf_integration.py` to validate the end-to-end process of extracting partnerships from a conversation and exporting them to RDF format. - Implemented tests for temporal properties in partnerships and ensured compliance with W3C Organization Ontology patterns. - Verified that extracted partnerships are correctly linked with PROV-O provenance metadata.	2025-11-19 23:20:47 +01:00
kempersc	5e9f54bd91	Deduplicate Brazilian institutions (212→121) - Merged 91 duplicate Brazilian institution records - Improved Wikidata coverage from 26.4% to 38.8% (+12.4pp) - Created intelligent merge strategy: - Prefer records with higher confidence scores - Merge locations (prefer most complete) - Combine all unique identifiers - Combine all unique digital platforms - Combine all unique collections - Add provenance notes documenting merges - Create backup before deduplication - Generate comprehensive deduplication report Dataset changes: - Total institutions: 13,502 → 13,411 - Brazilian institutions: 212 → 121 - Coverage: 47/121 institutions with Q-numbers (38.8%)	2025-11-11 22:08:34 +01:00
kempersc	59c99bfb26	Brazil Batch 10: Enrich 8 institutions (26.4% coverage) - Add Wikidata Q-numbers to 8 Brazilian institutions - Coverage: 56/212 institutions (26.4%, +5.6pp gain) - All Q-numbers validated via Wikidata authenticated API - Largest single batch gain yet - Note: Duplicate entries detected, deduplication needed Q-numbers added: - Q10333651 - Museu da Borracha - Q10387829 - UFAC Repository - Q10345196 - Parque Memorial Quilombo dos Palmares - Q1434444 - Teatro Amazonas - Q116921020 - Centro Cultural dos Povos da Amazônia - Q7894381 - UNIFAP - Q16496091 - Arquivo Público do Estado da Bahia - Q56695457 - Museu de Arqueologia e Etnologia da UFPR	2025-11-11 22:05:43 +01:00