Commit graph

37 commits

Author SHA1 Message Date
kempersc
7dce283c17 Add new enums for PersonalCollectionType, ResearchCenterType, and TasteScentHeritage classifications; implement validation script for custodian names against authoritative sources 2025-12-01 18:39:22 +01:00
kempersc
48a2b26f59 feat: Add script to generate Mermaid ER diagrams with instance data from LinkML schemas
- Implemented `generate_mermaid_with_instances.py` to create ER diagrams that include all classes, relationships, enum values, and instance data.
- Loaded instance data from YAML files and enriched enum definitions with meaningful annotations.
- Configured output paths for generated diagrams in both frontend and schema directories.
- Added support for excluding technical classes and limiting the number of displayed enum and instance values for readability.
2025-12-01 16:58:03 +01:00
kempersc
097d116b72 enrich entries 2025-12-01 16:06:34 +01:00
kempersc
2497e5913f enrich entries 2025-12-01 00:37:24 +01:00
kempersc
f3c149b1bb update entries 2025-11-30 23:30:29 +01:00
kempersc
ff92698c7a Implement feature X to enhance user experience and fix bug Y in module Z 2025-11-30 23:25:05 +01:00
kempersc
d623f0af4a store archived websites 2025-11-29 20:40:46 +01:00
kempersc
572ccd5daf archive websites 2025-11-29 18:18:04 +01:00
kempersc
0ab8f24a6b archive websites 2025-11-29 18:05:16 +01:00
kempersc
da1eae6597 Refactor code structure for improved readability and maintainability 2025-11-29 12:27:39 +01:00
kempersc
30162e6526 Add script to validate KB library entries and generate enrichment report
- Implemented a Python script to validate KB library YAML files for required fields and data quality.
- Analyzed enrichment coverage from Wikidata and Google Maps, generating statistics.
- Created a comprehensive markdown report summarizing validation results and enrichment quality.
- Included error handling for file loading and validation processes.
- Generated JSON statistics for further analysis.
2025-11-28 14:48:33 +01:00
kempersc
5cdce584b2 Add complete schema for heritage custodian observation reconstruction
- Introduced a comprehensive class diagram for the heritage custodian observation reconstruction schema.
- Defined multiple classes including AllocationAgency, ArchiveOrganizationType, AuxiliaryDigitalPlatform, and others, with relevant attributes and relationships.
- Established inheritance and associations among classes to represent complex relationships within the schema.
- Generated on 2025-11-28, version 0.9.0, excluding the Container class.
2025-11-28 13:13:23 +01:00
kempersc
0d1741c55e Refactor code structure for improved readability and maintainability 2025-11-28 11:44:21 +01:00
kempersc
37886f0433 Refactor code structure for improved readability and maintainability 2025-11-27 17:43:14 +01:00
kempersc
5ef8ccac51 Add script to enrich NDE Register NL entries with Wikidata data
- Implemented a Python script that fetches and enriches entries from the NDE Register using data from Wikidata.
- Utilized the Wikibase REST API and SPARQL endpoints for data retrieval.
- Added logging for tracking progress and errors during the enrichment process.
- Configured rate limiting based on authentication status for API requests.
- Created a structured output in YAML format, including detailed enrichment data.
- Generated a log file summarizing the enrichment process and results.
2025-11-27 13:30:00 +01:00
kempersc
cd0ff5b9c7 wrap up voorbeeld lijst 2025-11-27 10:58:53 +01:00
kempersc
a6cbce1749 feat: Implement intersection calculation for UML diagram node links 2025-11-27 10:58:45 +01:00
kempersc
e99b1e644e feat: Add platform_description slot for detailed auxiliary platform information 2025-11-26 10:18:16 +01:00
kempersc
e2eb7aa5cf feat: Add auxiliary slots and enums for places and digital platforms 2025-11-26 10:09:06 +01:00
kempersc
eff2f47f6f Add auxiliary enums and slots for digital platforms and physical locations
- Created AuxiliaryDigitalPlatformTypeEnum.yaml to classify types of secondary digital platforms.
- Created AuxiliaryPlaceTypeEnum.yaml to classify types of secondary physical locations.
- Added OrganizationBranchTypeEnum.yaml for formal organizational branches at auxiliary locations.
- Introduced auxiliary_places.yaml slot to link CustodianPlace to subordinate physical locations.
- Introduced auxiliary_platforms.yaml slot to link DigitalPlatform to subordinate digital properties.
- Added located_at.yaml slot to connect OrganizationalStructure to physical locations.
2025-11-25 15:06:43 +01:00
kempersc
a5a66eb547 add classes 2025-11-25 12:48:07 +01:00
kempersc
3ff0e33bf9 Add UML diagrams and scripts for custodian schema
- Created PlantUML diagrams for custodian types, full schema, legal status, and organizational structure.
- Implemented a script to generate GraphViz DOT diagrams from OWL/RDF ontology files.
- Developed a script to generate UML diagrams from modular LinkML schema, supporting both Mermaid and PlantUML formats.
- Enhanced class definitions and relationships in UML diagrams to reflect the latest schema updates.
2025-11-23 23:05:33 +01:00
kempersc
67657c39b6 feat: Complete Country Class Implementation and Hypernyms Removal
- Created the Country class with ISO 3166-1 alpha-2 and alpha-3 codes, ensuring minimal design without additional metadata.
- Integrated the Country class into CustodianPlace and LegalForm schemas to support country-specific feature types and legal forms.
- Removed duplicate keys in FeatureTypeEnum.yaml, resulting in 294 unique feature types.
- Eliminated "Hypernyms:" text from FeatureTypeEnum descriptions, verifying that semantic relationships are now conveyed through ontology mappings.
- Created example instance file demonstrating integration of Country with CustodianPlace and LegalForm.
- Updated documentation to reflect the completion of the Country class implementation and hypernyms removal.
2025-11-23 13:09:38 +01:00
kempersc
6eb18700f0 Add SHACL validation shapes and validation script for Heritage Custodian Ontology
- Created SHACL shapes for validating temporal consistency and bidirectional relationships in custodial collections and staff observations.
- Implemented a Python script to validate RDF data against the defined SHACL shapes using the pyshacl library.
- Added command-line interface for validation with options for specifying data formats and output reports.
- Included detailed error handling and reporting for validation results.
2025-11-22 23:22:10 +01:00
kempersc
2761857b0d Add scripts for converting OWL/Turtle ontology to Mermaid and PlantUML diagrams
- Implemented `owl_to_mermaid.py` to convert OWL/Turtle files into Mermaid class diagrams.
- Implemented `owl_to_plantuml.py` to convert OWL/Turtle files into PlantUML class diagrams.
- Added two new PlantUML files for custodian multi-aspect diagrams.
2025-11-22 23:01:13 +01:00
kempersc
8907aa6213 feat: Refactor Heritage Custodian Ontology to Multi-Aspect Model
- Implemented three independent aspects for custodians: CustodianLegalStatus, CustodianName, and CustodianPlace.
- Renamed CustodianReconstruction to CustodianLegalStatus and updated all references.
- Created new components for CustodianPlace and PlaceSpecificityEnum.
- Removed direct links from CustodianObservation to Custodian, aligning with PROV-O standards.
- Generated comprehensive example instance demonstrating the new architecture.
- Updated documentation to reflect changes and provide guidance on multi-aspect modeling.
- Added React hook for managing IndexedDB operations, including storing and loading transformation results.
- Created complete YAML example for Rijksmuseum, illustrating the integration of all three aspects.
2025-11-22 15:40:17 +01:00
kempersc
94d1054f4a Refactor code structure for improved readability and maintainability; removed redundant code blocks and optimized function calls. 2025-11-22 15:35:35 +01:00
kempersc
fa5680f0dd Add initial versions of custodian hub UML diagrams in Mermaid and PlantUML formats
- Introduced custodian_hub_v3.mmd, custodian_hub_v4_final.mmd, and custodian_hub_v5_FINAL.mmd for Mermaid representation.
- Created custodian_hub_FINAL.puml and custodian_hub_v3.puml for PlantUML representation.
- Defined entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument.
- Established relationships and associations between entities, including temporal extents, observations, and reconstruction activities.
- Incorporated enumerations for various types, statuses, and classifications relevant to custodians and their activities.
2025-11-22 14:33:51 +01:00
kempersc
284b575e88 Add UML diagrams for Custodian Hub v2 in Mermaid and PlantUML formats
- Introduced a new Mermaid diagram for Custodian Hub v2, detailing entities such as CustodianReconstruction, Identifier, TimeSpan, Agent, CustodianName, CustodianObservation, ReconstructionActivity, Appellation, ConfidenceMeasure, Custodian, LanguageCode, and SourceDocument.
- Established relationships between entities, including temporal extents, derivations, and revisions.
- Added a comprehensive PlantUML diagram reflecting the same structure and relationships, including enumerations for various types and statuses relevant to custodians and observations.
- Enhanced documentation to clarify the hub architecture pattern and its implications for data integrity and source authority.
2025-11-21 22:30:07 +01:00
kempersc
edb1e07941 updated schemata 2025-11-21 22:12:33 +01:00
kempersc
176a7479f9 Add comprehensive ontology mapping rules and update project mission
- Update AGENTS.md with PROJECT CORE MISSION section emphasizing ontology engineering focus
- Create .opencode/agent/ontology-mapping-rules.md (665 lines) with detailed guidelines:
  * Ontology consultation workflows (Rule 1)
  * Wikidata entity mapping procedures (Rule 2)
  * Multi-aspect modeling requirements (Rule 3)
  * Temporal independence documentation (Rule 4)
  * Property research workflows (Rule 5)
  * Decision trees for ontology selection (Rule 6-7)
  * Quality assurance checklists (Rule 8-9)
  * Agent collaboration protocols (Rule 10)
- Create ONTOLOGY_RULES_SUMMARY.md as quick reference guide

Key principles established:
1. Wikidata Q-numbers are NOT ontology classes (must be mapped)
2. Every heritage entity has multiple aspects with independent temporal lifecycles
3. Base ontologies (CPOV, TOOI, CIDOC-CRM, RiC-O, Schema.org, PiCo) are source of truth
4. Custom properties forbidden when ontology equivalents exist

Example: 'Mansion' (Q1802963) requires modeling as:
- Place aspect (crm:E27_Site, construction→present)
- Custodian aspect (cpov:PublicOrganisation OR schema:Museum, founding→present)
- Legal form aspect (org:FormalOrganization, registration→present)
- Collections aspect (crm:E78_Curated_Holding, accession→present)
- People aspect (picom:PersonObservation, employment periods)
- Temporal events (crm:E10_Transfer_of_Custody for custody changes)

All agents MUST read ontology files before schema design.
2025-11-20 23:09:02 +01:00
kempersc
e6684e815b feat: Enhance hyponyms with additional labels and types for better classification 2025-11-20 07:52:23 +01:00
kempersc
38354539a6 feat: Add comprehensive harvester for Thüringen archives
- Implemented a new script to extract full metadata from 149 archive detail pages on archive-in-thueringen.de.
- Extracted data includes addresses, emails, phones, directors, collection sizes, opening hours, histories, and more.
- Introduced structured data parsing and error handling for robust data extraction.
- Added rate limiting to respect server load and improve scraping efficiency.
- Results are saved in a JSON format with detailed metadata about the extraction process.
2025-11-20 00:25:45 +01:00
kempersc
3c80de87e0 add isil entries 2025-11-19 23:25:22 +01:00
kempersc
e5a532a8bc Add comprehensive tests for NLP institution extraction and RDF partnership integration
- Introduced `test_nlp_extractor.py` with unit tests for the InstitutionExtractor, covering various extraction patterns (ISIL, Wikidata, VIAF, city names) and ensuring proper classification of institutions (museum, library, archive).
- Added tests for extracted entities and result handling to validate the extraction process.
- Created `test_partnership_rdf_integration.py` to validate the end-to-end process of extracting partnerships from a conversation and exporting them to RDF format.
- Implemented tests for temporal properties in partnerships and ensured compliance with W3C Organization Ontology patterns.
- Verified that extracted partnerships are correctly linked with PROV-O provenance metadata.
2025-11-19 23:20:47 +01:00
kempersc
5e9f54bd91 Deduplicate Brazilian institutions (212→121)
- Merged 91 duplicate Brazilian institution records
- Improved Wikidata coverage from 26.4% to 38.8% (+12.4pp)
- Created intelligent merge strategy:
  - Prefer records with higher confidence scores
  - Merge locations (prefer most complete)
  - Combine all unique identifiers
  - Combine all unique digital platforms
  - Combine all unique collections
- Add provenance notes documenting merges
- Create backup before deduplication
- Generate comprehensive deduplication report

Dataset changes:
- Total institutions: 13,502 → 13,411
- Brazilian institutions: 212 → 121
- Coverage: 47/121 institutions with Q-numbers (38.8%)
2025-11-11 22:08:34 +01:00
kempersc
59c99bfb26 Brazil Batch 10: Enrich 8 institutions (26.4% coverage)
- Add Wikidata Q-numbers to 8 Brazilian institutions
- Coverage: 56/212 institutions (26.4%, +5.6pp gain)
- All Q-numbers validated via Wikidata authenticated API
- Largest single batch gain yet
- Note: Duplicate entries detected, deduplication needed

Q-numbers added:
- Q10333651 - Museu da Borracha
- Q10387829 - UFAC Repository
- Q10345196 - Parque Memorial Quilombo dos Palmares
- Q1434444 - Teatro Amazonas
- Q116921020 - Centro Cultural dos Povos da Amazônia
- Q7894381 - UNIFAP
- Q16496091 - Arquivo Público do Estado da Bahia
- Q56695457 - Museu de Arqueologia e Etnologia da UFPR
2025-11-11 22:05:43 +01:00