Commit graph

63 commits

Author SHA1 Message Date
kempersc
01e382b53b Update generated timestamp in manifest.json for accuracy 2026-02-01 02:04:00 +01:00
kempersc
ca4a54181e Refactor schema files to improve clarity and maintainability
- Updated WorldCatIdentifier.yaml to remove unnecessary description and ensure consistent formatting.
- Enhanced WorldHeritageSite.yaml by breaking long description into multiple lines for better readability and removed unused attributes.
- Simplified WritingSystem.yaml by removing redundant attributes and ensuring consistent formatting.
- Cleaned up XPathScore.yaml by removing unnecessary attributes and ensuring consistent formatting.
- Improved YoutubeChannel.yaml by breaking long description into multiple lines for better readability.
- Enhanced YoutubeEnrichment.yaml by breaking long description into multiple lines for better readability.
- Updated YoutubeVideo.yaml to break long description into multiple lines and removed legacy field name.
- Refined has_or_had_affiliation.yaml by removing unnecessary comments and ensuring clarity.
- Cleaned up is_or_was_retrieved_at.yaml by removing unnecessary comments and ensuring clarity.
- Added rules for generic slots and avoiding rough edits in schema files to maintain structural integrity.
- Introduced changes_or_changed_through.yaml to define a new slot for linking entities to change events.
2026-01-31 00:46:23 +01:00
kempersc
f7bf1cc5ae Refactor schema slots and classes
- Deleted obsolete slot definitions: statement_summary, statement_text, statement_type, status_name, supersede_articles, supersede_condition, supersede_name, temporal_dynamics, total_amount, typical_contents, use_cases, was_acquired_through, was_fetched_at, was_retrieved_at.
- Updated existing slot definitions for states_or_stated to enhance clarity and structure.
- Introduced new classes: Article, ConditionofAccess, FinancialStatementType, MaximumQuantity, Series, Summary, Type, and their respective slots to improve schema organization and usability.
- Added new slots: changes_or_changed_through, has_or_had_condition_of_access, has_or_had_heritage_type, is_or_was_part_of_series, is_or_was_retrieved_at, maximum_of_maximum to capture additional metadata and relationships.
2026-01-30 00:29:31 +01:00
kempersc
f3c0586d09 Refactor schema and slots for improved clarity and organization
- Updated `has_or_had_url` slot to allow a broader range of values by changing its range from `uriorcurie` to `Any`.
- Removed obsolete slots: `house_number`, `html_file`, `html_snapshot_path`, and `http_status_code`.
- Introduced new classes: `CeasingEvent`, `FileLocation`, `FilePath`, `HTMLFile`, `HTTPStatusCode`, `HouseNumber`, `MaximumHumidity`, `MinimumHumidity`, `TargetHumidity`, and `WKT` to better represent various concepts.
- Migrated existing slots to new structures, ensuring alignment with RiC-O naming conventions.
- Added new slots: `ceases_or_ceased_through`, `has_or_had_file_location`, `has_or_had_file_path`, `has_or_had_http_status`, and `is_or_was_observed_by` to capture additional metadata.
- Enhanced descriptions and annotations for clarity and context.
2026-01-28 10:49:49 +01:00
kempersc
7ea7e3d0d7 feat: Add new ontology and schema classes for Heritage and related concepts
- Introduced new classes: Heritage, HeritagePractice, HeritageRelevanceAssessment, HeritageRelevanceScore, HolySiteType, Mandate.
- Added slots for heritage-related attributes including has_or_had_confidence_measure, has_or_had_related_heritage_form, heritage_education, heritage_employer, heritage_mandate, heritage_practice, and more.
- Migrated existing attributes and ensured compliance with RiC-O naming conventions.
- Enhanced documentation and descriptions for clarity and usability.
- Archived previous versions of slots and classes to maintain schema integrity.
2026-01-28 08:06:56 +01:00
kempersc
f800e198ff Refactor code structure for improved readability and maintainability 2026-01-28 01:11:55 +01:00
kempersc
80eb3d969c Add new slots for heritage custodian ontology
- Introduced `has_api_version`, `has_appellation_language`, `has_appellation_type`, `has_appellation_value`, `has_applicable_country`, `has_application_deadline`, `has_application_opening_date`, `has_appraisal_note`, `has_approval_date`, `has_archdiocese_name`, `has_architectural_style`, `has_archival_reference`, `has_archive_description`, `has_archive_memento_uri`, `has_archive_name`, `has_archive_path`, `has_archive_search_score`, `has_arrangement`, `has_arrangement_level`, `has_arrangement_note`, `has_articles_archival_stage`, `has_articles_document_format`, `has_articles_document_url`, `has_articles_of_association`, `has_or_had_altitude`, `has_or_had_annotation`, `has_or_had_arrangement`, `has_or_had_document`, `has_or_had_reason`, `has_or_had_style`, `is_or_was_amended_through`, `is_or_was_approved_on`, `is_or_was_archived_as`, `is_or_was_due_on`, `is_or_was_opened_on`, and `is_or_was_used_in` slots.
- Each slot includes detailed descriptions, range specifications, and appropriate mappings to existing ontologies.
2026-01-27 10:07:16 +01:00
kempersc
3da90b940e feat(schema): complete multiple slot_fixes.yaml migrations
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 2m4s
Session 2026-01-19: Completed remaining migrations per Rules 53/56/60.

Major migrations:
1. claim_type → has_or_had_type + ClaimType/ClaimTypes (60+ concrete types in 11 categories)
2. circumstances_of_death → is_deceased + DeceasedStatus + CauseOfDeath
3. claims_count → has_or_had_quantity + Quantity (with based_on_claim for provenance)
4. classification_status → has_or_had_type + ClassificationStatusType

Created files:
- ClaimType.yaml, ClaimTypes.yaml (abstract base + 60+ concrete subclasses)
- DeceasedStatus.yaml, CauseOfDeath.yaml, CauseOfDeathTypeEnum.yaml
- ClassificationStatus.yaml, ClassificationStatusType.yaml, ClassificationStatusTypes.yaml
- CITESAppendix.yaml, City.yaml, CertaintyLevel.yaml
- is_deceased.yaml, is_or_was_caused_by.yaml, based_on_claim.yaml

Archived slots:
- claim_type, circumstances_of_death, claims_count, classification_status

Added Rule 60 to AGENTS.md: No Migration Deferral - agents MUST execute all migrations.

All 527 slot_fixes.yaml entries now complete (100%).
2026-01-19 13:05:53 +01:00
kempersc
4a277d7d42 standardise slots 2026-01-19 00:09:28 +01:00
kempersc
4319f38c05 Add archived slots for audience size, audience type, and capacity metrics
- Created new YAML files for audience size and audience type slots, defining their properties and annotations.
- Added archived capacity slots including cubic meters, linear meters, item count, and descriptions, with appropriate URIs and ranges.
- Introduced a template specificity slot for context-aware RAG filtering.
- Consolidated capacity-related slots into a unified structure, including has_or_had_capacity, capacity_type, and capacity_value, with detailed descriptions and examples.
2026-01-17 18:53:23 +01:00
kempersc
46757be964 Refactor ontology schema: Migrate slots and update references
- Replaced deprecated slot 'broader_type' with 'has_or_had_hypernym' in MuseumType, OrganizationBranch, and ResearchOrganizationType schemas, ensuring all references are updated accordingly.
- Removed obsolete slots: 'binding_description', 'binding_type', 'borrower', 'borrower_contact', 'bounding_box', 'branch_description', 'branch_type', and 'taxonomic_rank', archiving them for future reference.
- Introduced new generic slots: 'has_or_had_contact_point', 'has_or_had_geographic_extent', and 'has_or_had_rank' to standardize contact and spatial information, aligning with RiC-O naming conventions.
- Updated slot_fixes.yaml to reflect migration status and ensure immutability of revision entries.
- Enhanced documentation and examples for new slots to facilitate understanding and usage.
2026-01-17 15:18:34 +01:00
kempersc
d47bb5b097 standardise slots 2026-01-16 18:57:52 +01:00
kempersc
db389ed0a3 Refactor schema slots to resolve OWL ambiguity and enhance flexibility
- Updated ranges for multiple slots from `string` to `uriorcurie` to address OWL "Ambiguous type" warnings and allow for URI/CURIE references.
- Removed specialized slots for subtitle and transcript formats, consolidating them under broader predicates.
- Introduced new slots for structured descriptions, observation source documents, and entity statuses to improve data modeling.
- Implemented Rule 54 to broaden generic predicate ranges instead of creating bespoke predicates, promoting schema reuse and reducing complexity.
- Added a script for generating OWL ontology with type-object handling to ensure consistent ObjectProperty treatment for polymorphic slots.
2026-01-16 15:06:36 +01:00
kempersc
9f9c69f7b8 docs: Add Rule 53 for slot_fixes.yaml migration workflow
- Document that slot_fixes.yaml revision section is authoritative
- Add link_branch explanation for nested class attributes
- Clarify that deprecated slots must be fully removed, not just annotated
2026-01-14 19:57:08 +01:00
kempersc
b13674400f Refactor schema slots and classes for improved organization and clarity
- Removed deprecated slots: appraisal_notes, branch_id, is_or_was_real.
- Introduced new slots: has_or_had_notes, has_or_had_provenance.
- Created Notes class to encapsulate note-related metadata.
- Archived removed slots and classes in accordance with the new archive folder convention.
- Updated slot_fixes.yaml to reflect migration status and details.
- Enhanced documentation for new slots and classes, ensuring compliance with ontology alignment.
- Added new slots for note content, date, and type to support the Notes class.
2026-01-14 12:14:07 +01:00
kempersc
b30711fcfb update slots 2026-01-14 09:05:54 +01:00
kempersc
73b3b21017 docs: add Rule 52 prohibiting duplicate ontology mappings
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m53s
- Create .opencode/rules/no-duplicate-ontology-mappings.md with detection script
- Add Rule 52 to AGENTS.md (after Rule 51)
- Fix 29 duplicate mappings: same URI in multiple mapping categories
  - 26 slot files: remove duplicates keeping most precise mapping
  - 3 class files: ExhibitionSpace, Custodian, DigitalPlatform
- Mapping precedence: exact > close > narrow/broad > related

Each ontology URI must appear in only ONE mapping category per schema
element, following SKOS semantics where mapping properties are mutually
exclusive.
2026-01-13 15:57:26 +01:00
kempersc
1fb924c412 feat: add ontology mappings to LinkML schema and enhance entity resolution
Schema enhancements (443 files):
- Add class_uri with proper ontology references (schema:, prov:, skos:, rico:)
- Add close_mappings, related_mappings per Rule 50 convention
- Replace stub hc: slot_uri with standard predicates (dcterms:identifier, skos:prefLabel)
- Improve descriptions with ontology mapping rationale
- Add prefixes blocks to all schema modules

Entity Resolution improvements:
- Add entity_resolution module with email semantics parsing
- Enhance build_entity_resolution.py with email-based matching signals
- Extend Entity Review API with filtering by signal types and count
- Add candidates caching and indexing for performance
- Add ReviewLoginPage component

New rules and documentation:
- Add Rule 51: No Hallucinated Ontology References
- Add .opencode/rules/no-hallucinated-ontology-references.md
- Add .opencode/rules/slot-ontology-mapping-reference.md
- Add adms.ttl and dqv.ttl ontology files

Frontend ontology support:
- Add RiC-O_1-1.rdf and schemaorg.owl to public/ontology
2026-01-13 13:51:02 +01:00
kempersc
c5fb9ec88e feat: add route for Entity Review page with lazy loading 2026-01-13 01:49:43 +01:00
kempersc
8d7aca0f98 Refactor code structure for improved readability and maintainability 2026-01-12 19:13:35 +01:00
kempersc
846a6cdcec Add new Record Set Types for various archival collections
- Introduced SoundArchiveRecordSetType, SpecialCollectionRecordSetType, SpecializedArchiveRecordSetType, SpecializedArchivesCzechiaRecordSetType, StateArchivesRecordSetType, StateArchivesSectionRecordSetType, StateDistrictArchiveRecordSetType, StateRegionalArchiveCzechiaRecordSetType, TelevisionArchiveRecordSetType, TradeUnionArchiveRecordSetType, UniversityArchiveRecordSetType, VereinsarchivRecordSetType, VerlagsarchivRecordSetType, VerwaltungsarchivRecordSetType, WebArchiveRecordSetType, and WomensArchivesRecordSetType.
- Each new type includes appropriate metadata, slots, and relationships to existing classes.
- Implemented a script to detect and fix Type class violations in LinkML files.
2026-01-12 15:20:29 +01:00
kempersc
355d8be51d centralise slots 2026-01-12 14:33:56 +01:00
kempersc
3e6c2367ad feat(linkml-viewer): UX improvements - entry counts, deep links, settings persistence
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 4m4s
- Add entry count badge next to schema file name showing (xC, yE, zS) counts
- Add tooltip explaining LinkML file names vs class names
- Remove redundant section headers (Classes, Enums, Slots collapsible sections)
- Add URL params for enum (?enum=) and slot (?slot=) deep linking
- Persist category filters, dev tools visibility, and legend visibility to localStorage
- Set 'Main Schema' filter to OFF by default (confusing for users)
- Add Rule 48: Class files must not define inline slots
2026-01-11 21:42:35 +01:00
kempersc
dfb4744dc7 Evaluate data enrichments of persons 2026-01-11 12:15:27 +01:00
kempersc
556cc6c294 Add workspace configuration for Git and Gitea integration
- Set up GitHub integration to be disabled.
- Configure Git settings including path and autofetch options.
- Add Gitea instance URL and repository details.
- Enable YAML support for LinkML schemas with validation.
- Define file associations for YAML files.
- Recommend essential extensions for development and exclude unwanted ones.
2026-01-11 02:50:39 +01:00
kempersc
cce484c6b8 feat(archief-assistent): enhance semantic cache with ontology-driven vocabulary
- Integrate tier-2 embeddings from types-vocab.json
- Add segment-based caching for improved retrieval
- Update tests and documentation
2026-01-10 15:38:11 +01:00
kempersc
01b9d77566 feat(archief-assistent): add ontology-driven types vocabulary for cache segmentation
Add LinkML-derived vocabulary for semantic cache entity extraction (Rule 46):

- types-vocab.json: 10,142 lines of institution type vocabulary from LinkML
  - 19 GLAMORCUBESFIXPHDNT type codes with Dutch/English/German/French labels
  - Includes subtypes (kunstmuseum, rijksmuseum, streekarchief, etc.)
  - Extracted from CustodianType.yaml and CustodianTypes.yaml

- types-vocabulary.ts: TypeScript module for entity extraction
  - Exports INSTITUTION_TYPES with regex patterns per type code
  - Replaces hardcoded patterns with schema-derived vocabulary
  - Supports multilingual matching

- Rule 46 documentation (.opencode/rules/)
  - Specifies vocabulary extraction workflow
  - Defines cache key generation algorithm
  - Migration path from hardcoded patterns
2026-01-10 12:57:03 +01:00
kempersc
9e67d0f967 enrich profiles 2026-01-09 20:35:19 +01:00
kempersc
932ec5438c add person profiles with PPID 2026-01-09 18:26:58 +01:00
kempersc
c88fd3af70 Refactor code structure for improved readability and maintainability 2026-01-09 11:05:26 +01:00
kempersc
508b858e16 docs(Rule 40): Add empirical validation showing 33% Google Maps error rate for Type I
Audit of 188 Type I custodian files revealed:
- 62 false matches (33%) detected and corrected
- Categories: domain mismatch (39), name mismatch (8), wrong location (6),
  wrong org type (5), different entity (3), different event (3)
- Documents why Google Maps fails for intangible heritage:
  virtual orgs, person-based heritage, volunteer networks, event-based orgs

This validates KIEN as TIER_1_AUTHORITATIVE for Type I custodians.
2026-01-08 16:47:17 +01:00
kempersc
6608a207d4 update frontend 2026-01-08 15:56:28 +01:00
kempersc
98c42bf272 Fix LinkML URI conflicts and generate RDF outputs
- Fix scope_note → finding_aid_scope_note in FindingAid.yaml
- Remove duplicate wikidata_entity slot from CustodianType.yaml (import instead)
- Remove duplicate rico_record_set_type from class_metadata_slots.yaml
- Fix range types for equals_string compatibility (uriorcurie → string)
- Move class names from close_mappings to see_also in 10 RecordSetTypes files
- Generate all RDF formats: OWL, N-Triples, RDF/XML, N3, JSON-LD context
- Sync schemas to frontend/public/schemas/

Files: 1,151 changed (includes prior CustodianType migration)
2026-01-07 12:32:59 +01:00
kempersc
242bc8bb35 Add new slots for heritage custodian entities
- Created deliverables_slot for expected or achieved deliverable outputs.
- Introduced event_id_slot for persistent unique event identifiers.
- Added follow_up_date_slot for scheduled follow-up action dates.
- Implemented object_ref_slot for references to heritage objects.
- Established price_slot for price information across entities.
- Added price_currency_slot for currency codes in price information.
- Created protocol_slot for API protocol specifications.
- Introduced provenance_text_slot for full provenance entry text.
- Added record_type_slot for classification of record types.
- Implemented response_formats_slot for supported API response formats.
- Established status_slot for current status of entities or activities.
- Added FactualCountDisplay component for displaying count query results.
- Introduced ReplyTypeIndicator component for visualizing reply types.
- Created approval_date_slot for formal approval dates.
- Added authentication_required_slot for API authentication status.
- Implemented capacity_items_slot for maximum storage capacity.
- Established conservation_lab_slot for conservation laboratory information.
- Added cost_usd_slot for API operation costs in USD.
2026-01-05 00:49:05 +01:00
kempersc
2dca28d8c1 enrich CH entries with mission statements 2026-01-04 13:12:32 +01:00
kempersc
4f0cafe98a enrich HC profiles 2026-01-02 02:11:04 +01:00
kempersc
349f31ae6f enrich custodian profiles 2026-01-02 02:10:18 +01:00
kempersc
b42d6bf5d2 backup CZ and JP 2025-12-30 23:19:38 +01:00
kempersc
d64f857aa9 add sparql validator and RAG injector 2025-12-30 03:43:31 +01:00
kempersc
84904e344b Make AGENTS more succint by referring to opencode rules & enrich custodians 2025-12-28 14:56:35 +01:00
kempersc
cdb633b0c9 enrich custodian entries with logo 2025-12-27 02:15:17 +01:00
kempersc
0c1d19e98b enrich entries 2025-12-23 13:27:35 +01:00
kempersc
aca68ea47f remove a,bihguous web-claims 2025-12-21 00:01:54 +01:00
kempersc
99430c2a70 add new entries and semantic routing 2025-12-17 10:11:56 +01:00
kempersc
cb56aa7e40 enrich all custodian timespan 2025-12-15 22:31:41 +01:00
kempersc
c50c35fd3a enrich person custodian 2025-12-14 17:09:55 +01:00
kempersc
b1f93b6f22 enrich person profiles 2025-12-12 12:51:10 +01:00
kempersc
1b1cfbfca0 enrich custodians 2025-12-11 22:32:09 +01:00
kempersc
be3fbac601 enrich entries and persons 2025-12-10 18:04:25 +01:00
kempersc
41959f0766 correct HCID! 2025-12-10 13:01:13 +01:00