Commit graph

35 commits

Author SHA1 Message Date
kempersc
a590a8d94b Refactor and enhance descriptions across multiple YAML schemas for improved clarity and consistency.
- Updated descriptions in `WikidataOrganization`, `WikidataRecognition`, `WikidataResolvedEntities`, `WikidataSitelinks`, `WikidataSocialMedia`, `WikidataTemporal`, `WikidataTimeValue`, `WikidataWeb`, `WomensArchives`, `WomensArchivesRecordSetType`, `WomensArchivesRecordSetTypes`, `WordCount`, `WorkRevision`, `WorldCatIdentifier`, `WorldHeritageSite`, `WritingSystem`, `XPath`, `XPathScore`, `YoutubeChannel`, `YoutubeComment`, `YoutubeTranscript`, and `YoutubeVideo` to enhance readability and precision.
- Adjusted mappings and slot usage in various schemas to align with updated descriptions and improve data structure.
- Added new synonyms in multiple languages for better localization support.
2026-02-16 15:53:42 +01:00
kempersc
66adec257e Add scripts for normalizing LinkML schemas and validating schema integrity
- Implement `normalize_linkml_alt_descriptions.py` to convert structured alt_descriptions to the expected scalar form.
- Implement `normalize_linkml_structured_aliases.py` to flatten language-keyed structured_aliases into a standard list-of-objects format.
- Implement `validate_linkml_schema_integrity.py` to validate the integrity of LinkML schema bundles, checking for import resolution, YAML parsing, and reference existence.
2026-02-16 10:16:51 +01:00
kempersc
fcd1c21c63 Add aliases and enhance slot definitions across various modules
- Added new aliases for existing slots to improve clarity and usability, including:
  - has_deadline: has_embargo_end_date
  - has_extent: has_extent_text
  - has_fonds: has_fond
  - has_laboratory: conservation_lab
  - has_language: has_iso_code639_1, has_iso_code639_3
  - has_legal_basis: legal_basis
  - has_light_exposure: max_light_lux
  - has_measurement_unit: has_unit
  - has_note: has_custodian_observation
  - has_occupation: occupation
  - has_operating_hours: has_operating_hours
  - has_position: position
  - has_quantity: has_artwork_count, link_count
  - has_roadmap: review_date
  - has_skill: skill
  - has_speaker: speaker_label
  - has_specification: specification_url
  - has_statement: rights_statement_url, rights_statement
  - has_type: custodian_only
  - has_user_category: serves_visitors_only
  - hold_record_set: record_count
  - identified_by: has_index_number
  - in_period: has_period
  - in_place: has_place
  - in_series: has_series
  - measure: has_measurement
  - measured_on: measurement_date
  - organized_by: has_organizer
  - originate_from: has_origin
  - part_of: suborganization_of
  - published_on: has_publication_date
  - receive_investment: has_investment
  - related_to: connection_heritage_type
  - require: preservation_requirement
  - safeguarded_by: current_keeper, record_holder_note
  - state: states_or_stated
  - take_comission: takes_or_took_comission
  - take_place_at: takes_or_took_place_at
  - transmit_through: transmits_or_transmitted_through
  - warrant: warrants_or_warranted

- Introduced a new slot definition for evaluated_through to capture evaluation methodologies and review statuses.
2026-02-14 14:41:49 +01:00
kempersc
820d3969bb Refactor code structure for improved readability and maintainability 2026-02-11 12:11:59 +01:00
kempersc
d3a65a496c Refactor slot names and descriptions across multiple YAML files for consistency and clarity
- Updated slot names to improve semantic clarity:
  - `has_type` changed to `categorized_as`
  - `has_location` changed to `located_at`
  - `coordinates` changed to `has_coordinates`
  - `country` changed to `in_country`
  - `like_count` changed to `has_quantity`

- Adjusted descriptions and annotations for slots to enhance understanding and alignment with ontology standards.

- Modified imports in `WomensArchives.yaml` and `WomensArchivesRecordSetTypes.yaml` to reflect new slot names.

- Enhanced multilingual support in `has_record_set` slot definition with additional translations and structured aliases.

- General cleanup and standardization of slot definitions across various classes including `Wikidata`, `Youtube`, and `WorkExperience`.
2026-02-11 11:54:34 +01:00
kempersc
8f77f62585 update slot imports in classes 2026-02-08 19:22:13 +01:00
kempersc
90842851c2 Add slot definitions for 'updated_at' and 'written_in' with multilingual support and ontology alignment
- Created 'updated_at.yaml' to record the last modified date and time of entities, including multilingual descriptions and structured aliases.
- Created 'written_in.yaml' to specify the language in which content is composed, covering both natural and programming languages, with detailed comments and close ontology mappings.
2026-02-07 11:22:05 +01:00
kempersc
6435786556 edit slots 2026-02-04 00:24:46 +01:00
kempersc
a83f04d9c4 Refactor code structure for improved readability and maintainability 2026-02-02 15:57:17 +01:00
kempersc
fc405445c6 Refactor and update schema definitions
- Removed obsolete slots: `has_or_had_custodian_observation`, `provider`, and `specificity_annotation`.
- Updated `has_or_had_score` slot to use `SpecificityScore` class and modified its description and examples.
- Added new slots: `end_seconds`, `end_time`, `has_archive_path`, `has_or_had_custodian_name`, `protocol_name`, and `protocol_version`.
- Introduced a script `check_annotation_types.py` to validate the presence and structure of `custodian_types` in YAML files.
- Added a script `update_specificity.py` to automate updates related to `SpecificityAnnotation` to `SpecificityScore`.
2026-02-01 19:55:38 +01:00
kempersc
7e0622c755 chore: clean up code structure and remove redundant changes 2026-01-31 21:24:53 +01:00
kempersc
787ff6a8d8 Update manifest.json timestamp and refactor YAML schemas for improved clarity and consistency 2026-01-31 18:14:51 +01:00
kempersc
ca4a54181e Refactor schema files to improve clarity and maintainability
- Updated WorldCatIdentifier.yaml to remove unnecessary description and ensure consistent formatting.
- Enhanced WorldHeritageSite.yaml by breaking long description into multiple lines for better readability and removed unused attributes.
- Simplified WritingSystem.yaml by removing redundant attributes and ensuring consistent formatting.
- Cleaned up XPathScore.yaml by removing unnecessary attributes and ensuring consistent formatting.
- Improved YoutubeChannel.yaml by breaking long description into multiple lines for better readability.
- Enhanced YoutubeEnrichment.yaml by breaking long description into multiple lines for better readability.
- Updated YoutubeVideo.yaml to break long description into multiple lines and removed legacy field name.
- Refined has_or_had_affiliation.yaml by removing unnecessary comments and ensuring clarity.
- Cleaned up is_or_was_retrieved_at.yaml by removing unnecessary comments and ensuring clarity.
- Added rules for generic slots and avoiding rough edits in schema files to maintain structural integrity.
- Introduced changes_or_changed_through.yaml to define a new slot for linking entities to change events.
2026-01-31 00:46:23 +01:00
kempersc
4034c2a00a Refactor schema slots across multiple classes to improve consistency and clarity
- Removed unused slots from TaxonomicAuthority, TechnicalFeature, TelevisionArchive, TentativeWorldHeritageSite, Threat, TimeSpan, Title, TradeRegister, TradeUnionArchive, TradeUnionArchiveRecordSetType, TransferEvent, UNESCODomain, UnitIdentifier, UniversityArchive, UnspecifiedType, UserCommunity, Venue, Vereinsarchiv, Verlagsarchiv, VerlagsarchivRecordSetType, Version, Verwaltungsarchiv, VideoAnnotationTypes, VideoAudioAnnotation, VideoFrame, VideoPost, VideoSubtitle, VideoTextContent, Warehouse, WebArchive, WebClaim, WebClaimsBlock, WebLink, WebPortal, WebPortalTypes, WomensArchives, WordCount, WorldHeritageSite, WritingSystem, and XPathScore.
- Introduced new slot is_or_was_retrieved_at for tracking data retrieval timestamps.
2026-01-31 00:28:09 +01:00
kempersc
6203d19875 Refactor YAML schemas for clarity and consistency
- Removed unnecessary line breaks and whitespace in descriptions across multiple classes including Taxon, TaxonomicAuthority, TechnicalFeature, TradeRegister, TransferEvent, UNESCODomain, UnspecifiedType, UserCommunity, Version, VideoAnnotationTypes, VideoFrame, VideoTextContent, WebArchive, WebClaimsBlock, WebLink, WebPortal, and WordCount.
- Updated descriptions to enhance readability and maintain a uniform style.
- Migrated attributes and slots as per the latest schema rules, ensuring alignment with the defined standards.
- Improved documentation for better understanding of class purposes and usage scenarios.
2026-01-31 00:21:50 +01:00
kempersc
0c5211e40a removed convenience slots 2026-01-31 00:15:53 +01:00
kempersc
14375c583e added hidden slots 2026-01-30 23:56:19 +01:00
kempersc
1f8776bef4 Update schemas and slots with new mappings and descriptions
- Updated manifest.json with new generated timestamp.
- Added close mappings to APIRequest and Administration classes.
- Renamed slots in AccessPolicy to has_or_had_embargo_end_date and has_or_had_embargo_reason.
- Changed class_uri for Accumulation to rico:AccumulationRelation and updated description.
- Added exact mappings to Altitude, AppellationType, and ArchitecturalStyle classes.
- Removed deprecated slots from CollectionManagementSystem and updated has_or_had_type.
- Added new slots for has_or_had_embargo_end_date and has_or_had_embargo_reason.
- Updated slot definitions for has_or_had_assessment, has_or_had_sequence_index, and others with new URIs and mappings.
- Removed unused slots end_seconds and end_time.
- Added new slot definitions for has_or_had_exhibition_type, has_or_had_extent_text, and is_or_was_documented_by.
2026-01-29 13:33:23 +01:00
kempersc
c60b523f29 Implement feature X to enhance user experience and fix bug Y in module Z 2026-01-29 00:12:27 +01:00
kempersc
f800e198ff Refactor code structure for improved readability and maintainability 2026-01-28 01:11:55 +01:00
kempersc
73b2d21bb3 Refactor code structure for improved readability and maintainability 2026-01-26 23:48:27 +01:00
kempersc
3da90b940e feat(schema): complete multiple slot_fixes.yaml migrations
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 2m4s
Session 2026-01-19: Completed remaining migrations per Rules 53/56/60.

Major migrations:
1. claim_type → has_or_had_type + ClaimType/ClaimTypes (60+ concrete types in 11 categories)
2. circumstances_of_death → is_deceased + DeceasedStatus + CauseOfDeath
3. claims_count → has_or_had_quantity + Quantity (with based_on_claim for provenance)
4. classification_status → has_or_had_type + ClassificationStatusType

Created files:
- ClaimType.yaml, ClaimTypes.yaml (abstract base + 60+ concrete subclasses)
- DeceasedStatus.yaml, CauseOfDeath.yaml, CauseOfDeathTypeEnum.yaml
- ClassificationStatus.yaml, ClassificationStatusType.yaml, ClassificationStatusTypes.yaml
- CITESAppendix.yaml, City.yaml, CertaintyLevel.yaml
- is_deceased.yaml, is_or_was_caused_by.yaml, based_on_claim.yaml

Archived slots:
- claim_type, circumstances_of_death, claims_count, classification_status

Added Rule 60 to AGENTS.md: No Migration Deferral - agents MUST execute all migrations.

All 527 slot_fixes.yaml entries now complete (100%).
2026-01-19 13:05:53 +01:00
kempersc
4319f38c05 Add archived slots for audience size, audience type, and capacity metrics
- Created new YAML files for audience size and audience type slots, defining their properties and annotations.
- Added archived capacity slots including cubic meters, linear meters, item count, and descriptions, with appropriate URIs and ranges.
- Introduced a template specificity slot for context-aware RAG filtering.
- Consolidated capacity-related slots into a unified structure, including has_or_had_capacity, capacity_type, and capacity_value, with detailed descriptions and examples.
2026-01-17 18:53:23 +01:00
kempersc
f9f3cc8e74 fix: resolve YAML import indentation and add missing slot descriptions
Schema Improvements:
- Fix YAML import indentation across 800+ class files (sed: '^- ../' → '  - ../')
- Add descriptions to 26 inline slots missing them (lint warnings)
- Fix malformed imports in BirthPlace.yaml and CustodianObservation.yaml

Validation Results:
- linkml-lint: 4 warnings (intentional SCREAMING_CASE tier names)
- gen-owl: SUCCESS (164,069 lines generated)
- gen-json-schema: SUCCESS (9.4MB generated)

Files affected: 1,034 files, +23,908 -15,200 lines
2026-01-16 00:09:28 +01:00
kempersc
6c3fa6b5a3 Remove deprecated slots and add new slot definitions for enhanced data modeling
- Deleted obsolete slot definitions for work_location and workshop_space.
- Introduced new TaxonName class to represent scientific taxonomic names with detailed attributes.
- Archived existing slots related to surname_prefix, target_name, taxon_name, terminal_count, text_region_count, title, title_proper, total_chapter, total_characters_extracted, total_connections_extracted, track_name, transcript_format, traveling_venue, type_label, type_status, typical_responsibility, unesco_domain, unesco_inscription_year, unesco_list_status, uniform_title, unit_name, used_by_custodian, uv_filtered_required, valid_from_geo, valid_to_geo, validation_status, variant_of_name, verification_date, viability_status, within_auxiliary_place, and within_place.
- Updated slot descriptions and structures to improve clarity and compliance with standards.
2026-01-15 11:42:35 +01:00
kempersc
d5d970b513 Remove deprecated slot definitions and add archived versions for future reference
- Deleted the following slot definitions:
  - wikidata_class_slot
  - wikidata_entity_label_slot
  - wikidata_mapping_rationale_slot
  - word_count_slot

- Added archived versions of the deleted slots to preserve historical data:
  - wikidata_class_archived_20260114.yaml
  - wikidata_entity_label_archived_20260114.yaml
  - wikidata_mapping_rationale_archived_20260114.yaml
  - word_count_archived_20260114.yaml

- Introduced a new hook `usePersonSearch` for enhanced semantic search functionality in the frontend, supporting debounced queries and caching.
2026-01-14 22:57:09 +01:00
kempersc
ad5fbe82cf Migrate valid_from and valid_to slots to temporal_extent across multiple classes
- Consolidated valid_from and valid_to slots into a single temporal_extent slot in FundingRequirement, GiftShop, OrganizationBranch, OrganizationalChangeEvent, OrganizationalStructure, SocialMediaProfile, Storage, StorageUnit classes.
- Updated slot definitions to use TimeSpan for temporal_extent, providing structured validity periods.
- Removed deprecated slots: valid_from, valid_to, verified_by, wikidata_entity_id, and worldcat_id, archiving their definitions for reference.
- Adjusted related documentation and examples to reflect the new temporal_extent structure.
2026-01-14 22:33:36 +01:00
kempersc
355d8be51d centralise slots 2026-01-12 14:33:56 +01:00
kempersc
0d5d48568d refactor(schema): centralize slot definitions per Rule 38
- Remove slot_uri, description, mappings from slot_usage sections
- Move these properties to centralized slot files in modules/slots/
- Keep only class-specific overrides in slot_usage (required, inlined, examples)
- Update 1,499 centralized slot files with enriched definitions
- Clean 188 class files

Violations fixed:
- slot_uri in slot_usage: 1,676 → 0
- description in slot_usage: 2,287 → 0 (moved to centralized)

Schema still validates: 816 classes, 2028 slots, 127 enums
2026-01-11 23:27:17 +01:00
kempersc
5d3d8530b0 chore: trigger DSPy eval workflow
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 4m13s
2026-01-11 22:40:23 +01:00
kempersc
174a420c08 refactor(schema): centralize 1515 inline slot definitions per Rule 48
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m57s
- Remove inline slot definitions from 144 class files
- Create 7 new centralized slot files in modules/slots/:
  - custodian_type_broader.yaml
  - custodian_type_narrower.yaml
  - custodian_type_related.yaml
  - definition.yaml
  - finding_aid_access_restriction.yaml
  - finding_aid_description.yaml
  - finding_aid_temporal_coverage.yaml
- Add centralize_inline_slots.py automation script
- Update manifest with new timestamp

Rule 48: Class files must NOT define inline slots - all slots
must be imported from modules/slots/ directory.

Note: Pre-existing IdentifierFormat duplicate class definition
(in Standard.yaml and IdentifierFormat.yaml) not addressed in
this commit - requires separate schema refactor.
2026-01-11 22:02:14 +01:00
kempersc
0393b321c9 refactor(schema): unify custodian_type slots into has_or_had_custodian_type (Rule 39, 43)
- Migrate 236+ class files from custodian_types to has_or_had_custodian_type
- Archive deprecated slots: custodian_type, custodian_types, custodian_type_broader/narrower/related
- Update main schema and manifest imports
- Fix Custodian.yaml class to use new slot
- Fix annotation format (list→scalar) in has_or_had_custodian_type.yaml

Rules applied:
- Rule 39: RiC-O naming convention (hasOrHad pattern)
- Rule 43: Slot nouns must be singular (multivalued:true for cardinality)
- Rule 38: Slot centralization with semantic URI
2026-01-09 10:55:21 +01:00
kempersc
b34992b1d3 Migrate all 293 class files to ontology-aligned slots
Extends migration to all class types (museums, libraries, galleries, etc.)

New slots added to class_metadata_slots.yaml:
- RiC-O: rico_record_set_type, rico_organizational_principle,
  rico_has_or_had_holder, rico_note
- Multilingual: label_de, label_es, label_fr, label_nl, label_it, label_pt
- Scope: scope_includes, scope_excludes, custodian_only,
  organizational_level, geographic_restriction
- Notes: privacy_note, preservation_note, legal_note

Migration script now handles 30+ annotation types.
All migrated schemas pass linkml-validate.

Total: 387 class files now use proper slots instead of annotations.
2026-01-06 12:24:54 +01:00
kempersc
11983014bb Enhance specificity scoring system integration with existing infrastructure
- Updated documentation to clarify integration points with existing components in the RAG pipeline and DSPy framework.
- Added detailed mapping of SPARQL templates to context templates for improved specificity filtering.
- Implemented wrapper patterns around existing classifiers to extend functionality without duplication.
- Introduced new tests for the SpecificityAwareClassifier and SPARQLToContextMapper to ensure proper integration and functionality.
- Enhanced the CustodianRDFConverter to include ISO country and subregion codes from GHCID for better geospatial data handling.
2026-01-05 17:37:49 +01:00
kempersc
51554947a0 feat(schema): Add video content schema with comprehensive examples
Video Schema Classes (9 files):
- VideoPost, VideoComment: Social media video modeling
- VideoTextContent: Base class for text content extraction
- VideoTranscript, VideoSubtitle: Text with timing and formatting
- VideoTimeSegment: Time code handling with ISO 8601 duration
- VideoAnnotation: Base annotation with W3C Web Annotation alignment
- VideoAnnotationTypes: Scene, Object, OCR detection annotations
- VideoChapter, VideoChapterList: Navigation and chapter structure
- VideoAudioAnnotation: Speaker diarization, music, sound events

Enumerations (12 enums):
- VideoDefinitionEnum, LiveBroadcastStatusEnum
- TranscriptFormatEnum, SubtitleFormatEnum, SubtitlePositionEnum
- AnnotationTypeEnum, AnnotationMotivationEnum
- DetectionLevelEnum, SceneTypeEnum, TransitionTypeEnum, TextTypeEnum
- ChapterSourceEnum, AudioEventTypeEnum, SoundEventTypeEnum, MusicTypeEnum

Examples (904 lines, 10 comprehensive heritage-themed examples):
- Rijksmuseum virtual tour chapters (5 chapters with heritage entity refs)
- Operation Night Watch documentary chapters (5 chapters)
- VideoAudioAnnotation: curator interview, exhibition promo, museum lecture

All examples reference real heritage entities with Wikidata IDs:
Q5598 (Rembrandt), Q41264 (Vermeer), Q219831 (The Night Watch)
2025-12-16 20:03:17 +01:00