Commit graph

32 commits

Author SHA1 Message Date
kempersc
a590a8d94b Refactor and enhance descriptions across multiple YAML schemas for improved clarity and consistency.
- Updated descriptions in `WikidataOrganization`, `WikidataRecognition`, `WikidataResolvedEntities`, `WikidataSitelinks`, `WikidataSocialMedia`, `WikidataTemporal`, `WikidataTimeValue`, `WikidataWeb`, `WomensArchives`, `WomensArchivesRecordSetType`, `WomensArchivesRecordSetTypes`, `WordCount`, `WorkRevision`, `WorldCatIdentifier`, `WorldHeritageSite`, `WritingSystem`, `XPath`, `XPathScore`, `YoutubeChannel`, `YoutubeComment`, `YoutubeTranscript`, and `YoutubeVideo` to enhance readability and precision.
- Adjusted mappings and slot usage in various schemas to align with updated descriptions and improve data structure.
- Added new synonyms in multiple languages for better localization support.
2026-02-16 15:53:42 +01:00
kempersc
66adec257e Add scripts for normalizing LinkML schemas and validating schema integrity
- Implement `normalize_linkml_alt_descriptions.py` to convert structured alt_descriptions to the expected scalar form.
- Implement `normalize_linkml_structured_aliases.py` to flatten language-keyed structured_aliases into a standard list-of-objects format.
- Implement `validate_linkml_schema_integrity.py` to validate the integrity of LinkML schema bundles, checking for import resolution, YAML parsing, and reference existence.
2026-02-16 10:16:51 +01:00
kempersc
bd8368dfff Enhance schema definitions for cultural heritage annotations
- Updated WorldCatIdentifier class with improved descriptions and multilingual support.
- Refined WorldHeritageSite class to clarify its purpose and added structured aliases.
- Enhanced WritingSystem class with detailed descriptions and examples for script identification.
- Introduced BirthPlace class to represent birth locations with historical context and geographic identifiers.
- Added AnnotatorAnnotationMetadata class for quality metrics and verification of cultural heritage annotations.
- Created AnnotatorAnnotationProvenance class to track provenance of annotation activities.
- Developed AnnotatorBlock class to aggregate entity claims and metadata for annotations.
- Established AnnotatorEntityClaim class for individual assertions about cultural heritage entities.
- Introduced AnnotatorEntityClassification class for taxonomic categorization of entities.
- Added AnnotatorIntegrationNote class to document file creation and integration processes.
- Developed AnnotatorModel class for machine learning models used in entity extraction.
- Created AnnotatorProvenance class to track extraction provenance and source data access.
2026-02-15 12:41:51 +01:00
kempersc
fcd1c21c63 Add aliases and enhance slot definitions across various modules
- Added new aliases for existing slots to improve clarity and usability, including:
  - has_deadline: has_embargo_end_date
  - has_extent: has_extent_text
  - has_fonds: has_fond
  - has_laboratory: conservation_lab
  - has_language: has_iso_code639_1, has_iso_code639_3
  - has_legal_basis: legal_basis
  - has_light_exposure: max_light_lux
  - has_measurement_unit: has_unit
  - has_note: has_custodian_observation
  - has_occupation: occupation
  - has_operating_hours: has_operating_hours
  - has_position: position
  - has_quantity: has_artwork_count, link_count
  - has_roadmap: review_date
  - has_skill: skill
  - has_speaker: speaker_label
  - has_specification: specification_url
  - has_statement: rights_statement_url, rights_statement
  - has_type: custodian_only
  - has_user_category: serves_visitors_only
  - hold_record_set: record_count
  - identified_by: has_index_number
  - in_period: has_period
  - in_place: has_place
  - in_series: has_series
  - measure: has_measurement
  - measured_on: measurement_date
  - organized_by: has_organizer
  - originate_from: has_origin
  - part_of: suborganization_of
  - published_on: has_publication_date
  - receive_investment: has_investment
  - related_to: connection_heritage_type
  - require: preservation_requirement
  - safeguarded_by: current_keeper, record_holder_note
  - state: states_or_stated
  - take_comission: takes_or_took_comission
  - take_place_at: takes_or_took_place_at
  - transmit_through: transmits_or_transmitted_through
  - warrant: warrants_or_warranted

- Introduced a new slot definition for evaluated_through to capture evaluation methodologies and review statuses.
2026-02-14 14:41:49 +01:00
kempersc
820d3969bb Refactor code structure for improved readability and maintainability 2026-02-11 12:11:59 +01:00
kempersc
69a22e2b5a Refactor and expand LinkML slot definitions
- Deleted the `rights_statement_url` slot definition as it is no longer needed.
- Added multiple new slots including `has_legal_basis`, `has_statement`, `impose`, `pose_condition`, and `reviewed_through` with detailed descriptions and ontology alignments.
- Updated existing slots to improve clarity and consistency, including renaming `close_mappings` to `related_mappings` in several definitions.
- Enhanced the `require` slot with additional aliases for better usability.
- Improved documentation and comments across all slot definitions to clarify their purpose and usage.
2026-02-08 23:37:44 +01:00
kempersc
8f77f62585 update slot imports in classes 2026-02-08 19:22:13 +01:00
kempersc
90842851c2 Add slot definitions for 'updated_at' and 'written_in' with multilingual support and ontology alignment
- Created 'updated_at.yaml' to record the last modified date and time of entities, including multilingual descriptions and structured aliases.
- Created 'written_in.yaml' to specify the language in which content is composed, covering both natural and programming languages, with detailed comments and close ontology mappings.
2026-02-07 11:22:05 +01:00
kempersc
6435786556 edit slots 2026-02-04 00:24:46 +01:00
kempersc
a83f04d9c4 Refactor code structure for improved readability and maintainability 2026-02-02 15:57:17 +01:00
kempersc
fc405445c6 Refactor and update schema definitions
- Removed obsolete slots: `has_or_had_custodian_observation`, `provider`, and `specificity_annotation`.
- Updated `has_or_had_score` slot to use `SpecificityScore` class and modified its description and examples.
- Added new slots: `end_seconds`, `end_time`, `has_archive_path`, `has_or_had_custodian_name`, `protocol_name`, and `protocol_version`.
- Introduced a script `check_annotation_types.py` to validate the presence and structure of `custodian_types` in YAML files.
- Added a script `update_specificity.py` to automate updates related to `SpecificityAnnotation` to `SpecificityScore`.
2026-02-01 19:55:38 +01:00
kempersc
ca4a54181e Refactor schema files to improve clarity and maintainability
- Updated WorldCatIdentifier.yaml to remove unnecessary description and ensure consistent formatting.
- Enhanced WorldHeritageSite.yaml by breaking long description into multiple lines for better readability and removed unused attributes.
- Simplified WritingSystem.yaml by removing redundant attributes and ensuring consistent formatting.
- Cleaned up XPathScore.yaml by removing unnecessary attributes and ensuring consistent formatting.
- Improved YoutubeChannel.yaml by breaking long description into multiple lines for better readability.
- Enhanced YoutubeEnrichment.yaml by breaking long description into multiple lines for better readability.
- Updated YoutubeVideo.yaml to break long description into multiple lines and removed legacy field name.
- Refined has_or_had_affiliation.yaml by removing unnecessary comments and ensuring clarity.
- Cleaned up is_or_was_retrieved_at.yaml by removing unnecessary comments and ensuring clarity.
- Added rules for generic slots and avoiding rough edits in schema files to maintain structural integrity.
- Introduced changes_or_changed_through.yaml to define a new slot for linking entities to change events.
2026-01-31 00:46:23 +01:00
kempersc
0c5211e40a removed convenience slots 2026-01-31 00:15:53 +01:00
kempersc
14375c583e added hidden slots 2026-01-30 23:56:19 +01:00
kempersc
7cf10084b4 Implement scripts for schema modifications and ontology verification
- Added `fix_dual_class_link.py` to remove dual class link references from specified YAML files.
- Created `fix_specific_ghosts.py` to apply specific replacements in YAML files based on defined mappings.
- Introduced `migrate_staff_count.py` to migrate staff count references to a new structure in specified YAML files.
- Developed `migrate_type_slots.py` to replace type-related slots with new identifiers across YAML files.
- Implemented `scan_ghost_references.py` to identify and report ghost references to archived slots and classes in YAML files.
- Added `verify_ontology_terms.py` to verify the presence of ontology terms in specified ontology files against schema definitions.
2026-01-29 17:10:25 +01:00
kempersc
c60b523f29 Implement feature X to enhance user experience and fix bug Y in module Z 2026-01-29 00:12:27 +01:00
kempersc
7ea7e3d0d7 feat: Add new ontology and schema classes for Heritage and related concepts
- Introduced new classes: Heritage, HeritagePractice, HeritageRelevanceAssessment, HeritageRelevanceScore, HolySiteType, Mandate.
- Added slots for heritage-related attributes including has_or_had_confidence_measure, has_or_had_related_heritage_form, heritage_education, heritage_employer, heritage_mandate, heritage_practice, and more.
- Migrated existing attributes and ensured compliance with RiC-O naming conventions.
- Enhanced documentation and descriptions for clarity and usability.
- Archived previous versions of slots and classes to maintain schema integrity.
2026-01-28 08:06:56 +01:00
kempersc
f800e198ff Refactor code structure for improved readability and maintainability 2026-01-28 01:11:55 +01:00
kempersc
73b2d21bb3 Refactor code structure for improved readability and maintainability 2026-01-26 23:48:27 +01:00
kempersc
9342919c79 Add archived slot definitions for various attributes
- Introduced dual_class_role, emic_name, employer_linkedin_url, employer_name, employment_dates_raw, employment_end_date, employment_start_date, end_date, end_seconds, end_time, ended_at_time, endowment_draw, engagement_rate, enriched_date, enrichment_metadata_whatsapp, enrichment_method_whatsapp, exhibition_timespan, has_timespan, policy_effective_from, policy_effective_to, start_date, can_or_could_be_retrieved_from, documents_or_documented, has_or_had_contributor, has_or_had_drawer, has_or_had_email, has_or_had_endowment_draw, has_or_had_engagement_metric, has_or_had_metadata, has_or_had_summary, is_or_was_employed_by, and is_or_was_expired_at slots.
- Each slot includes detailed descriptions, ranges, and mappings to ensure compliance with ontology standards.
2026-01-26 17:32:24 +01:00
kempersc
4319f38c05 Add archived slots for audience size, audience type, and capacity metrics
- Created new YAML files for audience size and audience type slots, defining their properties and annotations.
- Added archived capacity slots including cubic meters, linear meters, item count, and descriptions, with appropriate URIs and ranges.
- Introduced a template specificity slot for context-aware RAG filtering.
- Consolidated capacity-related slots into a unified structure, including has_or_had_capacity, capacity_type, and capacity_value, with detailed descriptions and examples.
2026-01-17 18:53:23 +01:00
kempersc
f9f3cc8e74 fix: resolve YAML import indentation and add missing slot descriptions
Schema Improvements:
- Fix YAML import indentation across 800+ class files (sed: '^- ../' → '  - ../')
- Add descriptions to 26 inline slots missing them (lint warnings)
- Fix malformed imports in BirthPlace.yaml and CustodianObservation.yaml

Validation Results:
- linkml-lint: 4 warnings (intentional SCREAMING_CASE tier names)
- gen-owl: SUCCESS (164,069 lines generated)
- gen-json-schema: SUCCESS (9.4MB generated)

Files affected: 1,034 files, +23,908 -15,200 lines
2026-01-16 00:09:28 +01:00
kempersc
6c3fa6b5a3 Remove deprecated slots and add new slot definitions for enhanced data modeling
- Deleted obsolete slot definitions for work_location and workshop_space.
- Introduced new TaxonName class to represent scientific taxonomic names with detailed attributes.
- Archived existing slots related to surname_prefix, target_name, taxon_name, terminal_count, text_region_count, title, title_proper, total_chapter, total_characters_extracted, total_connections_extracted, track_name, transcript_format, traveling_venue, type_label, type_status, typical_responsibility, unesco_domain, unesco_inscription_year, unesco_list_status, uniform_title, unit_name, used_by_custodian, uv_filtered_required, valid_from_geo, valid_to_geo, validation_status, variant_of_name, verification_date, viability_status, within_auxiliary_place, and within_place.
- Updated slot descriptions and structures to improve clarity and compliance with standards.
2026-01-15 11:42:35 +01:00
kempersc
355d8be51d centralise slots 2026-01-12 14:33:56 +01:00
kempersc
0d5d48568d refactor(schema): centralize slot definitions per Rule 38
- Remove slot_uri, description, mappings from slot_usage sections
- Move these properties to centralized slot files in modules/slots/
- Keep only class-specific overrides in slot_usage (required, inlined, examples)
- Update 1,499 centralized slot files with enriched definitions
- Clean 188 class files

Violations fixed:
- slot_uri in slot_usage: 1,676 → 0
- description in slot_usage: 2,287 → 0 (moved to centralized)

Schema still validates: 816 classes, 2028 slots, 127 enums
2026-01-11 23:27:17 +01:00
kempersc
174a420c08 refactor(schema): centralize 1515 inline slot definitions per Rule 48
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m57s
- Remove inline slot definitions from 144 class files
- Create 7 new centralized slot files in modules/slots/:
  - custodian_type_broader.yaml
  - custodian_type_narrower.yaml
  - custodian_type_related.yaml
  - definition.yaml
  - finding_aid_access_restriction.yaml
  - finding_aid_description.yaml
  - finding_aid_temporal_coverage.yaml
- Add centralize_inline_slots.py automation script
- Update manifest with new timestamp

Rule 48: Class files must NOT define inline slots - all slots
must be imported from modules/slots/ directory.

Note: Pre-existing IdentifierFormat duplicate class definition
(in Standard.yaml and IdentifierFormat.yaml) not addressed in
this commit - requires separate schema refactor.
2026-01-11 22:02:14 +01:00
kempersc
626bd3a095 refactor(schemas): apply naming conventions to 261 class files
- Apply Rule 39: RiC-O style hasOrHad*/isOrWas* for temporal slots
- Apply Rule 43: Singular noun convention (keywords → keyword)
- Update slot references to match renamed slot files
- Maintain schema integrity across all class definitions
2026-01-10 15:36:33 +01:00
kempersc
0393b321c9 refactor(schema): unify custodian_type slots into has_or_had_custodian_type (Rule 39, 43)
- Migrate 236+ class files from custodian_types to has_or_had_custodian_type
- Archive deprecated slots: custodian_type, custodian_types, custodian_type_broader/narrower/related
- Update main schema and manifest imports
- Fix Custodian.yaml class to use new slot
- Fix annotation format (list→scalar) in has_or_had_custodian_type.yaml

Rules applied:
- Rule 39: RiC-O naming convention (hasOrHad pattern)
- Rule 43: Slot nouns must be singular (multivalued:true for cardinality)
- Rule 38: Slot centralization with semantic URI
2026-01-09 10:55:21 +01:00
kempersc
dfa667c90f Fix LinkML schema for valid RDF generation with proper slot_uri
Summary:
- Create 46 missing slot definition files with proper slot_uri values
- Add slot imports to main schema (01_custodian_name_modular.yaml)
- Fix YAML examples sections in 116+ class and slot files
- Fix PersonObservation.yaml examples section (nested objects → string literals)

Technical changes:
- All slots now have explicit slot_uri mapping to base ontologies (RiC-O, Schema.org, SKOS)
- Eliminates malformed URIs like 'custodian/:slot_name' in generated RDF
- gen-owl now produces valid Turtle with 153,166 triples

New slot files (46):
- RiC-O slots: rico_note, rico_organizational_principle, rico_has_or_had_holder, etc.
- Scope slots: scope_includes, scope_excludes, archive_scope
- Organization slots: organization_type, governance_authority, area_served
- Platform slots: platform_type_category, portal_type_category
- Social media slots: social_media_platform_category, post_type_*
- Type hierarchy slots: broader_type, narrower_types, custodian_type_broader
- Wikidata slots: wikidata_equivalent, wikidata_mapping

Generated output:
- schemas/20251121/rdf/01_custodian_name_modular_20260107_134534_clean.owl.ttl (6.9MB)
- Validated with rdflib: 153,166 triples, no malformed URIs
2026-01-07 13:48:03 +01:00
kempersc
b34992b1d3 Migrate all 293 class files to ontology-aligned slots
Extends migration to all class types (museums, libraries, galleries, etc.)

New slots added to class_metadata_slots.yaml:
- RiC-O: rico_record_set_type, rico_organizational_principle,
  rico_has_or_had_holder, rico_note
- Multilingual: label_de, label_es, label_fr, label_nl, label_it, label_pt
- Scope: scope_includes, scope_excludes, custodian_only,
  organizational_level, geographic_restriction
- Notes: privacy_note, preservation_note, legal_note

Migration script now handles 30+ annotation types.
All migrated schemas pass linkml-validate.

Total: 387 class files now use proper slots instead of annotations.
2026-01-06 12:24:54 +01:00
kempersc
11983014bb Enhance specificity scoring system integration with existing infrastructure
- Updated documentation to clarify integration points with existing components in the RAG pipeline and DSPy framework.
- Added detailed mapping of SPARQL templates to context templates for improved specificity filtering.
- Implemented wrapper patterns around existing classifiers to extend functionality without duplication.
- Introduced new tests for the SpecificityAwareClassifier and SPARQLToContextMapper to ensure proper integration and functionality.
- Enhanced the CustodianRDFConverter to include ISO country and subregion codes from GHCID for better geospatial data handling.
2026-01-05 17:37:49 +01:00
kempersc
767fb8ca80 feat(schema): Add LinkedIn profile and person modeling schema
Person Identity Classes:
- PersonName: Full name modeling with components (given_name, surname_prefix,
  base_surname, patronym, initials) following Dutch naming conventions
- PersonConnection: Professional network connections with heritage relevance scoring
- ConnectionNetwork: Network-level analysis and statistics

LinkedIn Profile Schema:
- LinkedInProfile: Complete professional profile structure
- WorkExperience: Employment history with heritage institution detection
- EducationCredential: Academic background and qualifications
- LanguageProficiency: Language skills with ISO 639-1 codes

Supporting Classes:
- ExtractionMetadata: Provenance tracking for extracted profile data
- HeritageRelevance: GLAMORCUBESFIXPHDNT type scoring and classification

Slots (17 person-related slots):
- Name components: given_name, base_surname, surname_prefix, patronym, initials
- Identity: age, birth_date, birth_place, death_place, gender_identity, pronouns
- Professional: occupation, religion
- References: literal_name, name_specification, has_person_name, extraction_metadata

Enums:
- HeritageTypeEnum: GLAMORCUBESFIXPHDNT type codes for heritage relevance
2025-12-16 20:04:59 +01:00