Compare commits

...

55 commits

Author SHA1 Message Date
kempersc
fcb704c97e Update generated timestamp in manifest.json
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 2m18s
DSPy RAG Evaluation / Layer 1 - Unit Tests (push) Successful in 5m16s
DSPy RAG Evaluation / Layer 2 - DSPy Module Tests (push) Successful in 7m6s
DSPy RAG Evaluation / Layer 3 - Integration Tests (push) Successful in 5m21s
DSPy RAG Evaluation / Layer 4 - Comprehensive Evaluation (push) Successful in 6m28s
DSPy RAG Evaluation / Quality Gate (push) Successful in 1s
2026-01-29 17:55:50 +01:00
kempersc
1516d509cf Add metadata to LinkML class definitions and update prefixes
- Added `id`, `name`, `title`, and `description` fields to multiple LinkML class YAML files.
- Standardized prefixes across all class definitions.
- Introduced a new script `fix_linkml_metadata.py` to automate the addition of metadata to class files.
- Updated existing class files to ensure compliance with the new metadata structure.
2026-01-29 17:40:47 +01:00
kempersc
7cf10084b4 Implement scripts for schema modifications and ontology verification
- Added `fix_dual_class_link.py` to remove dual class link references from specified YAML files.
- Created `fix_specific_ghosts.py` to apply specific replacements in YAML files based on defined mappings.
- Introduced `migrate_staff_count.py` to migrate staff count references to a new structure in specified YAML files.
- Developed `migrate_type_slots.py` to replace type-related slots with new identifiers across YAML files.
- Implemented `scan_ghost_references.py` to identify and report ghost references to archived slots and classes in YAML files.
- Added `verify_ontology_terms.py` to verify the presence of ontology terms in specified ontology files against schema definitions.
2026-01-29 17:10:25 +01:00
kempersc
2a349b11bb Update generated timestamp in manifest.json 2026-01-29 13:33:56 +01:00
kempersc
1f8776bef4 Update schemas and slots with new mappings and descriptions
- Updated manifest.json with new generated timestamp.
- Added close mappings to APIRequest and Administration classes.
- Renamed slots in AccessPolicy to has_or_had_embargo_end_date and has_or_had_embargo_reason.
- Changed class_uri for Accumulation to rico:AccumulationRelation and updated description.
- Added exact mappings to Altitude, AppellationType, and ArchitecturalStyle classes.
- Removed deprecated slots from CollectionManagementSystem and updated has_or_had_type.
- Added new slots for has_or_had_embargo_end_date and has_or_had_embargo_reason.
- Updated slot definitions for has_or_had_assessment, has_or_had_sequence_index, and others with new URIs and mappings.
- Removed unused slots end_seconds and end_time.
- Added new slot definitions for has_or_had_exhibition_type, has_or_had_extent_text, and is_or_was_documented_by.
2026-01-29 13:33:23 +01:00
kempersc
c60b523f29 Implement feature X to enhance user experience and fix bug Y in module Z 2026-01-29 00:12:27 +01:00
kempersc
68c2274e5f Refactor IdentificationEvent and StorageConditionPolicy schemas; remove attributes from IdentificationEvent and update humidity settings in StorageConditionPolicy. Add comprehensive ghost slots list. 2026-01-28 15:58:44 +01:00
kempersc
7b4e113a5a Refactor Access schema and related components
- Updated Access.yaml to replace properties with new slots: has_or_had_type, has_or_had_user_category, condition_of_access, and has_or_had_description.
- Introduced AccessTypeEnum for standardized access types.
- Migrated eligible_users to has_or_had_user_category slot.
- Adjusted examples in Access.yaml to reflect new slot structure.
- Modified AuxiliaryPlace, CustodianPlace, GeoSpatialPlace, ServiceArea schemas to use has_or_had_format for geometry representation.
- Added new slots and enums for better categorization and access control.
2026-01-28 15:11:20 +01:00
kempersc
4a518f587c Update generated timestamp in manifest.json 2026-01-28 15:04:44 +01:00
kempersc
c1946e93f9 Refactor VideoPost and WebObservation schemas; remove deprecated slots and migrate to new structures
- Updated VideoPost.yaml to include new slots and remove deprecated ones, enhancing video-specific properties.
- Removed extraction_confidence from WebObservation.yaml, streamlining the schema.
- Deleted obsolete slot files: characteristics.yaml, class_definition.yaml, confidence.yaml, confidence_method.yaml, confidence_score.yaml, confidence_value.yaml, count.yaml, and hosts_branch.yaml.
- Introduced ghost_slots.txt to track unused slots.
- Archived previous versions of characteristics, class_definition, confidence, confidence_method, confidence_score, confidence_value, count, and hosts_branch slots for historical reference.
- Added new slots: has_or_had_citation, has_or_had_city_code, and is_or_was_location_of with appropriate descriptions and mappings.
2026-01-28 15:04:11 +01:00
kempersc
c51b3e1cbf Refactor chapter slots and migrate to generic slots
- Deleted obsolete chapter slots: chapter_description, chapter_end_seconds, chapter_end_time, chapter_id, chapter_index, chapter_source, chapter_start_seconds, chapter_start_time, humidity_tolerance, parent_chapter_id.
- Archived previous versions of deleted slots for reference.
- Introduced new generic slots: end_seconds, end_time, has_or_had_parent, has_or_had_sequence_index to streamline schema and improve consistency.
- Updated descriptions and mappings for new slots to ensure clarity and maintain functionality.
2026-01-28 12:16:48 +01:00
kempersc
d69227897b Refactor StorageConditionPolicy and manage catalogues_or_cataloged slot
- Updated StorageConditionPolicy.yaml to include additional slots related to storage conditions, such as particulate_max and pest_management_required, enhancing the policy's comprehensiveness.
- Removed the obsolete catalogues_or_cataloged.yaml file to streamline the schema.
- Introduced a new archived version of catalogues_or_cataloged in catalogues_or_cataloged_archived_20260128.yaml, preserving the original structure and annotations for future reference.
2026-01-28 12:06:05 +01:00
kempersc
f3c0586d09 Refactor schema and slots for improved clarity and organization
- Updated `has_or_had_url` slot to allow a broader range of values by changing its range from `uriorcurie` to `Any`.
- Removed obsolete slots: `house_number`, `html_file`, `html_snapshot_path`, and `http_status_code`.
- Introduced new classes: `CeasingEvent`, `FileLocation`, `FilePath`, `HTMLFile`, `HTTPStatusCode`, `HouseNumber`, `MaximumHumidity`, `MinimumHumidity`, `TargetHumidity`, and `WKT` to better represent various concepts.
- Migrated existing slots to new structures, ensuring alignment with RiC-O naming conventions.
- Added new slots: `ceases_or_ceased_through`, `has_or_had_file_location`, `has_or_had_file_path`, `has_or_had_http_status`, and `is_or_was_observed_by` to capture additional metadata.
- Enhanced descriptions and annotations for clarity and context.
2026-01-28 10:49:49 +01:00
kempersc
fa5779bfd4 Refactor schema and slot definitions for heritage-related entities
- Added new slot `is_or_was_used_in` to `DataServiceEndpointType` for tracking usage in the heritage sector.
- Replaced `heritage_society_subtype` with `has_or_had_hyponym` in `HeritageSocietyType`.
- Updated `HistoricBuilding` to use `has_or_had_status` instead of `heritage_status`, linking to the new `HeritageStatus` class.
- Removed deprecated slots: `heritage_relevant_count`, `heritage_relevant_percentage`, `heritage_sector_usage`, `heritage_society_subtype`, and `heritage_status`.
- Introduced new classes `Connection`, `HeritageSector`, and `HeritageStatus` to better structure heritage-related data.
- Migrated relevant descriptions and annotations to align with new schema standards.
- Updated slot definitions to improve clarity and consistency across the schema.
2026-01-28 09:43:41 +01:00
kempersc
7ea7e3d0d7 feat: Add new ontology and schema classes for Heritage and related concepts
- Introduced new classes: Heritage, HeritagePractice, HeritageRelevanceAssessment, HeritageRelevanceScore, HolySiteType, Mandate.
- Added slots for heritage-related attributes including has_or_had_confidence_measure, has_or_had_related_heritage_form, heritage_education, heritage_employer, heritage_mandate, heritage_practice, and more.
- Migrated existing attributes and ensured compliance with RiC-O naming conventions.
- Enhanced documentation and descriptions for clarity and usability.
- Archived previous versions of slots and classes to maintain schema integrity.
2026-01-28 08:06:56 +01:00
kempersc
7992e8abaa Remove deprecated slot definitions and introduce new slots for height, width, x and y coordinates with temporal predicates. Add new classes for assessment and audit status types, along with a dataset class. Archive previous slot definitions and ensure proper migration documentation for the new slots. Update links to datasets and their registration status. 2026-01-28 01:27:24 +01:00
kempersc
f800e198ff Refactor code structure for improved readability and maintainability 2026-01-28 01:11:55 +01:00
kempersc
8c42292235 Add new classes and slots to the ontology
- Introduced GeospatialLocation class for specific geospatial locations.
- Added HandsOnFacility class representing facilities for hands-on experiences.
- Created Hyponym class for narrower terms or instances.
- Added ImagingEquipment class for imaging-related equipment.
- Introduced LoadingDock class for loading dock facilities.
- Created LocalCollection class for locally held collections.
- Added Locker class for storage lockers available to visitors/staff.
- Introduced MichelinStarRating class for Michelin star ratings.
- Created MicrofilmReader class for equipment used to read microfilms.
- Added OperationalArchive class for archives containing operational records.
- Introduced OperationalUnit class for operational units within organizations.
- Added has_or_had_archive slot for associating archives with entities.
- Created has_or_had_rating slot for ratings assigned to entities.
- Introduced has_or_had_section slot for sections or units within organizations.
- Added has_geospatial_location slot linking nominal places to precise geospatial coordinates.
2026-01-27 22:17:11 +01:00
kempersc
09674f7da2 Refactor schema slots and classes for improved consistency and clarity
- Renamed `has_or_had_auxiliary_entities` to `is_or_was_associated_with` in DigitalPlatform.yaml to align with naming conventions.
- Updated examples in DigitalPlatform.yaml to reflect new slot names and types.
- Migrated `has_av_equipment` to `has_or_had_equipment` in EducationCenter.yaml, including detailed descriptions and examples.
- Consolidated archival references by migrating `archival_reference` to `has_or_had_identifier` in InformationCarrier.yaml.
- Removed deprecated slots: `has_authority_file_name`, `has_authority_file_url`, `has_auxiliary_place`, `has_auxiliary_place_type`, `has_auxiliary_platform`, `has_auxiliary_platform_type`, and `has_av_equipment`, archiving their definitions.
- Updated slot fixes to reflect the migration of various slots to more generic or appropriate counterparts, ensuring all changes are documented with processing notes.
2026-01-27 11:39:06 +01:00
kempersc
7f57b3e4b8 feat: Update manifest generation timestamp and add new classes for AVEquipment, PlaceType, Platform, and PlatformType with associated slots 2026-01-27 10:46:16 +01:00
kempersc
b2840d5db4 Refactor manifest update script to dynamically scan and add YAML files for classes, slots, and enums; archive obsolete slots; add new slot definitions for various attributes including auction details, assessment categories, and authentication requirements; introduce new classes for aspect ratios, audits, and authority data; enhance slot descriptions and mappings for clarity and consistency. 2026-01-27 10:29:15 +01:00
kempersc
80eb3d969c Add new slots for heritage custodian ontology
- Introduced `has_api_version`, `has_appellation_language`, `has_appellation_type`, `has_appellation_value`, `has_applicable_country`, `has_application_deadline`, `has_application_opening_date`, `has_appraisal_note`, `has_approval_date`, `has_archdiocese_name`, `has_architectural_style`, `has_archival_reference`, `has_archive_description`, `has_archive_memento_uri`, `has_archive_name`, `has_archive_path`, `has_archive_search_score`, `has_arrangement`, `has_arrangement_level`, `has_arrangement_note`, `has_articles_archival_stage`, `has_articles_document_format`, `has_articles_document_url`, `has_articles_of_association`, `has_or_had_altitude`, `has_or_had_annotation`, `has_or_had_arrangement`, `has_or_had_document`, `has_or_had_reason`, `has_or_had_style`, `is_or_was_amended_through`, `is_or_was_approved_on`, `is_or_was_archived_as`, `is_or_was_due_on`, `is_or_was_opened_on`, and `is_or_was_used_in` slots.
- Each slot includes detailed descriptions, range specifications, and appropriate mappings to existing ontologies.
2026-01-27 10:07:16 +01:00
kempersc
140ef25b96 feat: Update manifest generation timestamp and migrate annotation-related classes 2026-01-27 09:04:51 +01:00
kempersc
b4d1a7677f feat: Migrate has_air_changes_per_hour to specifies_or_specified and create AirChanges and Ventilation classes 2026-01-27 09:03:22 +01:00
kempersc
3d7c52c1de feat: Migrate has_agreement_signed_date to is_or_was_based_on and add Agreement class and is_or_was_signed_on slot 2026-01-27 00:50:10 +01:00
kempersc
bdba9de593 feat: Add archived governance slots and update manifest generation timestamp 2026-01-27 00:49:30 +01:00
kempersc
73b2d21bb3 Refactor code structure for improved readability and maintainability 2026-01-26 23:48:27 +01:00
kempersc
9342919c79 Add archived slot definitions for various attributes
- Introduced dual_class_role, emic_name, employer_linkedin_url, employer_name, employment_dates_raw, employment_end_date, employment_start_date, end_date, end_seconds, end_time, ended_at_time, endowment_draw, engagement_rate, enriched_date, enrichment_metadata_whatsapp, enrichment_method_whatsapp, exhibition_timespan, has_timespan, policy_effective_from, policy_effective_to, start_date, can_or_could_be_retrieved_from, documents_or_documented, has_or_had_contributor, has_or_had_drawer, has_or_had_email, has_or_had_endowment_draw, has_or_had_engagement_metric, has_or_had_metadata, has_or_had_summary, is_or_was_employed_by, and is_or_was_expired_at slots.
- Each slot includes detailed descriptions, ranges, and mappings to ensure compliance with ontology standards.
2026-01-26 17:32:24 +01:00
kempersc
4fa0fd572f feat: Migrate document_type to structured classes and update related slots 2026-01-26 09:03:23 +01:00
kempersc
ec113e8811 Add new classes and slots for archival and educational metadata
- Introduced EADIdentifier, EBook, EcclesiasticalProvince, Edition, Editor, Education, EmailAddress, and Size classes to enhance archival description capabilities.
- Added slots for digital presence types, digital surrogates, digitization status, and dimensions to support comprehensive metadata management.
- Migrated existing slots such as ead_id, edition_number, and dimension to new structured formats.
- Established relationships between works and their editions, sizes, and editors to improve data interconnectivity.
- Enhanced ontology alignment with Schema.org and BIBFRAME standards for better interoperability.
2026-01-26 09:00:29 +01:00
kempersc
fba1ab9353 feat: Migrate multiple slots to structured classes and update processing notes 2026-01-26 01:41:04 +01:00
kempersc
48d89206f9 feat: Update generated timestamp in manifest.json 2026-01-25 12:48:21 +01:00
kempersc
776462de90 Migrate multiple slots to enhance semantic clarity and align with best practices
- Migrated catering_type to CateringType with subclasses for better classification.
- Updated certainty_level to has_or_had_level for improved metadata consistency.
- Addressed cessation_observed_in by confirming existing temporal data structure.
- Created NetAsset class and updated financial statements for richer financial modeling.
- Completed migrations for default_access_policy, default_audio_language, and default_language to structured classes.
- Migrated default_position to structured Alignment class for better representation.
- Updated defined_by_standard to broaden range for identifier standards.
- Migrated definition to structured Resolution class for video resolution modeling.
- Completed migrations for degree_name, deliverable, and departement_code to structured classes.
- Migrated deployment_date to structured DeploymentEvent with temporal extent.
- Migrated derived_from_entity and derived_from_observation to new reference structures.
- Completed description and description_text migrations to enhance content modeling.
- Migrated detection_count, detection_level, and detection_threshold to structured slots with classes.
- Migrated device-related slots to structured classes for better identification and classification.
- Added new slots and classes for historic building and web address modeling.
2026-01-25 12:47:38 +01:00
kempersc
511fc99847 feat: Add PriceRange, Publication, and TaxDeductibility classes
- Introduced PriceRange class to categorize price levels for hospitality services, including structured metadata for various price categories.
- Added Publication class to represent publication events, capturing details like publisher, publication place, and edition.
- Created TaxDeductibilityType as an abstract class for tax deductibility status, promoting previous enum values to a class hierarchy for richer metadata.
- Implemented TaxDeductibilityTypes with concrete subclasses detailing various tax deductibility statuses.
- Archived previous DeductibilityStatusEnum and related slots, transitioning to a more structured approach for tax deductibility classification.
- Updated multiple slot definitions to align with new class structures and naming conventions, including has_or_had_measurement and has_or_had_price.
- Enhanced documentation and examples across new and existing slots for clarity and compliance with naming conventions.
2026-01-24 17:41:06 +01:00
kempersc
9a0e56e23a feat: Update manifest generated timestamp and add new slot revisions in slot_fixes.yaml 2026-01-24 13:14:38 +01:00
kempersc
b61572f08a feat: Add CreationEvent, DatePrecision, IdentificationEvent, and Image classes with structured slots
- Introduced CreationEvent class to represent the creation of objects, including temporal extent, creator, and place of creation.
- Added DatePrecision class to indicate the precision level of date values, supporting various formats from day to century.
- Implemented IdentificationEvent class for taxonomic identification, capturing identification date, method, and confidence level.
- Created Image class for visual content representation, including URL and metadata for images used in collections.
- Archived previous slots related to card images and titles, replacing them with structured slots for better data representation.
- Enhanced slots for decommission dates, degree of certainty, and identification events to improve temporal data handling.
2026-01-23 23:15:43 +01:00
kempersc
4cdf9588b2 Refactor schema slots and introduce new classes for data sources and data tiers
- Added `range: string` to `connections_by_heritage_type` slot for better data representation.
- Removed obsolete `data_source_whatsapp`, `data_tier`, `date_retrieved`, and `de` slots from the schema.
- Updated `derived_from_observation` slot to support multiple values and changed range to `uriorcurie`.
- Introduced new `DataSource` class to represent various data sources with detailed descriptions and examples.
- Created `DataTierLevel` class to classify data quality tiers with standard codes and descriptions.
- Archived removed slots and updated the manifest to reflect these changes.
- Added new `was_retrieved_at` slot to track data retrieval timestamps, following RiC-O conventions.
2026-01-23 13:15:14 +01:00
kempersc
6bb8ac20ba feat: Add MainPart and OutputData classes with detailed specifications
- Introduced MainPart class to represent principal portions with quantified values, including attributes for part type and currency code.
- Added OutputData class to specify output characteristics from devices/services, including format, description, and destination URL.
- Created canonical_value, capacity, capacity_type, and capacity_value slots for enhanced data representation.
- Archived and migrated various slots related to data sensitivity, dataset descriptions, and titles to align with new structures.
- Implemented has_or_had_caption and has_or_had_main_part slots to support media accessibility and primary portion representation.
- Enhanced data license policy slot to define custodian data licensing and openness policies.
2026-01-23 11:04:15 +01:00
kempersc
479ceae715 feat: Migrate data_license_policy to has_or_had_policy; archive previous slot and update related schemas 2026-01-22 22:35:10 +01:00
kempersc
46cb4d40fa feat: Update manifest timestamps and migrate data_license_policy to has_or_had_policy; archive previous data_format slot 2026-01-22 22:19:52 +01:00
kempersc
849e5354cc feat: Migrate data_format to has_or_had_output; archive previous data_format slot and update related schemas 2026-01-22 22:17:35 +01:00
kempersc
4efaef60e4 feat: Migrate capacity_value and cut_count to structured has_or_had_quantity; archive previous slots and update related schemas 2026-01-22 22:16:35 +01:00
kempersc
3c9926956e feat: Update canonical value handling by migrating to structured CanonicalForm and archiving previous slot 2026-01-22 22:08:04 +01:00
kempersc
2a75ddf7cc feat: Add ConflictType and ConflictTypes schemas for heritage conflict taxonomy
- Introduced abstract class ConflictType to define a taxonomy for various conflict types affecting heritage institutions.
- Added concrete subclasses in ConflictTypes.yaml, detailing specific conflict types such as ArmedConflict, NaturalDisaster, CivilUnrest, Terrorism, Looting, Neglect, Occupation, and Sanctions.
- Implemented Permission and PermissionType schemas to represent authorization requirements for accessing heritage materials, including subclasses like BishopsPermission and InstitutionalAffiliation.
- Created SocialNetworkMember class for representing members in social/professional networks, facilitating heritage sector network analysis.
- Established slots for canonical access rules, conflict status, and connection metadata, enhancing the data model for heritage custodians.
- Developed ConnectionDegree and ConnectionDegreeType classes to represent degrees of connection in social networks, with subclasses for first, second, and third-plus degrees.
- Added slots for birth dates in EDTF and ISO formats, improving the representation of heritage custodian entities.
2026-01-22 20:41:06 +01:00
kempersc
be18d6761c feat: Update manifest generated timestamp and mark slot fixes as completed with detailed response 2026-01-22 20:17:33 +01:00
kempersc
821d040b9d feat: Update manifest generated timestamp and add new slot revisions for arrangement and document handling 2026-01-22 17:05:23 +01:00
kempersc
615910055a Add new slots and classes for heritage custodian ontology
- Introduced LastName class to represent Dutch surnames with sorting behavior based on base forms.
- Added address_formatted, amount, area_value, and base_surname slots for heritage custodian entities.
- Created benefits_provided, compliance_status, and component_type slots to enhance entity descriptions.
- Implemented condition, contains_or_contained, final_of_the_final, has_or_had_base, and has_or_had_component slots for better relationship modeling.
- Established initial_of_the_initial and poses_or_posed_condition slots for capturing temporal states and access conditions.
2026-01-22 16:51:41 +01:00
kempersc
2d09776856 Refactor StorageCondition schema: Migrate compliance_status to has_or_had_status with ComplianceStatus class
- Removed compliance_status slot and replaced it with has_or_had_status.
- Updated has_or_had_status to use ComplianceStatus for structured representation.
- Adjusted examples to reflect new structure for compliance status.
- Updated documentation to indicate migration and provide details on the ComplianceStatus class.
2026-01-22 16:22:16 +01:00
kempersc
1cd3704762 feat: Update generated timestamp in manifest and add ComplianceStatus class for structured compliance representation 2026-01-22 15:52:55 +01:00
kempersc
4c3978ab2f feat: Migrate community_significance and frame_sample_rate slots to new structures
- Removed community_significance slot and migrated its functionality to has_or_had_significance, utilizing the Significance class for structured representation.
- Introduced has_or_had_significance slot with detailed examples and descriptions.
- Archived community_significance slot and its YAML file.
- Removed frame_sample_rate slot, migrating its functionality to the analyzes_or_analyzed slot, now supporting the VideoFrame class for frame analysis.
- Created VideoFrame class to encapsulate frame analysis parameters, including sample rate and total frames processed.
- Updated relevant schemas and examples to reflect these changes, ensuring compliance with migration rules.
- Regenerated manifest to include new structures and updated counts.
2026-01-22 15:51:02 +01:00
kempersc
ba2c766dd0 Add new slots and update existing ones following RiC-O temporal naming conventions
- Introduced `founding_date`, `founding_date_diocese`, and `fr` slots for capturing founding dates and French language text.
- Created `collects_or_collected`, `has_or_had_objective`, `has_or_had_percentage`, `has_or_had_place`, `has_or_had_reply`, `has_or_had_web_page`, `is_or_was_acquired_by`, `is_or_was_appreciated`, `is_or_was_founded_through`, `is_or_was_part_of`, `is_or_was_part_of_total`, `start_of_the_start`, `takes_or_took_comission`, and `was_fetched_at` slots to enhance data modeling capabilities.
- Each slot includes detailed descriptions, examples, and ontology alignments to ensure clarity and usability.
- Migration notes added for slots transitioned from previous definitions to maintain historical context and facilitate understanding of changes.
2026-01-22 15:15:56 +01:00
kempersc
367aaffc27 feat(schema): update generated timestamp in manifest and add Locality class with structured locality descriptions 2026-01-22 13:00:07 +01:00
kempersc
f8205cbc75 Refactor code structure for improved readability and maintainability 2026-01-22 12:52:29 +01:00
kempersc
8d9817c99a feat(schema): update generated timestamp in manifest and add slot revisions in slot_fixes.yaml 2026-01-20 12:48:38 +01:00
kempersc
b32efc208e feat(schema): migrate collection_focus to has_or_had_category; archive collection_focus slot 2026-01-19 16:56:34 +01:00
7354 changed files with 395029 additions and 168409 deletions

View file

@ -0,0 +1,46 @@
# Rule: Do Not Delete From slot_fixes.yaml
**Identifier**: `no-deletion-from-slot-fixes`
**Severity**: **CRITICAL**
## Core Directive
**NEVER delete entries from `slot_fixes.yaml`.**
The `slot_fixes.yaml` file serves as the historical record and audit trail for all schema migrations. Removing entries destroys this history and violates the project's data integrity principles.
## Workflow
When processing a migration:
1. **Do NOT Remove**: Never delete the entry for the slot you are working on.
2. **Update `processed`**: Instead, update the `processed` block:
* Set `status: true`.
* Set `date` to the current date (YYYY-MM-DD).
* Add a detailed `notes` string explaining what was done (e.g., "Fully migrated to [new_slot] + [Class] (Rule 53). [File].yaml updated. Slot archived.").
3. **Preserve History**: The entry must remain in the file permanently as a record of the migration.
## Rationale
* **Audit Trail**: We need to know what was migrated, when, and how.
* **Reversibility**: If a migration introduces a bug, the record helps us understand the original state.
* **Completeness**: The file tracks the total progress of the schema refactoring project.
## Example
**WRONG (Deletion)**:
```yaml
# DELETED from file
# - original_slot_id: ...
```
**CORRECT (Update)**:
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/has_some_slot
processed:
status: true
date: '2026-01-27'
notes: Fully migrated to has_or_had_new_slot + NewClass (Rule 53).
revision:
...
```

View file

@ -0,0 +1,32 @@
# Rule: Preserve Bespoke Slots Until Refactoring
**Identifier**: `preserve-bespoke-slots-until-refactoring`
**Severity**: **CRITICAL**
## Core Directive
**DO NOT remove or migrate "additional" bespoke slots during generic migration passes unless they are the specific target of the current task.**
## Context
When migrating a specific slot (e.g., `has_approval_date`), you may encounter other bespoke or legacy slots in the same class file (e.g., `innovation_budget`, `operating_budget`).
**YOU MUST**:
* ✅ Migrate ONLY the specific slot you were instructed to work on.
* ✅ Leave other bespoke slots exactly as they are.
* ✅ Focus strictly on the current migration target.
**YOU MUST NOT**:
* ❌ Proactively migrate "nearby" slots just because they look like they need refactoring.
* ❌ Remove slots that seem unused or redundant without specific instruction.
* ❌ "Clean up" the class file by removing legacy attributes.
## Rationale
Refactoring is a separate, planned phase. Mixing opportunistic refactoring with systematic slot migration increases the risk of regression and makes changes harder to review. "We will refactor those later."
## Workflow
1. **Identify Target**: Identify the specific slot(s) assigned for migration (from `slot_fixes.yaml` or user prompt).
2. **Execute Migration**: Apply changes ONLY for those slots.
3. **Ignore Others**: Do not touch other slots in the file, even if they violate other rules (like Rule 39 or Rule 53). Those will be handled in their own dedicated tasks.

View file

@ -0,0 +1,29 @@
# Rule: Slot Fixes File is Authoritative
**Scope:** Schema Migration / Slot Fixes
**Description:**
The file `/Users/kempersc/apps/glam/data/fixes/slot_fixes.yaml` is the **single authoritative source** for tracking slot migrations and fixes.
**Directives:**
1. **Authoritative Source:** Always read and update `/Users/kempersc/apps/glam/data/fixes/slot_fixes.yaml`. Do NOT use `schemas/.../slot_fixes.yaml` as the master list (though you may need to sync them if they diverge, the `data/fixes` version takes precedence).
2. **Processed Status:** When a slot migration is completed (schema updated, data migrated), you MUST update the entry in `slot_fixes.yaml` with a `processed` block containing:
* `status: true`
* `date: 'YYYY-MM-DD'`
* `notes`: Brief description of what was done.
3. **NEVER DELETE:** You MUST NOT delete entries from `slot_fixes.yaml`. Even if a slot is removed from the schema, the record of its fix MUST remain in this file with `status: true`.
4. **Format Compliance:** New slots added during migration must follow proper LinkML format conventions and use `slot_uri` and mappings (`exact_mappings`, `close_mappings`) that reference **legitimate predicates and classes found in `/Users/kempersc/apps/glam/data/ontology/`**.
**Example of Processed Entry:**
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/has_old_slot
revision:
- label: has_new_slot
type: slot
- label: NewClass
type: class
processed:
status: true
date: '2026-01-27'
notes: Migrated to has_new_slot + NewClass. Old slot archived.
```

View file

@ -0,0 +1,68 @@
# Rule 62: Verified Ontology Terms Reference
🚨 **CRITICAL**: All `class_uri`, `slot_uri`, and mapping properties (`exact_mappings`, `close_mappings`, etc.) MUST use verified classes and predicates that exist in the local ontology files at `data/ontology/`.
## 1. Verified Ontology Files
The following ontologies are locally available in `data/ontology/`. Always verify terms against these specific files. **NO HALLUCINATIONS ALLOWED.**
**Mandatory Verification Step**: Before using any `class_uri`, `slot_uri`, or mapping URI, you MUST `grep` the term in the local ontology file to confirm it exists.
| Prefix | Namespace | Local File | Key Classes/Predicates (Verified) |
|--------|-----------|------------|-----------------------------------|
| `cpov:` | `http://data.europa.eu/m8g/` | `core-public-organisation-ap.ttl` | `PublicOrganisation`, `contactPage`, `email` |
| `crm:` | `http://www.cidoc-crm.org/cidoc-crm/` | `CIDOC_CRM_v7.1.3.rdf` | `E1_CRM_Entity`, `E5_Event`, `P2_has_type` |
| `rico:` | `https://www.ica.org/standards/RiC/ontology#` | `RiC-O_1-1.rdf` | `Record`, `Agent`, `hasOrHadHolder` (Note: Use v1.1 file) |
| `pico:` | `https://personsincontext.org/model#` | `pico.ttl` | `PersonObservation`, `role` |
| `prov:` | `http://www.w3.org/ns/prov#` | `prov.ttl` | `Activity`, `Agent`, `wasGeneratedBy` |
| `skos:` | `http://www.w3.org/2004/02/skos/core#` | `skos.rdf` | `Concept`, `prefLabel`, `broader` |
| `schema:` | `https://schema.org/` | `frontend/public/ontology/schemaorg.owl` | `Organization`, `Place`, `name`, `url` |
| `dcterms:` | `http://purl.org/dc/terms/` | `dublin_core_elements.rdf` | `identifier`, `title`, `description` |
| `org:` | `http://www.w3.org/ns/org#` | `org.rdf` | `Organization`, `hasMember` |
| `tooi:` | `https://identifier.overheid.nl/tooi/def/ont/` | `tooiont.ttl` | `Overheidsorganisatie` |
| `dcat:` | `http://www.w3.org/ns/dcat#` | `dcat3.ttl` | `Dataset`, `Catalog`, `dataset` |
| `gn:` | `https://www.geonames.org/ontology#` | `geonames_ontology.rdf` | `Feature` |
| `dqv:` | `http://www.w3.org/ns/dqv#` | `dqv.ttl` | `QualityMeasurement`, `hasQualityAnnotation` |
| `premis:` | `http://www.loc.gov/premis/rdf/v3/` | `premis3.owl` | `fixity`, `storedAt`, `Event` |
## 2. Verification Procedure (MANDATORY)
**You MUST verify every term.** Do not assume a term exists just because it sounds standard.
```bash
# 1. Identify the source ontology file
ls data/ontology/
# 2. Grep for the specific term (e.g., 'hasFixity')
grep "hasFixity" data/ontology/premis3.owl
# Result: EMPTY -> Term does not exist! DO NOT USE.
# 3. Grep for the correct term (e.g., 'fixity')
grep "fixity" data/ontology/premis3.owl
# Result: <owl:ObjectProperty rdf:about=".../fixity"> -> Term exists. USE THIS.
```
## 3. LinkML Mapping Requirements
Mappings must be precise and verified.
* `exact_mappings` = `skos:exactMatch` (Semantic equivalence)
* `close_mappings` = `skos:closeMatch` (Near equivalence)
* `related_mappings` = `skos:relatedMatch` (Association)
* `broad_mappings` = `skos:broadMatch` (Broader concept)
* `narrow_mappings` = `skos:narrowMatch` (Narrower concept)
## 4. Prohibited/Invalid Terms (Hallucinations)
Do NOT use these commonly hallucinated or incorrect terms. They have been verified as **non-existent** in our local ontologies:
* ❌ `dqv:ConfidenceScore` (Use `dqv:QualityMeasurement`)
* ❌ `premis:hasFixity` (Use `premis:fixity`)
* ❌ `premis:hasFrameRate` (Verify specific PREMIS properties first)
* ❌ `schema:HeritageBuilding` (Use `schema:LandmarksOrHistoricalBuildings`)
* ❌ `rico:has_provenance` (Use `rico:history`)
* ❌ `rico:hasProvenance` (Use `rico:history`)
* ❌ `schema:archive` (Use `schema:archiveHeld` or `schema:archivedAt`)
**Always verify against the local file content.**

View file

@ -4806,3 +4806,16 @@ def test_historical_addition():
**Schema Version**: v0.2.1 (modular)
**Last Updated**: 2025-12-08
**Maintained By**: GLAM Data Extraction Project
### Rule 61: Slot Fixes Authoritative File
🚨 **CRITICAL**: The file `/Users/kempersc/apps/glam/data/fixes/slot_fixes.yaml` is the AUTHORITATIVE source for slot migrations. NEVER delete entries from this file. Always mark completed migrations with `processed: {status: true}`.
**See**: `.opencode/rules/slot-fixes-authoritative-rule.md` for complete documentation
### Rule 62: Verified Ontology Terms Reference
🚨 **CRITICAL**: All `class_uri`, `slot_uri`, and mappings MUST use verified classes and predicates from local ontology files in `data/ontology/`.
**See**: `.opencode/rules/verified-ontology-terms.md` for the list of verified ontologies and verification procedures.

5
archived_classes.txt Normal file
View file

@ -0,0 +1,5 @@
DualClassLink_archived_20260126.yaml
EducationCredential_archived_20260125.yaml
EducationEntry_archived_20260125.yaml
RealnessStatus_archived_20260114.yaml
TemplateSpecificityScores_archived_20260117.yaml

1146
archived_slots.txt Normal file

File diff suppressed because it is too large Load diff

1146
archived_slots_refresh.txt Normal file

File diff suppressed because it is too large Load diff

View file

@ -42,6 +42,8 @@ RUN useradd -m -u 1000 -s /bin/bash glam
# Install Python dependencies first (better layer caching)
COPY requirements.txt .
# Install CPU-only PyTorch first to avoid massive CUDA download and runtime issues
RUN pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code

File diff suppressed because it is too large Load diff

View file

@ -407,7 +407,7 @@ class Settings:
# RAG uses only Qdrant (vectors) and Oxigraph (SPARQL) for retrieval
# LLM Configuration
anthropic_api_key: str = os.getenv("ANTHROPIC_API_KEY", "")
anthropic_api_key: str = os.getenv("ANTHROPIC_API_KEY", "") or os.getenv("CLAUDE_API_KEY", "")
openai_api_key: str = os.getenv("OPENAI_API_KEY", "")
huggingface_api_key: str = os.getenv("HUGGINGFACE_API_KEY", "")
groq_api_key: str = os.getenv("GROQ_API_KEY", "")
@ -1660,6 +1660,7 @@ class MultiSourceRetriever:
only_heritage_relevant: bool = False,
only_wcms: bool = False,
using: str | None = None,
extra_filters: dict[str, Any] | None = None,
) -> list[Any]:
"""Search for persons/staff in the heritage_persons collection.
@ -1672,20 +1673,29 @@ class MultiSourceRetriever:
only_heritage_relevant: Only return heritage-relevant staff
only_wcms: Only return WCMS-registered profiles
using: Optional embedding model to use (e.g., 'minilm_384', 'openai_1536')
extra_filters: Optional extra filters for Qdrant
Returns:
List of RetrievedPerson objects
"""
if self.qdrant:
try:
return self.qdrant.search_persons( # type: ignore[no-any-return]
query=query,
k=k,
filter_custodian=filter_custodian,
only_heritage_relevant=only_heritage_relevant,
only_wcms=only_wcms,
using=using,
)
# Dynamically check if qdrant.search_persons supports extra_filters
# This handles case where HybridRetriever signature varies
import inspect
sig = inspect.signature(self.qdrant.search_persons)
kwargs = {
"query": query,
"k": k,
"filter_custodian": filter_custodian,
"only_heritage_relevant": only_heritage_relevant,
"only_wcms": only_wcms,
"using": using,
}
if "extra_filters" in sig.parameters:
kwargs["extra_filters"] = extra_filters
return self.qdrant.search_persons(**kwargs) # type: ignore[no-any-return]
except Exception as e:
logger.error(f"Person search failed: {e}")
return []
@ -2752,16 +2762,68 @@ async def person_search(request: PersonSearchRequest) -> PersonSearchResponse:
)
try:
# Augment query for better recall on domain names if it looks like a domain search
# "nos" -> "nos email domain nos" to guide vector search towards email addresses
search_query = request.query
extra_filters = None
# 1. Email/Domain Detection Logic
is_email_like = "@" in search_query
# Check for common TLDs at the end of the query or at the end of words
common_tlds = ['.nl', '.com', '.org', '.net', '.eu', '.be', '.de', '.edu', '.gov', '.uk', '.fr', '.it', '.es']
is_domain_like = any(tld in search_query.lower() for tld in common_tlds)
# 2. Construct Filters
if is_email_like or is_domain_like:
logger.info(f"[PersonSearch] Email/Domain pattern detected: '{search_query}'")
# If explicit @ is present, we might want to strip leading @ for better matching
# e.g. "@nos.nl" -> "nos.nl"
clean_term = search_query.strip()
if clean_term.startswith("@"):
clean_term = clean_term[1:]
# Apply MatchText filter on email field
# This prioritizes email matches by strictly filtering for them first
extra_filters = {"email": {"match": {"text": clean_term}}}
elif len(search_query.split()) == 1 and len(search_query) > 2:
# Heuristic: single word queries might be domain searches (e.g. "nos", "leiden")
# We use MatchText filtering on email field to find substring matches
# Qdrant "match": {"text": "nos"} performs token-based matching
extra_filters = {"email": {"match": {"text": search_query}}}
logger.info(f"[PersonSearch] Potential domain search detected for '{search_query}'. Applying strict email filter: {extra_filters}")
logger.info(f"[PersonSearch] Executing search for '{search_query}' (extra_filters={extra_filters})")
# Use the hybrid retriever's person search
results = retriever.search_persons(
query=request.query,
query=search_query,
k=request.k,
filter_custodian=request.filter_custodian,
only_heritage_relevant=request.only_heritage_relevant,
only_wcms=request.only_wcms,
using=request.embedding_model, # Pass embedding model
extra_filters=extra_filters,
)
# FALLBACK: If strict domain filter yielded no results, try standard vector search
# This fixes the issue where searching for names like "willem" (which look like domains)
# would fail because they don't appear in emails.
if extra_filters and not results:
logger.info(f"[PersonSearch] No results with email filter for '{search_query}'. Falling back to standard vector search.")
results = retriever.search_persons(
query=search_query,
k=request.k,
filter_custodian=request.filter_custodian,
only_heritage_relevant=request.only_heritage_relevant,
only_wcms=request.only_wcms,
using=request.embedding_model,
extra_filters=None, # Disable filter for fallback
)
logger.info(f"[PersonSearch] Fallback search returned {len(results)} results")
logger.info(f"[PersonSearch] Final result count: {len(results)}")
# Determine which embedding model was actually used
embedding_model_used = None
qdrant = retriever.qdrant
@ -3501,6 +3563,21 @@ async def dspy_query(request: DSPyQueryRequest) -> DSPyQueryResponse:
logger.info(f"LLM provider requested: {requested_provider} (request.llm_provider={request.llm_provider}, server default={settings.llm_provider})")
# Check if requested provider has API key configured - fail early if not
provider_api_keys = {
"zai": settings.zai_api_token,
"groq": settings.groq_api_key,
"anthropic": settings.anthropic_api_key,
"openai": settings.openai_api_key,
"huggingface": settings.huggingface_api_key,
}
if requested_provider in provider_api_keys and not provider_api_keys[requested_provider]:
raise ValueError(
f"LLM provider '{requested_provider}' was requested but its API key is not configured. "
f"Please set the appropriate environment variable (e.g., ANTHROPIC_API_KEY or CLAUDE_API_KEY for anthropic)."
)
# Provider configuration priority: requested provider first, then fallback chain
providers_to_try = [requested_provider]
# Add fallback chain (but not duplicates)
@ -4201,6 +4278,22 @@ async def stream_dspy_query_response(
llm_model_used: str | None = None
lm = None
# Check if requested provider has API key configured - fail early if not
provider_api_keys = {
"zai": settings.zai_api_token,
"groq": settings.groq_api_key,
"anthropic": settings.anthropic_api_key,
"openai": settings.openai_api_key,
"huggingface": settings.huggingface_api_key,
}
if requested_provider in provider_api_keys and not provider_api_keys[requested_provider]:
yield emit_error(
f"LLM provider '{requested_provider}' was requested but its API key is not configured. "
f"Please set the appropriate environment variable (e.g., ANTHROPIC_API_KEY or CLAUDE_API_KEY for anthropic)."
)
return
providers_to_try = [requested_provider]
for fallback in ["zai", "groq", "anthropic", "openai"]:
if fallback not in providers_to_try:

View file

@ -0,0 +1,846 @@
"""
Multi-Embedding Retriever for Heritage Data
Supports multiple embedding models using Qdrant's named vectors feature.
This enables:
- A/B testing different embedding models
- Cost optimization (cheap local embeddings vs paid API embeddings)
- Gradual migration between embedding models
- Fallback when one model is unavailable
Supported Embedding Models:
- openai_1536: text-embedding-3-small (1536-dim, $0.02/1M tokens)
- minilm_384: all-MiniLM-L6-v2 (384-dim, free/local)
- bge_768: bge-base-en-v1.5 (768-dim, free/local, high quality)
Collection Architecture:
Each collection has named vectors for each embedding model:
heritage_custodians:
vectors:
"openai_1536": VectorParams(size=1536)
"minilm_384": VectorParams(size=384)
payload: {name, ghcid, institution_type, ...}
heritage_persons:
vectors:
"openai_1536": VectorParams(size=1536)
"minilm_384": VectorParams(size=384)
payload: {name, headline, custodian_name, ...}
Usage:
retriever = MultiEmbeddingRetriever()
# Search with default model (auto-select based on availability)
results = retriever.search("museums in Amsterdam")
# Search with specific model
results = retriever.search("museums in Amsterdam", using="minilm_384")
# A/B test comparison
comparison = retriever.compare_models("museums in Amsterdam")
"""
import hashlib
import logging
import os
from dataclasses import dataclass, field
from enum import Enum
from typing import Any, Literal
logger = logging.getLogger(__name__)
class EmbeddingModel(str, Enum):
"""Supported embedding models with their configurations."""
OPENAI_1536 = "openai_1536"
MINILM_384 = "minilm_384"
BGE_768 = "bge_768"
@property
def dimension(self) -> int:
"""Get the vector dimension for this model."""
dims = {
"openai_1536": 1536,
"minilm_384": 384,
"bge_768": 768,
}
return dims[self.value]
@property
def model_name(self) -> str:
"""Get the actual model name for loading."""
names = {
"openai_1536": "text-embedding-3-small",
"minilm_384": "all-MiniLM-L6-v2",
"bge_768": "BAAI/bge-base-en-v1.5",
}
return names[self.value]
@property
def is_local(self) -> bool:
"""Check if this model runs locally (no API calls)."""
return self.value in ("minilm_384", "bge_768")
@property
def cost_per_1m_tokens(self) -> float:
"""Approximate cost per 1M tokens (0 for local models)."""
costs = {
"openai_1536": 0.02,
"minilm_384": 0.0,
"bge_768": 0.0,
}
return costs[self.value]
@dataclass
class MultiEmbeddingConfig:
"""Configuration for multi-embedding retriever."""
# Qdrant connection
qdrant_host: str = "localhost"
qdrant_port: int = 6333
qdrant_https: bool = False
qdrant_prefix: str | None = None
# API keys
openai_api_key: str | None = None
# Default embedding model preference order
# First available model is used if no explicit model is specified
model_preference: list[EmbeddingModel] = field(default_factory=lambda: [
EmbeddingModel.MINILM_384, # Free, fast, good quality
EmbeddingModel.OPENAI_1536, # Higher quality, paid
EmbeddingModel.BGE_768, # Free, high quality, slower
])
# Collection names
institutions_collection: str = "heritage_custodians"
persons_collection: str = "heritage_persons"
# Search defaults
default_k: int = 10
class MultiEmbeddingRetriever:
"""Retriever supporting multiple embedding models via Qdrant named vectors.
This class manages multiple embedding models and allows searching with
any available model. It handles:
- Model lazy-loading
- Automatic model selection based on availability
- Named vector creation and search
- A/B testing between models
"""
def __init__(self, config: MultiEmbeddingConfig | None = None):
"""Initialize multi-embedding retriever.
Args:
config: Configuration options. If None, uses environment variables.
"""
self.config = config or self._config_from_env()
# Lazy-loaded clients
self._qdrant_client = None
self._openai_client = None
self._st_models: dict[str, Any] = {} # Sentence transformer models
# Track available models per collection
self._available_models: dict[str, set[EmbeddingModel]] = {}
# Track whether each collection uses named vectors (vs single unnamed vector)
self._uses_named_vectors: dict[str, bool] = {}
logger.info(f"MultiEmbeddingRetriever initialized with preference: {[m.value for m in self.config.model_preference]}")
@staticmethod
def _config_from_env() -> MultiEmbeddingConfig:
"""Create configuration from environment variables."""
use_production = os.getenv("QDRANT_USE_PRODUCTION", "false").lower() == "true"
if use_production:
return MultiEmbeddingConfig(
qdrant_host=os.getenv("QDRANT_PROD_HOST", "bronhouder.nl"),
qdrant_port=443,
qdrant_https=True,
qdrant_prefix=os.getenv("QDRANT_PROD_PREFIX", "qdrant"),
openai_api_key=os.getenv("OPENAI_API_KEY"),
)
else:
return MultiEmbeddingConfig(
qdrant_host=os.getenv("QDRANT_HOST", "localhost"),
qdrant_port=int(os.getenv("QDRANT_PORT", "6333")),
openai_api_key=os.getenv("OPENAI_API_KEY"),
)
@property
def qdrant_client(self):
"""Lazy-load Qdrant client."""
if self._qdrant_client is None:
from qdrant_client import QdrantClient
if self.config.qdrant_https:
self._qdrant_client = QdrantClient(
host=self.config.qdrant_host,
port=self.config.qdrant_port,
https=True,
prefix=self.config.qdrant_prefix,
prefer_grpc=False,
timeout=30,
)
logger.info(f"Connected to Qdrant: https://{self.config.qdrant_host}/{self.config.qdrant_prefix or ''}")
else:
self._qdrant_client = QdrantClient(
host=self.config.qdrant_host,
port=self.config.qdrant_port,
)
logger.info(f"Connected to Qdrant: {self.config.qdrant_host}:{self.config.qdrant_port}")
return self._qdrant_client
@property
def openai_client(self):
"""Lazy-load OpenAI client."""
if self._openai_client is None:
if not self.config.openai_api_key:
raise RuntimeError("OpenAI API key not configured")
import openai
self._openai_client = openai.OpenAI(api_key=self.config.openai_api_key)
return self._openai_client
def _load_sentence_transformer(self, model: EmbeddingModel) -> Any:
"""Lazy-load a sentence-transformers model.
Args:
model: The embedding model to load
Returns:
Loaded SentenceTransformer model
"""
if model.value not in self._st_models:
try:
from sentence_transformers import SentenceTransformer
self._st_models[model.value] = SentenceTransformer(model.model_name)
logger.info(f"Loaded sentence-transformers model: {model.model_name}")
except ImportError:
raise RuntimeError(
"sentence-transformers not installed. Run: pip install sentence-transformers"
)
return self._st_models[model.value]
def get_embedding(self, text: str, model: EmbeddingModel) -> list[float]:
"""Get embedding vector for text using specified model.
Args:
text: Text to embed
model: Embedding model to use
Returns:
Embedding vector as list of floats
"""
if model == EmbeddingModel.OPENAI_1536:
response = self.openai_client.embeddings.create(
input=text,
model=model.model_name,
)
return response.data[0].embedding
elif model in (EmbeddingModel.MINILM_384, EmbeddingModel.BGE_768):
st_model = self._load_sentence_transformer(model)
embedding = st_model.encode(text)
return embedding.tolist()
else:
raise ValueError(f"Unknown embedding model: {model}")
def get_embeddings_batch(
self,
texts: list[str],
model: EmbeddingModel,
batch_size: int = 32,
) -> list[list[float]]:
"""Get embedding vectors for multiple texts.
Args:
texts: List of texts to embed
model: Embedding model to use
batch_size: Batch size for processing
Returns:
List of embedding vectors
"""
if not texts:
return []
if model == EmbeddingModel.OPENAI_1536:
# OpenAI batch API (max 2048 per request)
all_embeddings = []
for i in range(0, len(texts), 2048):
batch = texts[i:i + 2048]
response = self.openai_client.embeddings.create(
input=batch,
model=model.model_name,
)
batch_embeddings = [item.embedding for item in sorted(response.data, key=lambda x: x.index)]
all_embeddings.extend(batch_embeddings)
return all_embeddings
elif model in (EmbeddingModel.MINILM_384, EmbeddingModel.BGE_768):
st_model = self._load_sentence_transformer(model)
embeddings = st_model.encode(texts, batch_size=batch_size, show_progress_bar=len(texts) > 100)
return embeddings.tolist()
else:
raise ValueError(f"Unknown embedding model: {model}")
def get_available_models(self, collection_name: str) -> set[EmbeddingModel]:
"""Get the embedding models available for a collection.
Checks which named vectors exist in the collection.
For single-vector collections, returns models matching the dimension.
Args:
collection_name: Name of the Qdrant collection
Returns:
Set of available EmbeddingModel values
"""
if collection_name in self._available_models:
return self._available_models[collection_name]
try:
info = self.qdrant_client.get_collection(collection_name)
vectors_config = info.config.params.vectors
available = set()
uses_named_vectors = False
# Check for named vectors (dict of vector configs)
if isinstance(vectors_config, dict):
# Named vectors - each key is a vector name
uses_named_vectors = True
for vector_name in vectors_config.keys():
try:
model = EmbeddingModel(vector_name)
available.add(model)
except ValueError:
logger.warning(f"Unknown vector name in collection: {vector_name}")
else:
# Single unnamed vector - check dimension to find compatible model
# Note: This doesn't mean we can use `using=model.value` in queries
uses_named_vectors = False
if hasattr(vectors_config, 'size'):
dim = vectors_config.size
for model in EmbeddingModel:
if model.dimension == dim:
available.add(model)
# Store both available models and whether named vectors are used
self._available_models[collection_name] = available
self._uses_named_vectors[collection_name] = uses_named_vectors
if uses_named_vectors:
logger.info(f"Collection '{collection_name}' uses named vectors: {[m.value for m in available]}")
else:
logger.info(f"Collection '{collection_name}' uses single vector (compatible with: {[m.value for m in available]})")
return available
except Exception as e:
logger.warning(f"Could not get available models for {collection_name}: {e}")
return set()
def uses_named_vectors(self, collection_name: str) -> bool:
"""Check if a collection uses named vectors (vs single unnamed vector).
Args:
collection_name: Name of the Qdrant collection
Returns:
True if collection has named vectors, False for single-vector collections
"""
# Ensure models are loaded (populates _uses_named_vectors)
self.get_available_models(collection_name)
return self._uses_named_vectors.get(collection_name, False)
def select_model(
self,
collection_name: str,
preferred: EmbeddingModel | None = None,
) -> EmbeddingModel | None:
"""Select the best available embedding model for a collection.
Args:
collection_name: Name of the collection
preferred: Preferred model (used if available)
Returns:
Selected EmbeddingModel or None if none available
"""
available = self.get_available_models(collection_name)
if not available:
# No named vectors - check if we can use any model
# This happens for legacy single-vector collections
try:
info = self.qdrant_client.get_collection(collection_name)
vectors_config = info.config.params.vectors
# Get vector dimension
dim = None
if hasattr(vectors_config, 'size'):
dim = vectors_config.size
elif isinstance(vectors_config, dict):
# Get first vector config
first_config = next(iter(vectors_config.values()), None)
if first_config and hasattr(first_config, 'size'):
dim = first_config.size
if dim:
for model in self.config.model_preference:
if model.dimension == dim:
return model
except Exception:
pass
return None
# If preferred model is available, use it
if preferred and preferred in available:
return preferred
# Otherwise, follow preference order
for model in self.config.model_preference:
if model in available:
# Check if model is usable (has API key if needed)
if model == EmbeddingModel.OPENAI_1536 and not self.config.openai_api_key:
continue
return model
return None
def search(
self,
query: str,
collection_name: str | None = None,
k: int | None = None,
using: EmbeddingModel | str | None = None,
filter_conditions: dict[str, Any] | None = None,
) -> list[dict[str, Any]]:
"""Search for similar documents using specified or auto-selected model.
Args:
query: Search query text
collection_name: Collection to search (default: institutions)
k: Number of results
using: Embedding model to use (auto-selected if None)
filter_conditions: Optional Qdrant filter conditions
Returns:
List of results with scores and payloads
"""
collection_name = collection_name or self.config.institutions_collection
k = k or self.config.default_k
# Resolve model
if using is not None:
if isinstance(using, str):
model = EmbeddingModel(using)
else:
model = using
else:
model = self.select_model(collection_name)
if model is None:
raise RuntimeError(f"No compatible embedding model for collection '{collection_name}'")
logger.info(f"Searching '{collection_name}' with {model.value}: {query[:50]}...")
# Get query embedding
query_vector = self.get_embedding(query, model)
# Build filter
from qdrant_client.http import models
query_filter = None
if filter_conditions:
query_filter = models.Filter(
must=[
models.FieldCondition(
key=key,
match=models.MatchValue(value=value),
)
for key, value in filter_conditions.items()
]
)
# Check if collection uses named vectors (not just single unnamed vector)
# Only pass `using=model.value` if collection has actual named vectors
use_named_vector = self.uses_named_vectors(collection_name)
# Search
if use_named_vector:
results = self.qdrant_client.query_points(
collection_name=collection_name,
query=query_vector,
using=model.value,
limit=k,
with_payload=True,
query_filter=query_filter,
)
else:
# Legacy single-vector search
results = self.qdrant_client.query_points(
collection_name=collection_name,
query=query_vector,
limit=k,
with_payload=True,
query_filter=query_filter,
)
return [
{
"id": str(point.id),
"score": point.score,
"model": model.value,
"payload": point.payload or {},
}
for point in results.points
]
def search_persons(
self,
query: str,
k: int | None = None,
using: EmbeddingModel | str | None = None,
filter_custodian: str | None = None,
only_heritage_relevant: bool = False,
only_wcms: bool = False,
) -> list[dict[str, Any]]:
"""Search for persons/staff in the heritage_persons collection.
Args:
query: Search query text
k: Number of results
using: Embedding model to use
filter_custodian: Optional custodian slug to filter by
only_heritage_relevant: Only return heritage-relevant staff
only_wcms: Only return WCMS-registered profiles (heritage sector users)
Returns:
List of person results with scores
"""
k = k or self.config.default_k
# Build filters
filters = {}
if filter_custodian:
filters["custodian_slug"] = filter_custodian
if only_wcms:
filters["has_wcms"] = True
# Search with over-fetch for post-filtering
results = self.search(
query=query,
collection_name=self.config.persons_collection,
k=k * 2,
using=using,
filter_conditions=filters if filters else None,
)
# Post-filter for heritage_relevant if needed
if only_heritage_relevant:
results = [r for r in results if r.get("payload", {}).get("heritage_relevant", False)]
# Format results
formatted = []
for r in results[:k]:
payload = r.get("payload", {})
formatted.append({
"person_id": payload.get("staff_id", "") or hashlib.md5(
f"{payload.get('custodian_slug', '')}:{payload.get('name', '')}".encode()
).hexdigest()[:16],
"name": payload.get("name", ""),
"headline": payload.get("headline"),
"custodian_name": payload.get("custodian_name"),
"custodian_slug": payload.get("custodian_slug"),
"location": payload.get("location"),
"heritage_relevant": payload.get("heritage_relevant", False),
"heritage_type": payload.get("heritage_type"),
"linkedin_url": payload.get("linkedin_url"),
"score": r["score"],
"model": r["model"],
})
return formatted
def compare_models(
self,
query: str,
collection_name: str | None = None,
k: int = 10,
models: list[EmbeddingModel] | None = None,
) -> dict[str, Any]:
"""A/B test comparison of multiple embedding models.
Args:
query: Search query
collection_name: Collection to search
k: Number of results per model
models: Models to compare (default: all available)
Returns:
Dict with results per model and overlap analysis
"""
collection_name = collection_name or self.config.institutions_collection
# Determine which models to compare
available = self.get_available_models(collection_name)
if models:
models_to_test = [m for m in models if m in available]
else:
models_to_test = list(available)
if not models_to_test:
return {"error": "No models available for comparison"}
results = {}
all_ids = {}
for model in models_to_test:
try:
model_results = self.search(
query=query,
collection_name=collection_name,
k=k,
using=model,
)
results[model.value] = model_results
all_ids[model.value] = {r["id"] for r in model_results}
except Exception as e:
results[model.value] = {"error": str(e)}
all_ids[model.value] = set()
# Calculate overlap between models
overlap = {}
model_values = list(all_ids.keys())
for i, m1 in enumerate(model_values):
for m2 in model_values[i + 1:]:
if all_ids[m1] and all_ids[m2]:
intersection = all_ids[m1] & all_ids[m2]
union = all_ids[m1] | all_ids[m2]
jaccard = len(intersection) / len(union) if union else 0
overlap[f"{m1}_vs_{m2}"] = {
"jaccard_similarity": round(jaccard, 3),
"common_results": len(intersection),
"total_unique": len(union),
}
return {
"query": query,
"collection": collection_name,
"k": k,
"results": results,
"overlap_analysis": overlap,
}
def create_multi_embedding_collection(
self,
collection_name: str,
models: list[EmbeddingModel] | None = None,
) -> bool:
"""Create a new collection with named vectors for multiple embedding models.
Args:
collection_name: Name for the new collection
models: Embedding models to support (default: all)
Returns:
True if created successfully
"""
from qdrant_client.http.models import Distance, VectorParams
models = models or list(EmbeddingModel)
vectors_config = {
model.value: VectorParams(
size=model.dimension,
distance=Distance.COSINE,
)
for model in models
}
try:
self.qdrant_client.create_collection(
collection_name=collection_name,
vectors_config=vectors_config,
)
logger.info(f"Created multi-embedding collection '{collection_name}' with {[m.value for m in models]}")
# Clear cache
self._available_models.pop(collection_name, None)
return True
except Exception as e:
logger.error(f"Failed to create collection: {e}")
return False
def add_documents_multi_embedding(
self,
documents: list[dict[str, Any]],
collection_name: str,
models: list[EmbeddingModel] | None = None,
batch_size: int = 100,
) -> int:
"""Add documents with embeddings from multiple models.
Args:
documents: List of documents with 'text' and optional 'metadata' fields
collection_name: Target collection
models: Models to generate embeddings for (default: all available)
batch_size: Batch size for processing
Returns:
Number of documents added
"""
from qdrant_client.http import models as qmodels
# Determine which models to use
available = self.get_available_models(collection_name)
if models:
models_to_use = [m for m in models if m in available]
else:
models_to_use = list(available)
if not models_to_use:
raise RuntimeError(f"No embedding models available for collection '{collection_name}'")
# Filter valid documents
valid_docs = [d for d in documents if d.get("text")]
total_indexed = 0
for i in range(0, len(valid_docs), batch_size):
batch = valid_docs[i:i + batch_size]
texts = [d["text"] for d in batch]
# Generate embeddings for each model
embeddings_by_model = {}
for model in models_to_use:
try:
embeddings_by_model[model] = self.get_embeddings_batch(texts, model)
except Exception as e:
logger.warning(f"Failed to get {model.value} embeddings: {e}")
if not embeddings_by_model:
continue
# Create points with named vectors
points = []
for j, doc in enumerate(batch):
text = doc["text"]
metadata = doc.get("metadata", {})
point_id = doc.get("id") or hashlib.md5(text.encode()).hexdigest()
# Build named vectors dict
vectors = {}
for model, model_embeddings in embeddings_by_model.items():
vectors[model.value] = model_embeddings[j]
points.append(qmodels.PointStruct(
id=point_id,
vector=vectors,
payload={
"text": text,
**metadata,
}
))
# Upsert batch
self.qdrant_client.upsert(
collection_name=collection_name,
points=points,
)
total_indexed += len(points)
logger.info(f"Indexed {total_indexed}/{len(valid_docs)} documents with {len(models_to_use)} models")
return total_indexed
def get_stats(self) -> dict[str, Any]:
"""Get statistics about collections and available models.
Returns:
Dict with collection stats and model availability
"""
stats = {
"config": {
"qdrant_host": self.config.qdrant_host,
"qdrant_port": self.config.qdrant_port,
"model_preference": [m.value for m in self.config.model_preference],
"openai_available": bool(self.config.openai_api_key),
},
"collections": {},
}
for collection_name in [self.config.institutions_collection, self.config.persons_collection]:
try:
info = self.qdrant_client.get_collection(collection_name)
available_models = self.get_available_models(collection_name)
selected_model = self.select_model(collection_name)
stats["collections"][collection_name] = {
"vectors_count": info.vectors_count,
"points_count": info.points_count,
"status": info.status.value if info.status else "unknown",
"available_models": [m.value for m in available_models],
"selected_model": selected_model.value if selected_model else None,
}
except Exception as e:
stats["collections"][collection_name] = {"error": str(e)}
return stats
def close(self):
"""Close all connections."""
if self._qdrant_client:
self._qdrant_client.close()
self._qdrant_client = None
self._st_models.clear()
self._available_models.clear()
self._uses_named_vectors.clear()
def create_multi_embedding_retriever(use_production: bool | None = None) -> MultiEmbeddingRetriever:
"""Factory function to create a MultiEmbeddingRetriever.
Args:
use_production: If True, connect to production Qdrant.
Defaults to QDRANT_USE_PRODUCTION env var.
Returns:
Configured MultiEmbeddingRetriever instance
"""
if use_production is None:
use_production = os.getenv("QDRANT_USE_PRODUCTION", "").lower() in ("true", "1", "yes")
if use_production:
config = MultiEmbeddingConfig(
qdrant_host=os.getenv("QDRANT_PROD_HOST", "bronhouder.nl"),
qdrant_port=443,
qdrant_https=True,
qdrant_prefix=os.getenv("QDRANT_PROD_PREFIX", "qdrant"),
openai_api_key=os.getenv("OPENAI_API_KEY"),
)
else:
config = MultiEmbeddingConfig(
qdrant_host=os.getenv("QDRANT_HOST", "localhost"),
qdrant_port=int(os.getenv("QDRANT_PORT", "6333")),
openai_api_key=os.getenv("OPENAI_API_KEY"),
)
return MultiEmbeddingRetriever(config)

19045
data/fixes/bu/slot_fixes.yaml Normal file

File diff suppressed because it is too large Load diff

18357
data/fixes/slot_fixes.yaml Normal file

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,44 @@
fixes:
- orignal_slot_id: /Users/kempersc/apps/glam/schemas/20251121/linkml/modules/slots/archive_branches.yaml
revision:
- label: has_or_had_branch
type: slot
- label: Branch
type: class
- original_slot_id: /Users/kempersc/apps/glam/schemas/20251121/linkml/modules/slots/archive_path.yaml
revision:
- label: has_or_had_provenance_path
type: slot
- label: ProvenancePath
type: class
- original_slot_id: /Users/kempersc/apps/glam/schemas/20251121/linkml/modules/slots/archive_series.yaml
revision:
- label: is_or_was_part_of_series
type: slot
- label: Series
type: class
- orignal_slot_id: /Users/kempersc/apps/glam/schemas/20251121/linkml/modules/slots/condition_of_access.yaml
revision:
- label: has_or_had_condition_of_access
type: slot
- label: ConditionofAccess
type: class
- original_slot_id: /Users/kempersc/apps/glam/schemas/20251121/linkml/modules/slots/connection_heritage_relevant.yaml
revision:
- label: is_or_was_related_to
type: slot
- label: Entity
type: class
- original_slot_id: /Users/kempersc/apps/glam/schemas/20251121/linkml/modules/slots/connection_heritage_type.yaml
revision:
- label: has_or_had_heritage_type
type: slot
- label: HeritageType
type: class
- original_slot_id: https://nde.nl/ontology/hc/slot/was_retrieved_at
revision:
- label: is_or_was_retrieved_at
type: slot
- label: TimeSpan
type: class
-

View file

@ -73,6 +73,9 @@ This document catalogs all ontologies used in the GLAM Heritage Custodian projec
|------|----------|---------|--------|-----------|
| `skos.rdf` | SKOS (Simple Knowledge Org System) | 2009 | https://www.w3.org/TR/skos-reference/ | `skos:` |
| `dublin_core_elements.rdf` | Dublin Core Elements | 1.1 | https://www.dublincore.org/specifications/dublin-core/ | `dc:` |
| `dcterms.rdf` | DCMI Metadata Terms (RDF) | 2020 | https://www.dublincore.org/specifications/dublin-core/dcmi-terms/dublin_core_terms.rdf | `dcterms:` |
| `dctype.rdf` | DCMI Type Vocabulary | 2012 | https://www.dublincore.org/specifications/dublin-core/dcmi-type-vocabulary/ | `dcmitype:` |
| `oa.ttl` | Open Annotation Data Model | 2013 | https://www.w3.org/TR/annotation-vocab/ | `oa:` |
| `dcat3.ttl` | DCAT (Data Catalog Vocabulary) | 3.0 | https://www.w3.org/TR/vocab-dcat-3/ | `dcat:` |
| `schemaorg.owl` | Schema.org | 2024 | https://schema.org/ | `schema:` |
| `vcard.rdf` | vCard Ontology | 4.0 | https://www.w3.org/TR/vcard-rdf/ | `vcard:` |

View file

@ -0,0 +1 @@
404: Not Found

30839
data/ontology/RiC-O_1-1.rdf Normal file

File diff suppressed because it is too large Load diff

2103
data/ontology/dcterms.rdf Normal file

File diff suppressed because it is too large Load diff

21762
data/ontology/dcterms.ttl Normal file

File diff suppressed because it is too large Load diff

152
data/ontology/dctype.rdf Normal file
View file

@ -0,0 +1,152 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rdf:RDF [
<!ENTITY rdfns 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<!ENTITY rdfsns 'http://www.w3.org/2000/01/rdf-schema#'>
<!ENTITY dcns 'http://purl.org/dc/elements/1.1/'>
<!ENTITY dctermsns 'http://purl.org/dc/terms/'>
<!ENTITY dctypens 'http://purl.org/dc/dcmitype/'>
<!ENTITY dcamns 'http://purl.org/dc/dcam/'>
<!ENTITY skosns 'http://www.w3.org/2004/02/skos/core#'>
<!ENTITY owlns 'http://www.w3.org/2002/07/owl#'>
]>
<rdf:RDF xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dcam="http://purl.org/dc/dcam/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdf:Description rdf:about="http://purl.org/dc/dcmitype/">
<dcterms:title xml:lang="en">DCMI Type Vocabulary</dcterms:title>
<dcterms:publisher rdf:resource="http://purl.org/dc/aboutdcmi#DCMI"/>
<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2012-06-14</dcterms:modified>
</rdf:Description>
<rdf:Description rdf:about="http://purl.org/dc/dcmitype/Collection">
<rdfs:label xml:lang="en">Collection</rdfs:label>
<rdfs:comment xml:lang="en">An aggregation of resources.</rdfs:comment>
<dcterms:description xml:lang="en">A collection is described as a group; its parts may also be separately described.</dcterms:description>
<rdfs:isDefinedBy rdf:resource="http://purl.org/dc/dcmitype/"/>
<dcterms:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2000-07-11</dcterms:issued>
<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2008-01-14</dcterms:modified>
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<dcterms:hasVersion rdf:resource="http://dublincore.org/usage/terms/history/#Collection-003"/>
<dcam:memberOf rdf:resource="http://purl.org/dc/terms/DCMIType"/>
</rdf:Description>
<rdf:Description rdf:about="http://purl.org/dc/dcmitype/Dataset">
<rdfs:label xml:lang="en">Dataset</rdfs:label>
<rdfs:comment xml:lang="en">Data encoded in a defined structure.</rdfs:comment>
<dcterms:description xml:lang="en">Examples include lists, tables, and databases. A dataset may be useful for direct machine processing.</dcterms:description>
<rdfs:isDefinedBy rdf:resource="http://purl.org/dc/dcmitype/"/>
<dcterms:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2000-07-11</dcterms:issued>
<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2008-01-14</dcterms:modified>
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<dcterms:hasVersion rdf:resource="http://dublincore.org/usage/terms/history/#Dataset-003"/>
<dcam:memberOf rdf:resource="http://purl.org/dc/terms/DCMIType"/>
</rdf:Description>
<rdf:Description rdf:about="http://purl.org/dc/dcmitype/Event">
<rdfs:label xml:lang="en">Event</rdfs:label>
<rdfs:comment xml:lang="en">A non-persistent, time-based occurrence.</rdfs:comment>
<dcterms:description xml:lang="en">Metadata for an event provides descriptive information that is the basis for discovery of the purpose, location, duration, and responsible agents associated with an event. Examples include an exhibition, webcast, conference, workshop, open day, performance, battle, trial, wedding, tea party, conflagration.</dcterms:description>
<rdfs:isDefinedBy rdf:resource="http://purl.org/dc/dcmitype/"/>
<dcterms:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2000-07-11</dcterms:issued>
<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2008-01-14</dcterms:modified>
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<dcterms:hasVersion rdf:resource="http://dublincore.org/usage/terms/history/#Event-003"/>
<dcam:memberOf rdf:resource="http://purl.org/dc/terms/DCMIType"/>
</rdf:Description>
<rdf:Description rdf:about="http://purl.org/dc/dcmitype/Image">
<rdfs:label xml:lang="en">Image</rdfs:label>
<rdfs:comment xml:lang="en">A visual representation other than text.</rdfs:comment>
<dcterms:description xml:lang="en">Examples include images and photographs of physical objects, paintings, prints, drawings, other images and graphics, animations and moving pictures, film, diagrams, maps, musical notation. Note that Image may include both electronic and physical representations.</dcterms:description>
<rdfs:isDefinedBy rdf:resource="http://purl.org/dc/dcmitype/"/>
<dcterms:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2000-07-11</dcterms:issued>
<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2008-01-14</dcterms:modified>
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<dcterms:hasVersion rdf:resource="http://dublincore.org/usage/terms/history/#Image-004"/>
<dcam:memberOf rdf:resource="http://purl.org/dc/terms/DCMIType"/>
</rdf:Description>
<rdf:Description rdf:about="http://purl.org/dc/dcmitype/InteractiveResource">
<rdfs:label xml:lang="en">Interactive Resource</rdfs:label>
<rdfs:comment xml:lang="en">A resource requiring interaction from the user to be understood, executed, or experienced.</rdfs:comment>
<dcterms:description xml:lang="en">Examples include forms on Web pages, applets, multimedia learning objects, chat services, or virtual reality environments.</dcterms:description>
<rdfs:isDefinedBy rdf:resource="http://purl.org/dc/dcmitype/"/>
<dcterms:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2000-07-11</dcterms:issued>
<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2008-01-14</dcterms:modified>
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<dcterms:hasVersion rdf:resource="http://dublincore.org/usage/terms/history/#InteractiveResource-003"/>
<dcam:memberOf rdf:resource="http://purl.org/dc/terms/DCMIType"/>
</rdf:Description>
<rdf:Description rdf:about="http://purl.org/dc/dcmitype/Service">
<rdfs:label xml:lang="en">Service</rdfs:label>
<rdfs:comment xml:lang="en">A system that provides one or more functions.</rdfs:comment>
<dcterms:description xml:lang="en">Examples include a photocopying service, a banking service, an authentication service, interlibrary loans, a Z39.50 or Web server.</dcterms:description>
<rdfs:isDefinedBy rdf:resource="http://purl.org/dc/dcmitype/"/>
<dcterms:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2000-07-11</dcterms:issued>
<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2008-01-14</dcterms:modified>
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<dcterms:hasVersion rdf:resource="http://dublincore.org/usage/terms/history/#Service-003"/>
<dcam:memberOf rdf:resource="http://purl.org/dc/terms/DCMIType"/>
</rdf:Description>
<rdf:Description rdf:about="http://purl.org/dc/dcmitype/Software">
<rdfs:label xml:lang="en">Software</rdfs:label>
<rdfs:comment xml:lang="en">A computer program in source or compiled form.</rdfs:comment>
<dcterms:description xml:lang="en">Examples include a C source file, MS-Windows .exe executable, or Perl script.</dcterms:description>
<rdfs:isDefinedBy rdf:resource="http://purl.org/dc/dcmitype/"/>
<dcterms:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2000-07-11</dcterms:issued>
<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2008-01-14</dcterms:modified>
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<dcterms:hasVersion rdf:resource="http://dublincore.org/usage/terms/history/#Software-003"/>
<dcam:memberOf rdf:resource="http://purl.org/dc/terms/DCMIType"/>
</rdf:Description>
<rdf:Description rdf:about="http://purl.org/dc/dcmitype/Sound">
<rdfs:label xml:lang="en">Sound</rdfs:label>
<rdfs:comment xml:lang="en">A resource primarily intended to be heard.</rdfs:comment>
<dcterms:description xml:lang="en">Examples include a music playback file format, an audio compact disc, and recorded speech or sounds.</dcterms:description>
<rdfs:isDefinedBy rdf:resource="http://purl.org/dc/dcmitype/"/>
<dcterms:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2000-07-11</dcterms:issued>
<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2008-01-14</dcterms:modified>
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<dcterms:hasVersion rdf:resource="http://dublincore.org/usage/terms/history/#Sound-003"/>
<dcam:memberOf rdf:resource="http://purl.org/dc/terms/DCMIType"/>
</rdf:Description>
<rdf:Description rdf:about="http://purl.org/dc/dcmitype/Text">
<rdfs:label xml:lang="en">Text</rdfs:label>
<rdfs:comment xml:lang="en">A resource consisting primarily of words for reading.</rdfs:comment>
<dcterms:description xml:lang="en">Examples include books, letters, dissertations, poems, newspapers, articles, archives of mailing lists. Note that facsimiles or images of texts are still of the genre Text.</dcterms:description>
<rdfs:isDefinedBy rdf:resource="http://purl.org/dc/dcmitype/"/>
<dcterms:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2000-07-11</dcterms:issued>
<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2008-01-14</dcterms:modified>
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<dcterms:hasVersion rdf:resource="http://dublincore.org/usage/terms/history/#Text-003"/>
<dcam:memberOf rdf:resource="http://purl.org/dc/terms/DCMIType"/>
</rdf:Description>
<rdf:Description rdf:about="http://purl.org/dc/dcmitype/PhysicalObject">
<rdfs:label xml:lang="en">Physical Object</rdfs:label>
<rdfs:comment xml:lang="en">An inanimate, three-dimensional object or substance.</rdfs:comment>
<dcterms:description xml:lang="en">Note that digital representations of, or surrogates for, these objects should use Image, Text or one of the other types.</dcterms:description>
<rdfs:isDefinedBy rdf:resource="http://purl.org/dc/dcmitype/"/>
<dcterms:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2002-07-13</dcterms:issued>
<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2008-01-14</dcterms:modified>
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<dcterms:hasVersion rdf:resource="http://dublincore.org/usage/terms/history/#PhysicalObject-003"/>
<dcam:memberOf rdf:resource="http://purl.org/dc/terms/DCMIType"/>
</rdf:Description>
<rdf:Description rdf:about="http://purl.org/dc/dcmitype/StillImage">
<rdfs:label xml:lang="en">Still Image</rdfs:label>
<rdfs:comment xml:lang="en">A static visual representation.</rdfs:comment>
<dcterms:description xml:lang="en">Examples include paintings, drawings, graphic designs, plans and maps. Recommended best practice is to assign the type Text to images of textual materials. Instances of the type Still Image must also be describable as instances of the broader type Image.</dcterms:description>
<rdfs:isDefinedBy rdf:resource="http://purl.org/dc/dcmitype/"/>
<dcterms:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2003-11-18</dcterms:issued>
<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2008-01-14</dcterms:modified>
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<dcterms:hasVersion rdf:resource="http://dublincore.org/usage/terms/history/#StillImage-003"/>
<dcam:memberOf rdf:resource="http://purl.org/dc/terms/DCMIType"/>
<rdfs:subClassOf rdf:resource="http://purl.org/dc/dcmitype/Image"/>
</rdf:Description>
<rdf:Description rdf:about="http://purl.org/dc/dcmitype/MovingImage">
<rdfs:label xml:lang="en">Moving Image</rdfs:label>
<rdfs:comment xml:lang="en">A series of visual representations imparting an impression of motion when shown in succession.</rdfs:comment>
<dcterms:description xml:lang="en">Examples include animations, movies, television programs, videos, zoetropes, or visual output from a simulation. Instances of the type Moving Image must also be describable as instances of the broader type Image.</dcterms:description>
<rdfs:isDefinedBy rdf:resource="http://purl.org/dc/dcmitype/"/>
<dcterms:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2003-11-18</dcterms:issued>
<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2008-01-14</dcterms:modified>
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<dcterms:hasVersion rdf:resource="http://dublincore.org/usage/terms/history/#MovingImage-003"/>
<dcam:memberOf rdf:resource="http://purl.org/dc/terms/DCMIType"/>
<rdfs:subClassOf rdf:resource="http://purl.org/dc/dcmitype/Image"/>
</rdf:Description>
</rdf:RDF>

429
data/ontology/oa.ttl Normal file
View file

@ -0,0 +1,429 @@
@prefix acl: <http://www.w3.org/ns/auth/acl#> .
@prefix as: <http://www.w3.org/ns/activitystreams#> .
@prefix bibo: <http://purl.org/ontology/bibo/> .
@prefix cnt: <http://www.w3.org/2011/content#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix dctypes: <http://purl.org/dc/dcmitype/> .
@prefix exif: <http://www.w3.org/2003/12/exif/ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix gr: <http://purl.org/goodrelations/v1#> .
@prefix iana: <http://www.iana.org/assignments/relation/> .
@prefix iiif: <http://iiif.io/api/image/2#> .
@prefix ldp: <http://www.w3.org/ns/ldp#> .
@prefix oa: <http://www.w3.org/ns/oa#> .
@prefix ore: <http://www.openarchives.org/ore/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix pcdm: <http://pcdm.org/models#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sc: <http://iiif.io/api/presentation/2#> .
@prefix sioc: <http://rdfs.org/sioc/ns#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix svcs: <http://rdfs.org/sioc/services#> .
@prefix time: <http://www.w3.org/2006/time#> .
@prefix trig: <http://www.w3.org/2004/03/trix/rdfg-1/> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
oa:Annotation a rdfs:Class ;
rdfs:label "Annotation" ;
rdfs:comment "The class for Web Annotations." ;
rdfs:isDefinedBy oa: .
oa:Choice a rdfs:Class ;
rdfs:label "Choice" ;
rdfs:comment "A subClass of as:OrderedCollection that conveys to a consuming application that it should select one of the resources in the as:items list to use, rather than all of them. This is typically used to provide a choice of resources to render to the user, based on further supplied properties. If the consuming application cannot determine the user's preference, then it should use the first in the list." ;
rdfs:isDefinedBy oa: ;
rdfs:subClassOf as:OrderedCollection .
oa:CssSelector a rdfs:Class ;
rdfs:label "CssSelector" ;
rdfs:comment "A CssSelector describes a Segment of interest in a representation that conforms to the Document Object Model through the use of the CSS selector specification." ;
rdfs:isDefinedBy oa: ;
rdfs:subClassOf oa:Selector .
oa:CssStyle a rdfs:Class ;
rdfs:label "CssStyle" ;
rdfs:comment "A resource which describes styles for resources participating in the Annotation using CSS." ;
rdfs:isDefinedBy oa: ;
rdfs:subClassOf oa:Style .
oa:DataPositionSelector a rdfs:Class ;
rdfs:label "DataPositionSelector" ;
rdfs:comment "DataPositionSelector describes a range of data by recording the start and end positions of the selection in the stream. Position 0 would be immediately before the first byte, position 1 would be immediately before the second byte, and so on. The start byte is thus included in the list, but the end byte is not." ;
rdfs:isDefinedBy oa: ;
rdfs:subClassOf oa:Selector .
oa:Direction a rdfs:Class ;
rdfs:label "Direction" ;
rdfs:comment "A class to encapsulate the different text directions that a textual resource might take. It is not used directly in the Annotation Model, only its three instances." ;
rdfs:isDefinedBy oa: .
oa:FragmentSelector a rdfs:Class ;
rdfs:label "FragmentSelector" ;
rdfs:comment "The FragmentSelector class is used to record the segment of a representation using the IRI fragment specification defined by the representation's media type." ;
rdfs:isDefinedBy oa: ;
rdfs:subClassOf oa:Selector .
oa:HttpRequestState a rdfs:Class ;
rdfs:label "HttpRequestState" ;
rdfs:comment "The HttpRequestState class is used to record the HTTP request headers that a client SHOULD use to request the correct representation from the resource. " ;
rdfs:isDefinedBy oa: ;
rdfs:subClassOf oa:State .
oa:Motivation a rdfs:Class ;
rdfs:label "Motivation" ;
rdfs:comment "The Motivation class is used to record the user's intent or motivation for the creation of the Annotation, or the inclusion of the body or target, that it is associated with." ;
rdfs:isDefinedBy oa: ;
rdfs:subClassOf skos:Concept .
oa:RangeSelector a rdfs:Class ;
rdfs:label "RangeSelector" ;
rdfs:comment "A Range Selector can be used to identify the beginning and the end of the selection by using other Selectors. The selection consists of everything from the beginning of the starting selector through to the beginning of the ending selector, but not including it." ;
rdfs:isDefinedBy oa: ;
rdfs:subClassOf oa:Selector .
oa:ResourceSelection a rdfs:Class ;
rdfs:label "ResourceSelection" ;
rdfs:comment "Instances of the ResourceSelection class identify part (described by an oa:Selector) of another resource (referenced with oa:hasSource), possibly from a particular representation of a resource (described by an oa:State). Please note that ResourceSelection is not used directly in the Web Annotation model, but is provided as a separate class for further application profiles to use, separate from oa:SpecificResource which has many Annotation specific features." ;
rdfs:isDefinedBy oa: .
oa:Selector a rdfs:Class ;
rdfs:label "Selector" ;
rdfs:comment "A resource which describes the segment of interest in a representation of a Source resource, indicated with oa:hasSelector from the Specific Resource. This class is not used directly in the Annotation model, only its subclasses." ;
rdfs:isDefinedBy oa: .
oa:SpecificResource a rdfs:Class ;
rdfs:label "SpecificResource" ;
rdfs:comment "Instances of the SpecificResource class identify part of another resource (referenced with oa:hasSource), a particular representation of a resource, a resource with styling hints for renders, or any combination of these, as used within an Annotation." ;
rdfs:isDefinedBy oa: ;
rdfs:subClassOf oa:ResourceSelection .
oa:State a rdfs:Class ;
rdfs:label "State" ;
rdfs:comment "A State describes the intended state of a resource as applied to the particular Annotation, and thus provides the information needed to retrieve the correct representation of that resource." ;
rdfs:isDefinedBy oa: .
oa:Style a rdfs:Class ;
rdfs:label "Style" ;
rdfs:comment "A Style describes the intended styling of a resource as applied to the particular Annotation, and thus provides the information to ensure that rendering is consistent across implementations." ;
rdfs:isDefinedBy oa: .
oa:SvgSelector a rdfs:Class ;
rdfs:label "SvgSelector" ;
rdfs:comment "An SvgSelector defines an area through the use of the Scalable Vector Graphics [SVG] standard. This allows the user to select a non-rectangular area of the content, such as a circle or polygon by describing the region using SVG. The SVG may be either embedded within the Annotation or referenced as an External Resource." ;
rdfs:isDefinedBy oa: ;
rdfs:subClassOf oa:Selector .
oa:TextPositionSelector a rdfs:Class ;
rdfs:label "TextPositionSelector" ;
rdfs:comment "The TextPositionSelector describes a range of text by recording the start and end positions of the selection in the stream. Position 0 would be immediately before the first character, position 1 would be immediately before the second character, and so on." ;
rdfs:isDefinedBy oa: ;
rdfs:subClassOf oa:Selector .
oa:TextQuoteSelector a rdfs:Class ;
rdfs:label "TextQuoteSelector" ;
rdfs:comment "The TextQuoteSelector describes a range of text by copying it, and including some of the text immediately before (a prefix) and after (a suffix) it to distinguish between multiple copies of the same sequence of characters." ;
rdfs:isDefinedBy oa: ;
rdfs:subClassOf oa:Selector .
oa:TextualBody a rdfs:Class ;
rdfs:label "TextualBody" ;
rdfs:comment "" ;
rdfs:isDefinedBy oa: .
oa:TimeState a rdfs:Class ;
rdfs:label "TimeState" ;
rdfs:comment "A TimeState records the time at which the resource's state is appropriate for the Annotation, typically the time that the Annotation was created and/or a link to a persistent copy of the current version." ;
rdfs:isDefinedBy oa: ;
rdfs:subClassOf oa:State .
oa:XPathSelector a rdfs:Class ;
rdfs:label "XPathSelector" ;
rdfs:comment " An XPathSelector is used to select elements and content within a resource that supports the Document Object Model via a specified XPath value." ;
rdfs:isDefinedBy oa: ;
rdfs:subClassOf oa:Selector .
oa:PreferContainedDescriptions a rdfs:Resource ;
rdfs:label "PreferContainedDescriptions" ;
rdfs:comment "An IRI to signal the client prefers to receive full descriptions of the Annotations from a container, not just their IRIs." ;
rdfs:isDefinedBy oa: .
oa:PreferContainedIRIs a rdfs:Resource ;
rdfs:label "PreferContainedIRIs" ;
rdfs:comment "An IRI to signal that the client prefers to receive only the IRIs of the Annotations from a container, not their full descriptions." ;
rdfs:isDefinedBy oa: .
oa:annotationService a rdf:Property ;
rdfs:label "annotationService" ;
rdfs:comment """The object of the relationship is the end point of a service that conforms to the annotation-protocol, and it may be associated with any resource. The expectation of asserting the relationship is that the object is the preferred service for maintaining annotations about the subject resource, according to the publisher of the relationship.
This relationship is intended to be used both within Linked Data descriptions and as the rel type of a Link, via HTTP Link Headers rfc5988 for binary resources and in HTML <link> elements. For more information about these, please see the Annotation Protocol specification annotation-protocol.
""" ;
rdfs:isDefinedBy oa: .
oa:assessing a oa:Motivation ;
rdfs:label "assessing" ;
rdfs:comment "The motivation for when the user intends to provide an assessment about the Target resource." ;
rdfs:isDefinedBy oa: .
oa:bodyValue a rdf:Property ;
rdfs:label "bodyValue" ;
rdfs:comment """The object of the predicate is a plain text string to be used as the content of the body of the Annotation. The value MUST be an xsd:string and that data type MUST NOT be expressed in the serialization. Note that language MUST NOT be associated with the value either as a language tag, as that is only available for rdf:langString .
""" ;
rdfs:domain oa:Annotation ;
rdfs:isDefinedBy oa: ;
rdfs:range xsd:string .
oa:bookmarking a oa:Motivation ;
rdfs:label "bookmarking" ;
rdfs:comment "The motivation for when the user intends to create a bookmark to the Target or part thereof." ;
rdfs:isDefinedBy oa: .
oa:cachedSource a rdf:Property ;
rdfs:label "cachedSource" ;
rdfs:comment "A object of the relationship is a copy of the Source resource's representation, appropriate for the Annotation." ;
rdfs:domain oa:TimeState ;
rdfs:isDefinedBy oa: .
oa:canonical a rdf:Property ;
rdfs:label "canonical" ;
rdfs:comment "A object of the relationship is the canonical IRI that can always be used to deduplicate the Annotation, regardless of the current IRI used to access the representation." ;
rdfs:isDefinedBy oa: .
oa:classifying a oa:Motivation ;
rdfs:label "classifying" ;
rdfs:comment "The motivation for when the user intends to that classify the Target as something." ;
rdfs:isDefinedBy oa: .
oa:commenting a oa:Motivation ;
rdfs:label "commenting" ;
rdfs:comment "The motivation for when the user intends to comment about the Target." ;
rdfs:isDefinedBy oa: .
oa:describing a oa:Motivation ;
rdfs:label "describing" ;
rdfs:comment "The motivation for when the user intends to describe the Target, as opposed to a comment about them." ;
rdfs:isDefinedBy oa: .
oa:editing a oa:Motivation ;
rdfs:label "editing" ;
rdfs:comment "The motivation for when the user intends to request a change or edit to the Target resource." ;
rdfs:isDefinedBy oa: .
oa:end a rdf:Property ;
rdfs:label "end" ;
rdfs:comment "The end property is used to convey the 0-based index of the end position of a range of content." ;
rdfs:isDefinedBy oa: ;
rdfs:range xsd:nonNegativeInteger .
oa:exact a rdf:Property ;
rdfs:label "exact" ;
rdfs:comment "The object of the predicate is a copy of the text which is being selected, after normalization." ;
rdfs:isDefinedBy oa: ;
rdfs:range xsd:string .
oa:hasBody a rdf:Property ;
rdfs:label "hasBody" ;
rdfs:comment "The object of the relationship is a resource that is a body of the Annotation." ;
rdfs:domain oa:Annotation ;
rdfs:isDefinedBy oa: .
oa:hasEndSelector a rdf:Property ;
rdfs:label "hasEndSelector" ;
rdfs:comment "The relationship between a RangeSelector and the Selector that describes the end position of the range. " ;
rdfs:domain oa:RangeSelector ;
rdfs:isDefinedBy oa: ;
rdfs:range oa:Selector .
oa:hasPurpose a rdf:Property ;
rdfs:label "hasPurpose" ;
rdfs:comment "The purpose served by the resource in the Annotation." ;
rdfs:isDefinedBy oa: ;
rdfs:range oa:Motivation .
oa:hasScope a rdf:Property ;
rdfs:label "hasScope" ;
rdfs:comment "The scope or context in which the resource is used within the Annotation." ;
rdfs:domain oa:SpecificResource ;
rdfs:isDefinedBy oa: .
oa:hasSelector a rdf:Property ;
rdfs:label "hasSelector" ;
rdfs:comment "The object of the relationship is a Selector that describes the segment or region of interest within the source resource. Please note that the domain ( oa:ResourceSelection ) is not used directly in the Web Annotation model." ;
rdfs:domain oa:ResourceSelection ;
rdfs:isDefinedBy oa: ;
rdfs:range oa:Selector .
oa:hasSource a rdf:Property ;
rdfs:label "hasSource" ;
rdfs:comment "The resource that the ResourceSelection, or its subclass SpecificResource, is refined from, or more specific than. Please note that the domain ( oa:ResourceSelection ) is not used directly in the Web Annotation model." ;
rdfs:domain oa:ResourceSelection ;
rdfs:isDefinedBy oa: .
oa:hasStartSelector a rdf:Property ;
rdfs:label "hasStartSelector" ;
rdfs:comment "The relationship between a RangeSelector and the Selector that describes the start position of the range. " ;
rdfs:domain oa:RangeSelector ;
rdfs:isDefinedBy oa: ;
rdfs:range oa:Selector .
oa:hasState a rdf:Property ;
rdfs:label "hasState" ;
rdfs:comment "The relationship between the ResourceSelection, or its subclass SpecificResource, and a State resource. Please note that the domain ( oa:ResourceSelection ) is not used directly in the Web Annotation model." ;
rdfs:domain oa:ResourceSelection ;
rdfs:isDefinedBy oa: ;
rdfs:range oa:State .
oa:hasTarget a rdf:Property ;
rdfs:label "hasTarget" ;
rdfs:comment "The relationship between an Annotation and its Target." ;
rdfs:domain oa:Annotation ;
rdfs:isDefinedBy oa: .
oa:highlighting a oa:Motivation ;
rdfs:label "highlighting" ;
rdfs:comment "The motivation for when the user intends to highlight the Target resource or segment of it." ;
rdfs:isDefinedBy oa: .
oa:identifying a oa:Motivation ;
rdfs:label "identifying" ;
rdfs:comment "The motivation for when the user intends to assign an identity to the Target or identify what is being depicted or described in the Target." ;
rdfs:isDefinedBy oa: .
oa:linking a oa:Motivation ;
rdfs:label "linking" ;
rdfs:comment "The motivation for when the user intends to link to a resource related to the Target." ;
rdfs:isDefinedBy oa: .
oa:ltrDirection a oa:Direction ;
rdfs:label "ltrDirection" ;
rdfs:comment "The direction of text that is read from left to right." ;
rdfs:isDefinedBy oa: .
oa:moderating a oa:Motivation ;
rdfs:label "moderating" ;
rdfs:comment "The motivation for when the user intends to assign some value or quality to the Target." ;
rdfs:isDefinedBy oa: .
oa:motivatedBy a rdf:Property ;
rdfs:label "motivatedBy" ;
rdfs:comment "The relationship between an Annotation and a Motivation that describes the reason for the Annotation's creation." ;
rdfs:domain oa:Annotation ;
rdfs:isDefinedBy oa: ;
rdfs:range oa:Motivation .
oa:prefix a rdf:Property ;
rdfs:label "prefix" ;
rdfs:comment "The object of the property is a snippet of content that occurs immediately before the content which is being selected by the Selector." ;
rdfs:isDefinedBy oa: ;
rdfs:range xsd:string .
oa:processingLanguage a rdf:Property ;
rdfs:label "processingLanguage" ;
rdfs:comment "The object of the property is the language that should be used for textual processing algorithms when dealing with the content of the resource, including hyphenation, line breaking, which font to use for rendering and so forth. The value must follow the recommendations of BCP47." ;
rdfs:isDefinedBy oa: ;
rdfs:range xsd:string .
oa:questioning a oa:Motivation ;
rdfs:label "questioning" ;
rdfs:comment "The motivation for when the user intends to ask a question about the Target." ;
rdfs:isDefinedBy oa: .
oa:refinedBy a rdf:Property ;
rdfs:label "refinedBy" ;
rdfs:comment "The relationship between a Selector and another Selector or a State and a Selector or State that should be applied to the results of the first to refine the processing of the source resource. " ;
rdfs:isDefinedBy oa: .
oa:renderedVia a rdf:Property ;
rdfs:label "renderedVia" ;
rdfs:comment "A system that was used by the application that created the Annotation to render the resource." ;
rdfs:domain oa:SpecificResource ;
rdfs:isDefinedBy oa: .
oa:replying a oa:Motivation ;
rdfs:label "replying" ;
rdfs:comment "The motivation for when the user intends to reply to a previous statement, either an Annotation or another resource." ;
rdfs:isDefinedBy oa: .
oa:rtlDirection a oa:Direction ;
rdfs:label "rtlDirection" ;
rdfs:comment "The direction of text that is read from right to left." ;
rdfs:isDefinedBy oa: .
oa:sourceDate a rdf:Property ;
rdfs:label "sourceDate" ;
rdfs:comment "The timestamp at which the Source resource should be interpreted as being applicable to the Annotation." ;
rdfs:domain oa:TimeState ;
rdfs:isDefinedBy oa: ;
rdfs:range xsd:dateTime .
oa:sourceDateEnd a rdf:Property ;
rdfs:label "sourceDateEnd" ;
rdfs:comment "The end timestamp of the interval over which the Source resource should be interpreted as being applicable to the Annotation." ;
rdfs:domain oa:TimeState ;
rdfs:isDefinedBy oa: ;
rdfs:range xsd:dateTime .
oa:sourceDateStart a rdf:Property ;
rdfs:label "sourceDateStart" ;
rdfs:comment "The start timestamp of the interval over which the Source resource should be interpreted as being applicable to the Annotation." ;
rdfs:domain oa:TimeState ;
rdfs:isDefinedBy oa: ;
rdfs:range xsd:dateTime .
oa:start a rdf:Property ;
rdfs:label "start" ;
rdfs:comment "The start position in a 0-based index at which a range of content is selected from the data in the source resource." ;
rdfs:isDefinedBy oa: ;
rdfs:range xsd:nonNegativeInteger .
oa:styleClass a rdf:Property ;
rdfs:label "styleClass" ;
rdfs:comment "The name of the class used in the CSS description referenced from the Annotation that should be applied to the Specific Resource." ;
rdfs:domain oa:SpecificResource ;
rdfs:isDefinedBy oa: ;
rdfs:range xsd:string .
oa:styledBy a rdf:Property ;
rdfs:label "styledBy" ;
rdfs:comment "A reference to a Stylesheet that should be used to apply styles to the Annotation rendering." ;
rdfs:domain oa:Annotation ;
rdfs:isDefinedBy oa: ;
rdfs:range oa:Style .
oa:suffix a rdf:Property ;
rdfs:label "suffix" ;
rdfs:comment "The snippet of text that occurs immediately after the text which is being selected." ;
rdfs:isDefinedBy oa: ;
rdfs:range xsd:string .
oa:tagging a oa:Motivation ;
rdfs:label "tagging" ;
rdfs:comment "The motivation for when the user intends to associate a tag with the Target." ;
rdfs:isDefinedBy oa: .
oa:textDirection a rdf:Property ;
rdfs:label "textDirection" ;
rdfs:comment "The direction of the text of the subject resource. There MUST only be one text direction associated with any given resource." ;
rdfs:isDefinedBy oa: ;
rdfs:range oa:Direction .
oa:via a rdf:Property ;
rdfs:label "via" ;
rdfs:comment "A object of the relationship is a resource from which the source resource was retrieved by the providing system." ;
rdfs:isDefinedBy oa: .
oa: a owl:Ontology ;
dc:title "Web Annotation Ontology" ;
dcterms:creator [a foaf:Person; foaf:name "Benjamin Young"],
[a foaf:Person; foaf:name "Paolo Ciccarese"],
[a foaf:Person; foaf:name "Robert Sanderson"] ;
dcterms:modified "2016-11-12T21:28:11Z" ;
rdfs:comment "The Web Annotation ontology defines the terms of the Web Annotation vocabulary. Any changes to this document MUST be from a Working Group in the W3C that has established expertise in the area." ;
rdfs:seeAlso <http://www.w3.org/TR/annotation-vocab/> ;
prov:wasRevisionOf <http://www.openannotation.org/spec/core/20130208/oa.owl> ;
owl:versionInfo "2016-11-12T21:28:11Z" .

41011
data/ontology/schemaorg.owl Normal file

File diff suppressed because it is too large Load diff

2445
defined_schema_terms.txt Normal file

File diff suppressed because it is too large Load diff

View file

@ -4,7 +4,7 @@
"version": "0.0.0",
"type": "module",
"scripts": {
"sync-schemas": "rsync -av --delete ../schemas/20251121/linkml/ public/schemas/20251121/linkml/",
"sync-schemas": "rsync -av --delete --exclude=\"archive/\" ../schemas/20251121/linkml/ public/schemas/20251121/linkml/",
"generate-manifest": "node scripts/generate-schema-manifest.cjs",
"dev": "pnpm run sync-schemas && pnpm run generate-manifest && vite",
"build": "pnpm run sync-schemas && pnpm run generate-manifest && tsc -b && vite build",

View file

@ -43,12 +43,15 @@ imports:
- modules/slots/has_appellation_type
- modules/slots/has_appellation_value
- modules/slots/has_or_had_arrangement_system
- modules/slots/collection_description
- modules/slots/collection_name
- modules/slots/has_or_had_description
- modules/slots/has_or_had_label
# collection_description ARCHIVED (2026-01-18) - migrated to has_or_had_description (Rule 53)
# collection_name ARCHIVED (2026-01-18) - migrated to has_or_had_label (Rule 53)
# collection_scope ARCHIVED (2026-01-18) - migrated to has_or_had_scope + CollectionScope (Rule 53)
- modules/slots/has_or_had_scope
- modules/slots/collection_type
- modules/slots/collections_under_responsibility
# collections_under_responsibility ARCHIVED (2026-01-19) - migrated to is_or_was_responsible_for (Rule 53)
- modules/slots/is_or_was_responsible_for
- modules/slots/confidence_method
- modules/slots/confidence_score
- modules/slots/confidence_value
@ -599,7 +602,7 @@ imports:
- modules/slots/has_or_had_area_served
- modules/slots/has_or_had_member_custodian
- modules/slots/membership_criteria
- modules/slots/community_engagement
# community_engagement ARCHIVED 2026-01-19 - migrated to has_or_had_activity (imported above)
- modules/slots/service_offering
- modules/slots/record_type
- modules/slots/society_focus

View file

@ -6,8 +6,14 @@ prefixes:
linkml: https://w3id.org/linkml/
org: http://www.w3.org/ns/org#
schema: http://schema.org/
dcterms: http://purl.org/dc/terms/
prov: http://www.w3.org/ns/prov#
crm: http://www.cidoc-crm.org/cidoc-crm/
skos: http://www.w3.org/2004/02/skos/core#
rdfs: http://www.w3.org/2000/01/rdf-schema#
xsd: http://www.w3.org/2001/XMLSchema#
imports:
- linkml:types
- linkml:types
default_prefix: hc
slots:
has_or_had_admin_staff_count:
@ -27,3 +33,5 @@ slots:
custodian_types_primary: M
specificity_score: 0.5
specificity_rationale: Moderately specific slot.
exact_mappings:
- hc:hasOrHadAdminStaffCount

View file

@ -0,0 +1,37 @@
id: https://nde.nl/ontology/hc/slot/has_or_had_admission_fee
name: has_or_had_admission_fee_slot
title: Has Or Had Admission Fee Slot
prefixes:
gr: http://purl.org/goodrelations/v1#
hc: https://nde.nl/ontology/hc/
linkml: https://w3id.org/linkml/
schema: http://schema.org/
dcterms: http://purl.org/dc/terms/
prov: http://www.w3.org/ns/prov#
crm: http://www.cidoc-crm.org/cidoc-crm/
skos: http://www.w3.org/2004/02/skos/core#
rdfs: http://www.w3.org/2000/01/rdf-schema#
org: http://www.w3.org/ns/org#
xsd: http://www.w3.org/2001/XMLSchema#
imports:
- linkml:types
default_prefix: hc
slots:
has_or_had_admission_fee:
description: "Admission fee charged by the institution. Temporal as fees change. A string describing the fee amount or structure (free, \u20AC10, \u20AC5-15, etc.)."
range: string
slot_uri: hc:hasOrHadAdmissionFee
close_mappings:
- schema:price
- schema:priceRange
related_mappings:
- schema:offers
- gr:hasPriceSpecification
comments:
- schema:offers links to Offer objects, not fee amounts directly. An admission fee is a specific price value, not an offer.
annotations:
custodian_types: '["*"]'
custodian_types_rationale: Applicable to all heritage custodian types.
custodian_types_primary: M
specificity_score: 0.5
specificity_rationale: Moderately specific slot.

View file

@ -6,8 +6,14 @@ prefixes:
hc: https://nde.nl/ontology/hc/
linkml: https://w3id.org/linkml/
prov: http://www.w3.org/ns/prov#
schema: http://schema.org/
crm: http://www.cidoc-crm.org/cidoc-crm/
skos: http://www.w3.org/2004/02/skos/core#
rdfs: http://www.w3.org/2000/01/rdf-schema#
org: http://www.w3.org/ns/org#
xsd: http://www.w3.org/2001/XMLSchema#
imports:
- linkml:types
- linkml:types
default_prefix: hc
slots:
has_or_had_assigned_processor:

View file

@ -0,0 +1,33 @@
id: https://nde.nl/ontology/hc/slot/has_or_had_classification
name: has_or_had_classification_slot
title: has_or_had_classification slot
description: "Generic temporal classification slot following RiC-O naming pattern. Used for various classification schemes (biological, organizational, etc.).\nReplaces bespoke classification slots per Rule 53/56: - bio_type_classification \u2192 has_or_had_classification (in OutdoorSite)"
version: 1.0.0
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
skos: http://www.w3.org/2004/02/skos/core#
dcterms: http://purl.org/dc/terms/
prov: http://www.w3.org/ns/prov#
crm: http://www.cidoc-crm.org/cidoc-crm/
rdfs: http://www.w3.org/2000/01/rdf-schema#
org: http://www.w3.org/ns/org#
xsd: http://www.w3.org/2001/XMLSchema#
imports:
- linkml:types
default_prefix: hc
slots:
has_or_had_classification:
slot_uri: schema:additionalType
description: "Classification or categorization scheme value. Uses schema:additionalType for type classification compatibility.\nClasses narrow this slot's range via slot_usage to specific enum types: - OutdoorSite \u2192 BioCustodianTypeEnum (biological/botanical classification)"
range: uriorcurie
multivalued: true
exact_mappings:
- schema:additionalType
close_mappings:
- skos:Concept
annotations:
custodian_types:
- '*'
custodian_types_rationale: Universal utility concept

View file

@ -0,0 +1,45 @@
id: https://nde.nl/ontology/hc/slot/has_or_had_comprehensive_overview
name: has_or_had_comprehensive_overview_slot
title: Has Or Had Comprehensive Overview Slot
description: 'Generic slot for linking to comprehensive overview collections.
Follows RiC-O temporal naming convention to indicate the relationship may be current or historical.'
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
dcterms: http://purl.org/dc/terms/
schema: http://schema.org/
rico: https://www.ica.org/standards/RiC/ontology#
prov: http://www.w3.org/ns/prov#
crm: http://www.cidoc-crm.org/cidoc-crm/
skos: http://www.w3.org/2004/02/skos/core#
rdfs: http://www.w3.org/2000/01/rdf-schema#
org: http://www.w3.org/ns/org#
xsd: http://www.w3.org/2001/XMLSchema#
imports:
- linkml:types
- ../classes/Overview
default_prefix: hc
slots:
has_or_had_comprehensive_overview:
description: "Links an entity to a comprehensive overview collection of resources.\nFollows RiC-O temporal naming convention (`hasOrHad*`) to indicate the relationship may be current or historical.\n**USAGE**:\n```yaml finding_aid:\n has_or_had_comprehensive_overview:\n id: hc:overview/findingaid-links\n title: \"All Links\"\n includes_or_included:\n - url: https://example.org/link1\n link_text: \"Related Resource\"\n```\n**DESIGN RATIONALE**:\nThis is a GENERIC slot for linking to comprehensive collections of resources. Replaces domain-specific slots like `all_links` with a typed relationship to an `Overview` class.\n**MIGRATION NOTE** (2026-01-14):\nCreated as replacement for `all_links` slot. The new pattern: - Uses typed `Overview` class instead of untyped string list - Uses `includes_or_included` for WebLink composition - Enables richer metadata about link collections\n**ONTOLOGY ALIGNMENT**:\n- `dcterms:hasPart` - Dublin Core part-whole relationship - `schema:hasPart`\
\ - Schema.org containment - `rico:hasOrHadPart` - RiC-O temporal containment"
range: Overview
multivalued: false
inlined: true
slot_uri: dcterms:hasPart
exact_mappings:
- dcterms:hasPart
close_mappings:
- schema:hasPart
- rico:hasOrHadPart
annotations:
custodian_types: '["*"]'
custodian_types_rationale: Comprehensive overviews applicable to all heritage custodian types.
custodian_types_primary: A
specificity_score: 0.35
specificity_rationale: Low-moderate specificity - applicable across many contexts where comprehensive resource collections are needed.
comments:
- Replaces all_links slot
- Uses Overview class for typed collection
- Created from slot_fixes.yaml migration (2026-01-14)

View file

@ -0,0 +1,115 @@
id: https://nde.nl/ontology/hc/slot/has_or_had_custodian_type
name: has_or_had_custodian_type_slot
title: Has Or Had Custodian Type Slot
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
org: http://www.w3.org/ns/org#
rov: http://www.w3.org/ns/regorg#
skos: http://www.w3.org/2004/02/skos/core#
crm: http://www.cidoc-crm.org/cidoc-crm/
schema: http://schema.org/
dcterms: http://purl.org/dc/terms/
prov: http://www.w3.org/ns/prov#
rdfs: http://www.w3.org/2000/01/rdf-schema#
xsd: http://www.w3.org/2001/XMLSchema#
default_prefix: hc
imports:
- linkml:types
- ../classes/CustodianType
slots:
has_or_had_custodian_type:
slot_uri: org:classification
description: "The organizational type classification(s) of a heritage custodian within\nthe GLAMORCUBESFIXPHDNT taxonomy.\n\n**Predicate Semantics**:\nThis slot uses org:classification as its primary predicate, which links\nan organization to its type classification(s) using SKOS concepts.\n\n**Temporal Semantics** (RiC-O Pattern):\nThe \"hasOrHad\" naming follows RiC-O convention indicating this relationship\nmay be historical - an institution may have changed type over time\n(e.g., a library becoming a museum, or a mixed institution).\n\n**Ontological Alignment**:\n- **Primary** (`slot_uri`): `org:classification` - W3C Organization Ontology\n predicate for organizational classification (range: skos:Concept)\n- **Close**: `rov:orgType` - Registered Organization Vocabulary predicate\n (subPropertyOf org:classification, for legal entity types like GmbH, Ltd)\n- **Related**: `crm:P2_has_type` - CIDOC-CRM predicate for typing entities\n (domain: E1_CRM_Entity, range: E55_Type)\n- **Related**:\
\ `schema:additionalType` - Schema.org predicate for additional\n type classification beyond the primary @type\n- **Broad**: `dcterms:type` - Dublin Core predicate for resource type\n\n**Range**:\nValues are instances of `CustodianType` or its 19 subclasses:\n\n| Code | Subclass | Wikidata | Description |\n|------|--------------------------------|-----------|--------------------------------|\n| A | ArchiveOrganizationType | Q166118 | Archives |\n| B | BioCustodianType | Q167346 | Botanical gardens, zoos |\n| C | CommercialOrganizationType | Q6881511 | Corporate archives |\n| D | DigitalPlatformType | Q3565794 | Digital platforms |\n| E | EducationProviderType | Q3152824 | Educational institutions |\n| F | FeatureCustodianType | Q4989906 | Monuments, memorials |\n| G | GalleryType \
\ | Q1007870 | Art galleries |\n| H | HolySacredSiteType | Q1370598 | Religious heritage sites |\n| I | IntangibleHeritageGroupType | Q59544 | Intangible heritage orgs |\n| L | LibraryType | Q7075 | Libraries |\n| M | MuseumType | Q33506 | Museums |\n| N | NonProfitType | Q163740 | NGOs, advocacy groups |\n| O | OfficialInstitutionType | Q2659904 | Government agencies |\n| P | PersonalCollectionType | Q2668072 | Private collections |\n| R | ResearchOrganizationType | Q31855 | Research institutes |\n| S | HeritageSocietyType | Q476068 | Historical societies |\n| T | TasteScentHeritageType | Q5765838 | Culinary/olfactory heritage |\n| U | UnspecifiedType | Q35120 | Unknown\
\ type |\n| X | MixedCustodianType | Q35120 | Multiple types combined |\n\nEach CustodianType subclass provides:\n- Wikidata Q-number alignment (via schema:additionalType)\n- Multilingual labels (skos:prefLabel, skos:altLabel)\n- Hierarchical relationships (skos:broader, skos:narrower)\n- GHCID single-letter code derivation\n\n**Cardinality**:\nMultivalued - institutions may have multiple types (e.g., museum + archive).\nUse MixedCustodianType (X) for institutions with complex multi-type identity.\n"
range: CustodianType
required: false
multivalued: true
inlined_as_list: true
exact_mappings:
- org:classification
close_mappings:
- rov:orgType
related_mappings:
- crm:P2_has_type
- schema:additionalType
broad_mappings:
- dcterms:type
annotations:
rico_naming_convention: 'Follows RiC-O "hasOrHad" pattern for temporal predicates.
See Rule 39: Slot Naming Convention (RiC-O Style)
'
replaces_slots: custodian_type, custodian_types
migration_date: '2026-01-09'
predicate_clarification: 'slot_uri and mappings reference PREDICATES (properties), not classes.
- org:classification is a PREDICATE (links Organization to Concept)
- CustodianType is a CLASS (the range of valid values)
'
range_note: 'Range is CustodianType (abstract class). Valid values are the 19
CustodianType subclasses defined in modules/classes/:
- ArchiveOrganizationType.yaml
- BioCustodianType.yaml
- CommercialOrganizationType.yaml
- DigitalPlatformType.yaml
- EducationProviderType.yaml
- FeatureCustodianType.yaml
- GalleryType.yaml
- HolySacredSiteType.yaml
- IntangibleHeritageGroupType.yaml
- LibraryType.yaml
- MuseumType.yaml
- NonProfitType.yaml (N)
- OfficialInstitutionType.yaml
- PersonalCollectionType.yaml
- ResearchOrganizationType.yaml
- HeritageSocietyType.yaml
- TasteScentHeritageType.yaml
- UnspecifiedType.yaml
- MixedCustodianType.yaml
'
custodian_types:
- '*'
custodian_types_rationale: Universal utility concept
comments:
- Unified slot replacing custodian_type (singular) and custodian_types (plural)
- slot_uri=org:classification is a PREDICATE, not a class
- range=CustodianType is an ABSTRACT CLASS - valid values are its 19 subclasses
- 'RiC-O naming: hasOrHad indicates potentially historical relationship'
- 'Multivalued: institutions may have multiple type classifications'
examples:
- value: hc:MuseumType
description: Art museum classification (M code)
- value: hc:ArchiveOrganizationType
description: Archive classification (A code)
- value: '[hc:MuseumType, hc:ArchiveOrganizationType]'
description: Mixed institution with both museum and archive functions
- value: hc:MixedCustodianType
description: Explicit mixed type when institution defies single categorization (X code)

Some files were not shown because too many files have changed in this diff Show more