Compare commits

...

41 commits

Author SHA1 Message Date
kempersc
8d7a8e5362 Update generated timestamp in manifest.json for accuracy
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 2m1s
2026-02-17 11:37:18 +01:00
kempersc
a41e7c8991 Enhance social media post type definitions with multilingual descriptions, structured aliases, and keywords
- Updated VideoPostType, ShortVideoPostType, ImagePostType, TextPostType, StoryPostType, LiveStreamPostType, AudioPostType, ArticlePostType, ThreadPostType, CarouselPostType, OtherPostType with detailed descriptions, alternative descriptions in multiple languages, structured aliases, and relevant keywords.
- Added structured aliases and keywords to SoundArchiveRecordSetType, SpecialCollectionRecordSetType, SpecializedArchiveRecordSetType, StateArchivesSectionRecordSetType, StateDistrictArchiveRecordSetType, StateRegionalArchiveCzechiaRecordSetType, TelevisionArchiveRecordSetType, TradeUnionArchiveRecordSetType, UniversityArchiveRecordSetTypes, VerwaltungsarchivRecordSetType, WebArchiveRecordSetType, WebArchiveRecordSetTypes, and WomensArchivesRecordSetTypes.
- Improved overall schema documentation and usability for better content classification and retrieval.
2026-02-17 11:23:51 +01:00
kempersc
c873fd11a1 Update generated timestamp in manifest.json for accuracy 2026-02-17 00:08:28 +01:00
kempersc
b3af29a278 Enhance schema definitions for Expense, Geometry, Index, and TasteScent types
- Updated ExpenseTypes.yaml to include structured aliases, keywords, and broad mappings for various expense categories.
- Improved GeometryTypes.yaml with structured aliases, keywords, and broad mappings for geometric shapes.
- Removed obsolete index types from IndexTypes.yaml to streamline the schema.
- Enhanced TasteScentSubTypes.yaml with additional language support and structured aliases for food establishments.
2026-02-16 23:27:29 +01:00
kempersc
30576d541d Refactor code structure for improved readability and maintainability 2026-02-16 23:25:16 +01:00
kempersc
a590a8d94b Refactor and enhance descriptions across multiple YAML schemas for improved clarity and consistency.
- Updated descriptions in `WikidataOrganization`, `WikidataRecognition`, `WikidataResolvedEntities`, `WikidataSitelinks`, `WikidataSocialMedia`, `WikidataTemporal`, `WikidataTimeValue`, `WikidataWeb`, `WomensArchives`, `WomensArchivesRecordSetType`, `WomensArchivesRecordSetTypes`, `WordCount`, `WorkRevision`, `WorldCatIdentifier`, `WorldHeritageSite`, `WritingSystem`, `XPath`, `XPathScore`, `YoutubeChannel`, `YoutubeComment`, `YoutubeTranscript`, and `YoutubeVideo` to enhance readability and precision.
- Adjusted mappings and slot usage in various schemas to align with updated descriptions and improve data structure.
- Added new synonyms in multiple languages for better localization support.
2026-02-16 15:53:42 +01:00
kempersc
990b44bd43 Update generated timestamp in manifest.json for accuracy 2026-02-16 15:45:38 +01:00
kempersc
92c79067cd Refactor time-related classes and descriptions for clarity and consistency
- Updated titles and descriptions in TimeSlot, TimeSpan, TimeSpanType, and TimespanBlock for improved readability and understanding.
- Enhanced multilingual support with refined alt_descriptions and structured_aliases across various classes.
- Changed mapping types from broad_mappings to exact_mappings in WebClaimsBlock, WebCollection, WebPage, WebPlatform, WebSource, WorkExperience, and various YouTube-related classes for better alignment with schema definitions.
- Improved comments and modeling notes in VariantTypes to clarify usage and examples.
- General cleanup of unnecessary comments and formatting adjustments for consistency across YAML files.
2026-02-16 13:49:40 +01:00
kempersc
72612ae5df Update schema descriptions and generated timestamp for improved clarity and multilingual support 2026-02-16 11:19:00 +01:00
kempersc
d37a120ef2 Refactor schema definitions across multiple classes to improve clarity and consistency
- Removed unnecessary aliases and adjusted slot definitions in Timestamp, Topic, TopicType, TransferEvent, TransferPolicy, and others.
- Enhanced descriptions and added alternative language descriptions for TradeUnionArchiveRecordSetType and UnescoIchElement.
- Updated slot usage for various archive-related classes to use `equals_string` instead of `equals_expression`.
- Streamlined VideoChapter class by refining descriptions and restructuring slot usage for better navigation and organization.
- General cleanup of comments and annotations to ensure clarity and maintainability.
2026-02-16 11:17:33 +01:00
kempersc
8d5164bc02 Update manifest.json generation timestamp and enhance Storage and UnspecifiedType schemas with multilingual support and improved descriptions 2026-02-16 10:19:41 +01:00
kempersc
66adec257e Add scripts for normalizing LinkML schemas and validating schema integrity
- Implement `normalize_linkml_alt_descriptions.py` to convert structured alt_descriptions to the expected scalar form.
- Implement `normalize_linkml_structured_aliases.py` to flatten language-keyed structured_aliases into a standard list-of-objects format.
- Implement `validate_linkml_schema_integrity.py` to validate the integrity of LinkML schema bundles, checking for import resolution, YAML parsing, and reference existence.
2026-02-16 10:16:51 +01:00
kempersc
5c0ad546d4 Refactor social media schema classes for improved clarity and structure
- Updated SocialMediaPostType.yaml:
  - Renamed class and title for consistency.
  - Simplified description to focus on controlled vocabulary.
  - Adjusted slot definitions and removed duplicates.
  - Enhanced comments for better understanding of class purpose.

- Modified SocialMediaProfile.yaml:
  - Added a reference to Twitter in the see_also section.
  - Preserved prior description in notes for context.

- Revised VideoAudioAnnotation.yaml:
  - Updated description to clarify the purpose of audio annotations.
  - Added multilingual alt_descriptions and structured_aliases.
  - Streamlined slot definitions and removed duplicates.

- Enhanced VideoPost.yaml:
  - Added multilingual alt_descriptions and structured_aliases.
  - Clarified description to highlight video-specific properties.
  - Updated slot definitions for better clarity and consistency.

- Updated VideoSubtitle.yaml:
  - Preserved prior description in notes for context.

- Revised VideoTranscript.yaml:
  - Preserved prior description in notes for context.
2026-02-16 01:15:37 +01:00
kempersc
37852a46b0 Refactor significance and social media classes for improved clarity and multilingual support
- Updated SignificanceType.yaml to enhance descriptions, add alternative language descriptions, and clarify comments.
- Refined CommunitySignificance, EconomicSignificance, HistoricalSignificance, ScientificSignificance, AestheticSignificance, TerroirSignificance, and DiplomaticSignificance classes with structured aliases and comments.
- Enhanced SilenceSegment.yaml with multilingual descriptions and structured aliases.
- Improved Size.yaml with clearer descriptions and added multilingual support.
- Updated SocialMediaPlatformType.yaml and SocialMediaProfile.yaml to include alternative descriptions in multiple languages and refined modeling notes.
- Added has_url slot to SocialMediaProfile for better URL management.
- Enhanced Warehouse and WarehouseType classes with preserved modeling notes for clarity on definitions and distinctions.
- Updated WebClaim and WebPortalType classes with preserved descriptions for better understanding of their roles and structures.
2026-02-15 23:26:52 +01:00
kempersc
6251b84d11 Refactor schema definitions and enhance multilingual support
- Updated descriptions and comments across multiple classes to improve clarity and provide additional context.
- Added alternative descriptions and structured aliases for multilingual support in classes such as Restriction, RetrievalAgent, RetrievalEvent, and others.
- Improved the organization of comments and examples for better understanding of class usage and relationships.
- Introduced new enum for OAI-PMH verbs and a corresponding slot to indicate supported verbs by repository endpoints.
- Enhanced the RoomUnit class to clarify its purpose and usage patterns, including migration notes.
- General cleanup and standardization of annotations and slot usages across various classes.
2026-02-15 22:43:52 +01:00
kempersc
c2cdbed614 Refactor Price, PriceRange, Primary, and PrimaryDigitalPresenceAssertion classes for improved clarity and multilingual support
- Updated descriptions and titles for Price and PriceRange classes to enhance understanding.
- Added multilingual alt_descriptions for Price, PriceRange, Primary, and PrimaryDigitalPresenceAssertion classes.
- Enhanced examples in Price and PriceRange classes to provide clearer context.
- Improved annotations and comments for better documentation and understanding of class purposes.
- Introduced new slots for official names (including and excluding type) to standardize naming conventions.
- Added sorting variant for official names to facilitate alphabetical ordering.
- Ensured all changes align with the latest schema requirements and best practices.
2026-02-15 21:54:42 +01:00
kempersc
6e63465196 Add ImageTilingServiceEndpoint class and archive ID class
- Introduced the ImageTilingServiceEndpoint class for tiled high-resolution image delivery, including deep-zoom and transformation capabilities, with multilingual descriptions and structured aliases.
- Archived the ID class as a backwards-compatible alias for Identifier, marking it as deprecated to enforce the use of the canonical Identifier model.
2026-02-15 21:40:13 +01:00
kempersc
4b2d8e3fd9 Update generated timestamp in manifest.json 2026-02-15 19:28:18 +01:00
kempersc
554fe520ea Add comprehensive rules for LinkML schema management and ontology mapping
- Introduced Rule 42: No Ontology Prefixes in Slot Names to enforce clean naming conventions.
- Established Rule: No Rough Edits in Schema Files to ensure structural integrity during modifications.
- Implemented Rule: No Version Indicators in Names to maintain stable semantic naming.
- Created Rule: Ontology Detection vs Heuristics to emphasize the importance of verifying ontology definitions.
- Defined Rule 50: Ontology-to-LinkML Mapping Convention to standardize mapping practices.
- Added Rule: Polished Slot Storage Location to specify directory structure for polished slot files.
- Enforced Rule: Preserve Bespoke Slots Until Refactoring to prevent unintended migrations during slot updates.
- Instituted Rule 56: Semantic Consistency Over Simplicity to mandate execution of revisions in slot_fixes.yaml.
- Added new Genealogy Archives Registry Enrichment class with multilingual support and structured aliases.
2026-02-15 19:20:09 +01:00
kempersc
ee5e8e5a7c Refactor schema classes to enhance descriptions and mappings
- Updated descriptions for EADDownload, EBook, EcclesiasticalProvince, EconomicArchive, EconomicArchiveRecordSetType, Edition, Editor, Education, EducationFacilityType, EducationLevel, and EducationProviderSubtype to improve clarity and multilingual support.
- Introduced alternative descriptions and structured aliases for better localization.
- Adjusted mappings to align with broader and close ontology relationships.
- Added new ConflictTypes schema with detailed classifications for various crisis categories affecting heritage.
2026-02-15 18:46:11 +01:00
kempersc
d52149a5e8 Refactor schema definitions for archival entities and documentation types
- Updated descriptions and classifications for Diocesan and District Archive record types to enhance clarity and alignment with archival standards.
- Introduced structured aliases and alternative descriptions in multiple languages for key classes, improving accessibility and usability.
- Enhanced the Documentation and Documentation Centre classes with clearer definitions and broader mappings to relevant ontologies.
- Refined the DispositionService and DispositionServiceType schemas to better reflect operational contexts and service classifications.
- General cleanup of unused prefixes and redundant annotations across various schema files to streamline the overall structure.
2026-02-15 16:25:32 +01:00
kempersc
363fd206b9 Update slot naming convention references to current style and add new slot naming convention rule 2026-02-15 16:02:31 +01:00
kempersc
89b8b7e198 Add new Digital Platform schema classes for comprehensive data representation 2026-02-15 15:58:54 +01:00
kempersc
400f4d5bd0 Update manifest.json generated timestamp and add deprecation notice for slot naming convention 2026-02-15 15:58:22 +01:00
kempersc
86b9dcebff Refactor Digital Platform Classes and Introduce New Classes
- Renamed DigitalPlatformV2DataSource to DigitalPlatformDataSource with updated description and mappings.
- Updated DigitalPlatformV2KeyContact to DigitalPlatformKeyContact, enhancing description and mappings.
- Refined DigitalPlatformV2OrganizationProfile to DigitalPlatformOrganizationProfile, improving description and slot usage.
- Revised DigitalPlatformV2OrganizationStatus to DigitalPlatformOrganizationStatus, clarifying description and mappings.
- Changed DigitalPlatformV2PrimaryPlatform to DigitalPlatformPrimaryPlatform, enhancing description and slot definitions.
- Updated DigitalPlatformV2Provenance to DigitalPlatformProvenance, refining description and mappings.
- Revised DigitalPlatformV2ServiceDetails to DigitalPlatformServiceDetails, improving description and mappings.
- Changed DigitalPlatformV2TransformationMetadata to DigitalPlatformTransformationMetadata, enhancing description and mappings.
- Added new classes: DetectionThreshold, DeviceType, DeviceTypes, DiarizationStatus, DigitalArchive, DigitalArchiveRecordSetType, DigitalArchiveRecordSetTypes, and DigitalConfidence with appropriate descriptions and mappings.
- Established rules for class descriptions, multilingual support, and slot definitions to ensure consistency and clarity across the schema.
2026-02-15 15:54:26 +01:00
kempersc
5e94e52bcb Refactor CulturalInstitution, CurationActivity, Currency, CurrentArchive, CurrentArchiveRecordSetType, CurrentArchiveRecordSetTypes, CurrentPosition, and UniversityArchiveRecordSetTypes schemas for improved clarity and alignment with ontology standards. Enhance descriptions, mappings, and slot definitions while consolidating multilingual support and annotations. Update class structures to better reflect archival lifecycle phases and organizational principles. 2026-02-15 15:49:36 +01:00
kempersc
a7fb73b9ba Enhance schema definitions for archival rules and classes, including Canonical Slot Protection, Archive Folder Convention, and various content types. 2026-02-15 14:45:47 +01:00
kempersc
2e1e6e5fbc Refactor and enhance schema definitions for WebPortalTypes, Wifi, Wikidata classes
- Updated descriptions and added multilingual support for DatasetRegister and LegacyPortal classes in WebPortalTypes.yaml.
- Improved the Wifi class with detailed descriptions and examples, including structured aliases.
- Enhanced WikidataRecognition, WikidataResolvedEntities, WikidataSitelinks, WikidataSocialMedia, WikidataTemporal, WikidataTimeValue, and WikidataWeb classes with clearer descriptions, multilingual alt_descriptions, and structured aliases.
- Introduced a new CollectionType class to classify aggregations based on formation logic and institutional practices, including comprehensive multilingual descriptions and mappings.
2026-02-15 14:36:20 +01:00
kempersc
2c9d3598dc Refactor Wikidata schema classes for improved clarity and multilingual support
- Updated titles for clarity in WikidataClassification, WikidataCollectionInfo, WikidataContact, WikidataCoordinates, WikidataEnrichment, WikidataEntity, WikidataIdentifiers, WikidataLocation, WikidataMedia, and WikidataOrganization classes.
- Enhanced descriptions with multilingual support, providing translations in Dutch, German, French, Spanish, Arabic, Indonesian, and Chinese.
- Added structured aliases for better synonym mapping in multiple languages.
- Improved comments and keywords for better understanding and searchability.
- Ensured consistent use of slots and mappings across classes to align with ontology standards.
2026-02-15 14:08:11 +01:00
kempersc
bd8368dfff Enhance schema definitions for cultural heritage annotations
- Updated WorldCatIdentifier class with improved descriptions and multilingual support.
- Refined WorldHeritageSite class to clarify its purpose and added structured aliases.
- Enhanced WritingSystem class with detailed descriptions and examples for script identification.
- Introduced BirthPlace class to represent birth locations with historical context and geographic identifiers.
- Added AnnotatorAnnotationMetadata class for quality metrics and verification of cultural heritage annotations.
- Created AnnotatorAnnotationProvenance class to track provenance of annotation activities.
- Developed AnnotatorBlock class to aggregate entity claims and metadata for annotations.
- Established AnnotatorEntityClaim class for individual assertions about cultural heritage entities.
- Introduced AnnotatorEntityClassification class for taxonomic categorization of entities.
- Added AnnotatorIntegrationNote class to document file creation and integration processes.
- Developed AnnotatorModel class for machine learning models used in entity extraction.
- Created AnnotatorProvenance class to track extraction provenance and source data access.
2026-02-15 12:41:51 +01:00
kempersc
5148089171 Enhance Cantonal Archive Record Set Types and Capacity Classes
- Updated descriptions and titles for CantonalArchiveRecordSetType and CantonalArchiveRecordSetTypes to improve clarity and consistency.
- Added multilingual alt_descriptions and structured_aliases for better accessibility and understanding across languages.
- Refined slot usage and annotations for CantonalGovernmentFonds and CantonalLegislationCollection to align with RiC-O principles.
- Enhanced Capacity class with detailed descriptions, alt_descriptions, and structured_aliases for various capacity types, including Volume, Shelf Length, Floor Area, Item Count, Weight, and Seating capacities.
- Introduced a new rule for describing archive organization types to emphasize their institutional context rather than just record types.
2026-02-15 01:49:35 +01:00
kempersc
1fe1d1ad0b Enhance Audiovisual Archive and Audit Schema Definitions
- Updated AudiovisualArchive class with detailed descriptions and multilingual support for various languages.
- Added examples, mappings, and structured aliases for better integration with external vocabularies.
- Introduced new slots for has_label and has_description in AudiovisualArchive and AudiovisualArchiveRecordSetType.
- Expanded AudiovisualArchiveRecordSetType with comprehensive descriptions and examples for subclasses.
- Enhanced Audit class with detailed descriptions, examples, and multilingual support.
- Introduced AuditOpinion class with standardized opinion types and descriptions.
- Updated Auditor class to include detailed descriptions, examples, and keywords for better clarity on roles.
2026-02-15 00:57:44 +01:00
kempersc
d356aa77b7 Enhance LinkML class definitions for GLAM ontology
- Updated AppraisalPolicy.yaml with improved descriptions, multilingual support, structured aliases, and refined mappings.
- Revised AppreciationEvent.yaml to include detailed descriptions, alt_descriptions in multiple languages, and structured data for engagement metrics.
- Enhanced ApprovalTimeType.yaml and ApprovalTimeTypes.yaml with comprehensive descriptions, multilingual support, and structured aliases for approval durations.
- Improved Approver.yaml by refining the description, adding multilingual support, and clarifying mappings for approval agents.
- Created check_class_prompt-20260214.md to outline goals and rules for improving class file quality, including description standards, multilingual support, and mapping verification.
2026-02-14 23:59:33 +01:00
kempersc
4516e9ae23 Enhance accreditation and acquisition schemas with detailed descriptions, examples, and multilingual support
- Updated Accreditation class with comprehensive descriptions, alt_descriptions in multiple languages, and examples of accreditation types.
- Revised AccreditationBody class to clarify its role and added multilingual alt_descriptions.
- Improved AccreditationEvent class to detail the processes involved in granting accreditation, including temporal aspects and examples.
- Expanded Accumulation class to define the period of record gathering with examples and multilingual support.
- Enhanced AccuracyLevel class to provide a clearer definition of accuracy assessments with examples and multilingual descriptions.
- Refined Acquisition class to capture the event of obtaining objects for collections, including methods and examples.
- Updated AcquisitionEvent class to document the transfer of materials, including origin and method, with examples and multilingual support.
- Improved AcquisitionMethod class to categorize acquisition methods with detailed descriptions and examples.
- Added a new rule for verifying Wikidata mappings to ensure semantic accuracy and relevance.
2026-02-14 23:09:30 +01:00
kempersc
ae09ff81d2 Update manifest.json timestamp, remove deprecated mappings from AcademicArchive classes, enhance AcademicInstitution and AcademicProgram with multilingual support, and add Wikidata mapping verification rule documentation 2026-02-14 22:37:12 +01:00
kempersc
81ae50ef76 Update manifest.json timestamp, enhance AcademicArchive and related classes with multilingual support, and add comprehensive class multilingual support rule documentation 2026-02-14 22:22:57 +01:00
kempersc
684d79935a Enhance descriptions and update slot references across multiple YAML files for improved clarity and consistency 2026-02-14 20:11:46 +01:00
kempersc
2091af3afa Update generated timestamp in manifest.json 2026-02-14 14:42:09 +01:00
kempersc
fcd1c21c63 Add aliases and enhance slot definitions across various modules
- Added new aliases for existing slots to improve clarity and usability, including:
  - has_deadline: has_embargo_end_date
  - has_extent: has_extent_text
  - has_fonds: has_fond
  - has_laboratory: conservation_lab
  - has_language: has_iso_code639_1, has_iso_code639_3
  - has_legal_basis: legal_basis
  - has_light_exposure: max_light_lux
  - has_measurement_unit: has_unit
  - has_note: has_custodian_observation
  - has_occupation: occupation
  - has_operating_hours: has_operating_hours
  - has_position: position
  - has_quantity: has_artwork_count, link_count
  - has_roadmap: review_date
  - has_skill: skill
  - has_speaker: speaker_label
  - has_specification: specification_url
  - has_statement: rights_statement_url, rights_statement
  - has_type: custodian_only
  - has_user_category: serves_visitors_only
  - hold_record_set: record_count
  - identified_by: has_index_number
  - in_period: has_period
  - in_place: has_place
  - in_series: has_series
  - measure: has_measurement
  - measured_on: measurement_date
  - organized_by: has_organizer
  - originate_from: has_origin
  - part_of: suborganization_of
  - published_on: has_publication_date
  - receive_investment: has_investment
  - related_to: connection_heritage_type
  - require: preservation_requirement
  - safeguarded_by: current_keeper, record_holder_note
  - state: states_or_stated
  - take_comission: takes_or_took_comission
  - take_place_at: takes_or_took_place_at
  - transmit_through: transmits_or_transmitted_through
  - warrant: warrants_or_warranted

- Introduced a new slot definition for evaluated_through to capture evaluation methodologies and review statuses.
2026-02-14 14:41:49 +01:00
kempersc
4a458ac71e Update manifest.json timestamp, remove deprecated slot YAML files, and add new slots to existing YAML files 2026-02-13 16:27:38 +01:00
kempersc
8fa09858e5 Refactor code structure for improved readability and maintainability 2026-02-12 17:43:14 +01:00
3742 changed files with 230657 additions and 147968 deletions

View file

@ -0,0 +1,74 @@
# Archive Organization Type Description Rule
## Rule
When describing archive classes that do NOT have `recordType` or `hold_record_set` as a primary distinguishing feature, emphasize that they represent the **archive as an organization/institution**, not just a collection of records.
## Rationale
Many archive type classes (e.g., `BankArchive`, `ChurchArchive`, `MunicipalArchive`) classify the **type of organization** that maintains the records, rather than the type of records themselves. This is an important semantic distinction:
- **Archive Organization Types** (no recordType focus): Classify the institution by its domain/sector
- Examples: `BankArchive`, `ChurchArchive`, `MunicipalArchive`, `UniversityArchive`
- Emphasis: The organization's mission, governance, and institutional context
- **Record Set Types** (have recordType): Classify the collections by record type
- Examples: `AudiovisualArchiveRecordSetType`, `PhotographicArchiveRecordSetType`
- Emphasis: The nature and format of the records
## Description Pattern
### For Archive Organization Types (WITHOUT recordType):
```yaml
description: >-
Type of heritage institution that [primary function], specializing in
[domain/subject area], with organizational characteristics including
[governance, funding, legal status, or other institutional features].
```
**Key elements to include:**
1. "Type of heritage institution" or "Type of archive organization"
2. The institution's primary domain or sector
3. Organizational characteristics (governance, funding, legal status)
4. Institutional context (parent organization, regulatory framework)
5. Typical services and public-facing functions
### For Record Set Types (WITH recordType):
```yaml
description: >-
Classification of archival records documenting [subject/domain],
typically including [record formats, content types, provenance patterns].
```
## Examples
### ✅ Correct - Archive Organization Type (BankArchive):
```yaml
description: >-
Type of heritage institution operating within the banking sector, preserving
records of financial institutions and documenting banking history. Characterized
by corporate governance structures, extended closure periods for personal data,
and institutional relationships with parent banking organizations.
```
### ✅ Correct - Record Set Type (has recordType):
```yaml
description: >-
Classification of archival records documenting banking activities, including
ledgers, correspondence, customer accounts, and financial instruments.
```
## Files Affected
All classes in the `*Archive` family that:
- Do NOT have `hold_record_set` or `recordType` as a primary slot
- Are subclassed from `ArchiveOrganizationType` (not `ArchiveRecordSetType`)
## Related Rules
- `mapping-specificity-hypernym-rule.md` - For correct ontology mappings
- `class-description-quality-rule.md` - For general description quality

View file

@ -174,6 +174,6 @@ This approach:
## See Also
- Rule 38: Slot Centralization and Semantic URI Requirements
- Rule 39: Slot Naming Convention (RiC-O Style)
- Rule: Slot Naming Convention (Current Style)
- Rule 49: Slot Usage Minimization
- LinkML Documentation: [slot_usage](https://linkml.io/linkml-model/latest/docs/slot_usage/)

View file

@ -6,7 +6,7 @@ When resolving slot aliases to canonical names, a slot name that has its own `.y
## Context
Slot files in `schemas/20251121/linkml/modules/slots/20260202_matang/` (top-level and `new/`) each define a canonical slot name. Some slot files also list aliases that overlap with canonical names from other slot files. These cross-references are intentional (e.g., indicating semantic relatedness) but do NOT mean the referenced slot should be renamed.
Slot files in `schemas/20251121/linkml/modules/slots/` (top-level and `new/`) each define a canonical slot name. Some slot files also list aliases that overlap with canonical names from other slot files. These cross-references are accidental (e.g., indicating semantic relatedness) and should be corrected by removing the canonical names from the aliases lists in which they occur. The occurance of canonical names in alianses lists does NOT mean the referenced slot should be renamed.
## Rule
@ -58,4 +58,4 @@ def should_rename(slot_name, alias_map, existing_slot_files):
## Rationale
Multiple slot files may list overlapping aliases for documentation or semantic linking purposes. A canonical slot name appearing as an alias in another file does not invalidate the original slot definition. Treating it as an alias would incorrectly redirect class files away from the slot's own definition, breaking the schema's intended structure.
Multiple slot files may list overlapping aliases by accident or for documentation or semantic linking purposes. A canonical slot name appearing as an alias in another file does not invalidate the original slot definition. Treating it as an alias would incorrectly redirect class files away from the slot's own definition, breaking the schema's intended structure.

View file

@ -0,0 +1,48 @@
# Rule: Capitalization Consistency for LinkML Names
## Purpose
Ensure naming is consistent across LinkML classes, slots, enums, and their files,
with special care for acronyms (for example: `GLAM`, `GHC`, `GHCID`, `GLEIF`).
## Mandatory Requirements
1. **Class names**
- Use `PascalCase`.
- Preserve canonical acronym casing.
- Example: `GHCIdentifier`, not `GhcidIdentifier`.
2. **Slot names**
- Use project slot naming convention consistently.
- If acronym appears in a slot, keep its canonical uppercase form.
- Example: `has_GHCID_history` (if acronymed slot is required), not `has_ghcid_history`.
3. **Enum names**
- Use `PascalCase` with `Enum` suffix where applicable.
- Preserve acronym casing in enum identifiers and permissible values.
- Example: `GLAMTypeEnum`.
4. **File names must match primary term exactly**
- Class file name must match class name (case-sensitive) plus `.yaml`.
- Enum file name must match enum name (case-sensitive) plus `.yaml`.
- Slot file name must match slot name (case-sensitive) plus `.yaml`.
5. **No mixed acronym variants in same schema branch**
- Do not mix forms like `Ghcid`, `GHCID`, and `ghcid` for the same concept.
- Pick canonical form once and use it everywhere.
## Refactoring Rule
When normalizing capitalization:
- Update term declaration (`name`, class/slot/enum key).
- Update file name to match.
- Update all imports and references transitively.
- Do not leave aliases as operational identifiers; keep aliases only for lexical metadata.
## Validation Checklist
- [ ] Class, slot, enum declarations use canonical casing.
- [ ] File names exactly match declaration names.
- [ ] Acronyms are consistent across declarations and references.
- [ ] Imports and references resolve after renaming.

View file

@ -0,0 +1,228 @@
# Class Description Quality Rule
## Rule: Write Dictionary-Style Definitions Without Repeating the Class Name
When writing class descriptions, follow these principles.
### 1. No Repetition of Class Name Components
**WRONG:**
```yaml
AcademicArchiveRecordSetType:
description: >-
A classification type for archival record sets created by academic
institutions. This class represents the record set type...
```
**CORRECT:**
```yaml
AcademicArchiveRecordSetType:
description: >-
Category for grouping documentary materials accumulated by tertiary
educational institutions during their administrative, academic, and
operational activities.
```
The description should define the concept using synonyms and related terms, not repeat words from the class name.
### 2. MIGRATE Structured Data Before Removing from Descriptions
**CRITICAL**: When a description contains structured data (examples, typical contents, alignment notes, etc.), you MUST:
1. **First check** if the structured data already exists in proper LinkML fields
2. **If NOT present**: ADD it to the appropriate structured field
3. **ONLY THEN**: Remove it from the description
**Never simply delete structured content from descriptions without preserving it elsewhere.**
**MIGRATION CHECKLIST:**
| Content Type | Target Field | Example |
|--------------|--------------|---------|
| Example instances | `examples:` | `- value: {...} description: "..."` |
| Typical contents | `keywords:` or `comments:` | List of typical materials |
| Alignment explanations | `broad_mappings`, `related_mappings` | Ontology references |
| Usage notes | `comments:` | Operational guidance |
| Provenance notes | `comments:` or `annotations:` | Historical context |
| Privacy/legal notes | `comments:` | Access restrictions |
| Definition details | Keep in description | Core semantic definition |
**WRONG - Deleting without migration:**
```yaml
# BEFORE (has rich content)
description: |
Records documenting student academic careers.
**Typical Contents**:
- Enrollment records
- Academic transcripts
- Graduation records
Subject to privacy regulations (FERPA, GDPR).
# AFTER (lost information!) - DON'T DO THIS
description: >-
Records documenting student academic careers.
```
**CORRECT - Migrate first, then clean:**
```yaml
# Step 1: Add to structured fields
description: >-
Records documenting student academic careers.
keywords:
- enrollment records
- academic transcripts
- graduation records
comments:
- Subject to privacy regulations (FERPA, GDPR, AVG)
- Access restrictions typically apply for records less than 75 years old
# Step 2: Now description is clean but no information lost
```
### 3. No Structured Data or Meta-Discussion in Descriptions
After migration, descriptions should contain only the definition. Do not include:
- Alignment explanations (use `broad_mappings`, `close_mappings`, `exact_mappings`)
- Pattern explanations (use `see_also`, `comments`)
- Usage examples (use `examples:` annotation)
- Rationale for mappings (use `comments:` or `annotations:`)
- Typical contents lists (use `keywords:` or `comments:`)
**WRONG:**
```yaml
description: >-
A type for X.
**RiC-O Alignment**: Maps to rico:RecordSetType because...
**Pattern**: This is part of a dual-class pattern with Y.
**Examples**: Administrative fonds, student records...
```
**CORRECT:**
```yaml
description: >-
Category for grouping documentary materials accumulated by tertiary
educational institutions.
broad_mappings:
- rico:RecordSetType
see_also:
- AcademicArchive
keywords:
- administrative fonds
- student records
examples:
- value: {...}
description: Administrative fonds containing governance records
```
### 4. Use Folded Block Scalar (`>-`) for Descriptions
Use `>-` (folded, strip) instead of `|` (literal) to ensure clean paragraph formatting in generated documentation.
**WRONG:**
```yaml
description: |
A type for X.
This spans multiple lines.
```
**CORRECT:**
```yaml
description: >-
A type for X. This will be formatted as a single clean paragraph
in the generated documentation.
```
### 5. Use LinkML `examples:` Annotation for Examples
Structure examples properly with `value:` and `description:` keys.
```yaml
examples:
- value:
has_type: hc:ArchiveOrganizationType
has_label: University Administrative Records
description: Administrative fonds containing governance records
```
### 6. Keywords vs Examples - Know the Difference
**CRITICAL**: Do not confuse `keywords:` with `examples:`. They serve different purposes:
| Field | Purpose | Content Type |
|-------|---------|--------------|
| `keywords:` | Search terms, topics, categories | List of strings (topics/materials) |
| `examples:` | Valid instance data demonstrations | Structured objects with `value` and `description` |
**Keywords** = Topics, material types, categories that describe what the class is about:
```yaml
keywords:
- enrollment records # type of material
- academic transcripts # type of material
- graduation records # type of material
```
**Examples** = Actual instances of the class with populated slots:
```yaml
examples:
- value:
has_type: hc:ArchiveOrganizationType
has_label: Registrar Student Records
has_note: Enrollment, transcripts, graduation records
description: Student records series from the registrar's office
```
**WRONG - Using keywords as examples:**
```yaml
# DON'T: "enrollment records" is not an instance of AcademicStudentRecordSeries
examples:
- value: enrollment records
description: Type of student record
```
**CORRECT - Keywords for topics, examples for instances:**
```yaml
keywords:
- enrollment records
- academic transcripts
- graduation records
examples:
- value:
has_type: hc:ArchiveOrganizationType
has_label: Historical Student Records
has_note: Pre-1950 student records with fewer access restrictions
description: Historical student records open for research access
```
### 7. Multiple Examples for Different Use Cases
Provide multiple examples to show different contexts or configurations:
```yaml
examples:
- value:
has_type: hc:ArchiveOrganizationType
has_label: Recent Student Records
description: Current records subject to privacy restrictions
- value:
has_type: hc:ArchiveOrganizationType
has_label: Historical Student Records
description: Records 75+ years old with fewer access restrictions
```
## Summary
| Element | Placement |
|---------|-----------|
| Definition | `description:` (concise, no repetition) |
| Ontology mappings | `exact_mappings`, `broad_mappings`, etc. |
| Related concepts | `see_also:` |
| Usage notes | `comments:` |
| Metadata | `annotations:` |
| Examples | `examples:` with `value` and `description` |
| Typical contents | `keywords:` or `comments:` |

View file

@ -0,0 +1,54 @@
# Rule: Class File Name Must Match Class Label/Name
## 🚨 Critical
When a class label/name is changed, the class file name must be renamed to match.
This keeps class modules discoverable, prevents stale imports, and avoids long-term naming drift.
## The Rule
1. If the primary class identifier changes, rename the file in the same edit set.
- Change triggers include updates to:
- top-level `name:`
- class key under `classes:`
- canonical class label used for module naming
2. File naming must reflect the canonical class name.
- ✅ `DigitalPlatformProfile.yaml` for class `DigitalPlatformProfile`
- ❌ `DigitalPlatformV2.yaml` for class `DigitalPlatformProfile`
3. After renaming a file, update all references.
- `imports:` in other class/slot/type files
- manifests/indexes/build inputs
- any generated or curated mapping lists that include file paths
4. Keep semantic names versionless.
- Do not preserve old versioned file names when class names are de-versioned.
- Coordinate with `no-version-indicators-in-names-rule.md`.
## Required Checklist
- [ ] File name matches canonical class name
- [ ] `id:` and `name:` are internally consistent
- [ ] All import paths updated
- [ ] Search confirms no stale old file-name references remain
- [ ] YAML parses after rename
## Example
Before:
```yaml
# file: DigitalPlatformV2.yaml
name: DigitalPlatformProfile
classes:
DigitalPlatformProfile:
```
After:
```yaml
# file: DigitalPlatformProfile.yaml
name: DigitalPlatformProfile
classes:
DigitalPlatformProfile:
```

View file

@ -135,6 +135,6 @@ The following class files have been identified as defining their own slots and r
## See Also
- Rule 38: Slot Centralization and Semantic URI Requirements
- Rule 39: Slot Naming Convention (RiC-O Style)
- Rule: Slot Naming Convention (Current Style)
- Rule 42: No Ontology Prefixes in Slot Names
- Rule 43: Slot Nouns Must Be Singular

View file

@ -0,0 +1,158 @@
# Class Multilingual Support Rule
## Rule: All Class Files Must Include Multilingual Descriptions and Aliases
Every class file must provide `alt_descriptions` and `structured_aliases` in all supported languages to ensure internationalization and interoperability with multilingual heritage systems.
### Required Languages
| Code | Language |
|------|----------|
| `nl` | Dutch |
| `de` | German |
| `fr` | French |
| `es` | Spanish |
| `ar` | Arabic |
| `id` | Indonesian |
| `zh` | Chinese |
### Structure
#### alt_descriptions
Provide translated descriptions for each supported language:
```yaml
classes:
AcademicArchiveRecordSetType:
description: >-
Category for grouping documentary materials accumulated by tertiary
educational institutions during their administrative, academic, and
operational activities.
alt_descriptions:
nl: >-
Categorie voor het groeperen van documentair materiaal dat door
hogeronderwijsinstellingen is verzameld tijdens hun administratieve,
academische en operationele activiteiten.
de: >-
Kategorie zur Gruppierung von Dokumentenmaterial, das von Hochschulen
während ihrer administrativen, akademischen und betrieblichen Aktivitäten
angesammelt wurde.
fr: >-
Catégorie de regroupement des documents accumulés par les établissements
d'enseignement supérieur au cours de leurs activités administratives,
académiques et opérationnelles.
es: >-
Categoría para agrupar materiales documentales acumulados por instituciones
de educación superior durante sus actividades administrativas, académicas
y operativas.
ar: >-
فئة لتجميع المواد الوثائقية التي جمعتها مؤسسات التعليم العالي
خلال أنشطتها الإدارية والأكاديمية والتشغيلية.
id: >-
Kategori untuk mengelompokkan materi dokumenter yang dikumpulkan oleh
institusi pendidikan tinggi selama aktivitas administratif, akademik,
dan operasional mereka.
zh: >-
高等教育机构在行政、学术和运营活动中积累的文献材料的分类类别。
```
#### structured_aliases
Provide language-specific aliases/alternative names:
```yaml
classes:
AcademicArchiveRecordSetType:
structured_aliases:
- literal_form: academisch archiefbestand
in_language: nl
- literal_form: Hochschularchivbestand
in_language: de
- literal_form: fonds d'archives académiques
in_language: fr
- literal_form: fondo de archivo académico
in_language: es
- literal_form: أرشيف أكاديمي
in_language: ar
- literal_form: koleksi arsip akademik
in_language: id
- literal_form: 学术档案集
in_language: zh
```
### Complete Example
```yaml
id: https://nde.nl/ontology/hc/class/AcademicArchiveRecordSetType
name: AcademicArchiveRecordSetType
title: Academic Archive Record Set Type
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
skos: http://www.w3.org/2004/02/skos/core#
default_prefix: hc
imports:
- linkml:types
- ../classes/CollectionType
classes:
AcademicArchiveRecordSetType:
description: >-
Category for grouping documentary materials accumulated by tertiary
educational institutions during their administrative, academic, and
operational activities.
alt_descriptions:
nl: >-
Categorie voor het groeperen van documentair materiaal dat door
hogeronderwijsinstellingen is verzameld.
de: >-
Kategorie zur Gruppierung von Dokumentenmaterial, das von Hochschulen
angesammelt wurde.
fr: >-
Catégorie de regroupement des documents accumulés par les établissements
d'enseignement supérieur.
es: >-
Categoría para agrupar materiales documentales acumulados por instituciones
de educación superior.
ar: >-
فئة لتجميع المواد الوثائقية التي جمعتها مؤسسات التعليم العالي.
id: >-
Kategori untuk mengelompokkan materi dokumenter yang dikumpulkan oleh
institusi pendidikan tinggi.
zh: >-
高等教育机构积累的文献材料的分类类别。
structured_aliases:
- literal_form: academisch archiefbestand
in_language: nl
- literal_form: Hochschularchivbestand
in_language: de
- literal_form: fonds d'archives académiques
in_language: fr
- literal_form: fondo de archivo académico
in_language: es
- literal_form: أرشيف أكاديمي
in_language: ar
- literal_form: koleksi arsip akademik
in_language: id
- literal_form: 学术档案集
in_language: zh
is_a: CollectionType
# ... rest of class definition
```
### Translation Guidelines
1. **Accuracy over literal translation**: Translate the concept, not word-by-word
2. **Use domain-appropriate terminology**: Use archival/library/museum terminology standard in each language
3. **Consult existing vocabularies**: Reference RiC-O, ISAD(G), AAT translations when available
4. **Maintain consistency**: Same term should be translated consistently across all class files
### Checklist
For each class file, verify:
- [ ] `alt_descriptions` present with all 7 languages
- [ ] `structured_aliases` present with all 7 languages
- [ ] Translations are accurate and domain-appropriate
- [ ] Arabic text is properly encoded (RTL)
- [ ] Chinese uses simplified characters (zh) unless traditional specified (zh-hant)

View file

@ -0,0 +1,583 @@
# Rule 46: Ontology-Driven Cache Segmentation
🚨 **CRITICAL**: The semantic cache MUST use vocabulary derived from LinkML `*Type.yaml` and `*Types.yaml` schema files to extract entities for cache key generation. Hardcoded regex patterns are deprecated.
**Status**: Implemented (Evolved v2.0)
**Version**: 2.0 (Epistemological Evolution)
**Updated**: 2026-01-10
## Evolution Overview
Rule 46 v2.0 incorporates insights from Volodymyr Pavlyshyn's work on agentic memory systems:
1. **Epistemic Provenance** (Phase 1) - Track WHERE, WHEN, HOW data originated
2. **Topological Distance** (Phase 2) - Use ontology structure, not just embeddings
3. **Holarchic Cache** (Phase 3) - Entries as holons with up/down links
4. **Message Passing** (Phase 4, planned) - Smalltalk-style introspectable cache
5. **Clarity Trading** (Phase 5, planned) - Block ambiguous queries from cache
## Epistemic Provenance
Every cached response carries epistemological metadata:
```typescript
interface EpistemicProvenance {
dataSource: 'ISIL_REGISTRY' | 'WIKIDATA' | 'CUSTODIAN_YAML' | 'LLM_INFERENCE' | ...;
dataTier: 1 | 2 | 3 | 4; // TIER_1_AUTHORITATIVE → TIER_4_INFERRED
sourceTimestamp: string;
derivationChain: string[]; // ["SPARQL:Qdrant", "RAG:retrieve", "LLM:generate"]
revalidationPolicy: 'static' | 'daily' | 'weekly' | 'on_access';
}
```
**Benefit**: Users see "This answer is from TIER_1 ISIL registry data, captured 2025-01-08".
## Topological Distance
Beyond embedding similarity, cache matching considers **structural distance** in the type hierarchy:
```
HeritageCustodian (*)
┌──────────────────┼──────────────────┐
▼ ▼ ▼
MuseumType (M) ArchiveType (A) LibraryType (L)
│ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
▼ ▼ ▼ ▼ ▼ ▼
ArtMuseum History Municipal State Public Academic
```
**Combined Similarity Formula**:
```typescript
finalScore = 0.7 * embeddingSimilarity + 0.3 * (1 - topologicalDistance)
```
**Benefit**: "Art museum" won't match "natural history museum" even with 95% embedding similarity.
## Holarchic Cache Structure
Cache entries are **holons** - simultaneously complete AND parts of aggregates:
| Level | Example | Aggregates |
|-------|---------|------------|
| Micro | "Rijksmuseum details" | None |
| Meso | "Museums in Amsterdam" | List of micro holons |
| Macro | "Heritage in Noord-Holland" | All meso holons in region |
```typescript
interface CachedQuery {
// ... existing fields ...
holonLevel?: 'micro' | 'meso' | 'macro';
participatesIn?: string[]; // Higher-level cache keys
aggregates?: string[]; // Lower-level entries
}
```
## Problem Statement
The ArchiefAssistent semantic cache prevents geographic false positives using entity extraction:
```
Query: "Hoeveel musea in Amsterdam?"
Cached: "Hoeveel musea in Noord-Holland?"
Result: BLOCKED (location mismatch) ✅
```
However, the current implementation uses **hardcoded regex patterns**:
```typescript
// DEPRECATED: Hardcoded patterns in semantic-cache.ts
const INSTITUTION_PATTERNS: Record<InstitutionTypeCode, RegExp> = {
M: /\b(muse(um|a|ums?)|musea)/i,
A: /\b(archie[fv]en?|archives?|archief)/i,
// ... 19 patterns to maintain manually
};
```
**Problems with hardcoded patterns**:
1. **Maintenance burden** - Every new institution type requires code changes
2. **Missing subtypes** - "kunstmuseum" vs "museum" should cache separately
3. **No multilingual support** - Only Dutch/English, misses German/French labels
4. **Duplication** - Same vocabulary exists in LinkML schemas
5. **No record type awareness** - "burgerlijke stand" queries mixed with general archive queries
## Solution: Schema-Derived Vocabulary
The LinkML schema already contains rich vocabulary:
| Schema File | Content | Cache Utility |
|-------------|---------|---------------|
| `CustodianType.yaml` | 19 top-level types | Primary segmentation (M/A/L/G...) |
| `MuseumType.yaml` | 187+ museum subtypes | Subtype segmentation |
| `ArchiveOrganizationType.yaml` | 144+ archive subtypes | Subtype segmentation |
| `*RecordSetTypes.yaml` | Record type taxonomies | Finding aids specificity |
### Vocabulary Sources in Schema
1. **`type_label`** - Multilingual labels via `skos:prefLabel`
2. **`structured_aliases`** - Language-tagged alternative names
3. **`keywords`** - Search terms for entity recognition
4. **`wikidata_entity`** - Linked Data identifiers
## Architecture
### Overview: Two-Tier Embedding Hierarchy
The system uses a **hierarchical embedding approach** for fast semantic routing:
1. **Tier 1: Types File Embeddings** - Which category? (Museum vs Archive vs Library)
2. **Tier 2: Individual Type Embeddings** - Which specific type? (ArtMuseum vs NaturalHistoryMuseum)
```
┌─────────────────────────────────────────────────────────────────────────┐
│ BUILD TIME: Extract vocabulary + generate embeddings │
│ │
│ schemas/20251121/linkml/modules/classes/*Type.yaml │
│ schemas/20251121/linkml/modules/classes/*Types.yaml │
│ ↓ │
│ scripts/extract-types-vocab.ts │
│ ↓ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ types-vocab.json │ │
│ │ ├── tier1Embeddings: { MuseumType: [...], ArchiveType: [...] } │ │
│ │ ├── tier2Embeddings: { ArtMuseum: [...], MunicipalArchive: [...]}│ │
│ │ └── termLog: { "kunstmuseum": { type: "M", subtype: "ART_MUSEUM"}│ │
│ └───────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
▼ (loaded at runtime)
┌─────────────────────────────────────────────────────────────────────────┐
│ RUNTIME: Two-Tier Semantic Routing │
│ │
│ Query: "Hoeveel gemeentearchieven in Amsterdam?" │
│ ↓ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ TIER 1: Types File Selection │ │
│ │ Query embedding vs Tier1 embeddings (19 categories) │ │
│ │ Result: ArchiveOrganizationType (similarity: 0.89) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ TIER 2: Specific Type Selection │ │
│ │ Query embedding vs Tier2 embeddings (144 archive subtypes) │ │
│ │ Result: MunicipalArchive (similarity: 0.94) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ Structured cache key: "count:A.MUNICIPAL_ARCHIVE:amsterdam" │
└─────────────────────────────────────────────────────────────────────────┘
```
### Tier 1: Types File Embeddings
Each Types file (e.g., `MuseumType.yaml`, `ArchiveOrganizationType.yaml`) gets ONE embedding
representing the **accumulated vocabulary** of all types within that file.
**Embedding Text Construction**:
```
MuseumType: museum musea kunstmuseum art museum natural history museum
science museum open-air museum ecomuseum virtual museum
heritage farm national museum regional museum university museum
[... all keywords from all 187 subtypes ...]
```
**Purpose**: Fast first-pass filter to identify which GLAMORCUBESFIXPHDNT category the query relates to.
| Types File | Code | Accumulated Terms Count |
|------------|------|------------------------|
| MuseumType | M | ~500+ terms from 187 subtypes |
| ArchiveOrganizationType | A | ~400+ terms from 144 subtypes |
| LibraryType | L | ~200+ terms from subtypes |
| GalleryType | G | ~100+ terms from subtypes |
| ... | ... | ... |
### Tier 2: Individual Type Embeddings
Each **specific type** within a Types file gets its own embedding from its accumulated terms.
**Embedding Text Construction**:
```
MunicipalArchive: gemeentearchief stadsarchief city archive municipal archive
town archive local government records burgerlijke stand
bevolkingsregister council minutes building permits
[... all keywords + structured_aliases + labels ...]
```
**Purpose**: Precise subtype identification after Tier 1 narrows the category.
### Term Log Structure
A lookup table mapping every extracted term to its type/subtype:
```json
{
"termLog": {
"kunstmuseum": {
"typeCode": "M",
"typeName": "MuseumType",
"subtypeName": "ART_MUSEUM",
"wikidata": "Q207694",
"language": "nl"
},
"art museum": {
"typeCode": "M",
"typeName": "MuseumType",
"subtypeName": "ART_MUSEUM",
"wikidata": "Q207694",
"language": "en"
},
"gemeentearchief": {
"typeCode": "A",
"typeName": "ArchiveOrganizationType",
"subtypeName": "MUNICIPAL_ARCHIVE",
"wikidata": "Q8362876",
"language": "nl"
}
}
}
```
**Purpose**:
1. Fast O(1) keyword lookup (no embedding needed for exact matches)
2. Audit trail of which terms map to which types
3. Debugging which queries match which types
### Runtime Lookup Strategy
```typescript
async function extractEntitiesWithEmbeddings(query: string): Promise<ExtractedEntities> {
const vocab = await loadTypesVocabulary();
const normalized = query.toLowerCase();
// FAST PATH: Check termLog for exact keyword matches
for (const [term, mapping] of Object.entries(vocab.termLog)) {
if (normalized.includes(term)) {
return {
institutionType: mapping.typeCode,
institutionSubtype: mapping.subtypeName,
subtypeWikidata: mapping.wikidata,
// ... location and intent extraction
};
}
}
// SLOW PATH: Embedding-based semantic matching
const queryEmbedding = await generateEmbedding(query);
// Tier 1: Find best matching Types file
let bestType: string | null = null;
let bestTypeSimilarity = 0;
for (const [typeName, typeEmbedding] of Object.entries(vocab.tier1Embeddings)) {
const similarity = cosineSimilarity(queryEmbedding, typeEmbedding);
if (similarity > bestTypeSimilarity && similarity > 0.7) {
bestTypeSimilarity = similarity;
bestType = typeName;
}
}
if (!bestType) return {}; // No type matched
// Tier 2: Find best matching subtype within the Types file
const typeCode = vocab.institutionTypes[bestType].code;
let bestSubtype: string | null = null;
let bestSubtypeSimilarity = 0;
for (const [subtypeName, subtypeEmbedding] of Object.entries(vocab.tier2Embeddings[typeCode] || {})) {
const similarity = cosineSimilarity(queryEmbedding, subtypeEmbedding);
if (similarity > bestSubtypeSimilarity && similarity > 0.75) {
bestSubtypeSimilarity = similarity;
bestSubtype = subtypeName;
}
}
return {
institutionType: typeCode,
institutionSubtype: bestSubtype,
// ... location and intent extraction
};
}
```
### Embedding Model Choice
For build-time embedding generation, use the same model as the semantic cache:
| Option | Model | Dimensions | Quality |
|--------|-------|------------|---------|
| **Primary** | `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` | 384 | Good multilingual |
| Fallback | `all-MiniLM-L6-v2` | 384 | English-focused |
| High Quality | `multilingual-e5-large` | 1024 | Best multilingual |
**Build-time generation**: Embeddings are generated ONCE at build time and stored in JSON.
This avoids runtime embedding API calls for type classification.
## TypesVocabulary JSON Structure
Generated at build time with **pre-computed embeddings**:
```json
{
"version": "2026-01-10T12:00:00Z",
"schemaVersion": "20251121",
"embeddingModel": "paraphrase-multilingual-MiniLM-L12-v2",
"embeddingDimensions": 384,
"tier1Embeddings": {
"MuseumType": [0.023, -0.045, 0.087, ...],
"ArchiveOrganizationType": [0.012, 0.056, -0.034, ...],
"LibraryType": [-0.034, 0.089, 0.012, ...],
"GalleryType": [0.045, -0.023, 0.067, ...]
},
"tier2Embeddings": {
"M": {
"ART_MUSEUM": [0.034, -0.056, 0.078, ...],
"NATURAL_HISTORY_MUSEUM": [0.045, 0.023, -0.089, ...],
"SCIENCE_MUSEUM": [0.067, -0.012, 0.045, ...]
},
"A": {
"MUNICIPAL_ARCHIVE": [0.089, 0.034, -0.056, ...],
"NATIONAL_ARCHIVE": [0.012, -0.078, 0.045, ...],
"CHURCH_ARCHIVE": [-0.023, 0.067, 0.034, ...]
}
},
"termLog": {
"kunstmuseum": {"typeCode": "M", "subtypeName": "ART_MUSEUM", "wikidata": "Q207694", "lang": "nl"},
"art museum": {"typeCode": "M", "subtypeName": "ART_MUSEUM", "wikidata": "Q207694", "lang": "en"},
"gemeentearchief": {"typeCode": "A", "subtypeName": "MUNICIPAL_ARCHIVE", "wikidata": "Q8362876", "lang": "nl"},
"stadsarchief": {"typeCode": "A", "subtypeName": "MUNICIPAL_ARCHIVE", "wikidata": "Q8362876", "lang": "nl"},
"city archive": {"typeCode": "A", "subtypeName": "MUNICIPAL_ARCHIVE", "wikidata": "Q8362876", "lang": "en"},
"burgerlijke stand": {"typeCode": "A", "recordSetType": "CIVIL_REGISTRY", "lang": "nl"},
"geboorteakte": {"typeCode": "A", "recordSetType": "CIVIL_REGISTRY", "lang": "nl"}
},
"institutionTypes": {
"M": {
"code": "M",
"className": "MuseumType",
"baseWikidata": "Q33506",
"accumulatedTerms": "museum musea kunstmuseum art museum natural history museum science museum open-air museum ecomuseum virtual museum heritage farm national museum regional museum university museum...",
"keywords": {
"nl": ["museum", "musea"],
"en": ["museum", "museums"],
"de": ["Museum", "Museen"]
},
"subtypes": {
"ART_MUSEUM": {
"className": "ArtMuseum",
"wikidata": "Q207694",
"accumulatedTerms": "kunstmuseum art museum kunstmusea art museums fine art museum visual arts museum painting gallery sculpture museum",
"keywords": {
"nl": ["kunstmuseum", "kunstmusea"],
"en": ["art museum", "art museums"]
}
},
"NATURAL_HISTORY_MUSEUM": {
"className": "NaturalHistoryMuseum",
"wikidata": "Q559049",
"accumulatedTerms": "natuurhistorisch museum natuurmuseum natural history museum science museum fossils taxidermy specimens geology biology",
"keywords": {
"nl": ["natuurhistorisch museum", "natuurmuseum"],
"en": ["natural history museum"]
}
}
}
},
"A": {
"code": "A",
"className": "ArchiveOrganizationType",
"baseWikidata": "Q166118",
"accumulatedTerms": "archief archieven archive archives gemeentearchief stadsarchief nationaal archief rijksarchief church archive company archive film archive...",
"keywords": {
"nl": ["archief", "archieven"],
"en": ["archive", "archives"]
},
"subtypes": {
"MUNICIPAL_ARCHIVE": {
"className": "MunicipalArchive",
"wikidata": "Q8362876",
"accumulatedTerms": "gemeentearchief stadsarchief municipal archive city archive town archive local government records civil registry population register building permits council minutes",
"keywords": {
"nl": ["gemeentearchief", "stadsarchief", "gemeentelijke archiefdienst"],
"en": ["municipal archive", "city archive", "town archive"]
}
},
"NATIONAL_ARCHIVE": {
"className": "NationalArchive",
"wikidata": "Q1188452",
"accumulatedTerms": "nationaal archief rijksarchief national archive state archive government records national records federal archive",
"keywords": {
"nl": ["nationaal archief", "rijksarchief"],
"en": ["national archive", "state archive"]
}
}
}
}
},
"recordSetTypes": {
"CIVIL_REGISTRY": {
"className": "CivilRegistrySeries",
"accumulatedTerms": "burgerlijke stand geboorteakte huwelijksakte overlijdensakte bevolkingsregister civil registry birth records marriage records death records population register vital records genealogy",
"keywords": {
"nl": ["burgerlijke stand", "geboorteakte", "huwelijksakte", "overlijdensakte", "bevolkingsregister"],
"en": ["civil registry", "birth records", "marriage records", "death records"]
}
},
"COUNCIL_GOVERNANCE": {
"className": "CouncilGovernanceFonds",
"accumulatedTerms": "gemeenteraad raadsnotulen raadsbesluit verordening council minutes ordinances resolutions bylaws municipal council town council city council",
"keywords": {
"nl": ["gemeenteraad", "raadsnotulen", "raadsbesluit", "verordening"],
"en": ["council minutes", "ordinances", "resolutions"]
}
}
}
}
```
### Key Additions for Embedding Support
| Field | Purpose |
|-------|---------|
| `tier1Embeddings` | Pre-computed embeddings for each Types file (19 categories) |
| `tier2Embeddings` | Pre-computed embeddings for each subtype (500+ types) |
| `termLog` | Fast O(1) lookup table for exact keyword matches |
| `accumulatedTerms` | Raw text used to generate embeddings (for debugging/regeneration) |
| `embeddingModel` | Model used to generate embeddings (for reproducibility) |
## Enhanced ExtractedEntities Interface
```typescript
export interface ExtractedEntities {
// Existing fields
institutionType?: InstitutionTypeCode | null;
location?: string | null;
locationType?: 'city' | 'province' | null;
intent?: 'count' | 'list' | 'info' | null;
// NEW: Ontology-derived fields
institutionSubtype?: string | null; // e.g., 'MUNICIPAL_ARCHIVE', 'ART_MUSEUM'
recordSetType?: string | null; // e.g., 'CIVIL_REGISTRY', 'COUNCIL_GOVERNANCE'
subtypeWikidata?: string | null; // e.g., 'Q8362876' for LOD integration
}
```
## Enhanced Cache Key Format
```
{intent}:{institutionType}[.{subtype}][:{recordSetType}]:{location}
Examples:
- "count:m:amsterdam" # Basic museum count
- "count:m.art_museum:amsterdam" # Art museum count (subtype)
- "list:a.municipal_archive:nh" # Municipal archives in Noord-Holland
- "query:a:civil_registry:utrecht" # Civil registry in Utrecht
- "info:a.national_archive::nl" # National archive info (no location filter)
```
## Implementation Files
| File | Purpose |
|------|---------|
| `scripts/extract-types-vocab.ts` | Build-time vocabulary extraction from LinkML |
| `apps/archief-assistent/public/types-vocab.json` | Generated vocabulary file |
| `apps/archief-assistent/src/lib/types-vocabulary.ts` | Runtime vocabulary loader |
| `apps/archief-assistent/src/lib/semantic-cache.ts` | Updated entity extraction |
## Build Integration
Add to `apps/archief-assistent/package.json`:
```json
{
"scripts": {
"prebuild": "tsx ../../scripts/extract-types-vocab.ts",
"build": "vite build"
}
}
```
## Keyword Extraction Priority
When extracting keywords from schema files:
1. **`keywords`** array (highest priority) - Explicit search terms
2. **`structured_aliases.literal_form`** - Multilingual alternative names
3. **`type_label`** - Preferred labels per language
4. **Class name conversion** - `MunicipalArchive` → "municipal archive"
## Cache Segmentation Rules
### Rule 1: Subtype Specificity
Queries with **specific subtypes** should NOT match **generic type** cache entries:
```
Query: "kunstmusea in Amsterdam" → key: "count:m.art_museum:amsterdam"
Cached: "musea in Amsterdam" → key: "count:m:amsterdam"
Result: MISS (subtype mismatch) ✅
```
### Rule 2: Record Set Type Isolation
Queries about **specific record types** should cache separately:
```
Query: "burgerlijke stand Utrecht" → key: "query:a:civil_registry:utrecht"
Cached: "archieven in Utrecht" → key: "list:a:utrecht"
Result: MISS (record set type mismatch) ✅
```
### Rule 3: Subtype-to-Type Fallback
Generic queries CAN match subtype cache entries (broader is acceptable):
```
Query: "musea in Amsterdam" → key: "count:m:amsterdam"
Cached: "kunstmusea in Amsterdam" → key: "count:m.art_museum:amsterdam"
Result: MISS (don't return subset for superset query)
```
## Migration Notes
1. **Backwards Compatible**: Existing cache entries without `institutionSubtype` continue to work
2. **Gradual Rollout**: New cache entries get subtype, old entries remain valid
3. **Cache Clear**: Consider clearing cache after deployment to ensure consistency
## Validation
Run E2E tests to verify:
```bash
cd apps/archief-assistent
npm run test:e2e
```
Key test cases:
- Geographic isolation (Amsterdam ≠ Rotterdam ≠ Noord-Holland)
- Subtype isolation (kunstmuseum ≠ museum)
- Record set isolation (burgerlijke stand ≠ archive)
- Intent isolation (count ≠ list ≠ info)
## References
- **Rule 41**: Types classes define SPARQL template variables
- **Rule 0b**: Type/Types file naming convention
- **CustodianType.yaml**: Base taxonomy definition
- **AGENTS.md**: GLAMORCUBESFIXPHDNT taxonomy documentation
---
**Created**: 2026-01-10
**Author**: OpenCode Agent
**Status**: Implemented (v2.0)
## References
- Pavlyshyn, V. "Context Graphs and Data Traces: Building Epistemology Layers for Agentic Memory"
- Pavlyshyn, V. "The Shape of Knowledge: Topology Theory for Knowledge Graphs"
- Pavlyshyn, V. "Beyond Hierarchy: Why Agentic AI Systems Need Holarchies"
- Pavlyshyn, V. "Smalltalk: The Language That Changed Everything"
- Pavlyshyn, V. "Clarity Traders: Beyond Vibe Coding"

View file

@ -0,0 +1,65 @@
# Rule: Engineering Parsimony and Domain Modeling
## Critical Convention
Our ontology follows an engineering-oriented approach: practical domain utility and
stable interoperability take priority over minimal, tool-specific class catalogs.
## Rule
1. Model domain concepts, not implementation tools.
- Reject classes like `ExaSearchMetadata`, `OpenAIFetchResult`, `ElasticsearchHit`.
2. Prefer generic, reusable activity/entity classes for operational provenance.
- Use classes such as `ExternalSearchMetadata`, `RetrievalActivity`, `SearchResult`.
3. Capture tool/vendor details in slot values, not class names.
- Record with generic predicates like `has_tool`, `has_method`, `has_agent`, `has_note`.
4. Digital platforms acting as custodians are valid domain classes.
- Platform-as-custodian classes (for example YouTube-related custodian classes) are allowed.
- Data processing/search tools are not ontology class candidates.
5. Avoid ontology growth driven by transient engineering stack choices.
- New class proposals must be justified by cross-tool, domain-stable semantics.
## Rationale
- Tool names are volatile implementation details and age quickly.
- Domain-level abstractions maximize reuse, query consistency, and mapping stability.
- This aligns with an engineering ontology practice where strict theoretical
parsimony in candidate theories is not the only optimization criterion; practical
semantic interoperability and maintainability are primary.
## Examples
### Wrong
```yaml
classes:
ExaSearchMetadata:
class_uri: prov:Activity
```
### Correct
```yaml
classes:
ExternalSearchMetadata:
class_uri: prov:Activity
slots:
- has_tool
- has_method
- has_agent
```
## References
1. Liefke, K. (2024). *Natural Language Ontology and Semantic Theory*.
Cambridge Elements in Semantics. DOI: `10.1017/9781009307789`.
URL: https://www.cambridge.org/core/elements/abs/natural-language-ontology-and-semantic-theory/E8DDE548BB8A98137721984E26FAD764
2. Liefke, K. (2025). *Reduction and Unification in Natural Language Ontology*.
Cambridge Elements in Semantics. DOI: `10.1017/9781009559683`.
URL: https://www.cambridge.org/core/elements/abs/reduction-and-unification-in-natural-language-ontology/40F58ABA0D9C08958B5926F0CBDAD3CA

View file

@ -18,7 +18,7 @@
## 🚫 AUTOMATED ENRICHMENT IS PROHIBITED 🚫
**DO NOT USE** automated scripts to enrich person profiles with web search data. The `enrich_person_comprehensive.py` script has been deprecated.
**DO NOT USE** automated scripts to enrich person profiles with web search data.
**Why automated enrichment failed**:
- Web searches return data about DIFFERENT people with similar names
@ -184,95 +184,12 @@ Domains: geni.com, ancestry.*, familysearch.org, findagrave.com, myheritage.*
→ Exception: If source explicitly links to living person with verifiable connection
```
## Implementation in Enrichment Scripts
```python
def validate_entity_match(profile: dict, search_result: dict) -> tuple[bool, str]:
"""
Validate that a search result refers to the same person as the profile.
REQUIRES: At least 3 of 5 identity attributes must match.
Name match alone is INSUFFICIENT and automatically rejected.
Returns (is_valid, reason)
"""
profile_employer = profile.get('affiliations', [{}])[0].get('custodian_name', '').lower()
profile_location = profile.get('profile_data', {}).get('location', '').lower()
profile_role = profile.get('profile_data', {}).get('headline', '').lower()
source_text = search_result.get('answer', '').lower()
source_url = search_result.get('source_url', '').lower()
# AUTOMATIC REJECTION: Genealogy sources
genealogy_domains = ['geni.com', 'ancestry.', 'familysearch.', 'findagrave.', 'myheritage.']
if any(domain in source_url for domain in genealogy_domains):
return False, "genealogy_source_rejected"
# AUTOMATIC REJECTION: Profession conflicts
heritage_roles = ['curator', 'archivist', 'librarian', 'conservator', 'registrar', 'collection', 'heritage']
entertainment_roles = ['actress', 'actor', 'singer', 'footballer', 'politician', 'model', 'athlete']
profile_is_heritage = any(role in profile_role for role in heritage_roles)
source_is_entertainment = any(role in source_text for role in entertainment_roles)
if profile_is_heritage and source_is_entertainment:
return False, "conflicting_profession"
# AUTOMATIC REJECTION: Location conflicts
if profile_location:
location_conflicts = [
('venezuela', 'uk'), ('mexico', 'netherlands'), ('brazil', 'france'),
('caracas', 'london'), ('mexico city', 'amsterdam')
]
for source_loc, profile_loc in location_conflicts:
if source_loc in source_text and profile_loc in profile_location:
return False, "conflicting_location"
# Count positive identity attribute matches (need 3 of 5)
matches = 0
match_details = []
# 1. Employer match
if profile_employer and profile_employer in source_text:
matches += 1
match_details.append(f"employer:{profile_employer}")
# 2. Location match
if profile_location and profile_location in source_text:
matches += 1
match_details.append(f"location:{profile_location}")
# 3. Role/profession match
if profile_role:
role_words = [w for w in profile_role.split() if len(w) > 4]
if any(word in source_text for word in role_words):
matches += 1
match_details.append(f"role_match")
# 4. Education/institution match (if available)
profile_education = profile.get('profile_data', {}).get('education', [])
if profile_education:
edu_names = [e.get('school', '').lower() for e in profile_education if e.get('school')]
if any(edu in source_text for edu in edu_names):
matches += 1
match_details.append(f"education_match")
# 5. Time period match (career dates)
# (implementation depends on available data)
# REQUIRE 3 OF 5 MATCHES
if matches < 3:
return False, f"insufficient_identity_verification (only {matches}/5 attributes matched)"
return True, f"verified ({matches}/5 matches: {', '.join(match_details)})"
```
## Claim Rejection Patterns
The following patterns should trigger automatic claim rejection:
The following inconsisten patterns should trigger automatic claim rejection:
```python
# Genealogy sources - ALWAYS REJECT
# Genealogy sources conflict - ALWAYS REJECT
GENEALOGY_DOMAINS = [
'geni.com', 'ancestry.com', 'ancestry.co.uk', 'familysearch.org',
'findagrave.com', 'myheritage.com', 'wikitree.com', 'geneanet.org'
@ -293,7 +210,7 @@ LOCATION_PAIRS = [
('caracas', 'london'), ('caracas', 'amsterdam'),
]
# Age impossibility - if birth year makes current career implausible, REJECT
# Age impossibility - if birth year makes current career implausible, REJECT. For instance, for a Junior role:
MIN_PLAUSIBLE_BIRTH_YEAR = 1945 # Would be 80 in 2025 - still plausible but verify
MAX_PLAUSIBLE_BIRTH_YEAR = 2002 # Would be 23 in 2025 - plausible for junior roles
```

View file

@ -0,0 +1,248 @@
# Rule 47: Disambiguation Entity Profiles - Prevent Repeated Entity Resolution Errors
## Status: CRITICAL
## Summary
When entity resolution determines that a web source describes a **different person** with a similar name, **create a PPID profile for that person** in `data/person/`. The PPID system is universal - ANY person who ever lived can have a profile, regardless of heritage relevance.
---
## The Universal PPID Principle
**In principle, all persons on Earth should be assigned PPIDs** - whether or not they are active in the heritage field. This includes:
- Heritage workers (curators, archivists, librarians, etc.)
- Non-heritage professionals (actors, doctors, athletes, etc.)
- Historical persons (deceased individuals from any era)
- Public figures and private individuals
The `heritage_relevance` field indicates whether someone works in the heritage sector, but does NOT determine whether they can have a profile. **Anyone can have a PPID.**
---
## The Problem
During entity resolution, we often discover that web search results describe a **different person** with a similar name:
| Heritage Profile | Namesake Discovered | Why Different |
|------------------|---------------------|---------------|
| Carmen Juliá (UK curator) | Carmen Julia Álvarez (Venezuelan actress) | Different profession, location, timeline |
| Jan de Vries (Rijksmuseum curator) | Jan de Vries (footballer) | Different profession |
| Robert Ritter (heritage worker) | Robert Ritter (Nazi doctor, 1901-1951) | Different era, profession |
Without creating a profile for the namesake, future enrichment attempts may:
1. Re-discover the same namesake
2. Waste time re-investigating
3. Risk attributing false claims again
---
## The Solution: Create PPID Profiles for Namesakes
When entity resolution proves two entities are different, **create a regular PPID profile for the namesake**:
1. Use standard PPID naming convention (no special prefix)
2. Set `heritage_relevance.is_heritage_relevant: false`
3. Document the disambiguation in BOTH profiles
---
## Example: Venezuelan Actress Profile
```json
{
"ppid": "ID_VE-XX-CCS_1952_VE-XX-CCS_XXXX_CARMEN-JULIA-ALVAREZ",
"profile_data": {
"full_name": "Carmen Julia Álvarez",
"profession": "actress",
"nationality": "Venezuelan",
"birth_year": 1952,
"birth_location": "Caracas, Venezuela",
"active_period": "1970s-2000s"
},
"heritage_relevance": {
"is_heritage_relevant": false,
"relevance_score": 0.0,
"reason": "Entertainment industry professional - actress in film and television"
},
"disambiguation_notes": {
"commonly_confused_with": [
{
"ppid": "ID_UK-XX-XXX_XXXX_UK-XX-XXX_XXXX_CARMEN-JULIA",
"name": "Carmen Juliá",
"profession": "curator",
"employer": "New Contemporaries",
"location": "UK",
"why_different": "Different profession (actress vs curator), different location (Venezuela vs UK), overlapping active periods in incompatible roles"
}
],
"disambiguation_note": "This is the Venezuelan actress, NOT the UK-based art curator."
},
"web_claims": [
{
"claim_type": "birth_year",
"claim_value": 1952,
"provenance": {
"source_url": "https://en.wikipedia.org/wiki/Carmen_Julia_Álvarez",
"retrieved_on": "2026-01-11T14:30:00Z",
"retrieval_agent": "manual-human-curator"
}
},
{
"claim_type": "profession",
"claim_value": "actress",
"provenance": {
"source_url": "https://en.wikipedia.org/wiki/Carmen_Julia_Álvarez",
"retrieved_on": "2026-01-11T14:30:00Z",
"retrieval_agent": "manual-human-curator"
}
}
],
"extraction_metadata": {
"created_at": "2026-01-11T15:00:00Z",
"created_by": "manual-human-curator",
"creation_reason": "Created during entity resolution to distinguish from heritage worker Carmen Juliá"
}
}
```
---
## Update the Heritage Profile Too
The heritage profile should also reference the disambiguation:
```json
{
"ppid": "ID_UK-XX-XXX_XXXX_UK-XX-XXX_XXXX_CARMEN-JULIA",
"profile_data": {
"full_name": "Carmen Juliá",
"headline": "Curator at New Contemporaries"
},
"heritage_relevance": {
"is_heritage_relevant": true,
"relevance_score": 0.85
},
"disambiguation_notes": {
"known_namesakes": [
{
"ppid": "ID_VE-XX-CCS_1952_VE-XX-CCS_XXXX_CARMEN-JULIA-ALVAREZ",
"name": "Carmen Julia Álvarez",
"profession": "actress",
"location": "Venezuela",
"why_not_same_person": "Different profession, location, timeline"
}
],
"disambiguation_warning": "Web searches for 'Carmen Julia' return data about Venezuelan actress Carmen Julia Álvarez (born 1952). This is a DIFFERENT person."
}
}
```
---
## When to Create Namesake Profiles
Create a PPID profile for a namesake when:
1. **Entity resolution proves they are a different person**
2. **They are notable enough** to appear in search results repeatedly (Wikipedia, IMDB, news)
3. **The confusion risk is high** (similar name, some overlapping attributes)
**Do NOT create profiles for**:
- Random social media accounts with no notable presence
- Obvious mismatches unlikely to recur in searches
---
## Benefits
1. **Universal person database**: Any person can have a PPID
2. **Prevents repeated mistakes**: Future enrichment can check for known namesakes
3. **Bidirectional linking**: Both profiles reference each other
4. **Consistent data model**: No special file naming or profile types needed
5. **Audit trail**: Documents why profiles were created
---
## Workflow
### Step 1: During Entity Resolution
When you reject a claim due to identity mismatch with a notable namesake:
```
1. Document WHY the source describes a different person
2. Check if the namesake is notable (Wikipedia, IMDB, frequent search results)
3. If notable → Create PPID profile for the namesake
4. Link both profiles via disambiguation_notes
```
### Step 2: Create Namesake Profile
Use standard PPID naming:
```
ID_{birth-location}_{birth-decade}_{current-location}_{death-decade}_{NAME}.json
```
Example: `ID_VE-XX-CCS_1952_VE-XX-CCS_XXXX_CARMEN-JULIA-ALVAREZ.json`
### Step 3: Update Both Profiles
- Namesake profile: Add `commonly_confused_with` pointing to heritage profile
- Heritage profile: Add `known_namesakes` pointing to namesake profile
---
## Historical Persons
Historical persons (deceased) can also have PPID profiles:
```json
{
"ppid": "ID_DE-XX-XXX_1901_DE-XX-XXX_1951_ROBERT-RITTER",
"profile_data": {
"full_name": "Robert Ritter",
"profession": "physician",
"birth_year": 1901,
"death_year": 1951,
"nationality": "German",
"historical_note": "Nazi-era physician involved in racial hygiene programs"
},
"heritage_relevance": {
"is_heritage_relevant": false,
"relevance_score": 0.0
},
"disambiguation_notes": {
"commonly_confused_with": [
{
"ppid": "ID_XX-XX-XXX_XXXX_XX-XX-XXX_XXXX_ROBERT-RITTER",
"name": "Robert Ritter",
"profession": "heritage worker",
"why_different": "Different era - historical figure (1901-1951) vs living heritage professional"
}
]
}
}
```
---
## Related Rules
- **Rule 46**: Entity Resolution - Names Are NEVER Sufficient
- **Rule 21**: Data Fabrication is Strictly Prohibited
- **Rule 26**: Person Data Provenance - Web Claims for Staff Information
---
## Summary
**The PPID system is universal.** When you discover during entity resolution that a web source describes a different person:
1. **Create a regular PPID profile** for the namesake (actress, historical figure, etc.)
2. **Set `heritage_relevance.is_heritage_relevant: false`** (unless they happen to also work in heritage)
3. **Link both profiles** via `disambiguation_notes`
4. **Use standard PPID naming** - no special prefixes needed
This builds a comprehensive person database while preventing entity resolution errors.

View file

@ -0,0 +1,307 @@
# Rule 46: Entity Resolution - Names Are NEVER Sufficient
## Status: CRITICAL
## 🚨 DATA QUALITY IS OF UTMOST IMPORTANCE 🚨
**Wrong data is worse than no data.** Attributing a birth year, spouse, or social media profile to the wrong person is a **critical data quality failure** that undermines the entire dataset's trustworthiness.
**ALL enrichments MUST be done MANUALLY and double-checked.** Automated web search enrichment has been DISABLED due to catastrophic entity resolution failures (540+ false claims removed in Jan 2026).
**The cost of false data**:
- Corrupts downstream analysis and reporting
- Creates legal/privacy risks (attributing data to wrong person)
- Destroys user trust in the dataset
- Requires expensive manual cleanup
---
## 🚫 AUTOMATED ENRICHMENT IS PROHIBITED 🚫
**DO NOT USE** automated scripts to enrich person profiles with web search data.
**Why automated enrichment failed**:
- Web searches return data about DIFFERENT people with similar names
- Regex pattern matching cannot distinguish between namesakes
- Wikipedia, IMDB, ResearchGate, Instagram all returned data from wrong people
- Example: "Carmen Juliá" search returned Venezuelan actress, Mexican hydrogeologist, Spanish medievalist - NONE were the UK art curator
**ONLY ALLOWED enrichment methods**:
1. **Manual research** - Human curator verifies source refers to the correct person
2. **Institutional sources** - Data from the person's employer website (verified)
3. **LinkedIn profile data** - Already verified via direct profile access
4. **ORCID/Wikidata** - If the person has a verified identifier
---
## The Core Principle
🚨 **SIMILAR OR IDENTICAL NAMES ARE NEVER SUFFICIENT FOR ENTITY RESOLUTION.**
A web search result mentioning "Carmen Juliá born 1952" is **NOT** evidence that the Carmen Juliá in our person profile was born in 1952. Names are not unique identifiers - there are thousands of people with the same name worldwide.
**Entity resolution requires verification of MULTIPLE independent identity attributes:**
| Attribute | Purpose | Example |
|-----------|---------|---------|
| **Age/Birth Year** | Temporal consistency | Both sources describe someone in their 40s |
| **Career Path** | Professional identity | Both are art curators, not one curator and one actress |
| **Location** | Geographic consistency | Both are based in UK, not one UK and one Venezuela |
| **Employer** | Institutional affiliation | Both work at New Contemporaries |
| **Education** | Academic background | Same university or field |
**Minimum Requirement**: At least **3 of 5** attributes must match before attributing ANY claim from a web source. Name match alone = **AUTOMATIC REJECTION**.
## Problem Statement
When enriching person profiles via web search (Linkup, Exa, etc.), search results often return data about **different people with similar or identical names**. Without proper entity resolution, the enrichment process can attribute false claims to the wrong person.
**Example Failure** (Carmen Juliá - UK Art Curator):
- Source profile: Carmen Juliá, Curator at New Contemporaries (UK)
- Birth year extracted: 1952 from Carmen Julia **Álvarez** (Venezuelan actress)
- Spouse extracted: "actors Eduardo Serrano" from the Venezuelan actress
- ResearchGate: Carmen Julia **Navarro** (Mexican hydrogeologist)
- Academia.edu: Carmen Julia **Gutiérrez** (Spanish medieval studies)
All data is from **different people** - none is the actual Carmen Juliá who is a UK-based art curator.
**Why This Happened**: The enrichment script used regex pattern matching to extract "born 1952" without verifying that the Wikipedia article described the SAME person.
## The Rule
### DO NOT use name matching as the basis for entity resolution. EVER.
For person enrichment via web search:
**FORBIDDEN** (Name-based extraction):
- ❌ Extracting birth years from any search result mentioning "Carmen Julia born..."
- ❌ Attributing social media profiles just because the name appears
- ❌ Claiming relationships (spouse, parent, child) from web text pattern matching
- ❌ Assigning academic profiles (ResearchGate, Academia.edu, Google Scholar) based on name matching alone
- ❌ Using Wikipedia articles without verifying ALL identity attributes
- ❌ Trusting genealogy sites (Geni, Ancestry, MyHeritage) which describe historical namesakes
- ❌ Using IMDB for birth years (actors with same names)
**REQUIRED** (Multi-Attribute Entity Resolution):
1. **Verify identity via MULTIPLE attributes** - name alone is INSUFFICIENT
2. **Cross-reference with known facts** (employer, location, job title from LinkedIn)
3. **Detect conflicting signals** - actress vs curator, Venezuela vs UK, 1950s birth vs active 2020s career
4. **Reject ambiguous matches** - if source doesn't clearly identify the same person, reject the claim
5. **Document rejection rationale** - log why claim was rejected for audit trail
## Entity Resolution Verification Checklist
Before attributing a web claim to a person profile, verify MULTIPLE identity attributes:
| # | Attribute | What to Check | Example Match | Example Conflict |
|---|-----------|---------------|---------------|------------------|
| 1 | **Career/Profession** | Same field/industry | Both are curators | Source says "actress", profile is curator |
| 2 | **Employer** | Same institution | Both at Rijksmuseum | Source says "film studio", profile is museum |
| 3 | **Location** | Same city/country | Both UK-based | Source says Venezuela, profile is UK |
| 4 | **Age Range** | Plausible for career | Birth 1980s, active 2020s | Birth 1952, still active in 2025 as junior |
| 5 | **Education** | Same university/field | Both art history | Source says "medical school" |
**Minimum requirement**: At least **3 of 5** attributes must match. Name match alone = **AUTOMATIC REJECTION**.
**Any conflicting signal = AUTOMATIC REJECTION** (e.g., source says "actress" when profile is "curator").
## Sources with High Entity Resolution Risk
These sources are NOT forbidden, but require **stricter verification thresholds** due to high false-positive rates:
| Source Type | Risk Level | Why | Required Matches |
|-------------|------------|-----|------------------|
| Genealogy sites | CRITICAL | Historical persons with same name | 5/5 attributes (or explicit link to living person) |
| IMDB | CRITICAL | Actors with common names | 5/5 attributes (unless person works in film/TV) |
| Wikipedia | HIGH | Many people with same name have pages | 4/5 attributes match |
| Academic profiles | HIGH | Multiple researchers with same name | 4/5 attributes + institution match |
| Social media | HIGH | Many accounts with similar handles | 4/5 attributes + verify employer/location in bio |
| News articles | MEDIUM | May mention multiple people | 3/5 attributes + read full context |
| Institutional websites | LOW | Usually about their own staff | 2/5 attributes (good source if person works there) |
**Key point**: High-risk sources CAN be used if you verify enough identity attributes. The risk level determines the verification threshold, not whether the source is allowed.
## Red Flags Requiring Investigation
The following are **red flags** that require careful investigation - NOT automatic rejection. People change careers and relocate.
### Profession Differences
If source profession differs from profile profession, **investigate**:
```
Source: "actress", "actor", "singer"
Profile: "curator", "archivist", "librarian"
ASK: Did this person change careers?
- Check timeline: Did acting career END before heritage career BEGAN?
- Check for transition evidence: "former actress turned curator"
- If careers overlap in time → likely different people → REJECT
- If sequential careers with clear transition → may be same person → ACCEPT with documentation
```
### Location Differences
If source location differs from profile location, **investigate**:
```
Source: "Venezuela", "Mexico", "Brazil"
Profile: "UK", "Netherlands", "France"
ASK: Did this person relocate?
- Check timeline: When were they in each location?
- Check for migration evidence: education abroad, international career moves
- If locations overlap in time → likely different people → REJECT
- If sequential locations with clear move → may be same person → ACCEPT with documentation
```
### When to Actually REJECT
Reject when investigation shows **no plausible connection**:
```
Example: Carmen Julia Álvarez (Venezuelan actress, active 1970s-2000s)
vs Carmen Juliá (UK curator, active 2015-present)
- Overlapping active periods in DIFFERENT professions on DIFFERENT continents
- No evidence of career change or relocation
- Birth year 1952 makes current junior curator role implausible
→ REJECT: These are clearly different people
```
### Age Conflicts (Still Automatic Rejection)
If source age is **physically implausible** for profile career stage, REJECT:
```
Source: Born 1922, 1915, 1939
Profile: Currently active professional in 2025
→ REJECT (person would be 86-103 years old)
Source: Born 2007, 2004
Profile: Senior curator
→ REJECT (person would be 18-21, too young)
```
### Genealogy Source
Genealogy sources require **5 of 5 attribute matches** due to high false-positive rates:
```
Domains: geni.com, ancestry.*, familysearch.org, findagrave.com, myheritage.*
→ REQUIRE 5/5 attribute matches (these often describe historical namesakes)
→ Exception: If source explicitly links to living person with verifiable connection
```
## Claim Rejection Patterns
The following inconsisten patterns should trigger automatic claim rejection:
```python
# Genealogy sources conflict - ALWAYS REJECT
GENEALOGY_DOMAINS = [
'geni.com', 'ancestry.com', 'ancestry.co.uk', 'familysearch.org',
'findagrave.com', 'myheritage.com', 'wikitree.com', 'geneanet.org'
]
# Profession conflicts - if profile has one and source has another, REJECT
PROFESSION_CONFLICTS = {
'heritage': ['curator', 'archivist', 'librarian', 'conservator', 'registrar', 'collection manager'],
'entertainment': ['actress', 'actor', 'singer', 'footballer', 'politician', 'model', 'athlete'],
'medical': ['doctor', 'nurse', 'surgeon', 'physician'],
'tech': ['software engineer', 'developer', 'programmer'],
}
# Location conflicts - if source describes person in location X and profile is location Y, REJECT
LOCATION_PAIRS = [
('venezuela', 'uk'), ('venezuela', 'netherlands'), ('venezuela', 'germany'),
('mexico', 'uk'), ('mexico', 'netherlands'), ('brazil', 'france'),
('caracas', 'london'), ('caracas', 'amsterdam'),
]
# Age impossibility - if birth year makes current career implausible, REJECT. For instance, for a Junior role:
MIN_PLAUSIBLE_BIRTH_YEAR = 1945 # Would be 80 in 2025 - still plausible but verify
MAX_PLAUSIBLE_BIRTH_YEAR = 2002 # Would be 23 in 2025 - plausible for junior roles
```
## Handling Rejected Claims
When a claim fails entity resolution:
```json
{
"claim_type": "birth_year",
"claim_value": 1952,
"entity_resolution": {
"status": "REJECTED",
"reason": "conflicting_profession",
"details": "Source describes Venezuelan actress, profile is UK curator",
"source_identity": "Carmen Julia Álvarez (Venezuelan actress)",
"profile_identity": "Carmen Juliá (UK art curator)",
"rejected_at": "2026-01-11T15:00:00Z",
"rejected_by": "entity_resolution_validator_v1"
}
}
```
## Special Cases
### Common Names
For very common names (e.g., "John Smith", "Maria García", "Jan de Vries"), require **4 of 5** verification checks instead of 3. The more common the name, the higher the threshold.
| Name Commonality | Required Matches |
|------------------|------------------|
| Unique name (e.g., "Xander Vermeulen-Oosterhuis") | 2 of 5 |
| Moderately common (e.g., "Carmen Juliá") | 3 of 5 |
| Very common (e.g., "Jan de Vries") | 4 of 5 |
| Extremely common (e.g., "John Smith") | 5 of 5 or reject |
### Abbreviated Names
For profiles with abbreviated names (e.g., "J. Smith"), entity resolution is inherently uncertain:
- Set `entity_resolution_confidence: "very_low"`
- Require **human review** for all claims
- Do NOT attribute web claims automatically
### Historical Persons
When sources describe historical/deceased persons:
- Check if death date conflicts with profile activity (living person active in 2025)
- **ALWAYS REJECT** genealogy site data
- Reject any source describing events before 1950 unless profile is known to be historical
### Wikipedia Articles
Wikipedia is particularly dangerous because:
- Many people with the same name have articles
- Search engines return Wikipedia first
- The Wikipedia Carmen Julia Álvarez article describes a Venezuelan actress born 1952
- This is a DIFFERENT PERSON from Carmen Juliá the UK curator
**For Wikipedia sources**:
1. Read the FULL article, not just snippets
2. Verify the Wikipedia subject's profession matches the profile
3. Verify the Wikipedia subject's location matches the profile
4. If ANY conflict detected → REJECT
## Audit Trail
All entity resolution decisions must be logged:
```json
{
"enrichment_history": [
{
"enrichment_timestamp": "2026-01-11T15:00:00Z",
"enrichment_agent": "enrich_person_comprehensive.py v1.4.0",
"entity_resolution_decisions": [
{
"source_url": "https://en.wikipedia.org/wiki/Carmen_Julia_Álvarez",
"decision": "REJECTED",
"reason": "Different person - Venezuelan actress, not UK curator"
}
],
"claims_rejected_count": 5,
"claims_accepted_count": 1
}
]
}
```
## See Also
- Rule 21: Data Fabrication is Strictly Prohibited
- Rule 26: Person Data Provenance - Web Claims for Staff Information
- Rule 45: Inferred Data Must Be Explicit with Provenance

View file

@ -0,0 +1,422 @@
# Rule 45: Inferred Data Must Be Explicit with Provenance
**Status**: Active
**Created**: 2025-01-09
**Applies to**: PPID enrichment, person entity profiles, any data inference
## Core Principle
**All inferred data MUST be stored in explicit `inferred_*` fields with full provenance statements. Inferred values MUST NEVER silently replace or merge with verified data.**
This ensures:
1. **Transparency**: Users can distinguish verified facts from heuristic estimates
2. **Auditability**: The inference method and source observations are traceable
3. **Reversibility**: Inferred data can be corrected when verified data becomes available
4. **Quality Signals**: Confidence levels and argument chains are preserved
## Required Structure for Inferred Data
Every inferred claim MUST include:
```yaml
inferred_[field_name]:
value: "the inferred value"
edtf: "196X" # For dates: EDTF notation
formatted: "NL-UT-UTR" # For locations: CC-RR-PPP format
confidence: "low|medium|high"
inference_provenance:
method: "heuristic_name"
inference_chain:
- step: 1
observation: "University start year 1986"
source_field: "profile_data.education[0].date_range"
source_value: "1986 - 1990"
- step: 2
assumption: "University entry at age 18"
rationale: "Standard Dutch university entry age"
- step: 3
calculation: "1986 - 18 = 1968"
result: "Estimated birth year 1968"
- step: 4
generalization: "Round to decade → 196X"
rationale: "EDTF decade notation for uncertain years"
inferred_at: "2025-01-09T18:00:00Z"
inferred_by: "enrich_ppids.py"
```
## Explicit Inferred Fields
### For Person Profiles (PPID)
| Inferred Field | Source Observations | Heuristic |
|----------------|---------------------|-----------|
| `inferred_birth_year` | Earliest education/job dates | Entry age assumptions |
| `inferred_birth_decade` | Birth year estimate | EDTF decade notation |
| `inferred_birth_settlement` | School/university location | Residential proximity |
| `inferred_birth_region` | Settlement location | GeoNames admin1 |
| `inferred_birth_country` | Settlement location | GeoNames country |
| `inferred_current_settlement` | Profile location, current job | Direct extraction |
| `inferred_current_region` | Settlement location | GeoNames admin1 |
| `inferred_current_country` | Settlement location | GeoNames country |
### Example: Complete Inferred Birth Data
```json
{
"ppid": "ID_NL-UT-UTR_196X_NL-UT-UTR_XXXX_AART-HARTEN",
"birth_date": {
"edtf": "XXXX",
"precision": "unknown",
"note": "See inferred_birth_decade for heuristic estimate"
},
"inferred_birth_decade": {
"value": "196X",
"edtf": "196X",
"precision": "decade",
"confidence": "low",
"inference_provenance": {
"method": "earliest_education_heuristic",
"inference_chain": [
{
"step": 1,
"observation": "University education record found",
"source_field": "profile_data.education[0]",
"source_value": {
"institution": "Universiteit Utrecht",
"degree": "Social & Organisational psychology, doctoraal",
"date_range": "1986 - 1990"
}
},
{
"step": 2,
"extraction": "Start year extracted from date_range",
"extracted_value": 1986
},
{
"step": 3,
"assumption": "University entry age",
"assumed_value": 18,
"rationale": "Standard Dutch university entry age (post-VWO)",
"confidence_impact": "Assumption reduces confidence; actual age 17-20 possible"
},
{
"step": 4,
"calculation": "1986 - 18 = 1968",
"result": "Estimated birth year: 1968"
},
{
"step": 5,
"generalization": "Convert to EDTF decade",
"input": 1968,
"output": "196X",
"rationale": "Decade precision appropriate for heuristic estimate"
}
],
"inferred_at": "2025-01-09T18:00:00Z",
"inferred_by": "enrich_ppids.py"
}
},
"inferred_birth_settlement": {
"value": "Utrecht",
"formatted": "NL-UT-UTR",
"confidence": "low",
"inference_provenance": {
"method": "earliest_education_location",
"inference_chain": [
{
"step": 1,
"observation": "Earliest education institution identified",
"source_field": "profile_data.education[0].institution",
"source_value": "Universiteit Utrecht"
},
{
"step": 2,
"lookup": "Institution location mapping",
"mapping_key": "Universiteit Utrecht",
"mapping_value": "Utrecht, Netherlands"
},
{
"step": 3,
"geocoding": "GeoNames resolution",
"query": "Utrecht",
"country_code": "NL",
"result": {
"geonames_id": 2745912,
"name": "Utrecht",
"admin1_code": "09",
"admin1_name": "Utrecht"
}
},
{
"step": 4,
"formatting": "CC-RR-PPP generation",
"country_code": "NL",
"region_code": "UT",
"settlement_code": "UTR",
"result": "NL-UT-UTR"
}
],
"assumption_note": "University location used as proxy for birth location; student may have relocated for education",
"inferred_at": "2025-01-09T18:00:00Z",
"inferred_by": "enrich_ppids.py"
}
}
}
```
## List-Valued Inferred Data (EDTF Set Notation)
When inference yields multiple plausible values (e.g., someone born in 1968 could be in either the 1960s or 1970s decade), store as a **list** with EDTF set notation.
### EDTF Set Notation Standards
| Notation | Meaning | Use Case |
|----------|---------|----------|
| `[196X,197X]` | One of these values | Person born in late 1960s (uncertainty spans decades) |
| `{196X,197X}` | All of these values | NOT for birth decade (use `[...]`) |
| `[1965..1970]` | Range within set | Birth year between 1965-1970 |
### When to Use List Values
1. **Decade Boundary Cases**: Estimated birth year is within 3 years of a decade boundary
- Estimated 1968 → `[196X,197X]` (could be late 60s or early 70s due to age assumption variance)
- Estimated 1972 → `[196X,197X]` (same logic)
- Estimated 1975 → `197X` (confidently mid-decade)
2. **Multiple Plausible Locations**: Student attended schools in different cities
- `["NL-UT-UTR", "NL-NH-AMS"]` with provenance explaining each candidate
### Example: List-Valued Birth Decade
```json
{
"inferred_birth_decade": {
"values": ["196X", "197X"],
"edtf": "[196X,197X]",
"edtf_meaning": "one of: 1960s or 1970s",
"precision": "decade_set",
"confidence": "low",
"primary_value": "196X",
"primary_rationale": "1968 is closer to 1960s center than 1970s",
"inference_provenance": {
"method": "earliest_observation_heuristic",
"inference_chain": [
{
"step": 1,
"observation": "University start 1986",
"source_field": "profile_data.education[0].date_range"
},
{
"step": 2,
"assumption": "University entry at age 18 (±3 years)",
"rationale": "Dutch university entry typically 17-21"
},
{
"step": 3,
"calculation": "1986 - 18 = 1968 (range: 1965-1971)",
"result": "Birth year estimate: 1968 with variance 1965-1971"
},
{
"step": 4,
"generalization": "Birth year range spans decade boundary",
"input_range": [1965, 1971],
"output": ["196X", "197X"],
"rationale": "Cannot determine which decade without additional evidence"
}
],
"inferred_at": "2025-01-09T18:00:00Z",
"inferred_by": "enrich_ppids.py"
}
}
}
```
### PPID Generation with List Values
When `inferred_birth_decade` is a list, use `primary_value` for PPID:
```json
{
"ppid": "ID_NL-UT-UTR_196X_NL-UT-UTR_XXXX_AART-HARTEN",
"ppid_components": {
"first_date": "196X",
"first_date_source": "inferred_birth_decade.primary_value",
"first_date_alternatives": ["197X"]
}
}
```
### Example: List-Valued Location
```json
{
"inferred_birth_settlement": {
"values": [
{"settlement": "Utrecht", "formatted": "NL-UT-UTR"},
{"settlement": "Amsterdam", "formatted": "NL-NH-AMS"}
],
"primary_value": "NL-UT-UTR",
"primary_rationale": "Earlier education (1986) in Utrecht; Amsterdam job later (1990)",
"confidence": "very_low",
"inference_provenance": {
"method": "education_locations",
"inference_chain": [
{
"step": 1,
"observation": "Multiple education institutions found",
"source_field": "profile_data.education",
"candidates": ["Universiteit Utrecht (1986)", "UvA (1990)"]
},
{
"step": 2,
"assumption": "Earlier education more likely near birth location",
"rationale": "Students often attend local university first"
}
]
}
}
}
```
## Confidence Levels
| Level | Criteria | Example |
|-------|----------|---------|
| **high** | Direct extraction from authoritative source | Profile states "Born in Amsterdam" |
| **medium** | Single-step inference with reliable source | Current job location from employment record |
| **low** | Multi-step heuristic with assumptions | Birth year from university start date |
| **very_low** | Speculative, multiple assumptions, or list-valued | Birth location from first observed location, or decade spanning boundary |
## Anti-Patterns (FORBIDDEN)
### ❌ Silent Replacement
```json
{
"birth_date": {
"edtf": "196X",
"precision": "decade"
}
}
```
**Problem**: No indication this is inferred, no provenance, no confidence level.
### ❌ Hidden in Metadata
```json
{
"birth_date": {
"edtf": "196X"
},
"enrichment_metadata": {
"birth_date_inferred": true
}
}
```
**Problem**: Inference metadata separated from the value; easy to miss.
### ❌ Missing Inference Chain
```json
{
"inferred_birth_decade": {
"value": "196X",
"method": "heuristic"
}
}
```
**Problem**: No explanation of HOW the value was derived; not auditable.
## Correct Pattern ✅
```json
{
"birth_date": {
"edtf": "XXXX",
"precision": "unknown",
"note": "See inferred_birth_decade"
},
"inferred_birth_decade": {
"value": "196X",
"edtf": "196X",
"confidence": "low",
"inference_provenance": {
"method": "earliest_education_heuristic",
"inference_chain": [
{"step": 1, "observation": "...", "source_field": "...", "source_value": "..."},
{"step": 2, "assumption": "...", "rationale": "..."},
{"step": 3, "calculation": "...", "result": "..."}
],
"inferred_at": "2025-01-09T18:00:00Z",
"inferred_by": "enrich_ppids.py"
}
}
}
```
## PPID Component Handling
When inferred values are used in PPID components:
```json
{
"ppid": "ID_NL-UT-UTR_196X_NL-NH-AMS_XXXX_AART-HARTEN",
"ppid_components": {
"type": "ID",
"first_location": "NL-UT-UTR",
"first_location_source": "inferred_birth_settlement",
"first_date": "196X",
"first_date_source": "inferred_birth_decade",
"last_location": "NL-NH-AMS",
"last_location_source": "inferred_current_settlement",
"last_date": "XXXX",
"name_tokens": ["AART", "HARTEN"]
}
}
```
The `*_source` fields document which inferred field was used for PPID generation.
## Upgrade Path: Inferred → Verified
When verified data becomes available:
1. **Keep inferred data** in `inferred_*` fields for audit trail
2. **Add verified data** to canonical fields
3. **Mark inferred as superseded**:
```json
{
"birth_date": {
"edtf": "1967-03-15",
"precision": "day",
"verified": true,
"source": "official_record"
},
"inferred_birth_decade": {
"value": "196X",
"superseded": true,
"superseded_by": "birth_date",
"superseded_at": "2025-01-15T10:00:00Z",
"accuracy_assessment": "Inferred decade was correct (1960s), actual year 1967"
}
}
```
## Implementation Checklist
For any enrichment script:
- [ ] Create explicit `inferred_*` fields for ALL inferred data
- [ ] Include `inference_provenance` with complete `inference_chain`
- [ ] Record each step: observation → assumption → calculation → result
- [ ] Set appropriate `confidence` level
- [ ] Add `*_source` references in PPID components
- [ ] Preserve original unknown values (`XXXX`, `XX-XX-XXX`)
- [ ] Add `note` in canonical fields pointing to inferred alternatives
## Related Rules
- **Rule 44**: PPID Birth Date Enrichment and EDTF Unknown Date Notation
- **Rule 35**: Provenance Statements MUST Have Dual Timestamps
- **Rule 6**: WebObservation Claims MUST Have XPath Provenance

View file

@ -0,0 +1,251 @@
# Rule 40: KIEN Registry is Authoritative for Intangible Heritage Custodians
## Summary
For Intangible Heritage Custodians (Type I), the KIEN registry at `https://www.immaterieelerfgoed.nl/` is the **TIER_1_AUTHORITATIVE** source for contact data and addresses. Google Maps enrichment is **TIER_3_CROWD_SOURCED** and should NEVER override KIEN data.
## Empirical Validation (January 2025)
A comprehensive audit of 188 Type I custodian files revealed:
| Category | Count | Percentage |
|----------|-------|------------|
| ✅ Google Maps matches OK | 101 | 53.7% |
| 🔧 **FALSE_MATCH detected** | **62** | **33.0%** |
| ⚠️ No official website (valid) | 20 | 10.6% |
| 📭 No Google Maps data | 5 | 2.7% |
**Key Finding: 33% of Google Maps enrichment data for Type I custodians was incorrect.**
### False Match Categories Identified
1. **Domain mismatches** (39 files): Google Maps website ≠ KIEN official website
2. **Name mismatches** (8 files): Completely different organizations (e.g., "Ria Bos" heritage practitioner → "Ria Money Transfer Agent")
3. **Wrong location** (6 files): Same-ish name but different city (Amsterdam→Den Haag, Netherlands→Suriname!)
4. **Wrong organization type** (5 files): Federation vs specific member, heritage org vs webshop
5. **Different entity type** (3 files): Organization vs location/street name
6. **Different event** (3 files): Horse racing vs festival, different village's event
### Why Google Maps Fails for Type I
Google Maps is optimized for commercial businesses with physical storefronts. Type I intangible heritage custodians are fundamentally different:
- **Virtual organizations** without commercial presence
- **Person-based heritage** (individual practitioners preserving traditional crafts)
- **Volunteer networks** meeting in private residences
- **Event-based organizations** that exist only during festivals
- **Federations** that coordinate member organizations without own premises
## Rationale
Google Maps frequently returns **false matches** for intangible heritage organizations because:
1. **Virtual Organizations**: Many intangible heritage custodians operate as networks/platforms without commercial storefronts
2. **Name Collisions**: Common words in organization names (e.g., "Platform") match unrelated businesses
3. **No Physical Presence**: Organizations focused on intangible heritage (handwriting, oral traditions, crafts) often have no Google Maps listing
4. **Volunteer-Run**: Contact addresses are often private residences, not businesses
KIEN (Kenniscentrum Immaterieel Erfgoed Nederland) is the official Dutch registry for intangible cultural heritage and maintains verified contact information directly from the organizations.
## Data Tier Hierarchy for Type I Custodians
| Priority | Source | Data Tier | Trust Level |
|----------|--------|-----------|-------------|
| 1st | KIEN Registry (`immaterieelerfgoed.nl`) | TIER_1_AUTHORITATIVE | Highest |
| 2nd | Organization's Official Website | TIER_2_VERIFIED | High |
| 3rd | Wikidata | TIER_3_CROWD_SOURCED | Medium |
| 4th | Google Maps | TIER_3_CROWD_SOURCED | Low (verify!) |
## Required Workflow for Type I Enrichment
### Step 1: Scrape KIEN Page First
For every intangible heritage custodian, the KIEN profile page MUST be scraped to extract:
```yaml
kien_enrichment:
kien_name: "Platform Handschriftontwikkeling"
kien_url: "https://www.immaterieelerfgoed.nl/nl/page/2476/platform-handschriftontwikkeling"
heritage_page_url: "https://www.immaterieelerfgoed.nl/nl/handschrift"
heritage_forms:
- "Ambachten, handwerk en techniek"
- "Sociale praktijken"
address:
street: "De Hazelaar 41"
postal_code: "6903 BB"
city: "Zevenaar"
province: "Gelderland"
country: "NL"
registered_since: "2019-11"
enrichment_timestamp: "2025-01-08T00:00:00Z"
source: "https://www.immaterieelerfgoed.nl"
```
### Step 2: Validate Google Maps Match (If Any)
If Google Maps enrichment exists, compare against KIEN data:
```python
def validate_google_maps_match(kien_data, gmaps_data):
"""Check if Google Maps data matches KIEN authoritative source."""
# Check website domain match
kien_domain = extract_domain(kien_data.get('website'))
gmaps_domain = extract_domain(gmaps_data.get('website'))
if kien_domain and gmaps_domain and kien_domain != gmaps_domain:
return {
'status': 'FALSE_MATCH',
'reason': f'Website mismatch: KIEN={kien_domain}, GMaps={gmaps_domain}'
}
# Check name similarity
kien_name = kien_data.get('kien_name', '').lower()
gmaps_name = gmaps_data.get('name', '').lower()
if fuzz.ratio(kien_name, gmaps_name) < 70:
return {
'status': 'FALSE_MATCH',
'reason': f'Name mismatch: KIEN="{kien_name}", GMaps="{gmaps_name}"'
}
return {'status': 'VERIFIED'}
```
### Step 3: Mark False Matches
When Google Maps returns a different organization:
```yaml
google_maps_enrichment:
status: FALSE_MATCH
false_match_reason: >-
Google Maps returned "Platform 9 BV" (a health/coaching business at
Nieuwleusen) instead of "Platform Handschriftontwikkeling" (a virtual
handwriting development platform). These are completely different
organizations. KIEN registry is authoritative for this Type I custodian.
original_false_match:
place_id: ChIJNZ6o7H_fx0cR-TURAN3Bj54
name: Platform 9 BV
formatted_address: Burg, Burgemeester Backxlaan 321, 7711 AD Nieuwleusen
website: http://www.platform9.nl/
correction_timestamp: "2025-01-08T00:00:00Z"
correction_agent: opencode-claude-sonnet-4
```
## KIEN Contact Data Extraction
The KIEN heritage pages follow a consistent structure. Extract from the "Contact" section:
```
## Contact
[Organization Name](link-to-profile-page)
Street Address
Postal Code
City
Province
[Website](url)
Bijgeschreven in inventaris vanaf: [date]
```
### Example Extraction (from immaterieelerfgoed.nl/nl/handschrift):
```yaml
contact:
organization: "Platform Handschriftontwikkeling"
profile_url: "https://www.immaterieelerfgoed.nl/nl/page/2476/platform-handschriftontwikkeling"
address:
street: "De Hazelaar 41"
postal_code: "6903 BB"
city: "Zevenaar"
province: "Gelderland"
website: "http://www.handschriftontwikkeling.nl/"
registered_since: "november 2019"
```
## Location Resolution for Type I
When KIEN provides an address:
1. **Use KIEN address** for `location.formatted_address`
2. **Geocode KIEN address** to get coordinates (NOT Google Maps coordinates)
3. **Update location_resolution** with method `KIEN_ADDRESS_GEOCODE`
```yaml
location:
street_address: "De Hazelaar 41"
postal_code: "6903 BB"
city: Zevenaar
region_code: GE
country: NL
coordinate_provenance:
source_type: KIEN_ADDRESS_GEOCODE
source_url: "https://www.immaterieelerfgoed.nl/nl/handschrift"
geocoding_service: nominatim
geocoding_timestamp: "2025-01-08T00:00:00Z"
```
## Batch Re-Enrichment Script
To fix all Type I custodians with potentially incorrect Google Maps data:
```bash
# Find all Type I custodians
python scripts/rescrape_kien_contacts.py --type I --output data/custodian/
# This script should:
# 1. Read all NL-*-I-*.yaml files
# 2. Fetch KIEN page for each (from kien_enrichment.kien_url)
# 3. Extract contact/address from KIEN
# 4. Compare with google_maps_enrichment
# 5. Mark mismatches as FALSE_MATCH
# 6. Update location with KIEN address
```
## Anti-Patterns
### WRONG - Using Google Maps as primary source for Type I:
```yaml
# WRONG - Google Maps overriding KIEN data
location:
formatted_address: "Burg, Burgemeester Backxlaan 321, 7711 AD Nieuwleusen"
coordinate_provenance:
source_type: GOOGLE_MAPS # WRONG for Type I!
```
### CORRECT - KIEN as primary source:
```yaml
# CORRECT - KIEN is authoritative
location:
street_address: "De Hazelaar 41"
postal_code: "6903 BB"
city: Zevenaar
coordinate_provenance:
source_type: KIEN_ADDRESS_GEOCODE # Correct!
```
## Affected Files
This rule affects approximately 100+ Type I custodian files:
- `data/custodian/NL-*-I-*.yaml`
All should be reviewed to ensure:
1. `kien_enrichment` contains address from KIEN page
2. `google_maps_enrichment` is validated against KIEN
3. `location` uses KIEN address (not Google Maps)
4. False matches are properly documented
## Related Rules
- **Rule 5**: NEVER Delete Enriched Data - Keep false match data in `original_false_match`
- **Rule 6**: WebObservation Claims - KIEN data should have provenance
- **Rule 22**: Custodian YAML Files Are Single Source of Truth
- **Rule 35**: Provenance Timestamps - Include KIEN fetch timestamps
## See Also
- KIEN Registry: https://www.immaterieelerfgoed.nl/
- UNESCO Intangible Cultural Heritage: https://ich.unesco.org/
- Dutch Intangible Heritage Network documentation

View file

@ -0,0 +1,351 @@
# Rule 44: PPID Birth Date Enrichment and Unknown Date Notation
**Version**: 1.0.0
**Created**: 2025-01-09
**Status**: ACTIVE
**Related**: [PPID-GHCID Alignment](../../docs/plan/person_pid/10_ppid_ghcid_alignment.md) | [EDTF Specification](https://www.loc.gov/standards/datetime/)
---
## 1. Summary
When birth/death dates are missing from person entity sources, agents MUST:
1. **Search for dates** using Exa Search and Linkup tools
2. **Record all enrichment data** as web claims with provenance
3. **If not found**, use **EDTF-compliant notation** for estimated/unknown dates
4. **Never fabricate** specific dates without source evidence
---
## 2. Enrichment Workflow
### 2.1 Required Search Before Using Unknown Notation
Before marking a date as unknown, agents MUST attempt enrichment:
```
Person Entity (missing birth_date)
1. Search Exa: "{full_name} born birth date"
2. Search Exa: "{full_name} {known_employer}"
3. Search Linkup: "{full_name} biography"
4. If found → Record as web_claim with provenance
5. If NOT found → Use EDTF unknown notation
6. Record enrichment_attempt in metadata
```
### 2.2 Enrichment Search Requirements
| Search Tool | Query Pattern | When to Use |
|-------------|---------------|-------------|
| `exa_web_search_exa` | `"{name}" born birthday birth date year` | Primary search |
| `exa_linkedin_search_exa` | `"{name}" at "{employer}"` | For work context |
| `linkup_linkup-search` | `"{name}" biography personal` | Deep research |
### 2.3 Recording Successful Enrichment
When birth date is found, record as web claim:
```yaml
web_claims:
- claim_type: birth_date
claim_value: "1985-03-15"
source_url: "https://example.org/person/bio"
retrieved_on: "2025-01-09T14:30:00Z"
retrieval_agent: "opencode-claude-sonnet-4"
confidence_score: 0.85
notes: "Found in biography section"
```
### 2.4 Recording Failed Enrichment Attempts
Always record that enrichment was attempted:
```yaml
enrichment_metadata:
birth_date_search:
attempted: true
search_date: "2025-01-09T14:30:00Z"
search_agent: "opencode-claude-sonnet-4"
search_tools_used:
- exa_web_search_exa
- linkup_linkup-search
queries_tried:
- '"Jan van Berg" born birthday'
- '"Jan van Berg" biography'
result: "NOT_FOUND"
notes: "No publicly available birth date found after comprehensive search"
```
---
## 3. EDTF-Compliant Unknown Date Notation
### 3.1 Standard: Extended Date/Time Format (EDTF)
This project follows the **Library of Congress EDTF Specification** (ISO 8601-2:2019) for representing uncertain, approximate, and unspecified dates.
**Key EDTF Characters**:
| Character | Meaning | EDTF Level | Example |
|-----------|---------|------------|---------|
| `X` | Unspecified digit | Level 1+ | `19XX` = some year 1900-1999 |
| `~` | Approximate (circa) | Level 1+ | `1985~` = circa 1985 |
| `?` | Uncertain | Level 1+ | `1985?` = possibly 1985 |
| `%` | Uncertain AND approximate | Level 1+ | `1985%` = possibly circa 1985 |
| `S` | Significant digits | Level 2 | `1950S2` = 1900-1999, estimated 1950 |
| `[..]` | One of set | Level 2 | `[1970,1980]` = either 1970 or 1980 |
| `{..}` | All of set | Level 2 | `{1970..1980}` = all years 1970-1980 |
### 3.2 Unspecified Date Components (X Notation)
Use `X` to replace unknown digits:
| Known Information | EDTF Format | Meaning |
|-------------------|-------------|---------|
| Only decade known (1970s) | `197X` | Some year 1970-1979 |
| Only century known (1900s) | `19XX` | Some year 1900-1999 |
| Year unknown entirely | `XXXX` | Year unknown |
| Year known, month unknown | `1985-XX` | Some month in 1985 |
| Year+month known, day unknown | `1985-04-XX` | Some day in April 1985 |
| Year known, month+day unknown | `1985-XX-XX` | Some day in 1985 |
| Only decade and final digit known | `197X-XX-XX` or use set | 1970-1979 |
### 3.3 Multiple Possible Decades (Set Notation)
When the decade is uncertain but constrained to specific options:
| Scenario | EDTF Format | Meaning |
|----------|-------------|---------|
| Born in 1970s OR 1980s | `[197X,198X]` | One of: some year in 1970s or 1980s |
| Born in specific years | `[1975,1985]` | Either 1975 or 1985 |
| Born 1970-1985 range | `1970/1985` | Interval: between 1970 and 1985 |
### 3.4 Estimated Dates with Significant Digits
When you can estimate a year with confidence bounds:
```
1975S2 = Estimated 1975, significant to 2 digits (1900-1999)
1975S3 = Estimated 1975, significant to 3 digits (1970-1979)
```
This is useful when you can estimate based on career timeline (e.g., "started working 1998, likely born 1970s").
### 3.5 Living Persons - Birth Date Estimation
For living persons in LinkedIn data, estimate birth decade from:
1. **Graduation year** (if available): Subtract ~22 years for bachelor's degree
2. **Career start** (first job): Subtract ~22-25 years
3. **Current role seniority**: "Senior" roles suggest 35+ years old
```yaml
# Example: Person graduated 2010
birth_date_estimate:
edtf: "1988S2" # Estimated 1988, significant to 2 digits (1980-1999)
estimation_method: "graduation_year_inference"
estimation_basis: "Graduated bachelor's 2010, estimated birth ~1988"
confidence: 0.60
```
---
## 4. PPID Format with Unknown Dates
### 4.1 PPID Date Component Rules
The PPID format includes birth and death dates:
```
{TYPE}_{FL}_{FD}_{LL}_{LD}_{NT}
│ │
│ └── Last Date (death) - EDTF format
└── First Date (birth) - EDTF format
```
### 4.2 Examples with Unknown Components
| Scenario | PPID Example |
|----------|--------------|
| All known | `PID_NL-NH-AMS_1985-03-15_NL-NH-HAA_2020-08-22_JAN-BERG` |
| Birth year only | `ID_NL-NH-AMS_1985_XX-XX-XXX_XXXX_JAN-BERG` |
| Birth decade only | `ID_XX-XX-XXX_197X_XX-XX-XXX_XXXX_JAN-BERG` |
| Nothing known | `ID_XX-XX-XXX_XXXX_XX-XX-XXX_XXXX_JAN-BERG` |
| Living person | `ID_NL-NH-AMS_1985_XX-XX-XXX_XXXX_JAN-BERG` |
### 4.3 Filename Safety
EDTF characters are **filename-safe**:
| Character | Filename Safe? | Notes |
|-----------|----------------|-------|
| `X` | YES | Uppercase letter |
| `~` | YES | Allowed on macOS/Linux/Windows |
| `?` | NO | Not allowed on Windows |
| `%` | CAUTION | URL encoding issues |
| `[` `]` | CAUTION | Shell escaping issues |
| `,` | YES | Allowed |
| `/` | NO | Directory separator |
| `\|` | CAUTION | Shell pipe, Windows disallowed |
**Recommendation**: For filenames, use only:
- `X` for unknown digits
- `~` for approximate (suffix only)
- Avoid `?`, `%`, `[]`, `/`, `|` in filenames
When set notation `[..]` is needed, store in metadata but use simplified form in filename:
- Filename: `ID_XX-XX-XXX_197X_...` (simplified)
- Metadata: `birth_date_edtf: "[1975,1985]"` (full EDTF)
---
## 5. Decision Tree
```
┌─────────────────────────────────────────┐
│ Person entity missing birth_date │
└─────────────────┬───────────────────────┘
┌─────────────────────────────────────────┐
│ Search Exa + Linkup for birth date │
└─────────────────┬───────────────────────┘
┌───────┴───────┐
│ Date found? │
└───────┬───────┘
YES │ NO
▼ │ ▼
┌─────────────────┐ ┌─────────────────────────────┐
│ Record as │ │ Can estimate from career? │
│ web_claim with │ └───────────┬─────────────────┘
│ provenance │ YES │ NO
└─────────────────┘ ▼ │ ▼
┌───────────────┐ ┌───────────────┐
│ Use EDTF │ │ Use XXXX │
│ estimate: │ │ (unknown) │
│ 1988S2 or │ │ │
│ 198X │ │ │
└───────────────┘ └───────────────┘
```
---
## 6. Examples
### 6.1 Fully Unknown (No Enrichment Found)
```yaml
# Person: Nora Ruijs (student, no public birth info)
ppid: ID_XX-XX-XXX_XXXX_XX-XX-XXX_XXXX_NORA-RUIJS
birth_date:
edtf: "XXXX"
precision: "unknown"
enrichment_metadata:
birth_date_search:
attempted: true
search_date: "2025-01-09T14:30:00Z"
result: "NOT_FOUND"
```
### 6.2 Decade Estimated from Career
```yaml
# Person: Senior curator, started career 1995
ppid: ID_NL-NH-AMS_197X_XX-XX-XXX_XXXX_JAN-BERG
birth_date:
edtf: "197X"
edtf_full: "1972S3" # Estimated 1972, significant to 3 digits
precision: "decade"
estimation_method: "career_start_inference"
estimation_basis: "Career started 1995 as junior curator, estimated age 23"
```
### 6.3 Multiple Possible Decades
```yaml
# Person: Could be born 1970s or 1980s based on conflicting sources
ppid: ID_XX-XX-XXX_197X_XX-XX-XXX_XXXX_MARIA-SILVA # Simplified for filename
birth_date:
edtf: "[197X,198X]" # Full EDTF with set notation
edtf_filename: "197X" # Simplified for filename (earlier estimate)
precision: "decade_uncertain"
notes: "Sources conflict: LinkedIn suggests 1980s, university bio suggests 1970s"
```
### 6.4 Exact Date Found via Enrichment
```yaml
# Person: Birth date found on institutional bio page
ppid: ID_NL-NH-AMS_1985-03-15_XX-XX-XXX_XXXX_JAN-BERG
birth_date:
edtf: "1985-03-15"
precision: "day"
web_claims:
- claim_type: birth_date
claim_value: "1985-03-15"
source_url: "https://museum.nl/team/jan-berg"
retrieved_on: "2025-01-09T14:30:00Z"
retrieval_agent: "opencode-claude-sonnet-4"
```
---
## 7. Anti-Patterns
### 7.1 FORBIDDEN: Fabricating Dates
```yaml
# WRONG - No source, no search attempted
birth_date:
edtf: "1985-03-15" # Where did this come from?!
```
### 7.2 FORBIDDEN: Using Non-EDTF Notation
```yaml
# WRONG - Not EDTF compliant
birth_date: "197~8~" # Invalid notation
birth_date: "1970s" # Use 197X instead
birth_date: "circa 1985" # Use 1985~ instead
birth_date: "unknown" # Use XXXX instead
```
### 7.3 FORBIDDEN: Skipping Enrichment Search
```yaml
# WRONG - No search attempted
birth_date:
edtf: "XXXX"
# No enrichment_metadata showing search was attempted!
```
---
## 8. Validation Rules
1. **Search Required**: Cannot use `XXXX` without `enrichment_metadata.birth_date_search.attempted: true`
2. **EDTF Compliance**: All dates must parse as valid EDTF (use validator)
3. **Filename Safety**: PPID filenames must avoid `?`, `%`, `[]`, `/`, `|`
4. **Provenance Required**: All found dates must have `web_claims` with source
---
## 9. References
- [EDTF Specification (Library of Congress)](https://www.loc.gov/standards/datetime/)
- [ISO 8601-2:2019](https://www.iso.org/standard/70908.html)
- [PPID-GHCID Alignment Document](../../docs/plan/person_pid/10_ppid_ghcid_alignment.md)
- [Rule 21: Data Fabrication Prohibition](../DATA_FABRICATION_PROHIBITION.md)

View file

@ -5,13 +5,18 @@
## The Rule
1. **Slots (Predicates)** MUST ONLY have `exact_mappings` to ontology **predicates** (properties).
* ❌ INVALID: Slot `analyzes_or_analyzed` maps to `schema:object` (a Class).
* ✅ VALID: Slot `analyzes_or_analyzed` maps to `crm:P129_is_about` (a Property).
* ❌ INVALID: Slot `analyze` maps to `schema:object` (a Class).
* ✅ VALID: Slot `analyze` maps to `crm:P129_is_about` (a Property).
2. **Classes (Entities)** MUST ONLY have `exact_mappings` to ontology **classes** (entities).
* ❌ INVALID: Class `Person` maps to `foaf:name` (a Property).
* ✅ VALID: Class `Person` maps to `foaf:Person` (a Class).
3. **When true equivalence exists and is verified, exact mapping is preferred.**
* ✅ VALID: Class `Acquisition` maps to `crm:E8_Acquisition`.
* ✅ VALID: Slot mapped to an actually equivalent ontology property.
* ❗ Do not avoid `exact_mappings` by default; avoid only when scope is broader/narrower/similar-but-not-equal.
## Rationale
Mapping a slot (which defines a relationship or attribute) to a class (which defines a type of entity) is a category error. `schema:object` represents the *class* of objects, not the *relationship* of "having an object" or "analyzing an object".
@ -20,9 +25,10 @@ Mapping a slot (which defines a relationship or attribute) to a class (which def
When adding or reviewing `exact_mappings`:
- [ ] Is the LinkML element a Class or a Slot?
- [ ] Does the target ontology term represent a Class (usually Capitalized) or a Property (usually lowercase)?
- [ ] Did you verify the target term type in the ontology definition files (do not rely on naming heuristics)?
- [ ] Do they match? (Class↔Class, Slot↔Property)
- [ ] If the target ontology uses opaque IDs (like CIDOC-CRM `E55_Type`), verify the type definition in the ontology file.
- [ ] If semantic scope is truly equivalent, use `exact_mappings` (not `close`/`broad` as a conservative fallback).
## Common Pitfalls to Fix

View file

@ -368,6 +368,6 @@ Before marking a slot as processed:
- Rule 9: Enum-to-Class Promotion (single source of truth principle)
- Rule 0b: Type/Types File Naming Convention
- Rule 39: Slot Naming Convention (RiC-O Style)
- Rule: Slot Naming Convention (Current Style)
- `.opencode/ENUM_TO_CLASS_PRINCIPLE.md`
- `schemas/20251121/linkml/modules/slots/slot_fixes.yaml` - **AUTHORITATIVE** master list of migrations

View file

@ -126,4 +126,4 @@ If you encounter an overly specific slot:
## See Also
* Rule 55: Broaden Generic Predicate Ranges
* Rule 39: Slot Naming Convention (RiC-O Style)
* Rule: Slot Naming Convention (Current Style)

View file

@ -0,0 +1,181 @@
# LinkML YAML Best Practices Rule
## Rule: Follow LinkML Conventions for Valid, Interoperable Schema Files
### 1. equals_expression Anti-Pattern
`equals_expression` is for dynamic formula evaluation (e.g., `"{age_in_years} * 12"`). Never use it for static value constraints.
**WRONG:**
```yaml
slot_usage:
has_type:
equals_expression: '["hc:ArchiveOrganizationType"]'
hold_record_set:
equals_expression: '["hc:Fonds", "hc:Series"]'
```
**CORRECT** (single value):
```yaml
slot_usage:
has_type:
equals_string: "hc:ArchiveOrganizationType"
```
**CORRECT** (multiple allowed values - if classes):
```yaml
slot_usage:
hold_record_set:
any_of:
- range: UniversityAdministrativeFonds
- range: StudentRecordSeries
- range: FacultyPaperCollection
```
**CORRECT** (multiple allowed values - if literals):
```yaml
slot_usage:
status:
equals_string_in:
- "active"
- "inactive"
- "pending"
```
### 2. Declare All Used Prefixes
Every CURIE prefix used in the file must be declared in the `prefixes:` block.
**WRONG:**
```yaml
prefixes:
linkml: https://w3id.org/linkml/
skos: http://www.w3.org/2004/02/skos/core#
slot_usage:
has_type:
equals_string: "hc:ArchiveOrganizationType" # hc: not declared!
```
**CORRECT:**
```yaml
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
skos: http://www.w3.org/2004/02/skos/core#
default_prefix: hc
slot_usage:
has_type:
equals_string: "hc:ArchiveOrganizationType"
```
### 3. Import Referenced Classes
When using external classes in `is_a`, `range`, or other references, import them.
**WRONG:**
```yaml
imports:
- linkml:types
classes:
AcademicArchive:
is_a: ArchiveOrganizationType # Not imported!
slot_usage:
related_to:
range: WikidataAlignment # Not imported!
```
**CORRECT:**
```yaml
imports:
- linkml:types
- ../classes/ArchiveOrganizationType
- ../classes/WikidataAlignment
classes:
AcademicArchive:
is_a: ArchiveOrganizationType
slot_usage:
related_to:
range: WikidataAlignment
```
### 4. Quote Regex Patterns and Annotation Values
**Regex patterns:**
```yaml
# WRONG
pattern: ^Q[0-9]+$
# CORRECT
pattern: "^Q[0-9]+$"
```
**Annotation values (must be strings):**
```yaml
# WRONG
annotations:
specificity_score: 0.1
# CORRECT
annotations:
specificity_score: "0.1"
```
### 5. Remove Unused Imports
Only import slots and classes that are actually used in the file.
**WRONG:**
```yaml
imports:
- ../slots/has_scope # Never used in slots: or slot_usage:
- ../slots/has_score
- ../slots/has_type
```
**CORRECT:**
```yaml
imports:
- ../slots/has_score
- ../slots/has_type
```
### 6. Slot Usage Requires Slot Presence
A slot referenced in `slot_usage:` must either be:
- Listed in the `slots:` array, OR
- Inherited from a parent class via `is_a`
**WRONG:**
```yaml
classes:
MyClass:
slots:
- has_type
slot_usage:
has_type: {...}
identified_by: {...} # Not in slots: and not inherited!
```
**CORRECT:**
```yaml
classes:
MyClass:
slots:
- has_type
- identified_by
slot_usage:
has_type: {...}
identified_by: {...}
```
## Checklist for Class Files
- [ ] All prefixes used in CURIEs are declared
- [ ] `default_prefix` set if module belongs to that namespace
- [ ] All referenced classes are imported
- [ ] All used slots are imported
- [ ] No `equals_expression` with static JSON arrays
- [ ] Regex patterns are quoted
- [ ] Annotation values are quoted strings
- [ ] No unused imports
- [ ] `slot_usage` only references slots that exist (via slots: or inheritance)

View file

@ -0,0 +1,98 @@
# Rule: Archive Folder Convention
**Rule ID**: archive-folder-convention
**Created**: 2026-01-14
**Status**: Active
## Summary
All archived files MUST be placed in an `/archive/` subfolder within their parent directory, NOT at the same level as active files.
## Rationale
1. **Clean separation**: Active files are clearly distinguished from deprecated/archived files
2. **Discoverability**: Developers can easily find current files without wading through archived versions
3. **Git history**: Archive folder can be `.gitignore`d for lightweight clones if needed
4. **Consistent pattern**: Same structure across all schema module types (slots, classes, enums)
## Directory Structure
```
modules/
├── slots/
│ ├── archive/ # Archived slot files go HERE
│ │ ├── branch_id_archived_20260114.yaml
│ │ ├── all_data_real_archived_20260114.yaml
│ │ └── ...
│ ├── has_or_had_identifier.yaml # Active slots at this level
│ └── ...
├── classes/
│ ├── archive/ # Archived class files go HERE
│ │ └── ...
│ └── ...
└── enums/
├── archive/ # Archived enum files go HERE
│ └── ...
└── ...
```
## Naming Convention for Archived Files
```
{original_filename}_archived_{YYYYMMDD}.yaml
```
**Examples**:
- `branch_id.yaml``archive/branch_id_archived_20260114.yaml`
- `RealnessStatus.yaml``archive/RealnessStatus_archived_20260114.yaml`
## Migration Workflow
When archiving a file during slot migration:
```bash
# 1. Copy to archive folder with timestamp suffix
cp modules/slots/branch_id.yaml modules/slots/archive/branch_id_archived_20260114.yaml
# 2. Remove from active location
rm modules/slots/branch_id.yaml
# 3. Update manifest counts
# (Decrement slot count in manifest.json)
# 4. Update slot_fixes.yaml
# (Mark migration as processed: true)
```
## Anti-Patterns
**WRONG** - Archived files at same level as active:
```
modules/slots/
├── branch_id_archived_20260114.yaml # NO - clutters active directory
├── has_or_had_identifier.yaml
└── ...
```
**CORRECT** - Archived files in subdirectory:
```
modules/slots/
├── archive/
│ └── branch_id_archived_20260114.yaml # YES - clean separation
├── has_or_had_identifier.yaml
└── ...
```
## Validation
Before committing migrations, verify:
- [ ] No `*_archived_*.yaml` files at module root level
- [ ] All archived files are in `archive/` subdirectory
- [ ] Archive folder exists for each module type with archived files
- [ ] Manifest counts updated to exclude archived files
## See Also
- Rule 53: Full Slot Migration (`full-slot-migration-rule.md`)
- Rule 9: Enum-to-Class Promotion (`ENUM_TO_CLASS_PRINCIPLE.md`)
- slot_fixes.yaml for migration tracking

View file

@ -0,0 +1,74 @@
# Archive Organization Type Description Rule
## Rule
When describing archive classes that do NOT have `recordType` or `hold_record_set` as a primary distinguishing feature, emphasize that they represent the **archive as an organization/institution**, not just a collection of records.
## Rationale
Many archive type classes (e.g., `BankArchive`, `ChurchArchive`, `MunicipalArchive`) classify the **type of organization** that maintains the records, rather than the type of records themselves. This is an important semantic distinction:
- **Archive Organization Types** (no recordType focus): Classify the institution by its domain/sector
- Examples: `BankArchive`, `ChurchArchive`, `MunicipalArchive`, `UniversityArchive`
- Emphasis: The organization's mission, governance, and institutional context
- **Record Set Types** (have recordType): Classify the collections by record type
- Examples: `AudiovisualArchiveRecordSetType`, `PhotographicArchiveRecordSetType`
- Emphasis: The nature and format of the records
## Description Pattern
### For Archive Organization Types (WITHOUT recordType):
```yaml
description: >-
Type of heritage institution that [primary function], specializing in
[domain/subject area], with organizational characteristics including
[governance, funding, legal status, or other institutional features].
```
**Key elements to include:**
1. "Type of heritage institution" or "Type of archive organization"
2. The institution's primary domain or sector
3. Organizational characteristics (governance, funding, legal status)
4. Institutional context (parent organization, regulatory framework)
5. Typical services and public-facing functions
### For Record Set Types (WITH recordType):
```yaml
description: >-
Classification of archival records documenting [subject/domain],
typically including [record formats, content types, provenance patterns].
```
## Examples
### ✅ Correct - Archive Organization Type (BankArchive):
```yaml
description: >-
Type of heritage institution operating within the banking sector, preserving
records of financial institutions and documenting banking history. Characterized
by corporate governance structures, extended closure periods for personal data,
and institutional relationships with parent banking organizations.
```
### ✅ Correct - Record Set Type (has recordType):
```yaml
description: >-
Classification of archival records documenting banking activities, including
ledgers, correspondence, customer accounts, and financial instruments.
```
## Files Affected
All classes in the `*Archive` family that:
- Do NOT have `hold_record_set` or `recordType` as a primary slot
- Are subclassed from `ArchiveOrganizationType` (not `ArchiveRecordSetType`)
## Related Rules
- `mapping-specificity-hypernym-rule.md` - For correct ontology mappings
- `class-description-quality-rule.md` - For general description quality

View file

@ -0,0 +1,179 @@
# Rule 54: Broaden Generic Predicate Ranges Instead of Creating Bespoke Predicates
🚨 **CRITICAL**: When fixing gen-owl "Ambiguous type" warnings, **broaden the range of generic predicates** rather than creating specialized bespoke predicates.
## The Problem
gen-owl "Ambiguous type" warnings occur when a slot is used as both:
- **DatatypeProperty** (base range: `string`, `integer`, `uri`, etc.)
- **ObjectProperty** (slot_usage override range: a class like `Description`, `SubtitleFormatEnum`)
This creates OWL ambiguity because OWL requires properties to be either DatatypeProperty OR ObjectProperty, not both.
## ❌ WRONG Approach: Create Bespoke Predicates
```yaml
# DON'T DO THIS - creates proliferation of rare-use predicates
slots:
has_or_had_subtitle_format: # Only used by VideoSubtitle
range: SubtitleFormatEnum
has_or_had_transcript_format: # Only used by VideoTranscript
range: TranscriptFormat
```
**Why This Is Wrong**:
- Creates **predicate proliferation** (schema bloat)
- Bespoke predicates are **rarely reused** across classes
- **Increases cognitive load** for schema users
- **Fragments the ontology** unnecessarily
- Violates the principle of schema parsimony
## ✅ CORRECT Approach: Broaden Generic Predicate Ranges
```yaml
# DO THIS - make the generic predicate flexible enough
slots:
has_or_had_format:
range: uriorcurie # Broadened from string
description: |
The format of a resource. Classes narrow this to specific
enum types (SubtitleFormatEnum, TranscriptFormatEnum) via slot_usage.
```
Then in class files, use `slot_usage` to narrow the range:
```yaml
classes:
VideoSubtitle:
slots:
- has_or_had_format
slot_usage:
has_or_had_format:
range: SubtitleFormatEnum # Narrowed for this class
required: true
```
## Range Broadening Options
| Original Range | Broadened Range | When to Use |
|----------------|-----------------|-------------|
| `string` | `uriorcurie` | When class overrides use URI-identified types or enums |
| `string` | `Any` | When truly polymorphic (strings AND class instances) |
| Specific class | Common base class | When multiple subclasses are used |
## Decision Tree
```
gen-owl warning: "Ambiguous type for: SLOTNAME"
Is base slot range a primitive (string, integer, uri)?
├─ YES → Broaden to uriorcurie or Any
│ - Edit modules/slots/SLOTNAME.yaml
│ - Change range: string → range: uriorcurie
│ - Document change with Rule 54 reference
│ - Keep class-level slot_usage overrides (they narrow the range)
└─ NO → Consider if base slot needs common ancestor class
- Create abstract base class if needed
- Or broaden to uriorcurie
```
## Implementation Workflow
1. **Identify warning**: `gen-owl ... 2>&1 | grep "Ambiguous type for:"`
2. **Check base slot range**:
```bash
cat modules/slots/SLOTNAME.yaml | grep -A5 "^slots:" | grep "range:"
```
3. **Find class overrides**:
```bash
for f in modules/classes/*.yaml; do
grep -l "SLOTNAME" "$f" && grep -A3 "SLOTNAME:" "$f" | grep "range:"
done
```
4. **Broaden base range**:
- Edit `modules/slots/SLOTNAME.yaml`
- Change `range: string``range: uriorcurie`
- Add annotation documenting the change
5. **Verify fix**: Run gen-owl and confirm warning is gone
6. **Keep slot_usage overrides**: Class-level range narrowing is fine and expected
## Examples
### Example 1: has_or_had_format
**Before (caused warning)**:
```yaml
# Base slot
slots:
has_or_had_format:
range: string # DatatypeProperty
# Class override
classes:
VideoSubtitle:
slot_usage:
has_or_had_format:
range: SubtitleFormatEnum # ObjectProperty → CONFLICT!
```
**After (fixed)**:
```yaml
# Base slot - broadened
slots:
has_or_had_format:
range: uriorcurie # Now ObjectProperty-compatible
# Class override - unchanged, still narrows
classes:
VideoSubtitle:
slot_usage:
has_or_had_format:
range: SubtitleFormatEnum # Valid narrowing
```
### Example 2: has_or_had_hypernym
**Before**: `range: string` (DatatypeProperty)
**After**: `range: uriorcurie` (ObjectProperty-compatible)
Classes that override to class ranges now work without ambiguity.
## Validation
After broadening, run:
```bash
gen-owl 01_custodian_name_modular.yaml 2>&1 | grep "Ambiguous type for: SLOTNAME"
```
The warning should disappear without creating new predicates.
## Anti-Patterns to Avoid
| ❌ Anti-Pattern | ✅ Correct Pattern |
|----------------|-------------------|
| Create `has_or_had_subtitle_format` | Broaden `has_or_had_format` to `uriorcurie` |
| Create `has_or_had_entity_type` | Broaden `has_or_had_type` to `uriorcurie` |
| Create `has_or_had_X_label` | Broaden `has_or_had_label` to `uriorcurie` |
| Create `has_or_had_X_status` | Broaden `has_or_had_status` to `uriorcurie` |
## Rationale
This approach:
1. **Reduces schema complexity** - Fewer predicates to understand
2. **Promotes reuse** - Generic predicates work across domains
3. **Maintains OWL consistency** - Single property type per predicate
4. **Preserves type safety** - slot_usage still enforces class-specific ranges
5. **Follows semantic web best practices** - Broad predicates, narrow contexts
## See Also
- Rule 38: Slot Centralization and Semantic URI Requirements
- Rule: Slot Naming Convention (Current Style)
- Rule 49: Slot Usage Minimization
- LinkML Documentation: [slot_usage](https://linkml.io/linkml-model/latest/docs/slot_usage/)

View file

@ -0,0 +1,61 @@
# Rule: Canonical Slot Protection
## Summary
When resolving slot aliases to canonical names, a slot name that has its own `.yaml` file (i.e., is itself a canonical slot) MUST NOT be replaced with a different canonical name, even if it also appears as an alias in another slot file.
## Context
Slot files in `schemas/20251121/linkml/modules/slots/` (top-level and `new/`) each define a canonical slot name. Some slot files also list aliases that overlap with canonical names from other slot files. These cross-references are accidental (e.g., indicating semantic relatedness) and should be corrected by removing the canonical names from the aliases lists in which they occur. The occurance of canonical names in alianses lists does NOT mean the referenced slot should be renamed.
## Rule
1. **Before renaming any slot reference** (in `slots:`, `slot_usage:`, or `imports:` of class files), check whether the current name is itself a canonical slot name — i.e., whether a `.yaml` file exists for it in the slots directory.
2. **If the name IS canonical** (has its own `.yaml` file), do NOT rename it and do NOT redirect its import. The class file is correctly referencing that slot's own definition file.
3. **Only rename a slot reference** if the name does NOT have its own `.yaml` file and is ONLY found as an alias in another slot's file.
## Examples
### WRONG
```yaml
# categorized_as.yaml defines aliases: [..., "has_type", ...]
# has_type.yaml exists with canonical name "has_type"
# WRONG: Renaming has_type -> categorized_as in a class file
# This destroys the valid reference to has_type.yaml
slots:
- categorized_as # was: has_type -- INCORRECT REPLACEMENT
```
### CORRECT
```yaml
# has_type.yaml exists => "has_type" is canonical => leave it alone
slots:
- has_type # CORRECT: has_type is canonical, keep it
# "custodian_type" does NOT have its own .yaml file
# "custodian_type" is listed as an alias in has_type.yaml
# => rename custodian_type -> has_type
slots:
- has_type # was: custodian_type -- CORRECT REPLACEMENT
```
## Implementation Check
```python
# Pseudocode for alias resolution
def should_rename(slot_name, alias_map, existing_slot_files):
if slot_name in existing_slot_files:
return False # It's canonical — do not rename
if slot_name in alias_map:
return True # It's only an alias — rename to canonical
return False # Unknown — leave alone
```
## Rationale
Multiple slot files may list overlapping aliases by accident or for documentation or semantic linking purposes. A canonical slot name appearing as an alias in another file does not invalidate the original slot definition. Treating it as an alias would incorrectly redirect class files away from the slot's own definition, breaking the schema's intended structure.

View file

@ -0,0 +1,48 @@
# Rule: Capitalization Consistency for LinkML Names
## Purpose
Ensure naming is consistent across LinkML classes, slots, enums, and their files,
with special care for acronyms (for example: `GLAM`, `GHC`, `GHCID`, `GLEIF`).
## Mandatory Requirements
1. **Class names**
- Use `PascalCase`.
- Preserve canonical acronym casing.
- Example: `GHCIdentifier`, not `GhcidIdentifier`.
2. **Slot names**
- Use project slot naming convention consistently.
- If acronym appears in a slot, keep its canonical uppercase form.
- Example: `has_GHCID_history` (if acronymed slot is required), not `has_ghcid_history`.
3. **Enum names**
- Use `PascalCase` with `Enum` suffix where applicable.
- Preserve acronym casing in enum identifiers and permissible values.
- Example: `GLAMTypeEnum`.
4. **File names must match primary term exactly**
- Class file name must match class name (case-sensitive) plus `.yaml`.
- Enum file name must match enum name (case-sensitive) plus `.yaml`.
- Slot file name must match slot name (case-sensitive) plus `.yaml`.
5. **No mixed acronym variants in same schema branch**
- Do not mix forms like `Ghcid`, `GHCID`, and `ghcid` for the same concept.
- Pick canonical form once and use it everywhere.
## Refactoring Rule
When normalizing capitalization:
- Update term declaration (`name`, class/slot/enum key).
- Update file name to match.
- Update all imports and references transitively.
- Do not leave aliases as operational identifiers; keep aliases only for lexical metadata.
## Validation Checklist
- [ ] Class, slot, enum declarations use canonical casing.
- [ ] File names exactly match declaration names.
- [ ] Acronyms are consistent across declarations and references.
- [ ] Imports and references resolve after renaming.

View file

@ -0,0 +1,228 @@
# Class Description Quality Rule
## Rule: Write Dictionary-Style Definitions Without Repeating the Class Name
When writing class descriptions, follow these principles.
### 1. No Repetition of Class Name Components
**WRONG:**
```yaml
AcademicArchiveRecordSetType:
description: >-
A classification type for archival record sets created by academic
institutions. This class represents the record set type...
```
**CORRECT:**
```yaml
AcademicArchiveRecordSetType:
description: >-
Category for grouping documentary materials accumulated by tertiary
educational institutions during their administrative, academic, and
operational activities.
```
The description should define the concept using synonyms and related terms, not repeat words from the class name.
### 2. MIGRATE Structured Data Before Removing from Descriptions
**CRITICAL**: When a description contains structured data (examples, typical contents, alignment notes, etc.), you MUST:
1. **First check** if the structured data already exists in proper LinkML fields
2. **If NOT present**: ADD it to the appropriate structured field
3. **ONLY THEN**: Remove it from the description
**Never simply delete structured content from descriptions without preserving it elsewhere.**
**MIGRATION CHECKLIST:**
| Content Type | Target Field | Example |
|--------------|--------------|---------|
| Example instances | `examples:` | `- value: {...} description: "..."` |
| Typical contents | `keywords:` or `comments:` | List of typical materials |
| Alignment explanations | `broad_mappings`, `related_mappings` | Ontology references |
| Usage notes | `comments:` | Operational guidance |
| Provenance notes | `comments:` or `annotations:` | Historical context |
| Privacy/legal notes | `comments:` | Access restrictions |
| Definition details | Keep in description | Core semantic definition |
**WRONG - Deleting without migration:**
```yaml
# BEFORE (has rich content)
description: |
Records documenting student academic careers.
**Typical Contents**:
- Enrollment records
- Academic transcripts
- Graduation records
Subject to privacy regulations (FERPA, GDPR).
# AFTER (lost information!) - DON'T DO THIS
description: >-
Records documenting student academic careers.
```
**CORRECT - Migrate first, then clean:**
```yaml
# Step 1: Add to structured fields
description: >-
Records documenting student academic careers.
keywords:
- enrollment records
- academic transcripts
- graduation records
comments:
- Subject to privacy regulations (FERPA, GDPR, AVG)
- Access restrictions typically apply for records less than 75 years old
# Step 2: Now description is clean but no information lost
```
### 3. No Structured Data or Meta-Discussion in Descriptions
After migration, descriptions should contain only the definition. Do not include:
- Alignment explanations (use `broad_mappings`, `close_mappings`, `exact_mappings`)
- Pattern explanations (use `see_also`, `comments`)
- Usage examples (use `examples:` annotation)
- Rationale for mappings (use `comments:` or `annotations:`)
- Typical contents lists (use `keywords:` or `comments:`)
**WRONG:**
```yaml
description: >-
A type for X.
**RiC-O Alignment**: Maps to rico:RecordSetType because...
**Pattern**: This is part of a dual-class pattern with Y.
**Examples**: Administrative fonds, student records...
```
**CORRECT:**
```yaml
description: >-
Category for grouping documentary materials accumulated by tertiary
educational institutions.
broad_mappings:
- rico:RecordSetType
see_also:
- AcademicArchive
keywords:
- administrative fonds
- student records
examples:
- value: {...}
description: Administrative fonds containing governance records
```
### 4. Use Folded Block Scalar (`>-`) for Descriptions
Use `>-` (folded, strip) instead of `|` (literal) to ensure clean paragraph formatting in generated documentation.
**WRONG:**
```yaml
description: |
A type for X.
This spans multiple lines.
```
**CORRECT:**
```yaml
description: >-
A type for X. This will be formatted as a single clean paragraph
in the generated documentation.
```
### 5. Use LinkML `examples:` Annotation for Examples
Structure examples properly with `value:` and `description:` keys.
```yaml
examples:
- value:
has_type: hc:ArchiveOrganizationType
has_label: University Administrative Records
description: Administrative fonds containing governance records
```
### 6. Keywords vs Examples - Know the Difference
**CRITICAL**: Do not confuse `keywords:` with `examples:`. They serve different purposes:
| Field | Purpose | Content Type |
|-------|---------|--------------|
| `keywords:` | Search terms, topics, categories | List of strings (topics/materials) |
| `examples:` | Valid instance data demonstrations | Structured objects with `value` and `description` |
**Keywords** = Topics, material types, categories that describe what the class is about:
```yaml
keywords:
- enrollment records # type of material
- academic transcripts # type of material
- graduation records # type of material
```
**Examples** = Actual instances of the class with populated slots:
```yaml
examples:
- value:
has_type: hc:ArchiveOrganizationType
has_label: Registrar Student Records
has_note: Enrollment, transcripts, graduation records
description: Student records series from the registrar's office
```
**WRONG - Using keywords as examples:**
```yaml
# DON'T: "enrollment records" is not an instance of AcademicStudentRecordSeries
examples:
- value: enrollment records
description: Type of student record
```
**CORRECT - Keywords for topics, examples for instances:**
```yaml
keywords:
- enrollment records
- academic transcripts
- graduation records
examples:
- value:
has_type: hc:ArchiveOrganizationType
has_label: Historical Student Records
has_note: Pre-1950 student records with fewer access restrictions
description: Historical student records open for research access
```
### 7. Multiple Examples for Different Use Cases
Provide multiple examples to show different contexts or configurations:
```yaml
examples:
- value:
has_type: hc:ArchiveOrganizationType
has_label: Recent Student Records
description: Current records subject to privacy restrictions
- value:
has_type: hc:ArchiveOrganizationType
has_label: Historical Student Records
description: Records 75+ years old with fewer access restrictions
```
## Summary
| Element | Placement |
|---------|-----------|
| Definition | `description:` (concise, no repetition) |
| Ontology mappings | `exact_mappings`, `broad_mappings`, etc. |
| Related concepts | `see_also:` |
| Usage notes | `comments:` |
| Metadata | `annotations:` |
| Examples | `examples:` with `value` and `description` |
| Typical contents | `keywords:` or `comments:` |

View file

@ -0,0 +1,54 @@
# Rule: Class File Name Must Match Class Label/Name
## 🚨 Critical
When a class label/name is changed, the class file name must be renamed to match.
This keeps class modules discoverable, prevents stale imports, and avoids long-term naming drift.
## The Rule
1. If the primary class identifier changes, rename the file in the same edit set.
- Change triggers include updates to:
- top-level `name:`
- class key under `classes:`
- canonical class label used for module naming
2. File naming must reflect the canonical class name.
- ✅ `DigitalPlatformProfile.yaml` for class `DigitalPlatformProfile`
- ❌ `DigitalPlatformV2.yaml` for class `DigitalPlatformProfile`
3. After renaming a file, update all references.
- `imports:` in other class/slot/type files
- manifests/indexes/build inputs
- any generated or curated mapping lists that include file paths
4. Keep semantic names versionless.
- Do not preserve old versioned file names when class names are de-versioned.
- Coordinate with `no-version-indicators-in-names-rule.md`.
## Required Checklist
- [ ] File name matches canonical class name
- [ ] `id:` and `name:` are internally consistent
- [ ] All import paths updated
- [ ] Search confirms no stale old file-name references remain
- [ ] YAML parses after rename
## Example
Before:
```yaml
# file: DigitalPlatformV2.yaml
name: DigitalPlatformProfile
classes:
DigitalPlatformProfile:
```
After:
```yaml
# file: DigitalPlatformProfile.yaml
name: DigitalPlatformProfile
classes:
DigitalPlatformProfile:
```

View file

@ -0,0 +1,133 @@
# Rule 48: Class Files Must Not Define Inline Slots
🚨 **CRITICAL**: LinkML class files in `schemas/20251121/linkml/modules/classes/` MUST NOT define their own slots inline. All slots MUST be imported from the centralized `modules/slots/` directory.
## Problem Statement
When class files define their own slots (e.g., `AccessRestriction.yaml` defining its own slot properties), this creates:
1. **Duplication**: Same slot semantics defined in multiple places
2. **Inconsistency**: Slot definitions may diverge between files
3. **Frontend Issues**: LinkML viewer cannot properly render slot relationships
4. **Maintenance Burden**: Changes require updates in multiple locations
## Architecture Requirement
```
schemas/20251121/linkml/
├── modules/
│ ├── classes/ # Class definitions ONLY
│ │ └── *.yaml # NO inline slot definitions
│ ├── slots/ # ALL slot definitions go here
│ │ └── *.yaml # One file per slot or logical group
│ └── enums/ # Enumeration definitions
```
## Correct Pattern
**Class file** (`modules/classes/AccessRestriction.yaml`):
```yaml
id: https://nde.nl/ontology/hc/class/AccessRestriction
name: AccessRestriction
prefixes:
hc: https://nde.nl/ontology/hc/
linkml: https://w3id.org/linkml/
imports:
- linkml:types
- ../slots/restriction_type # Import slot from centralized location
- ../slots/restriction_reason
- ../slots/applies_from
- ../slots/applies_until
default_range: string
classes:
AccessRestriction:
class_uri: hc:AccessRestriction
description: >-
Describes access restrictions on heritage collections or items.
slots:
- restriction_type # Reference slot by name
- restriction_reason
- applies_from
- applies_until
```
**Slot file** (`modules/slots/restriction_type.yaml`):
```yaml
id: https://nde.nl/ontology/hc/slot/restriction_type
name: restriction_type
prefixes:
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
linkml: https://w3id.org/linkml/
imports:
- linkml:types
slots:
restriction_type:
slot_uri: hc:restrictionType
description: The type of access restriction applied.
range: string
exact_mappings:
- schema:accessMode
```
## Anti-Pattern (WRONG)
**DO NOT** define slots inline in class files:
```yaml
# WRONG - AccessRestriction.yaml with inline slots
classes:
AccessRestriction:
slots:
- restriction_type
slots: # ❌ DO NOT define slots here
restriction_type:
description: Type of restriction
range: string
```
## Identifying Violations
To find class files that incorrectly define slots:
```bash
# Find class files with inline slot definitions
grep -l "^slots:" schemas/20251121/linkml/modules/classes/*.yaml
```
Files that match need refactoring:
1. Extract slot definitions to `modules/slots/`
2. Add imports for the extracted slots
3. Remove inline `slots:` section from class file
## Migration Workflow
1. **Identify inline slots** in class file
2. **Check if slot exists** in `modules/slots/`
3. **If exists**: Remove inline definition, add import
4. **If not exists**: Create new slot file in `modules/slots/`, then add import
5. **Validate**: Run `linkml-validate` to ensure schema integrity
6. **Update manifest**: Regenerate `manifest.json` if needed
## Rationale
- **Single Source of Truth**: Each slot defined exactly once
- **Reusability**: Slots can be used across multiple classes
- **Frontend Compatibility**: LinkML viewer depends on centralized slots for proper edge rendering in UML diagrams
- **Semantic Consistency**: `slot_uri` and mappings defined once, applied everywhere
- **Maintenance**: Changes to slot semantics applied in one place
## See Also
- Rule 38: Slot Centralization and Semantic URI Requirements
- Rule: Slot Naming Convention (Current Style)
- Rule 42: No Ontology Prefixes in Slot Names
- Rule 43: Slot Nouns Must Be Singular

View file

@ -0,0 +1,158 @@
# Class Multilingual Support Rule
## Rule: All Class Files Must Include Multilingual Descriptions and Aliases
Every class file must provide `alt_descriptions` and `structured_aliases` in all supported languages to ensure internationalization and interoperability with multilingual heritage systems.
### Required Languages
| Code | Language |
|------|----------|
| `nl` | Dutch |
| `de` | German |
| `fr` | French |
| `es` | Spanish |
| `ar` | Arabic |
| `id` | Indonesian |
| `zh` | Chinese |
### Structure
#### alt_descriptions
Provide translated descriptions for each supported language:
```yaml
classes:
AcademicArchiveRecordSetType:
description: >-
Category for grouping documentary materials accumulated by tertiary
educational institutions during their administrative, academic, and
operational activities.
alt_descriptions:
nl: >-
Categorie voor het groeperen van documentair materiaal dat door
hogeronderwijsinstellingen is verzameld tijdens hun administratieve,
academische en operationele activiteiten.
de: >-
Kategorie zur Gruppierung von Dokumentenmaterial, das von Hochschulen
während ihrer administrativen, akademischen und betrieblichen Aktivitäten
angesammelt wurde.
fr: >-
Catégorie de regroupement des documents accumulés par les établissements
d'enseignement supérieur au cours de leurs activités administratives,
académiques et opérationnelles.
es: >-
Categoría para agrupar materiales documentales acumulados por instituciones
de educación superior durante sus actividades administrativas, académicas
y operativas.
ar: >-
فئة لتجميع المواد الوثائقية التي جمعتها مؤسسات التعليم العالي
خلال أنشطتها الإدارية والأكاديمية والتشغيلية.
id: >-
Kategori untuk mengelompokkan materi dokumenter yang dikumpulkan oleh
institusi pendidikan tinggi selama aktivitas administratif, akademik,
dan operasional mereka.
zh: >-
高等教育机构在行政、学术和运营活动中积累的文献材料的分类类别。
```
#### structured_aliases
Provide language-specific aliases/alternative names:
```yaml
classes:
AcademicArchiveRecordSetType:
structured_aliases:
- literal_form: academisch archiefbestand
in_language: nl
- literal_form: Hochschularchivbestand
in_language: de
- literal_form: fonds d'archives académiques
in_language: fr
- literal_form: fondo de archivo académico
in_language: es
- literal_form: أرشيف أكاديمي
in_language: ar
- literal_form: koleksi arsip akademik
in_language: id
- literal_form: 学术档案集
in_language: zh
```
### Complete Example
```yaml
id: https://nde.nl/ontology/hc/class/AcademicArchiveRecordSetType
name: AcademicArchiveRecordSetType
title: Academic Archive Record Set Type
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
skos: http://www.w3.org/2004/02/skos/core#
default_prefix: hc
imports:
- linkml:types
- ../classes/CollectionType
classes:
AcademicArchiveRecordSetType:
description: >-
Category for grouping documentary materials accumulated by tertiary
educational institutions during their administrative, academic, and
operational activities.
alt_descriptions:
nl: >-
Categorie voor het groeperen van documentair materiaal dat door
hogeronderwijsinstellingen is verzameld.
de: >-
Kategorie zur Gruppierung von Dokumentenmaterial, das von Hochschulen
angesammelt wurde.
fr: >-
Catégorie de regroupement des documents accumulés par les établissements
d'enseignement supérieur.
es: >-
Categoría para agrupar materiales documentales acumulados por instituciones
de educación superior.
ar: >-
فئة لتجميع المواد الوثائقية التي جمعتها مؤسسات التعليم العالي.
id: >-
Kategori untuk mengelompokkan materi dokumenter yang dikumpulkan oleh
institusi pendidikan tinggi.
zh: >-
高等教育机构积累的文献材料的分类类别。
structured_aliases:
- literal_form: academisch archiefbestand
in_language: nl
- literal_form: Hochschularchivbestand
in_language: de
- literal_form: fonds d'archives académiques
in_language: fr
- literal_form: fondo de archivo académico
in_language: es
- literal_form: أرشيف أكاديمي
in_language: ar
- literal_form: koleksi arsip akademik
in_language: id
- literal_form: 学术档案集
in_language: zh
is_a: CollectionType
# ... rest of class definition
```
### Translation Guidelines
1. **Accuracy over literal translation**: Translate the concept, not word-by-word
2. **Use domain-appropriate terminology**: Use archival/library/museum terminology standard in each language
3. **Consult existing vocabularies**: Reference RiC-O, ISAD(G), AAT translations when available
4. **Maintain consistency**: Same term should be translated consistently across all class files
### Checklist
For each class file, verify:
- [ ] `alt_descriptions` present with all 7 languages
- [ ] `structured_aliases` present with all 7 languages
- [ ] Translations are accurate and domain-appropriate
- [ ] Arabic text is properly encoded (RTL)
- [ ] Chinese uses simplified characters (zh) unless traditional specified (zh-hant)

View file

@ -0,0 +1,65 @@
# Rule: Engineering Parsimony and Domain Modeling
## Critical Convention
Our ontology follows an engineering-oriented approach: practical domain utility and
stable interoperability take priority over minimal, tool-specific class catalogs.
## Rule
1. Model domain concepts, not implementation tools.
- Reject classes like `ExaSearchMetadata`, `OpenAIFetchResult`, `ElasticsearchHit`.
2. Prefer generic, reusable activity/entity classes for operational provenance.
- Use classes such as `ExternalSearchMetadata`, `RetrievalActivity`, `SearchResult`.
3. Capture tool/vendor details in slot values, not class names.
- Record with generic predicates like `has_tool`, `has_method`, `has_agent`, `has_note`.
4. Digital platforms acting as custodians are valid domain classes.
- Platform-as-custodian classes (for example YouTube-related custodian classes) are allowed.
- Data processing/search tools are not ontology class candidates.
5. Avoid ontology growth driven by transient engineering stack choices.
- New class proposals must be justified by cross-tool, domain-stable semantics.
## Rationale
- Tool names are volatile implementation details and age quickly.
- Domain-level abstractions maximize reuse, query consistency, and mapping stability.
- This aligns with an engineering ontology practice where strict theoretical
parsimony in candidate theories is not the only optimization criterion; practical
semantic interoperability and maintainability are primary.
## Examples
### Wrong
```yaml
classes:
ExaSearchMetadata:
class_uri: prov:Activity
```
### Correct
```yaml
classes:
ExternalSearchMetadata:
class_uri: prov:Activity
slots:
- has_tool
- has_method
- has_agent
```
## References
1. Liefke, K. (2024). *Natural Language Ontology and Semantic Theory*.
Cambridge Elements in Semantics. DOI: `10.1017/9781009307789`.
URL: https://www.cambridge.org/core/elements/abs/natural-language-ontology-and-semantic-theory/E8DDE548BB8A98137721984E26FAD764
2. Liefke, K. (2025). *Reduction and Unification in Natural Language Ontology*.
Cambridge Elements in Semantics. DOI: `10.1017/9781009559683`.
URL: https://www.cambridge.org/core/elements/abs/reduction-and-unification-in-natural-language-ontology/40F58ABA0D9C08958B5926F0CBDAD3CA

View file

@ -0,0 +1,37 @@
# Exact Mapping Predicate/Class Distinction Rule
🚨 **CRITICAL**: The `exact_mappings` property implies semantic equivalence. Equivalence can only exist between elements of the same ontological category.
## The Rule
1. **Slots (Predicates)** MUST ONLY have `exact_mappings` to ontology **predicates** (properties).
* ❌ INVALID: Slot `analyze` maps to `schema:object` (a Class).
* ✅ VALID: Slot `analyze` maps to `crm:P129_is_about` (a Property).
2. **Classes (Entities)** MUST ONLY have `exact_mappings` to ontology **classes** (entities).
* ❌ INVALID: Class `Person` maps to `foaf:name` (a Property).
* ✅ VALID: Class `Person` maps to `foaf:Person` (a Class).
3. **When true equivalence exists and is verified, exact mapping is preferred.**
* ✅ VALID: Class `Acquisition` maps to `crm:E8_Acquisition`.
* ✅ VALID: Slot mapped to an actually equivalent ontology property.
* ❗ Do not avoid `exact_mappings` by default; avoid only when scope is broader/narrower/similar-but-not-equal.
## Rationale
Mapping a slot (which defines a relationship or attribute) to a class (which defines a type of entity) is a category error. `schema:object` represents the *class* of objects, not the *relationship* of "having an object" or "analyzing an object".
## Verification Checklist
When adding or reviewing `exact_mappings`:
- [ ] Is the LinkML element a Class or a Slot?
- [ ] Did you verify the target term type in the ontology definition files (do not rely on naming heuristics)?
- [ ] Do they match? (Class↔Class, Slot↔Property)
- [ ] If the target ontology uses opaque IDs (like CIDOC-CRM `E55_Type`), verify the type definition in the ontology file.
- [ ] If semantic scope is truly equivalent, use `exact_mappings` (not `close`/`broad` as a conservative fallback).
## Common Pitfalls to Fix
- Mapping slots to `schema:Object` or `schema:Thing`.
- Mapping slots to `skos:Concept`.
- Mapping classes to `schema:name` or `dc:title`.

View file

@ -0,0 +1,144 @@
# Rule 58: Feedback vs Revision Distinction in slot_fixes.yaml
## Summary
The `feedback` and `revision` fields in `slot_fixes.yaml` serve distinct purposes and MUST NOT be conflated or renamed.
## Field Definitions
### `revision` Field
- **Purpose**: Defines WHAT the migration target is
- **Content**: List of slots and classes to create
- **Authority**: IMMUTABLE (per Rule 57)
- **Format**: Structured YAML list with `label`, `type`, optional `link_branch`
### `feedback` Field
- **Purpose**: Contains user instructions on HOW the revision needs to be applied or corrected
- **Content**: Can be string or structured format
- **Authority**: User directives that override previous `notes`
- **Action Required**: Agent must interpret and act upon feedback
## Feedback Formats
### Format 1: Structured (with `done` field)
```yaml
feedback:
- timestamp: '2026-01-17T00:01:57Z'
user: Simon C. Kemper
done: false # Becomes true after agent processes
comment: |
The migration should use X instead of Y.
response: "" # Agent fills this after completing
```
### Format 2: String (direct instruction)
```yaml
feedback: I reject this! type_id should be migrated to has_or_had_identifier + Identifier
```
Or:
```yaml
feedback: I altered the revision based on this feedback. Conduct this new migration accordingly.
```
## Interpretation Rules
| Feedback Contains | Meaning | Action Required |
|-------------------|---------|-----------------|
| "I reject this" | Previous `notes` were WRONG | Follow `revision` field instead |
| "I altered the revision" | User updated `revision` | Execute migration per NEW revision |
| "Conduct the migration" | Migration not yet done | Execute migration now |
| "Please conduct accordingly" | Migration pending | Execute migration now |
| "ADDRESSED" or `done: true` | Already processed | No action needed |
## Decision Tree
```
Is feedback field present?
├─ NO → Check `processed.status`
│ ├─ true → Migration complete
│ └─ false → Execute revision
└─ YES → What format?
├─ Structured with `done: true` → No action needed
├─ Structured with `done: false` → Process feedback, then set done: true
└─ String format → Parse for keywords:
├─ "reject" → Previous notes invalid, follow revision
├─ "altered/adjusted revision" → Execute NEW revision
├─ "conduct/please" → Migration pending, execute now
└─ "ADDRESSED" → Already done, no action
```
## Anti-Patterns
### WRONG: Renaming feedback to revision
```yaml
# DO NOT DO THIS
# feedback contains instructions, not migration specs
revision: # Was: feedback
- I reject this! Use has_or_had_identifier
```
### WRONG: Ignoring string feedback
```yaml
feedback: Please conduct the migration accordingly.
notes: "NO MIGRATION NEEDED" # WRONG - feedback overrides notes
```
### WRONG: Treating all feedback as completed
```yaml
feedback: I altered the revision. Conduct this new migration.
processed:
status: true # WRONG if migration not actually done
```
## Correct Workflow
1. **Read feedback** - Understand user instruction
2. **Check revision** - This defines the target migration
3. **Execute migration** - Create/update slots and classes per revision
4. **Update processed.status** - Set to `true`
5. **Add response** - Document what was done
- For structured feedback: Set `done: true` and fill `response`
- For string feedback: Add new structured feedback entry confirming completion
## Example: Processing String Feedback
Before:
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/type_id
feedback: I reject this! type_id should be migrated to has_or_had_identifier + Identifier
revision:
- label: has_or_had_identifier
type: slot
- label: Identifier
type: class
processed:
status: false
notes: "Previously marked as no migration needed"
```
After processing:
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/type_id
feedback:
- timestamp: '2026-01-17T12:00:00Z'
user: System
done: true
comment: "Original string feedback: I reject this! type_id should be migrated to has_or_had_identifier + Identifier"
response: "Migration completed. type_id.yaml archived, consuming classes updated to use has_or_had_identifier slot with Identifier range."
revision:
- label: has_or_had_identifier
type: slot
- label: Identifier
type: class
processed:
status: true
notes: "Migration completed per user feedback rejecting previous notes."
```
## See Also
- **Rule 53**: Full Slot Migration - slot_fixes.yaml is AUTHORITATIVE
- **Rule 57**: slot_fixes.yaml Revision Key is IMMUTABLE
- **Rule 39**: Slot Naming Convention (RiC-O Style)

View file

@ -0,0 +1,373 @@
# Rule 53: Full Slot Migration - No Deprecation Notes
🚨 **CRITICAL**: When migrating slots from `slot_fixes.yaml`:
1. **Follow the `revision` section EXACTLY** - The `slot_fixes.yaml` file specifies the exact replacement slots and classes to use
2. **Perform FULL MIGRATION** - Completely remove the deprecated slot from the entity class
3. **Do NOT add deprecation notes** - Never keep both old and new slots with deprecation markers
---
## 🚨 slot_fixes.yaml is AUTHORITATIVE AND CURATED 🚨
**File Location**: `schemas/20251121/linkml/modules/slots/slot_fixes.yaml`
**THIS FILE IS THE SINGLE SOURCE OF TRUTH FOR ALL SLOT MIGRATIONS.**
The `slot_fixes.yaml` file has been **manually curated** to specify the exact replacement slots and classes for each deprecated slot. The revisions are based on:
1. **Ontology analysis** - Each replacement was chosen based on alignment with base ontologies (CIDOC-CRM, RiC-O, PROV-O, Schema.org, etc.)
2. **Semantic correctness** - Revisions reflect the intended meaning of the original slot
3. **Pattern consistency** - Follows established naming conventions (Rule 39: RiC-O style, Rule 43: singular nouns)
4. **Class hierarchy design** - Type/Types pattern (Rule 0b) applied where appropriate
**YOU MUST NOT**:
- ❌ Substitute different slots than those specified in `revision`
- ❌ Use your own judgment to pick "similar" slots
- ❌ Skip the revision and invent new mappings
- ❌ Partially apply the revision (e.g., use the slot but not the class)
**YOU MUST**:
- ✅ Follow the `revision` section TO THE LETTER
- ✅ Use EXACTLY the slots and classes specified
- ✅ Apply ALL components of the revision (both slots AND classes)
- ✅ Interpret `link_branch` fields correctly (see below)
- ✅ Update `processed.status: true` after completing migration
---
## Understanding `link_branch` in Revision Plans
🚨 **CRITICAL**: The `link_branch` field in revision plans indicates **nested class attributes**. Items with `link_branch: N` are slots/classes that belong TO the primary class, not standalone replacements.
### How to Interpret `link_branch`
| Revision Item | Meaning |
|---------------|---------|
| Items **WITHOUT** `link_branch` | **PRIMARY** slot and class to create |
| Items **WITH** `link_branch: 1` | First attribute branch that the primary class needs |
| Items **WITH** `link_branch: 2` | Second attribute branch that the primary class needs |
| Items **WITH** `link_branch: N` | Nth attribute branch for the primary class |
### Example: `visitor_count` Revision
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/visitor_count
revision:
- label: has_or_had_quantity # PRIMARY SLOT (no link_branch)
type: slot
- label: Quantity # PRIMARY CLASS (no link_branch)
type: class
- label: has_or_had_measurement_unit # Quantity needs this slot
type: slot
link_branch: 1 # ← Branch 1: unit attribute
- label: MeasureUnit # Range of has_or_had_measurement_unit
type: class
value:
- visitors
link_branch: 1
- label: temporal_extent # Quantity needs this slot too
type: slot
link_branch: 2 # ← Branch 2: time attribute
- label: TimeSpan # Range of temporal_extent
type: class
link_branch: 2
```
**Interpretation**: This creates:
1. **Primary**: `has_or_had_quantity` slot → `Quantity` class
2. **Branch 1**: `Quantity.has_or_had_measurement_unit``MeasureUnit` (with value "visitors")
3. **Branch 2**: `Quantity.temporal_extent``TimeSpan`
### Resulting Class Structure
```yaml
# The Quantity class should have these slots:
Quantity:
slots:
- has_or_had_measurement_unit # From link_branch: 1
- temporal_extent # From link_branch: 2
```
### Complex Example: `visitor_conversion_rate`
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/visitor_conversion_rate
revision:
- label: has_or_had_conversion_rate # PRIMARY SLOT
type: slot
- label: ConversionRate # PRIMARY CLASS
type: class
- label: has_or_had_type # ConversionRate.has_or_had_type
type: slot
link_branch: 1
- label: ConversionRateType # Abstract type class
type: class
link_branch: 1
- label: includes_or_included # ConversionRateType hierarchy slot
type: slot
link_branch: 1
- label: ConversionRateTypes # Concrete subclasses file
type: class
link_branch: 1
- label: temporal_extent # ConversionRate.temporal_extent
type: slot
link_branch: 2
- label: TimeSpan # Range of temporal_extent
type: class
link_branch: 2
```
**Interpretation**:
1. **Primary**: `has_or_had_conversion_rate``ConversionRate`
2. **Branch 1**: Type hierarchy with `ConversionRateType` (abstract) + `ConversionRateTypes` (concrete subclasses)
3. **Branch 2**: Temporal tracking via `temporal_extent``TimeSpan`
### Migration Checklist for `link_branch` Revisions
- [ ] Create/verify PRIMARY slot exists
- [ ] Create/verify PRIMARY class exists
- [ ] For EACH `link_branch: N`:
- [ ] Add the branch slot to PRIMARY class's `slots:` list
- [ ] Import the branch slot file
- [ ] Import the branch class file (if creating new class)
- [ ] Verify range of branch slot points to branch class
- [ ] Update consuming class to use PRIMARY slot (not deprecated slot)
- [ ] Update examples to show nested structure
---
## Mandatory: Follow slot_fixes.yaml Revisions Exactly
**The `revision` section in `slot_fixes.yaml` is AUTHORITATIVE.** Do not substitute different slots based on your own judgment.
**Example from slot_fixes.yaml**:
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/actual_start
revision:
- label: begin_of_the_begin # ← USE THIS SLOT
type: slot
- label: TimeSpan # ← USE THIS CLASS
type: class
```
**CORRECT**: Use `begin_of_the_begin` slot (as specified)
**WRONG**: Substitute `has_actual_start_date` (not in revision)
## The Problem
Adding deprecation notes while keeping both old and new slots:
- Creates schema bloat with redundant properties
- Confuses data consumers about which slot to use
- Violates single-source-of-truth principle
- Complicates future data validation
## Anti-Pattern (WRONG)
```yaml
# WRONG - Keeping deprecated slot with deprecation note
classes:
TemporaryLocation:
slots:
- actual_start # OLD - kept with deprecation note
- actual_end # OLD - kept with deprecation note
- has_actual_start_date # NEW
- has_actual_end_date # NEW
slot_usage:
actual_start:
deprecated: |
DEPRECATED: Use has_actual_start_date instead.
# ... more deprecation documentation
```
## Correct Pattern
```yaml
# CORRECT - Only new slots, old slots completely removed
classes:
TemporaryLocation:
slots:
- has_actual_start_date # NEW - only new slots present
- has_actual_end_date # NEW
# NO slot_usage for deprecated slots - they don't exist in this class
```
## Migration Steps
When processing a slot from `slot_fixes.yaml`:
1. **Identify affected entity class(es)**
2. **Remove old slot from imports** (if dedicated import file exists)
3. **Remove old slot from slots list**
4. **Remove any slot_usage for old slot**
5. **Add new slot import** (if not already present)
6. **Add new slot to slots list**
7. **Add slot_usage for new slot** (if range override or customization needed)
8. **Update examples** to use new slot
9. **Validate with gen-owl**
## What Happens to Old Slot Files
The old slot files in `modules/slots/` (e.g., `actual_start.yaml`, `activities_societies.yaml`) are **NOT deleted** because:
- Other entity classes might still use them
- They serve as documentation of the old schema
- They can be archived when all usages are migrated
However, the old slots are **removed from the entity class** being migrated.
## Example: TemporaryLocation Migration
**Before** (with old slots):
```yaml
imports:
- ../slots/actual_end
- ../slots/actual_start
- ../slots/has_actual_start_date
- ../slots/has_actual_end_date
slots:
- actual_end
- actual_start
- has_actual_start_date
- has_actual_end_date
```
**After** (fully migrated):
```yaml
imports:
# actual_end and actual_start imports REMOVED
- ../slots/has_actual_start_date
- ../slots/has_actual_end_date
slots:
# actual_end and actual_start REMOVED from list
- has_actual_start_date
- has_actual_end_date
```
## Slot Usage for New Slots
Only add `slot_usage` for the new slot if you need to:
- Override the range for this specific class
- Add class-specific examples
- Add class-specific constraints
Do NOT add `slot_usage` just to document that it replaces an old slot.
## Recording in slot_fixes.yaml
When marking a slot as processed:
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/actual_start
processed:
status: true
timestamp: '2026-01-14T16:00:00Z'
session: "session-2026-01-14-type-migration"
notes: "FULLY MIGRATED: TemporaryLocation - actual_start REMOVED, using temporal_extent with TimeSpan.begin_of_the_begin (Rule 53)"
```
Note the "FULLY MIGRATED" prefix in notes to confirm this was a complete removal, not a deprecation-in-place.
---
## ⚠️ Common Mistakes to Avoid ⚠️
### Mistake 1: Substituting Different Slots
**slot_fixes.yaml specifies**:
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/actual_start
revision:
- label: begin_of_the_begin # ← MUST USE THIS
type: slot
- label: TimeSpan # ← WITH THIS CLASS
type: class
```
| Action | Status |
|--------|--------|
| Using `begin_of_the_begin` with `TimeSpan` | ✅ CORRECT |
| Using `has_actual_start_date` (invented) | ❌ WRONG |
| Using `start_date` (different slot) | ❌ WRONG |
| Using `begin_of_the_begin` WITHOUT `TimeSpan` | ❌ WRONG (incomplete) |
### Mistake 2: Partial Application
The revision often specifies MULTIPLE components that work together:
```yaml
revision:
- label: has_or_had_type # ← Slot for linking
type: slot
- label: BackupType # ← Abstract base class
type: class
- label: includes_or_included # ← Slot for hierarchy
type: slot
- label: BackupTypes # ← Concrete subclasses
type: class
```
**All four components** are part of the migration. Don't just use `has_or_had_type` and ignore the class structure.
### Mistake 3: Using `temporal_extent` Slot Correctly
When `slot_fixes.yaml` specifies TimeSpan-based revision:
```yaml
revision:
- label: begin_of_the_begin
type: slot
- label: TimeSpan
type: class
```
This means: **Use the `temporal_extent` slot** (which has `range: TimeSpan`) and access the temporal bounds via TimeSpan's slots:
```yaml
# CORRECT: Use temporal_extent with TimeSpan structure
temporal_extent:
begin_of_the_begin: '2020-06-15'
end_of_the_end: '2022-03-15'
# WRONG: Create new has_actual_start_date slot
has_actual_start_date: '2020-06-15' # ❌ Not in revision!
```
### Mistake 4: Not Updating Examples
When migrating slots, **update ALL examples** in the class file:
- Description examples (in class description)
- slot_usage examples
- Class-level examples (at bottom of file)
---
## Verification Checklist
Before marking a slot as processed:
- [ ] Read the `revision` section completely
- [ ] Identified ALL slots and classes in revision
- [ ] Removed old slot from imports
- [ ] Removed old slot from slots list
- [ ] Removed old slot from slot_usage
- [ ] Added new slot(s) per revision
- [ ] Added new class import(s) per revision
- [ ] Updated ALL examples to use new slots
- [ ] Validated with `linkml-lint` or `gen-owl`
- [ ] Updated `slot_fixes.yaml` with:
- `status: true`
- `timestamp` (ISO 8601)
- `session` identifier
- `notes` with "FULLY MIGRATED:" prefix
---
## See Also
- Rule 9: Enum-to-Class Promotion (single source of truth principle)
- Rule 0b: Type/Types File Naming Convention
- Rule: Slot Naming Convention (Current Style)
- `.opencode/ENUM_TO_CLASS_PRINCIPLE.md`
- `schemas/20251121/linkml/modules/slots/slot_fixes.yaml` - **AUTHORITATIVE** master list of migrations

View file

@ -0,0 +1,129 @@
# Rule: Generic Slots, Specific Classes
**Identifier**: `generic-slots-specific-classes`
**Severity**: **CRITICAL**
## Core Principle
**Slots MUST be generic predicates** that can be reused across multiple classes. **Classes MUST be specific** to provide context and constraints.
**DO NOT** create class-specific slots when a generic predicate can be used.
## Rationale
1. **Predicate Proliferation**: Creating bespoke slots for every class explodes the schema size (e.g., `has_museum_name`, `has_library_name`, `has_archive_name` instead of `has_name`).
2. **Interoperability**: Generic predicates (`has_name`, `has_identifier`, `has_part`) map cleanly to standard ontologies (Schema.org, Dublin Core, RiC-O).
3. **Querying**: It's easier to query "all entities with a name" than "all entities with museum_name OR library_name OR archive_name".
4. **Maintenance**: Updating one generic slot propagates to all classes.
## Examples
### ❌ Anti-Pattern: Class-Specific Slots
```yaml
# WRONG: Creating specific slots for each class
slots:
has_museum_visitor_count:
range: integer
has_library_patron_count:
range: integer
classes:
Museum:
slots:
- has_museum_visitor_count
Library:
slots:
- has_library_patron_count
```
### ✅ Correct Pattern: Generic Slot, Specific Class Usage
```yaml
# CORRECT: One generic slot reused
slots:
has_or_had_quantity:
slot_uri: rico:hasOrHadQuantity
range: Quantity
multivalued: true
classes:
Museum:
slots:
- has_or_had_quantity
slot_usage:
has_or_had_quantity:
description: The number of visitors to the museum.
Library:
slots:
- has_or_had_quantity
slot_usage:
has_or_had_quantity:
description: The number of registered patrons.
```
## Intermediate Class Pattern
Making slots generic often requires introducing **Intermediate Classes** to hold structured data, rather than flattening attributes onto the parent class.
### ❌ Anti-Pattern: Specific Flattened Slots
```yaml
# WRONG: Flattened specific attributes
classes:
Museum:
slots:
- has_museum_budget_amount
- has_museum_budget_currency
- has_museum_budget_year
```
### ✅ Correct Pattern: Generic Slot + Intermediate Class
```yaml
# CORRECT: Generic slot pointing to structured class
slots:
has_or_had_budget:
range: Budget
multivalued: true
classes:
Museum:
slots:
- has_or_had_budget
Budget:
slots:
- has_or_had_amount
- has_or_had_currency
- has_or_had_year
```
## Specificity Levels
| Level | Component | Example |
|-------|-----------|---------|
| **Generic** | **Slot (Predicate)** | `has_or_had_identifier` |
| **Specific** | **Class (Subject/Object)** | `ISILCode` |
| **Specific** | **Slot Usage (Context)** | "The ISIL code assigned to this library" |
## Migration Guide
If you encounter an overly specific slot:
1. **Identify the generic concept** (e.g., `has_museum_opening_hours``has_opening_hours`).
2. **Check if a generic slot exists** in `modules/slots/`.
3. **If yes**, use the generic slot and add `slot_usage` to the class.
4. **If no**, create the **generic** slot, not a specific one.
## Naming Indicators
**Reject slots containing:**
* Class names (e.g., `has_custodian_name``has_name`)
* Narrow types (e.g., `has_isbn_identifier``has_identifier`)
* Contextual specifics (e.g., `has_primary_email``has_email` + type/role)
## See Also
* Rule 55: Broaden Generic Predicate Ranges
* Rule: Slot Naming Convention (Current Style)

View file

@ -0,0 +1,157 @@
# Rule 59: LinkML Union Types Require `range: Any`
🚨 **CRITICAL**: When using `any_of` for union types in LinkML, you MUST also specify `range: Any` at the attribute level. Without it, the union type validation does NOT work.
## The Problem
LinkML's `any_of` construct allows defining slots that accept multiple types (e.g., string OR integer). However, there's a critical implementation detail:
**Without `range: Any`, the `any_of` constraint is silently ignored during validation.**
This leads to validation failures where data that should be valid (e.g., integer value in a string/integer union field) is rejected.
## Correct Pattern
```yaml
slots:
identifier_value:
range: Any # ← REQUIRED for any_of to work
any_of:
- range: string
- range: integer
description: The identifier value (can be string or integer)
```
## Incorrect Pattern (WILL FAIL)
```yaml
slots:
identifier_value:
# Missing range: Any - validation will fail!
any_of:
- range: string
- range: integer
description: The identifier value (can be string or integer)
```
## Common Use Cases
This pattern is required for:
| Use Case | Types | Example Fields |
|----------|-------|----------------|
| Identifier values | string \| integer | `identifier_value`, `geonames_id`, `viaf_id` |
| Social media IDs | string \| array | `youtube_channel_id`, `facebook_id`, `twitter_username` |
| Flexible identifiers | object \| array | `identifiers` (dict or list format) |
| Numeric strings | string \| integer | `postal_code`, `kvk_number` |
## Real-World Examples from GLAM Schema
### Example 1: OriginalEntryIdentifier.yaml
```yaml
# Before (BROKEN):
attributes:
identifier_value:
any_of:
- range: string
- range: integer
# After (WORKING):
attributes:
identifier_value:
range: Any # Added
any_of:
- range: string
- range: integer
```
### Example 2: WikidataSocialMedia.yaml
```yaml
# Social media fields that can be single value or array
attributes:
youtube_channel_id:
range: Any # Required for string|array union
any_of:
- range: string
- range: string
multivalued: true
description: YouTube channel ID (single value or array)
facebook_id:
range: Any
any_of:
- range: string
- range: string
multivalued: true
```
### Example 3: OriginalEntry.yaml (object|array union)
```yaml
# identifiers field that accepts both dict and array formats
attributes:
identifiers:
range: Any # Required for flexible typing
description: >-
Identifiers from original source. Accepts both dict format
(e.g., {isil: "XX-123"}) and array format
(e.g., [{scheme: "isil", value: "XX-123"}])
```
### Example 4: OriginalEntryLocation.yaml
```yaml
attributes:
geonames_id:
range: Any # Required for string|integer
any_of:
- range: string
- range: integer
description: GeoNames ID (may be string or integer depending on source)
```
## Validation Behavior
| Schema Definition | Integer Data | String Data | Result |
|-------------------|--------------|-------------|--------|
| `range: string` | ❌ FAIL | ✅ PASS | Strict string only |
| `range: integer` | ✅ PASS | ❌ FAIL | Strict integer only |
| `any_of` without `range: Any` | ❌ FAIL | ❌ FAIL | Broken - nothing works |
| `any_of` with `range: Any` | ✅ PASS | ✅ PASS | Correct union behavior |
## Why This Happens
LinkML's validation engine processes `range` first to determine the basic type constraint. When `range` is not specified (or defaults to `string`), it applies that constraint before checking `any_of`. The `range: Any` tells the validator to defer type checking to the `any_of` constraints.
## Checklist for Union Types
When adding a field that accepts multiple types:
- [ ] Define the `any_of` block with all acceptable ranges
- [ ] Add `range: Any` at the same level as `any_of`
- [ ] Test with sample data of each type
- [ ] Document the accepted types in the description
## See Also
- LinkML Documentation: [Union Types](https://linkml.io/linkml/schemas/advanced.html#union-types)
- GLAM Validation: `schemas/20251121/linkml/modules/classes/CustodianSourceFile.yaml`
- Validation command: `linkml-validate -s <schema>.yaml <data>.yaml`
## Migration Notes
**Affected Files (Fixed January 2026)**:
- `OriginalEntryIdentifier.yaml` - `identifier_value`
- `Identifier.yaml` - `identifier_value` slot_usage
- `WikidataSocialMedia.yaml` - `youtube_channel_id`, `facebook_id`, `instagram_username`, `linkedin_company_id`, `twitter_username`, `facebook_page_id`
- `YoutubeEnrichment.yaml` - `channel_id`
- `OriginalEntryLocation.yaml` - `geonames_id`
- `OriginalEntry.yaml` - `identifiers`
---
**Version**: 1.0
**Created**: 2026-01-18
**Author**: AI Agent (OpenCode Claude)

View file

@ -0,0 +1,181 @@
# LinkML YAML Best Practices Rule
## Rule: Follow LinkML Conventions for Valid, Interoperable Schema Files
### 1. equals_expression Anti-Pattern
`equals_expression` is for dynamic formula evaluation (e.g., `"{age_in_years} * 12"`). Never use it for static value constraints.
**WRONG:**
```yaml
slot_usage:
has_type:
equals_expression: '["hc:ArchiveOrganizationType"]'
hold_record_set:
equals_expression: '["hc:Fonds", "hc:Series"]'
```
**CORRECT** (single value):
```yaml
slot_usage:
has_type:
equals_string: "hc:ArchiveOrganizationType"
```
**CORRECT** (multiple allowed values - if classes):
```yaml
slot_usage:
hold_record_set:
any_of:
- range: UniversityAdministrativeFonds
- range: StudentRecordSeries
- range: FacultyPaperCollection
```
**CORRECT** (multiple allowed values - if literals):
```yaml
slot_usage:
status:
equals_string_in:
- "active"
- "inactive"
- "pending"
```
### 2. Declare All Used Prefixes
Every CURIE prefix used in the file must be declared in the `prefixes:` block.
**WRONG:**
```yaml
prefixes:
linkml: https://w3id.org/linkml/
skos: http://www.w3.org/2004/02/skos/core#
slot_usage:
has_type:
equals_string: "hc:ArchiveOrganizationType" # hc: not declared!
```
**CORRECT:**
```yaml
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
skos: http://www.w3.org/2004/02/skos/core#
default_prefix: hc
slot_usage:
has_type:
equals_string: "hc:ArchiveOrganizationType"
```
### 3. Import Referenced Classes
When using external classes in `is_a`, `range`, or other references, import them.
**WRONG:**
```yaml
imports:
- linkml:types
classes:
AcademicArchive:
is_a: ArchiveOrganizationType # Not imported!
slot_usage:
related_to:
range: WikidataAlignment # Not imported!
```
**CORRECT:**
```yaml
imports:
- linkml:types
- ../classes/ArchiveOrganizationType
- ../classes/WikidataAlignment
classes:
AcademicArchive:
is_a: ArchiveOrganizationType
slot_usage:
related_to:
range: WikidataAlignment
```
### 4. Quote Regex Patterns and Annotation Values
**Regex patterns:**
```yaml
# WRONG
pattern: ^Q[0-9]+$
# CORRECT
pattern: "^Q[0-9]+$"
```
**Annotation values (must be strings):**
```yaml
# WRONG
annotations:
specificity_score: 0.1
# CORRECT
annotations:
specificity_score: "0.1"
```
### 5. Remove Unused Imports
Only import slots and classes that are actually used in the file.
**WRONG:**
```yaml
imports:
- ../slots/has_scope # Never used in slots: or slot_usage:
- ../slots/has_score
- ../slots/has_type
```
**CORRECT:**
```yaml
imports:
- ../slots/has_score
- ../slots/has_type
```
### 6. Slot Usage Requires Slot Presence
A slot referenced in `slot_usage:` must either be:
- Listed in the `slots:` array, OR
- Inherited from a parent class via `is_a`
**WRONG:**
```yaml
classes:
MyClass:
slots:
- has_type
slot_usage:
has_type: {...}
identified_by: {...} # Not in slots: and not inherited!
```
**CORRECT:**
```yaml
classes:
MyClass:
slots:
- has_type
- identified_by
slot_usage:
has_type: {...}
identified_by: {...}
```
## Checklist for Class Files
- [ ] All prefixes used in CURIEs are declared
- [ ] `default_prefix` set if module belongs to that namespace
- [ ] All referenced classes are imported
- [ ] All used slots are imported
- [ ] No `equals_expression` with static JSON arrays
- [ ] Regex patterns are quoted
- [ ] Annotation values are quoted strings
- [ ] No unused imports
- [ ] `slot_usage` only references slots that exist (via slots: or inheritance)

View file

@ -0,0 +1,185 @@
# Mapping Specificity Rule: Broad vs Narrow vs Exact Mappings
## 🚨 CRITICAL: Mapping Semantics
When mapping LinkML classes to external ontologies, you MUST distinguish between **equivalence**, **hypernyms** (broader concepts), and **hyponyms** (narrower concepts).
### The Rule
1. **Exact Mappings (`skos:exactMatch`)**: Use ONLY when the external concept is **semantically equivalent** to your class.
* *Example*: `hc:Person` `exact_mappings` `schema:Person`.
* **CRITICAL**: Exact means the SAME semantic scope - neither broader nor narrower!
* **DO NOT AVOID EXACT BY DEFAULT**: If equivalence is verified (including class/property category match and ontology definition review), `exact_mappings` SHOULD be used.
2. **Broad Mappings (`skos:broadMatch`)**: Use when the external concept is a **hypernym** (a broader, more general category) of your class.
* *Example*: `hc:AcademicArchiveRecordSetType` `broad_mappings` `rico:RecordSetType`.
* *Rationale*: An academic archive record set *is a* record set type, but `rico:RecordSetType` is broader.
* *Common Hypernyms*: `skos:Concept`, `prov:Entity`, `prov:Activity`, `schema:Thing`, `schema:Organization`, `schema:Action`, `rico:RecordSetType`, `crm:E55_Type`.
3. **Narrow Mappings (`skos:narrowMatch`)**: Use when the external concept is a **hyponym** (a narrower, more specific category) of your class.
* *Example*: `hc:Organization` `narrow_mappings` `hc:Library` (if mapping inversely).
4. **Close Mappings (`skos:closeMatch`)**: Use when the external concept is similar but not exactly equivalent.
* *Example*: `hc:AccessPolicy` `close_mappings` `dcterms:accessRights` (related but different scope).
5. **Related Mappings (`skos:relatedMatch`)**: Use for non-hierarchical relationships.
* *Example*: `hc:Collection` `related_mappings` `rico:RecordSet`.
### 🚨 Type Compatibility Rule
**Classes map to classes, properties map to properties.** Never mix types in mappings.
| Your Element | Valid Mapping Target |
|--------------|---------------------|
| Class | Class (owl:Class, rdfs:Class) |
| Slot | Property (owl:ObjectProperty, owl:DatatypeProperty, rdf:Property) |
**WRONG**:
```yaml
# AccessApplication is a CLASS, schema:Action is a CLASS - but Action is BROADER
AccessApplication:
exact_mappings:
- schema:Action # WRONG: Action is a hypernym, not equivalent
```
**CORRECT**:
```yaml
AccessApplication:
broad_mappings:
- schema:Action # CORRECT: Action is the broader category
```
### 🚨 No Self/Internal Exact Mappings
`exact_mappings` MUST NOT contain self-references or internal HC class references for the same concept.
**WRONG**:
```yaml
AcademicArchive:
exact_mappings:
- hc:AcademicArchive # Self/internal reference; not an external equivalence mapping
```
**CORRECT**:
```yaml
AcademicArchive:
exact_mappings:
- wd:Q27032435 # External concept with equivalent semantic scope
```
Use `exact_mappings` only for equivalent terms in external ontologies or external controlled vocabularies, not for repeating the class itself.
### ✅ Positive Guidance: When Exact Mapping Is Correct
Use `exact_mappings` when all checks below pass:
- Semantic scope is equivalent (not parent/child, not merely similar)
- Ontological category matches (Class↔Class, Slot↔Property)
- Target term is verified in the ontology source files under `data/ontology/` or verified Wikidata entity metadata
- No self/internal duplication (no `hc:` self-reference for the same concept)
**CORRECT**:
```yaml
Person:
exact_mappings:
- schema:Person
Acquisition:
exact_mappings:
- crm:E8_Acquisition
```
Do not downgrade a truly equivalent mapping to `close_mappings` or `broad_mappings` just to be conservative.
### Common Hypernyms That Are NEVER Exact Mappings
These terms are always BROADER than your specific class - never use them as `exact_mappings`:
| Hypernym | What It Means | Use Instead |
|----------|---------------|-------------|
| `schema:Action` | Any action | `broad_mappings` |
| `schema:Organization` | Any organization | `broad_mappings` |
| `schema:Thing` | Anything at all | `broad_mappings` |
| `schema:PropertyValue` | Any property value | `broad_mappings` |
| `schema:Permit` | Any permit | `broad_mappings` |
| `prov:Activity` | Any activity | `broad_mappings` |
| `prov:Entity` | Any entity | `broad_mappings` |
| `skos:Concept` | Any concept | `broad_mappings` |
| `crm:E55_Type` | Any type classification | `broad_mappings` |
| `crm:E42_Identifier` | Any identifier | `broad_mappings` |
| `rico:Identifier` | Any identifier | `broad_mappings` |
| `dcat:DataService` | Any data service | `broad_mappings` |
### Common Violations to Avoid
**WRONG**:
```yaml
AcademicArchiveRecordSetType:
exact_mappings:
- rico:RecordSetType # WRONG: This implies AcademicArchiveRecordSetType == RecordSetType
```
**CORRECT**:
```yaml
AcademicArchiveRecordSetType:
broad_mappings:
- rico:RecordSetType # CORRECT: RecordSetType is broader
```
**WRONG**:
```yaml
SocialMovement:
exact_mappings:
- schema:Organization # WRONG: SocialMovement is a specific TYPE of Organization
```
**CORRECT**:
```yaml
SocialMovement:
broad_mappings:
- schema:Organization # CORRECT
```
**WRONG**:
```yaml
AccessApplication:
exact_mappings:
- schema:Action # WRONG: Action is a hypernym
```
**CORRECT**:
```yaml
AccessApplication:
broad_mappings:
- schema:Action # CORRECT: Action is the broader category
```
### How to Determine Mapping Type
Ask these questions:
1. **Is it the SAME thing?**`exact_mappings`
- "Could I swap these two terms in any context without changing meaning?"
- If NO, it's not an exact mapping
2. **Is the external term a PARENT category?**`broad_mappings`
- "Is my class a TYPE OF the external term?"
- Example: AccessApplication IS-A Action
3. **Is the external term a CHILD category?**`narrow_mappings`
- "Is the external term a TYPE OF my class?"
- Example: Library IS-A Organization (so Organization has narrow_mapping to Library)
4. **Is it similar but not hierarchical?**`close_mappings`
- "Related but not equivalent or hierarchical"
5. **Is there some other relationship?**`related_mappings`
- "Connected in some way"
### Verification Checklist
- [ ] Does the `exact_mapping` represent the **exact same scope**?
- [ ] Is the external term a generic parent class (e.g., `Type`, `Concept`, `Entity`, `Action`, `Activity`, `Organization`)? → Move to `broad_mappings`
- [ ] Is the external term a specific instance or subclass? → Check `narrow_mappings`
- [ ] Is the external term the same type (class→class, property→property)?
- [ ] Would swapping the terms change the meaning? If yes, not an `exact_mapping`

View file

@ -0,0 +1,177 @@
# Rule: Multilingual Support Requirements
## Overview
All LinkML slot files MUST include multilingual support with translations in the following languages:
| Code | Language | Required |
|------|----------|----------|
| `nl` | Dutch | ✅ Yes |
| `de` | German | ✅ Yes |
| `fr` | French | ✅ Yes |
| `ar` | Arabic | ✅ Yes |
| `id` | Indonesian | ✅ Yes |
| `zh` | Chinese (Simplified) | ✅ Yes |
| `es` | Spanish | ✅ Yes |
---
## Required Multilingual Fields
### 1. `alt_descriptions`
Provide faithful translations of the English `description` field:
```yaml
slots:
my_slot:
description: >-
To possess a specific structural arrangement or encoding standard.
alt_descriptions:
nl: >-
Het bezitten van een specifieke structurele rangschikking of coderingsstandaard.
de: >-
Das Besitzen einer spezifischen strukturellen Anordnung oder eines Kodierungsstandards.
fr: >-
Posséder un arrangement structurel spécifique ou une norme de codage.
ar: >-
امتلاك ترتيب هيكلي محدد أو معيار ترميز.
id: >-
Memiliki susunan struktural tertentu atau standar pengkodean.
zh: >-
拥有特定的结构安排或编码标准。
es: >-
Poseer una disposición estructural específica o un estándar de codificación.
```
### 2. `structured_aliases`
Provide translated slot names/labels for each language:
```yaml
slots:
has_format:
structured_aliases:
- literal_form: heeft formaat
predicate: EXACT_SYNONYM
in_language: nl
- literal_form: hat Format
predicate: EXACT_SYNONYM
in_language: de
- literal_form: a un format
predicate: EXACT_SYNONYM
in_language: fr
- literal_form: لديه تنسيق
predicate: EXACT_SYNONYM
in_language: ar
- literal_form: memiliki format
predicate: EXACT_SYNONYM
in_language: id
- literal_form: 具有格式
predicate: EXACT_SYNONYM
in_language: zh
- literal_form: tiene formato
predicate: EXACT_SYNONYM
in_language: es
```
---
## Translation Guidelines
### DO:
- Translate the semantic meaning faithfully
- Preserve technical precision
- Use natural phrasing for each language
- Keep translations concise (similar length to English)
### DON'T:
- Paraphrase or expand beyond the original meaning
- Add information not present in the English description
- Use machine translation without review
- Skip any of the required languages
---
## Complete Example
```yaml
id: https://nde.nl/ontology/hc/slot/catalogue
name: catalogue
title: catalogue
slots:
catalogue:
slot_uri: crm:P70_documents
description: >-
To systematically record, classify, and organize items within a structured
inventory or database for the purposes of documentation and retrieval.
alt_descriptions:
nl: >-
Het systematisch vastleggen, classificeren en ordenen van items binnen een
gestructureerde inventaris of database voor documentatie en terugvinding.
de: >-
Das systematische Erfassen, Klassifizieren und Ordnen von Objekten in einem
strukturierten Inventar oder einer Datenbank für Dokumentation und Abruf.
fr: >-
Enregistrer, classer et organiser systématiquement des éléments dans un
inventaire structuré ou une base de données à des fins de documentation et de récupération.
ar: >-
تسجيل وتصنيف وتنظيم العناصر بشكل منهجي ضمن جرد منظم أو قاعدة بيانات لأغراض التوثيق والاسترجاع.
id: >-
Mencatat, mengklasifikasikan, dan mengatur item secara sistematis dalam
inventaris terstruktur atau database untuk tujuan dokumentasi dan pengambilan.
zh: >-
在结构化清单或数据库中系统地记录、分类和组织项目,以便于文档编制和检索。
es: >-
Registrar, clasificar y organizar sistemáticamente elementos dentro de un
inventario estructurado o base de datos con fines de documentación y recuperación.
structured_aliases:
- literal_form: catalogiseren
predicate: EXACT_SYNONYM
in_language: nl
- literal_form: katalogisieren
predicate: EXACT_SYNONYM
in_language: de
- literal_form: cataloguer
predicate: EXACT_SYNONYM
in_language: fr
- literal_form: فهرسة
predicate: EXACT_SYNONYM
in_language: ar
- literal_form: mengkatalogkan
predicate: EXACT_SYNONYM
in_language: id
- literal_form: 编目
predicate: EXACT_SYNONYM
in_language: zh
- literal_form: catalogar
predicate: EXACT_SYNONYM
in_language: es
```
---
## Validation Checklist
Before completing a slot file, verify:
- [ ] `alt_descriptions` provided for all 7 languages (nl, de, fr, ar, id, zh, es)
- [ ] `structured_aliases` provided for all 7 languages
- [ ] Translations are faithful to the English original
- [ ] No language is skipped or left empty
- [ ] Arabic and Chinese characters render correctly
---
## See Also
- Rule 1: Preserve Original Descriptions (LINKML_EDITING_RULES.md)
- Rule 2: Translation Accuracy (LINKML_EDITING_RULES.md)
- Rule 3: Description Field Purity (LINKML_EDITING_RULES.md)
---
**Version**: 1.0.0
**Created**: 2026-02-03
**Author**: OpenCODE

View file

@ -0,0 +1,24 @@
# Rule: No Autonomous Alias Assignment
**Status**: ACTIVE
**Created**: 2026-02-10
## Rule
The agent MUST NOT assign aliases to canonical slot files on its own. Only the user decides which `new/` slot files are absorbed as aliases into which canonical slots.
## Rationale
Alias assignment is a semantic decision that determines the conceptual scope of a canonical slot. Incorrect alias assignment conflates distinct concepts. For example, `membership_criteria` (eligibility rules for joining) is not an alias of `has_mission` (organizational purpose), even though both relate to organizational governance.
## What the agent MUST do
1. When creating or polishing a canonical slot file, leave the `aliases` field empty unless the user has explicitly specified which aliases to include.
2. When processing `new/` files, present candidates to the user and wait for their alias assignment decisions.
3. Do NOT delete `new/` files until the user confirms the alias mapping.
## What the agent MUST NOT do
- Autonomously decide that a `new/` file should become an alias of a canonical slot.
- Add alias entries without explicit user instruction.
- Delete `new/` files based on self-determined alias assignments.

View file

@ -0,0 +1,46 @@
# Rule: Do Not Delete From slot_fixes.yaml
**Identifier**: `no-deletion-from-slot-fixes`
**Severity**: **CRITICAL**
## Core Directive
**NEVER delete entries from `slot_fixes.yaml`.**
The `slot_fixes.yaml` file serves as the historical record and audit trail for all schema migrations. Removing entries destroys this history and violates the project's data integrity principles.
## Workflow
When processing a migration:
1. **Do NOT Remove**: Never delete the entry for the slot you are working on.
2. **Update `processed`**: Instead, update the `processed` block:
* Set `status: true`.
* Set `date` to the current date (YYYY-MM-DD).
* Add a detailed `notes` string explaining what was done (e.g., "Fully migrated to [new_slot] + [Class] (Rule 53). [File].yaml updated. Slot archived.").
3. **Preserve History**: The entry must remain in the file permanently as a record of the migration.
## Rationale
* **Audit Trail**: We need to know what was migrated, when, and how.
* **Reversibility**: If a migration introduces a bug, the record helps us understand the original state.
* **Completeness**: The file tracks the total progress of the schema refactoring project.
## Example
**WRONG (Deletion)**:
```yaml
# DELETED from file
# - original_slot_id: ...
```
**CORRECT (Update)**:
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/has_some_slot
processed:
status: true
date: '2026-01-27'
notes: Fully migrated to has_or_had_new_slot + NewClass (Rule 53).
revision:
...
```

View file

@ -0,0 +1,189 @@
# Rule 52: No Duplicate Ontology Mappings
## Summary
Each ontology URI MUST appear in only ONE mapping category per schema element. A URI cannot simultaneously have multiple semantic relationships to the same class or slot.
## The Problem
LinkML provides five mapping annotation types based on SKOS vocabulary alignment:
| Property | SKOS Predicate | Meaning |
|----------|---------------|---------|
| `exact_mappings` | `skos:exactMatch` | "This IS that" (equivalent) |
| `close_mappings` | `skos:closeMatch` | "This is very similar to that" |
| `related_mappings` | `skos:relatedMatch` | "This is conceptually related to that" |
| `narrow_mappings` | `skos:narrowMatch` | "This is MORE SPECIFIC than that" |
| `broad_mappings` | `skos:broadMatch` | "This is MORE GENERAL than that" |
These relationships are **mutually exclusive**. A URI cannot simultaneously:
- BE the element (`exact_mappings`) AND be broader than it (`broad_mappings`)
- Be closely similar (`close_mappings`) AND be more general (`broad_mappings`)
## Anti-Pattern (WRONG)
```yaml
# WRONG - schema:url appears in TWO mapping types
slots:
source_url:
slot_uri: prov:atLocation
exact_mappings:
- schema:url # Says "source_url IS schema:url"
broad_mappings:
- schema:url # Says "schema:url is MORE GENERAL than source_url"
```
This is a **logical contradiction**: `source_url` cannot simultaneously BE `schema:url` AND be more specific than `schema:url`.
## Correct Pattern
```yaml
# CORRECT - each URI appears in only ONE mapping type
slots:
source_url:
slot_uri: prov:atLocation
exact_mappings:
- schema:url # source_url IS schema:url
close_mappings:
- dcterms:source # Similar but not identical
```
## Decision Guide: Which Mapping to Keep
When a URI appears in multiple categories, keep the **most precise** one:
### Precedence Order (keep the first match)
1. **exact_mappings** - Strongest claim: semantic equivalence
2. **close_mappings** - Strong claim: nearly equivalent
3. **narrow_mappings** / **broad_mappings** - Hierarchical relationship
4. **related_mappings** - Weakest claim: conceptual association
### Decision Matrix
| If URI appears in... | Keep | Remove |
|---------------------|------|--------|
| exact + broad | exact | broad |
| exact + close | exact | close |
| exact + related | exact | related |
| close + broad | close | broad |
| close + related | close | related |
| related + broad | related | broad |
| narrow + broad | narrow | broad (contradictory!) |
### Special Case: narrow + broad
If a URI appears in BOTH `narrow_mappings` AND `broad_mappings`, this is a **data error** - the same URI cannot be both more specific AND more general. Investigate which is correct based on the ontology definition.
## Real Examples Fixed
### Example 1: source_url
```yaml
# BEFORE (wrong)
slots:
source_url:
exact_mappings:
- schema:url
broad_mappings:
- schema:url # Duplicate!
# AFTER (correct)
slots:
source_url:
exact_mappings:
- schema:url # Keep exact (strongest)
# broad_mappings removed
```
### Example 2: Custodian class
```yaml
# BEFORE (wrong)
classes:
Custodian:
close_mappings:
- cpov:PublicOrganisation
narrow_mappings:
- cpov:PublicOrganisation # Duplicate!
# AFTER (correct)
classes:
Custodian:
close_mappings:
- cpov:PublicOrganisation # Keep close (Custodian ≈ PublicOrganisation)
# narrow_mappings: use for URIs that are MORE SPECIFIC than Custodian
```
### Example 3: geonames_id (narrow + broad conflict)
```yaml
# BEFORE (wrong - logical contradiction!)
slots:
geonames_id:
narrow_mappings:
- dcterms:identifier # Says geonames_id is MORE SPECIFIC
broad_mappings:
- dcterms:identifier # Says geonames_id is MORE GENERAL
# AFTER (correct)
slots:
geonames_id:
narrow_mappings:
- dcterms:identifier # geonames_id IS a specific type of identifier
# broad_mappings removed (was contradictory)
```
## Detection Script
Run this to find duplicate mappings in the schema:
```python
import yaml
from pathlib import Path
from collections import defaultdict
mapping_types = ['exact_mappings', 'close_mappings', 'related_mappings',
'narrow_mappings', 'broad_mappings']
dirs = [
Path('schemas/20251121/linkml/modules/slots'),
Path('schemas/20251121/linkml/modules/classes'),
]
for d in dirs:
for yaml_file in d.glob('*.yaml'):
try:
with open(yaml_file) as f:
content = yaml.safe_load(f)
except Exception:
continue
if not content:
continue
for section in ['classes', 'slots']:
items = content.get(section, {})
if not isinstance(items, dict):
continue
for name, defn in items.items():
if not isinstance(defn, dict):
continue
uri_to_types = defaultdict(list)
for mt in mapping_types:
for uri in defn.get(mt, []) or []:
uri_to_types[uri].append(mt)
for uri, types in uri_to_types.items():
if len(types) > 1:
print(f"{yaml_file}: {name} - {uri} in {types}")
```
## Validation Rule
**Pre-commit check**: Before committing LinkML schema changes, run the detection script. If any duplicates are found, the commit should fail.
## References
- [LinkML Mappings Documentation](https://linkml.io/linkml-model/latest/docs/mappings/)
- [SKOS Mapping Properties](https://www.w3.org/TR/skos-reference/#mapping)
- Rule 50: Ontology-to-LinkML Mapping Convention (parent rule)
- Rule 51: No Hallucinated Ontology References

View file

@ -0,0 +1,316 @@
# Rule 51: No Hallucinated Ontology References
**Priority**: CRITICAL
**Scope**: All LinkML schema files (`schemas/20251121/linkml/`)
**Created**: 2025-01-13
---
## Summary
All ontology references in LinkML schema files (`class_uri`, `slot_uri`, `*_mappings`) MUST be verifiable against actual ontology files in `/data/ontology/`. References to predicates or classes that do not exist in local ontology files are considered **hallucinated** and are prohibited.
---
## The Problem
AI agents may suggest ontology mappings based on training data without verifying that:
1. The ontology file exists in `/data/ontology/`
2. The specific predicate/class exists within that ontology file
3. The prefix is declared and resolvable
This leads to schema files containing references like `dqv:value` or `adms:status` that cannot be validated or serialized to RDF.
---
## Requirements
### 1. All Ontology Prefixes Must Have Local Files
Before using a prefix (e.g., `prov:`, `schema:`, `org:`), verify the ontology file exists:
```bash
# Check if ontology exists
ls data/ontology/ | grep -i "prov\|schema\|org"
```
**Available Ontologies** (as of 2025-01-13):
| Prefix | File | Verified |
|--------|------|----------|
| `prov:` | `prov-o.ttl`, `prov.ttl` | ✅ |
| `schema:` | `schemaorg.owl` | ✅ |
| `org:` | `org.rdf` | ✅ |
| `skos:` | `skos.rdf` | ✅ |
| `dcterms:` | `dublin_core_elements.rdf` | ✅ |
| `foaf:` | `foaf.ttl` | ✅ |
| `rico:` | `RiC-O_1-1.rdf` | ✅ |
| `crm:` | `CIDOC_CRM_v7.1.3.rdf` | ✅ |
| `geo:` | `geo.ttl` | ✅ |
| `sosa:` | `sosa.ttl` | ✅ |
| `bf:` | `bibframe.rdf` | ✅ |
| `edm:` | `edm.owl` | ✅ |
| `premis:` | `premis3.owl` | ✅ |
| `dcat:` | `dcat3.ttl` | ✅ |
| `ore:` | `ore.rdf` | ✅ |
| `pico:` | `pico.ttl` | ✅ |
| `gn:` | `geonames_ontology.rdf` | ✅ |
| `time:` | `time.ttl` | ✅ |
| `locn:` | `locn.ttl` | ✅ |
| `dqv:` | `dqv.ttl` | ✅ |
| `adms:` | `adms.ttl` | ✅ |
**NOT Available** (do not use without adding):
| Prefix | Status | Alternative |
|--------|--------|-------------|
| `qudt:` | Only referenced in era_ontology.ttl | Use `hc:` with close_mappings annotation |
### 2. Predicates Must Exist in Ontology Files
Before using a predicate, verify it exists:
```bash
# Verify predicate exists
grep -l "hasFrameRate\|frameRate" data/ontology/premis3.owl
# Check specific predicate definition
grep -E "premis:hasFrameRate|:hasFrameRate" data/ontology/premis3.owl
```
### 3. Use hc: Prefix for Domain-Specific Concepts
When no standard ontology predicate exists, use the Heritage Custodian namespace:
```yaml
# CORRECT - Use hc: with documentation
slots:
heritage_relevance_score:
slot_uri: hc:heritageRelevanceScore
description: Heritage sector relevance score (0.0-1.0)
annotations:
ontology_note: >-
No standard ontology predicate for heritage relevance scoring.
Domain-specific metric for this project.
# WRONG - Hallucinated predicate
slots:
heritage_relevance_score:
slot_uri: dqv:heritageScore # Does not exist!
```
### 4. Document External References in close_mappings
When a similar concept exists in an ontology we don't have locally, document it in `close_mappings` with a note:
```yaml
slots:
confidence_score:
slot_uri: hc:confidenceScore
close_mappings:
- dqv:value # W3C Data Quality Vocabulary (not in local files)
annotations:
external_ontology_note: >-
dqv:value from W3C Data Quality Vocabulary would be semantically
appropriate but ontology not included in project. See
https://www.w3.org/TR/vocab-dqv/
```
---
## Verification Workflow
### Before Adding New Mappings
1. **Check if ontology file exists**:
```bash
ls data/ontology/ | grep -i "<ontology-name>"
```
2. **Search for predicate in ontology**:
```bash
grep -l "<predicate-name>" data/ontology/*
```
3. **Verify predicate definition**:
```bash
grep -B2 -A5 "<predicate-name>" data/ontology/<file>
```
4. **If not found**: Use `hc:` prefix with appropriate documentation
### When Reviewing Existing Mappings
Run validation script:
```bash
# Find all slot_uri references
grep -r "slot_uri:" schemas/20251121/linkml/modules/slots/ | \
grep -v "hc:" | \
cut -d: -f3 | \
sort -u
# Verify each prefix has a local file
for prefix in prov schema org skos dcterms foaf rico; do
echo "Checking $prefix:"
ls data/ontology/ | grep -i "$prefix" || echo " NOT FOUND!"
done
```
---
## Ontology Addition Process
If a new ontology is genuinely needed:
1. **Download the ontology**:
```bash
curl -L -o data/ontology/<name>.ttl "<url>" -H "Accept: text/turtle"
```
2. **Update ONTOLOGY_CATALOG.md**:
```bash
# Add entry to data/ontology/ONTOLOGY_CATALOG.md
```
3. **Verify predicates exist**:
```bash
grep "<predicate>" data/ontology/<name>.ttl
```
4. **Update LinkML prefixes** in schema files
---
## Examples
### CORRECT: Verified Mapping
```yaml
slots:
retrieval_timestamp:
slot_uri: prov:atTime # Verified in data/ontology/prov-o.ttl
range: datetime
```
### CORRECT: Domain-Specific with External Reference
```yaml
slots:
confidence_score:
slot_uri: hc:confidenceScore # HC namespace (always valid)
range: float
close_mappings:
- dqv:value # External reference (documented, not required locally)
annotations:
ontology_note: >-
Uses HC namespace as dqv: ontology not in local files.
dqv:value would be semantically appropriate alternative.
```
### WRONG: Hallucinated Mapping
```yaml
slots:
confidence_score:
slot_uri: dqv:value # INVALID - dqv: not in data/ontology/!
range: float
```
### WRONG: Non-Existent Predicate
```yaml
slots:
frame_rate:
slot_uri: premis:hasFrameRate # INVALID - predicate not in premis3.owl!
range: float
```
---
## Consequences of Violation
1. **RDF serialization fails** - Invalid prefixes cause gen-owl errors
2. **Schema validation errors** - LinkML validates prefix declarations
3. **Broken interoperability** - External systems cannot resolve URIs
4. **Data quality issues** - Semantic web tooling cannot process data
---
## PREMIS Ontology Reference (premis3.owl)
**CRITICAL**: The PREMIS ontology is frequently hallucinated. ALL premis: references MUST be verified.
### Valid PREMIS Classes
```
Action, Agent, Bitstream, Copyright, Dependency, EnvironmentCharacteristic,
Event, File, Fixity, HardwareAgent, Identifier, Inhibitor, InstitutionalPolicy,
IntellectualEntity, License, Object, Organization, OutcomeStatus, Person,
PreservationPolicy, Representation, RightsBasis, RightsStatus, Rule, Signature,
SignatureEncoding, SignificantProperties, SoftwareAgent, Statute,
StorageLocation, StorageMedium
```
### Valid PREMIS Properties
```
act, allows, basis, characteristic, citation, compositionLevel, dependency,
determinationDate, documentation, encoding, endDate, fixity, governs,
identifier, inhibitedBy, inhibits, jurisdiction, key, medium, note,
originalName, outcome, outcomeNote, policy, prohibits, purpose, rationale,
relationship, restriction, rightsStatus, signature, size, startDate,
storedAt, terms, validationRules, version
```
### Known Hallucinated PREMIS Terms (DO NOT USE)
| Hallucinated Term | Correction |
|-------------------|------------|
| `premis:PreservationEvent` | Use `premis:Event` |
| `premis:RightsDeclaration` | Use `premis:RightsBasis` or `premis:RightsStatus` |
| `premis:hasRightsStatement` | Use `premis:rightsStatus` |
| `premis:hasRightsDeclaration` | Use `premis:rightsStatus` |
| `premis:hasRepresentation` | Use `premis:relationship` or `dcterms:hasFormat` |
| `premis:hasRelatedStatementInformation` | Use `premis:note` or `adms:status` |
| `premis:hasObjectCharacteristics` | Use `premis:characteristic` |
| `premis:rightsGranted` | Use `premis:RightsStatus` class with `premis:restriction` |
| `premis:rightsEndDate` | Use `premis:endDate` |
| `premis:linkingAgentIdentifier` | Use `premis:Agent` class |
| `premis:storageLocation` (lowercase) | Use `premis:storedAt` property or `premis:StorageLocation` class |
| `premis:hasFrameRate` | Does not exist - use `hc:frameRate` |
| `premis:environmentCharacteristic` (lowercase) | Use `premis:EnvironmentCharacteristic` (class) |
### PREMIS Verification Commands
```bash
# List all PREMIS classes
grep -E "owl:Class.*premis" data/ontology/premis3.owl | \
sed 's/.*v3\///' | sed 's/".*//' | sort -u
# List all PREMIS properties
grep -E "ObjectProperty|DatatypeProperty" data/ontology/premis3.owl | \
grep -oP 'v3/\K[^"]+' | sort -u
# Verify a specific term exists
grep -c "YourTermHere" data/ontology/premis3.owl
```
---
## See Also
- Rule 38: Slot Centralization and Semantic URI Requirements
- Rule 50: Ontology-to-LinkML Mapping Convention
- `/data/ontology/ONTOLOGY_CATALOG.md` - Available ontologies
- `.opencode/rules/slot-ontology-mapping-reference.md` - Mapping reference
---
## Version History
- **2025-01-13**: Added 7 more hallucinated PREMIS terms discovered during schema audit:
- `premis:hasRightsStatement`, `premis:hasRightsDeclaration`, `premis:hasRepresentation`
- `premis:hasRelatedStatementInformation`, `premis:rightsGranted`, `premis:rightsEndDate`
- `premis:linkingAgentIdentifier`
- **2025-01-13**: Initial creation after discovering dqv:, adms:, qudt: references without local files

View file

@ -0,0 +1,164 @@
# Rule 60: No Migration Deferral - Execute All slot_fixes.yaml Revisions
## Status: MANDATORY
## Summary
**AI agents MUST execute ALL migrations specified in `slot_fixes.yaml`.** The pattern of marking migrations as "WELL_STRUCTURED_NO_MIGRATION_NEEDED" or similar deferral notes is **STRICTLY PROHIBITED** except in the rarest cases of genuine semantic conflict.
## The Problem
Previous AI agents incorrectly deferred >95% of slot_fixes.yaml migrations with notes like:
- "WELL_STRUCTURED_NO_MIGRATION_NEEDED"
- "Already has proper slot_uri"
- "Simple enum is appropriate"
- "Would add unnecessary indirection"
- "Current structure is adequate"
**These deferrals were WRONG.** The slot_fixes.yaml revisions were carefully curated based on:
- Ontology analysis (CIDOC-CRM, RiC-O, PROV-O, Schema.org alignment)
- Semantic correctness and consistency
- Pattern adherence (Rule 39: RiC-O naming, Rule 43: singular nouns)
- Type/Types class hierarchy design (Rule 0b)
## The Rule
### MANDATORY: Execute ALL Migrations
When encountering a slot_fixes.yaml entry with `status: false`:
1. **READ the `revision` field** - This specifies EXACTLY what to do
2. **EXECUTE the migration** - Create/update slots, classes, imports as specified
3. **UPDATE status to `true`** - Document what was done in `notes`
### PROHIBITED: Deferral Reasoning
The following justifications are **INVALID** and **NEVER acceptable**:
| Invalid Justification | Why It's Wrong |
|-----------------------|----------------|
| "Already has proper slot_uri" | slot_uri handles external mapping; internal structure is separate concern |
| "Simple string/enum is sufficient" | Consistency and extensibility trump local simplicity |
| "Would add unnecessary indirection" | Indirection enables reuse and future extension |
| "Creating a class would over-engineer" | Ontology design favors class-based modeling |
| "Current structure is well-designed" | Individual optimization conflicts with schema-wide patterns |
| "No semantic benefit" | Agent lacks context to make this judgment |
### VALID: Reasons to Pause (Not Defer)
The ONLY valid reasons to pause a migration (NOT defer indefinitely) are:
1. **Semantic Conflict**: The proposed revision contradicts the slot's documented semantic intent
2. **Circular Dependency**: Migration would create import cycle
3. **External Consumer Impact**: Breaking change affecting downstream systems
4. **Class Already Exists**: Target class exists under different name (document and merge)
**When pausing**: Add a `feedback` entry explaining the specific conflict and request human review. Do NOT mark as "NO_MIGRATION_NEEDED".
## Decision Tree
```
Is there a slot_fixes.yaml entry with status: false?
├─ YES → Read the revision field
│ ├─ Does revision specify slots/classes to create?
│ │ └─ YES → EXECUTE THE MIGRATION (mandatory)
│ └─ Is there a genuine semantic conflict?
│ ├─ NO → EXECUTE THE MIGRATION (mandatory)
│ └─ YES → Document conflict in feedback, request human review
│ (Do NOT mark as "no migration needed")
└─ NO → Nothing to do
```
## Examples
### WRONG: Deferral Note
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/example_slot
revision:
- label: has_or_had_example
type: slot
- label: Example
type: class
processed:
status: true # WRONG - marked true without doing work
notes: "WELL_STRUCTURED_NO_MIGRATION_NEEDED - slot already has proper
slot_uri and the current structure is adequate" # INVALID
```
### CORRECT: Execute Migration
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/example_slot
revision:
- label: has_or_had_example
type: slot
- label: Example
type: class
processed:
status: true
timestamp: '2026-01-19T12:00:00Z'
notes: 'Migrated 2026-01-19 per Rule 53/56.
- Created has_or_had_example.yaml slot file
- Created Example.yaml class file
- Updated ClassA.yaml, ClassB.yaml to use new slot
- Archived: modules/slots/archive/example_slot_archived_20260119.yaml'
```
### CORRECT: Pause with Genuine Conflict
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/conflicting_slot
revision:
- label: has_or_had_foo
type: slot
processed:
status: false # Correctly left false
notes: ''
feedback:
- timestamp: '2026-01-19T12:00:00Z'
user: opencode-claude
done: false
comment: |
PAUSED FOR HUMAN REVIEW - Genuine semantic conflict detected:
- Revision specifies has_or_had_foo (temporal relationship)
- But slot is used for immutable birth dates (should be has_*)
- Request clarification on intended temporal semantics
```
## Statistics Context
The slot_fixes.yaml file contains 527 migration entries. Analysis of previous agent behavior:
- **Incorrectly deferred**: >95% of entries marked "NO_MIGRATION_NEEDED"
- **Actually needing deferral**: <5% (genuine semantic conflicts)
- **Required action**: Execute ALL migrations except those with documented semantic conflicts
## Related Rules
- **Rule 53**: Full Slot Migration - slot_fixes.yaml is AUTHORITATIVE
- **Rule 56**: Semantic Consistency Over Simplicity - Always Execute Revisions
- **Rule 57**: slot_fixes.yaml Revision Key is IMMUTABLE
- **Rule 58**: Feedback vs Revision Distinction
## Anti-Patterns Checklist
Before marking ANY migration as complete without execution, verify:
- [ ] Did I actually create the specified slots?
- [ ] Did I actually create the specified classes?
- [ ] Did I update all class files that use this slot?
- [ ] Did I archive the old slot file?
- [ ] Is my "notes" field documenting actual work done, not a deferral excuse?
If any answer is "no", the migration is NOT complete.
## Consequences
Agents that defer migrations without genuine semantic conflict:
1. Create technical debt requiring human cleanup
2. Delay schema consistency improvements
3. Waste curator time reviewing false "completions"
4. Undermine trust in AI-assisted schema work
**Execute the migrations. Do not defer.**

View file

@ -0,0 +1,215 @@
# Rule 42: No Ontology Prefixes in Slot Names
**CRITICAL**: LinkML slot names MUST NOT include ontology namespace prefixes. Ontology references belong in mapping properties, NOT in element names.
---
## 1. The Problem
Slot names like `rico_has_or_had_holder` or `skos_broader` violate separation of concerns:
- **Slot names** should describe the semantic meaning in plain, readable terms
- **Ontology mappings** belong in `slot_uri`, `exact_mappings`, `close_mappings`, `related_mappings`, `narrow_mappings`, `broad_mappings`
Embedding ontology prefixes in names:
1. Creates coupling between naming and specific ontology versions
2. Reduces readability for non-ontology experts
3. Duplicates information already in mapping properties
4. Makes future ontology migrations harder
---
## 2. Correct Pattern
### Use Descriptive Names + Mapping Properties
```yaml
# CORRECT: Clean name with ontology reference in slot_uri
slots:
record_holder:
description: The custodian that holds or held this record set.
slot_uri: rico:hasOrHadHolder
exact_mappings:
- rico:hasOrHadHolder
close_mappings:
- schema:holdingArchive
range: Custodian
```
### WRONG: Ontology Prefix in Name
```yaml
# WRONG: Ontology prefix embedded in slot name
slots:
rico_has_or_had_holder: # BAD - "rico_" prefix
description: The custodian that holds or held this record set.
slot_uri: rico:hasOrHadHolder
range: string
```
---
## 3. Prohibited Prefixes in Slot Names
The following prefixes MUST NOT appear at the start of slot names:
| Prefix | Ontology | Example Violation |
|--------|----------|-------------------|
| `rico_` | Records in Contexts | `rico_organizational_principle` |
| `skos_` | SKOS | `skos_broader`, `skos_narrower` |
| `schema_` | Schema.org | `schema_name` |
| `dcterms_` | Dublin Core | `dcterms_created` |
| `dct_` | Dublin Core | `dct_identifier` |
| `prov_` | PROV-O | `prov_generated_by` |
| `org_` | W3C Organization | `org_has_member` |
| `crm_` | CIDOC-CRM | `crm_carried_out_by` |
| `foaf_` | FOAF | `foaf_knows` |
| `owl_` | OWL | `owl_same_as` |
| `rdf_` | RDF | `rdf_type` |
| `rdfs_` | RDFS | `rdfs_label` |
| `cpov_` | CPOV | `cpov_public_organisation` |
| `tooi_` | TOOI | `tooi_overheidsorganisatie` |
| `bf_` | BIBFRAME | `bf_title` |
| `edm_` | Europeana | `edm_provided_cho` |
---
## 4. Migration Examples
### Example 1: RiC-O Slots
```yaml
# BEFORE (wrong)
rico_has_or_had_holder:
slot_uri: rico:hasOrHadHolder
range: string
# AFTER (correct)
record_holder:
description: Reference to the custodian that holds or held this record set.
slot_uri: rico:hasOrHadHolder
exact_mappings:
- rico:hasOrHadHolder
range: Custodian
```
### Example 2: SKOS Slots
```yaml
# BEFORE (wrong)
skos_broader:
slot_uri: skos:broader
range: uriorcurie
# AFTER (correct)
broader_concept:
description: A broader concept in the hierarchy.
slot_uri: skos:broader
exact_mappings:
- skos:broader
range: uriorcurie
```
### Example 3: RiC-O Organizational Principle
```yaml
# BEFORE (wrong)
rico_organizational_principle:
slot_uri: rico:hasRecordSetType
range: string
# AFTER (correct)
organizational_principle:
description: The organizational principle (fonds, series, collection) for this record set.
slot_uri: rico:hasRecordSetType
exact_mappings:
- rico:hasRecordSetType
range: string
```
---
## 5. Exceptions
### 5.1 Identifier Slots
Slots that store **identifiers from external systems** may include system names (not ontology prefixes):
```yaml
# ALLOWED: External system identifier
wikidata_id:
description: Wikidata entity identifier (Q-number).
slot_uri: schema:identifier
range: string
pattern: "^Q[0-9]+$"
# ALLOWED: External system identifier
viaf_id:
description: VIAF identifier for authority control.
slot_uri: schema:identifier
range: string
```
### 5.2 Internal Namespace Force Slots
Technical slots for namespace generation are prefixed with `internal_`:
```yaml
# ALLOWED: Technical workaround slot
internal_wd_namespace_force:
description: Internal slot to force WD namespace generation. Do not use.
slot_uri: wd:Q35120
range: string
```
---
## 6. Validation
Run this command to find violations:
```bash
cd schemas/20251121/linkml/modules/slots
ls -1 *.yaml | grep -E "^(rico_|skos_|schema_|dcterms_|dct_|prov_|org_|crm_|foaf_|owl_|rdf_|rdfs_|cpov_|tooi_|bf_|edm_)"
```
Expected output: No files (after migration)
---
## 7. Rationale
### LinkML Best Practices
LinkML provides dedicated properties for ontology alignment:
| Property | Purpose | Example |
|----------|---------|---------|
| `slot_uri` | Primary ontology predicate | `slot_uri: rico:hasOrHadHolder` |
| `exact_mappings` | Semantically equivalent predicates | `exact_mappings: [schema:holdingArchive]` |
| `close_mappings` | Nearly equivalent predicates | `close_mappings: [dc:creator]` |
| `related_mappings` | Related but different predicates | `related_mappings: [prov:wasAttributedTo]` |
| `narrow_mappings` | More specific predicates | `narrow_mappings: [rico:hasInstantiation]` |
| `broad_mappings` | More general predicates | `broad_mappings: [schema:about]` |
See: https://linkml.io/linkml-model/latest/docs/mappings/
### Clean Separation of Concerns
- **Names**: Human-readable, domain-focused terminology
- **URIs**: Machine-readable, ontology-specific identifiers
- **Mappings**: Cross-ontology alignment documentation
This separation allows:
1. Renaming slots without changing ontology bindings
2. Adding new ontology mappings without renaming slots
3. Clear documentation of semantic relationships
4. Easier maintenance and evolution
---
## 8. See Also
- **Rule 38**: Slot Centralization and Semantic URI Requirements
- **Rule 39**: Slot Naming Convention (RiC-O Style) - for temporal naming patterns
- LinkML Mappings Documentation: https://linkml.io/linkml-model/latest/docs/mappings/

View file

@ -0,0 +1,61 @@
# Rule: No Rough Edits in Schema Files
**Identifier**: `no-rough-edits-in-schema`
**Severity**: **CRITICAL**
## Core Directive
**DO NOT** perform rough, imprecise, or bulk text substitutions (like `sed -i` or regex-based python scripts) on LinkML schema files (`schemas/*/linkml/`) without guaranteeing structural integrity.
**YOU MUST**:
* ✅ Use proper YAML parsers/dumpers if modifying structure programmatically.
* ✅ Manually verify edits if using text replacement.
* ✅ Ensure indentation and nesting are preserved exactly.
* ✅ Respect comments and ordering (which parsers often destroy, so careful text editing is sometimes necessary, but it must be PRECISE).
## Rationale
LinkML schemas are highly structured YAML files where indentation and nesting semantics are critical. Rough edits often cause:
* **Duplicate keys** (e.g., leaving a property behind after deleting its parent key).
* **Invalid indentation** (breaking the parent-child relationship).
* **Silent corruption** (valid YAML but wrong semantics).
## Examples
### ❌ Anti-Pattern: Rough Deletion
Deleting lines containing a string without checking context:
```python
# WRONG: Deleting lines blindly
for line in lines:
if "some_slot" in line:
continue # Deletes the line, but might leave children orphaned!
new_lines.append(line)
```
**Resulting Corruption**:
```yaml
# Original
slots:
some_slot:
range: string
# Corrupted (orphaned child)
slots:
range: string # INVALID!
```
### ✅ Correct Pattern: Structural Awareness
If removing a slot reference, ensure you remove the entire list item or key-value block.
```python
# BETTER: Check for list item syntax
if re.match(r'^\s*-\s*some_slot\s*$', line):
continue
```
## Application
This rule applies to ALL files in `schemas/20251121/linkml/` and future versions.

View file

@ -0,0 +1,53 @@
# Rule: No Version Indicators in Names
## 🚨 Critical
Do not include version identifiers in **class names**, **slot names**, or **enum names**.
Version tags in semantic names create churn, break reuse, and force unnecessary migrations.
## The Rule
1. Use stable semantic names for LinkML elements.
- ✅ `DigitalPlatform`
- ❌ `DigitalPlatformV2`
2. If a model evolves, keep the name and update metadata/provenance.
- Track revision in changelog, annotations, or transformation metadata.
- Do not encode `v2`, `v3`, `_2026`, `beta`, `final` in the element name.
3. Apply this to all naming surfaces:
- `classes:` keys
- `slots:` keys
- `enums:` keys
- `name:` values in module files
## Allowed Versioning Locations
- File-level changelog/comments
- Dedicated metadata classes/slots (e.g., transformation metadata)
- External release tags (git tags, manifest versions)
## Migration Guidance
When you encounter versioned names:
1. Rename semantic elements to stable names.
2. Update references/imports/usages accordingly.
3. Preserve provenance of the migration in comments/annotations.
## Examples
✅ Correct:
```yaml
classes:
DigitalPlatformTransformationMetadata:
description: Metadata about record transformation steps.
```
❌ Wrong:
```yaml
classes:
DigitalPlatformV2TransformationMetadata:
description: Metadata about V2 transformation.
```

View file

@ -0,0 +1,15 @@
# Rule: Ontology Detection vs Heuristics
## Summary
When detecting classes and predicates in `data/ontology/` or external ontology files, you must **read the actual ontology definitions** (e.g., RDF, OWL, TTL files) to determine if a term is a Class or a Property. Do not rely on naming heuristics (like "Capitalized means Class").
## Detail
* **Verification**: Always read the source ontology file or use a semantic lookup tool to verify the `rdf:type` of an entity.
* If `rdf:type` is `owl:Class` or `rdfs:Class`, it is a **Class**.
* If `rdf:type` is `rdf:Property`, `owl:ObjectProperty`, or `owl:DatatypeProperty`, it is a **Property**.
* **Avoid Heuristics**: Do not assume that `skos:Concept` is a class just because it looks like one (it is), or that `schema:name` is a property just because it's lowercase. Many ontologies have inconsistent naming conventions (e.g., `schema:Person` vs `foaf:Person`).
* **Strictness**: If the ontology file is not available locally, attempt to fetch it or consult authoritative documentation before guessing.
## Violation Examples
* Assuming `ex:MyTerm` is a class because it starts with an uppercase letter without checking the `.ttl` file.
* Mapping a LinkML slot to `schema:Thing` (a Class) instead of a Property because you guessed based on the name.

View file

@ -0,0 +1,306 @@
# Rule 50: Ontology-to-LinkML Mapping Convention
🚨 **CRITICAL**: When mapping base ontology classes and predicates to LinkML schema elements, use LinkML's dedicated mapping properties as documented at https://linkml.io/linkml-model/latest/docs/mappings/
---
## 1. What "LinkML Mapping" Means in This Project
**"LinkML mapping"** refers specifically to:
1. Connecting LinkML schema elements (classes, slots, enums) to external ontology URIs
2. Using LinkML's built-in mapping properties (`class_uri`, `slot_uri`, `*_mappings`)
3. Following SKOS-based vocabulary alignment standards
**LinkML mapping does NOT mean**:
- Creating arbitrary crosswalks in spreadsheets
- Writing prose descriptions of how concepts relate
- Inventing custom `@context` JSON-LD mappings outside the schema
---
## 2. LinkML Mapping Property Reference
### Primary Identity Properties
| Property | Applies To | Purpose | Example |
|----------|-----------|---------|---------|
| `class_uri` | Classes | Primary RDF class URI | `class_uri: ore:Aggregation` |
| `slot_uri` | Slots | Primary RDF predicate URI | `slot_uri: rico:hasOrHadHolder` |
| `enum_uri` | Enums | Enum namespace URI | `enum_uri: hc:PlatformTypeEnum` |
### SKOS-Based Mapping Properties
These properties express **semantic relationships** to external ontology terms:
| Property | SKOS Predicate | Meaning | Use When |
|----------|---------------|---------|----------|
| `exact_mappings` | `skos:exactMatch` | **IDENTICAL meaning** | Different ontology, **SAME semantics** (interchangeable) |
| `close_mappings` | `skos:closeMatch` | Very similar meaning | Similar but **NOT interchangeable** |
| `related_mappings` | `skos:relatedMatch` | Semantically related | Broader conceptual relationship |
| `narrow_mappings` | `skos:narrowMatch` | This is more specific | External term is broader |
| `broad_mappings` | `skos:broadMatch` | This is more general | External term is narrower |
### ⚠️ CRITICAL: `exact_mappings` Requires PRECISE Semantic Equivalence
**`exact_mappings` means the terms are INTERCHANGEABLE** - you could substitute one for the other in any context without changing meaning.
**Requirements for `exact_mappings`**:
1. **Same definition**: Both terms must have equivalent definitions
2. **Same scope**: Both terms cover the same set of instances
3. **Same constraints**: Same domain/range restrictions apply
4. **Bidirectional**: If A exactMatch B, then B exactMatch A
**DO NOT use `exact_mappings` when**:
- One term is a subset of the other (use `narrow_mappings`/`broad_mappings`)
- Terms are similar but have different scopes (use `close_mappings`)
- Terms are related but not equivalent (use `related_mappings`)
- You're uncertain about equivalence (default to `close_mappings`)
**Example - WRONG**:
```yaml
# PersonProfile is NOT equivalent to foaf:Person
# PersonProfile is a structured document ABOUT a person, not the person themselves
exact_mappings:
- foaf:Person # ❌ WRONG - different semantics!
```
**Example - CORRECT**:
```yaml
# foaf:Person and schema:Person ARE equivalent
# Both define "a person" with the same scope
exact_mappings:
- schema:Person # ✅ CORRECT - truly equivalent
```
---
## 3. Mapping Workflow: Ontology → LinkML
### Step 1: Identify External Ontology Class/Predicate
Search base ontology files in `/data/ontology/`:
```bash
# Find aggregation-related classes
rg -i "aggregation|aggregate" data/ontology/*.ttl data/ontology/*.rdf data/ontology/*.owl
# Check specific ontology
rg "rdfs:Class|owl:Class" data/ontology/ore.rdf | grep -i "aggregation"
```
### Step 2: Determine Mapping Strength
| Scenario | Mapping Property |
|----------|------------------|
| **This IS that ontology class** (identity) | `class_uri` |
| **Equivalent in another vocabulary** | `exact_mappings` |
| **Similar concept, different scope** | `close_mappings` |
| **Related but different granularity** | `narrow_mappings` / `broad_mappings` |
| **Conceptually related** | `related_mappings` |
### Step 3: Document Mapping in LinkML Schema
#### For Classes
```yaml
classes:
DataAggregator:
class_uri: ore:Aggregation # Primary identity - THIS IS an ORE Aggregation
description: |
A platform that harvests and STORES copies of metadata/content, causing data duplication.
ore:Aggregation - "A set of related resources grouped together."
Mapped to ORE because aggregators create aggregations of harvested metadata.
exact_mappings:
- edm:EuropeanaAggregation # Europeana's specialization
close_mappings:
- dcat:Catalog # Similar (collects datasets) but broader scope
narrow_mappings:
- edm:ProvidedCHO # More specific (single cultural object)
```
#### For Slots
```yaml
slots:
aggregates_from:
slot_uri: ore:aggregates # Primary predicate
description: |
Institutions whose data is aggregated (harvested and stored) by this platform.
ore:aggregates - "Aggregations assert ore:aggregates relationships."
exact_mappings:
- edm:aggregatedCHO # Europeana equivalent
range: HeritageCustodian
multivalued: true
```
---
## 4. Aggregation vs. Linking: A Mapping Example
This project requires **semantic precision** in distinguishing:
| Concept | Primary Mapping | Semantic Pattern |
|---------|-----------------|------------------|
| **Data Aggregation** | `ore:Aggregation` | Data is COPIED to aggregator's server |
| **Linking/Federation** | `dcat:DataService` | Data REMAINS at source; only links provided |
### Aggregation Pattern (Data Duplication)
```yaml
classes:
DataAggregator:
class_uri: ore:Aggregation
description: |
Harvests and stores copies of metadata from partner institutions.
Key semantic: Data DUPLICATION occurs - the aggregator maintains its own copy.
Examples: Europeana, DPLA, Archives Portal Europe
exact_mappings:
- edm:EuropeanaAggregation
annotations:
data_storage_pattern: AGGREGATION
causes_data_duplication: true
```
### Linking Pattern (Single Source of Truth)
```yaml
classes:
FederatedDiscoveryPortal:
class_uri: dcat:DataService
description: |
Provides unified search across multiple institutions but LINKS to original sources.
Key semantic: NO data duplication - users are redirected to source institutions.
Data remains at partner institutions' platforms (single source of truth).
close_mappings:
- schema:SearchAction # The search functionality
related_mappings:
- ore:Aggregation # Related but crucially different
annotations:
data_storage_pattern: LINKING
causes_data_duplication: false
```
### Linking Properties from EDM
Use `edm:isShownAt` and `edm:isShownBy` to express links to source:
```yaml
slots:
is_shown_at:
slot_uri: edm:isShownAt
description: |
Unambiguous URL to the digital object on the provider's web site
in its full information context.
edm:isShownAt - "The URL of a web view of the object in full context."
This property LINKS to the source institution - no data duplication.
range: uri
is_shown_by:
slot_uri: edm:isShownBy
description: |
Direct URL to the object in best available resolution on provider's site.
edm:isShownBy - "The URL of the object itself (not the context page)."
range: uri
```
---
## 5. Complete Mapping Documentation Template
When creating or updating a class with ontology mappings:
```yaml
classes:
MyNewClass:
# === PRIMARY IDENTITY ===
class_uri: {prefix}:{ClassName} # The ontology class this IS
# === DESCRIPTION WITH ONTOLOGY REFERENCE ===
description: |
{Human-readable description of what this class represents}
{Ontology}: {class} - "{Definition from ontology documentation}"
Mapping rationale:
- Chosen because: {why this ontology class fits}
- Not using X because: {why alternatives were rejected}
# === SKOS-BASED MAPPINGS ===
exact_mappings:
- {prefix}:{EquivalentClass} # Same meaning, different vocabulary
close_mappings:
- {prefix}:{SimilarClass} # Very similar but not identical
narrow_mappings:
- {prefix}:{MoreSpecificClass} # External is broader than ours
broad_mappings:
- {prefix}:{MoreGeneralClass} # External is narrower than ours
related_mappings:
- {prefix}:{RelatedClass} # Conceptually related
# === OPTIONAL ANNOTATIONS ===
annotations:
ontology_source: "{Full name of source ontology}"
ontology_version: "{Version if applicable}"
mapping_confidence: "high|medium|low"
mapping_notes: "{Additional context}"
```
---
## 6. Validation Checklist
Before committing ontology mappings:
- [ ] `class_uri` / `slot_uri` points to a real URI in `data/ontology/` files
- [ ] Description includes ontology definition (quoted from source)
- [ ] Mapping rationale documented for non-obvious choices
- [ ] `exact_mappings` used ONLY for truly equivalent terms
- [ ] `close_mappings` documented with difference explanation
- [ ] All prefixes declared in schema's `prefixes:` block
- [ ] Prefixes resolve to valid ontology namespaces
---
## 7. Common Ontology Prefixes for Mappings
| Prefix | Namespace | Ontology | Use For |
|--------|-----------|----------|---------|
| `ore:` | `http://www.openarchives.org/ore/terms/` | OAI-ORE | Aggregation patterns |
| `edm:` | `http://www.europeana.eu/schemas/edm/` | Europeana Data Model | Cultural heritage aggregation |
| `dcat:` | `http://www.w3.org/ns/dcat#` | DCAT | Data catalogs, services |
| `rico:` | `https://www.ica.org/standards/RiC/ontology#` | Records in Contexts | Archival description |
| `crm:` | `http://www.cidoc-crm.org/cidoc-crm/` | CIDOC-CRM | Cultural heritage events |
| `schema:` | `http://schema.org/` | Schema.org | Web semantics |
| `skos:` | `http://www.w3.org/2004/02/skos/core#` | SKOS | Concepts, labels |
| `dcterms:` | `http://purl.org/dc/terms/` | Dublin Core | Metadata properties |
| `prov:` | `http://www.w3.org/ns/prov#` | PROV-O | Provenance |
| `org:` | `http://www.w3.org/ns/org#` | W3C Organization | Organizations |
| `foaf:` | `http://xmlns.com/foaf/0.1/` | FOAF | People, agents |
---
## See Also
- [LinkML Mappings Documentation](https://linkml.io/linkml-model/latest/docs/mappings/)
- [LinkML URIs and Mappings Guide](https://linkml.io/linkml/schemas/uris-and-mappings.html)
- [LinkML class_uri Reference](https://linkml.io/linkml-model/latest/docs/class_uri/)
- [LinkML slot_uri Reference](https://linkml.io/linkml-model/latest/docs/slot_uri/)
- Rule 1: Ontology Files Are Your Primary Reference
- Rule 38: Slot Centralization and Semantic URI Requirements
- Rule 42: No Ontology Prefixes in Slot Names
---
**Version**: 1.0.0
**Created**: 2026-01-12
**Author**: OpenCODE

View file

@ -0,0 +1,45 @@
# Rule: Polished Slot Storage Location
## Summary
Polished (refactored) canonical slot files MUST be stored in the parent `slots/` directory:
```
schemas/20251121/linkml/modules/slots/
```
They must **NOT** be stored in the `20260202_matang/` subdirectory.
## Rationale
The `new/` subdirectory contain **draft/unpolished** slot definitions that are pending review. Once a slot file has been polished (ontology-aligned, translated, cleaned), it graduates to the canonical `slots/` directory.
## Directory Structure
```
schemas/20251121/linkml/modules/slots/
├── *.yaml ← Polished canonical slot files go HERE
└── 20260202_matang/
├── *.yaml ← Draft/unpolished canonical slots (staging area)
└── new/
└── *.yaml ← Raw/draft slot definitions pending triage
```
## Rule
- When polishing a slot file, write the result to `schemas/20251121/linkml/modules/slots/{slot_name}.yaml`
- If the source file was in `20260202_matang/`, remove it from there after writing to `slots/`
- If the source file was in `20260202_matang/new/`, it should only be deleted after user confirmation of alias absorption (per the no-autonomous-alias-assignment rule)
- If a file already exists in `slots/` (i.e., it was previously polished in an earlier session), overwrite it in place
## Examples
**CORRECT:**
```
schemas/20251121/linkml/modules/slots/has_pattern.yaml ← polished file
```
**WRONG:**
```
schemas/20251121/linkml/modules/slots/20260202_matang/has_pattern.yaml ← should not be here after polishing
```

View file

@ -0,0 +1,32 @@
# Rule: Preserve Bespoke Slots Until Refactoring
**Identifier**: `preserve-bespoke-slots-until-refactoring`
**Severity**: **CRITICAL**
## Core Directive
**DO NOT remove or migrate "additional" bespoke slots during generic migration passes unless they are the specific target of the current task.**
## Context
When migrating a specific slot (e.g., `has_approval_date`), you may encounter other bespoke or legacy slots in the same class file (e.g., `innovation_budget`, `operating_budget`).
**YOU MUST**:
* ✅ Migrate ONLY the specific slot you were instructed to work on.
* ✅ Leave other bespoke slots exactly as they are.
* ✅ Focus strictly on the current migration target.
**YOU MUST NOT**:
* ❌ Proactively migrate "nearby" slots just because they look like they need refactoring.
* ❌ Remove slots that seem unused or redundant without specific instruction.
* ❌ "Clean up" the class file by removing legacy attributes.
## Rationale
Refactoring is a separate, planned phase. Mixing opportunistic refactoring with systematic slot migration increases the risk of regression and makes changes harder to review. "We will refactor those later."
## Workflow
1. **Identify Target**: Identify the specific slot(s) assigned for migration (from `slot_fixes.yaml` or user prompt).
2. **Execute Migration**: Apply changes ONLY for those slots.
3. **Ignore Others**: Do not touch other slots in the file, even if they violate other rules (like Rule 39 or Rule 53). Those will be handled in their own dedicated tasks.

View file

@ -0,0 +1,190 @@
# Rule 56: Semantic Consistency Over Simplicity - Always Execute slot_fixes.yaml Revisions
🚨 **CRITICAL**: When `slot_fixes.yaml` specifies a revision, agents MUST execute it. Perceived simplicity of the existing slot is NOT a valid reason to reject a migration.
## The Core Problem
Previous agents marked migrations as "NO MIGRATION NEEDED" or "WELL_STRUCTURED_NO_MIGRATION_NEEDED" citing reasons like:
- "simple enum appropriate"
- "would add unnecessary indirection"
- "already has proper slot_uri"
- "string range is sufficient for this use case"
- "well-structured, no migration needed"
- "current implementation is adequate"
**These judgments were INCORRECT.**
**Over 95% of entries in `slot_fixes.yaml` represent carefully curated migrations that MUST be executed.**
The revisions were manually designed based on:
- Ontology analysis (CIDOC-CRM, RiC-O, PROV-O, Schema.org alignment)
- Semantic correctness
- Pattern consistency (Rule 39: RiC-O style naming)
- Type/Types class hierarchy design (Rule 0b)
## Why Revisions MUST Be Executed
| Principle | Explanation |
|-----------|-------------|
| **Schema Consistency** | Ontology achieves semantic power through consistent patterns, not local optimizations |
| **LinkML Mapping Separation** | `slot_uri` handles external ontology alignment; slot structure handles internal consistency |
| **Single Responsibility Principle** | Predicates should have single, focused purposes |
| **Extensibility First** | Structured classes enable future extension even if current use is simple |
| **Curated Quality** | Revisions were manually designed with ontology expertise - trust them |
## Invalid Reasons to Reject Migrations
| Rejected Reason | Why It's Invalid |
|-----------------|------------------|
| "Already has proper slot_uri" | slot_uri is for external mapping; internal structure is separate concern |
| "Simple string/enum is sufficient" | Consistency and extensibility trump local simplicity |
| "Would add unnecessary indirection" | Indirection enables reuse and future extension |
| "Creating a class would over-engineer" | Ontology design favors class-based modeling |
| "Well-structured, no migration needed" | **If revision exists, migration IS needed** |
| "Current implementation is adequate" | Adequacy is not the standard; consistency is |
| "WELL_STRUCTURED_NO_MIGRATION_NEEDED" | **This note type is PROHIBITED** |
## Valid Reasons to PAUSE Migrations (Not Reject)
These warrant discussion with the user, NOT unilateral rejection:
1. **Semantic Conflict**: Proposed slot_uri contradicts the semantic intent of the slot
2. **Class Already Exists**: The target class exists under a different name (propose merge)
3. **Circular Dependency**: Migration would create import cycles
4. **Breaking External API**: Change would break published external consumers
**Even these reasons require escalation to the user** - agents should NOT mark as "NO MIGRATION NEEDED".
## The Key Insight
Agents confused "has good external mapping" with "needs no migration". These are **independent concerns**:
- **External mapping** (slot_uri, exact_mappings): How this slot relates to external ontologies
- **Internal structure** (range, class design): How this slot fits within the LinkML schema
A slot can have perfect external mappings AND still need structural migration for internal consistency.
## Correct Agent Behavior
```yaml
# When encountering a slot_fixes.yaml entry:
1. READ the revision specification completely
2. DO NOT evaluate whether migration is "necessary" - it IS necessary
3. CREATE the specified slots and classes
4. UPDATE the consuming class(es)
5. ARCHIVE the old slot
6. ADD processed section with status: true
# NEVER add notes like:
# - "NO MIGRATION NEEDED"
# - "WELL_STRUCTURED_NO_MIGRATION_NEEDED"
# - "Current implementation is sufficient"
# - "Would add unnecessary complexity"
# - "Simple enum appropriate"
```
## Handling Previous "NO MIGRATION NEEDED" Entries
Entries previously marked with notes like "WELL_STRUCTURED_NO_MIGRATION_NEEDED" should be:
1. **Re-processed**: The migration should be executed as specified in `revision`
2. **Notes replaced**: Replace the incorrect notes with actual migration documentation
3. **Status verified**: Ensure `status: true` reflects ACTUAL migration, not skipped migration
## Example - WRONG Agent Behavior
```yaml
# WRONG - Agent decided migration wasn't needed
- original_slot_id: https://nde.nl/ontology/hc/slot/cites_appendix
revision:
- label: is_or_was_listed_in
type: slot
- label: CITESAppendix
type: class
processed:
status: true # ← Marked complete but NOT actually migrated!
notes: "WELL_STRUCTURED_NO_MIGRATION_NEEDED: Already has proper slot_uri
and string range is sufficient for CITES appendix values."
```
## Example - CORRECT Agent Behavior
```yaml
# CORRECT - Agent executed the migration as specified
- original_slot_id: https://nde.nl/ontology/hc/slot/cites_appendix
revision:
- label: is_or_was_listed_in
type: slot
- label: CITESAppendix
type: class
processed:
status: true
timestamp: '2026-01-19T00:00:00Z'
session: session-2026-01-19-cites-appendix-migration
notes: 'Migrated 2026-01-19 per Rule 53/56. Created is_or_was_listed_in.yaml.
Created CITESAppendix.yaml class. Updated BiologicalObject.yaml.
Archived: modules/slots/archive/cites_appendix_archived_20260119.yaml.'
```
## Feedback Field
The `feedback` field in slot_fixes.yaml entries contains user corrections to agent mistakes. When feedback says things like:
- "I reject this!"
- "Conduct the migration"
- "Please conduct accordingly"
- "I altered the revision"
This means a previous agent incorrectly deferred the migration, and it MUST now be executed.
## Schema Consistency Examples
### Why "Simple URI is fine" is WRONG
```yaml
# WRONG - Agent judgment: "Simple URI is fine"
thumbnail_url:
range: uri
slot_uri: schema:thumbnailUrl
# CORRECT - Consistent with all media references
has_or_had_thumbnail:
range: Thumbnail # Thumbnail class with has_or_had_url → URL
```
**Rationale**: All media references (images, thumbnails, videos, documents) should use the same structural pattern.
### Why "Simple enum is appropriate" is WRONG
```yaml
# WRONG - "Simple enum is fine"
thinking_mode:
range: ThinkingModeEnum # enabled, disabled, interleaved
# CORRECT - Enables extension
has_or_had_mode:
range: ThinkingMode
# ThinkingMode can have: mode_type, confidence, effective_date, etc.
```
**Rationale**: Even if current use is simple, structured classes enable future extension without breaking changes.
## Summary
**Trust the revision. Execute the migration. Document the work.**
The `revision` key in `slot_fixes.yaml` represents carefully curated ontology decisions. Agents are **executors** of these decisions, **not evaluators**. The only acceptable output is a completed migration with proper documentation.
## Related Rules
- **Rule 53**: slot_fixes.yaml is AUTHORITATIVE - Full Slot Migration
- **Rule 55**: Broaden Generic Predicate Ranges Instead of Creating Bespoke Predicates
- **Rule 57**: The revision key in slot_fixes.yaml is IMMUTABLE
- **Rule 39**: RiC-O Temporal Naming Conventions
- **Rule 38**: Slot Centralization and Semantic URI Requirements
## Revision History
- 2026-01-19: Strengthened with explicit prohibition of "WELL_STRUCTURED_NO_MIGRATION_NEEDED" notes
- 2026-01-16: Created based on analysis of 51 feedback entries in slot_fixes.yaml

View file

@ -0,0 +1,317 @@
# Rule 38: Slot Centralization and Semantic URI Requirements
🚨 **CRITICAL**: All LinkML slots MUST be centralized in `model/symbolic/schema/modules/slots/` and MUST have semantically sound `slot_uri` predicates from base ontologies.
---
## 1. Slot Centralization is Mandatory
**Location**: All slot definitions MUST be in `model/symbolic/schema/modules/slots/`
**File Naming**: `{slot_name}.yaml` (snake_case)
**Import Pattern**: Classes import slots via relative imports:
```yaml
# In modules/classes/Collection.yaml
imports:
- ../slots/collection_name
- ../slots/collection_type_ref
- ../slots/parent_collection
```
### Why Centralization?
1. **UML Visualization**: The frontend's schema service loads slots from the database in which `modules/slots/` files are ingested to determine aggregation edges. Inline slots in class files are NOT properly parsed for visualization.
2. **Reusability**: Slots can be used by multiple classes without duplication.
3. **Semantic Consistency**: Single source of truth for slot semantics prevents drift.
4. **Maintainability**: Changes to slot semantics propagate automatically to all classes.
### Anti-Pattern: Inline Slot Definitions
```yaml
# ❌ WRONG - Slots defined inline in class file
classes:
Collection:
slots:
- collection_name
- parent_collection
slots: # ← This section in a class file is WRONG
collection_name:
range: string
```
```yaml
# ✅ CORRECT - Slots imported from centralized files
# In modules/classes/Collection.yaml
imports:
- ../slots/collection_name
- ../slots/parent_collection
classes:
Collection:
slots:
- collection_name
- parent_collection
```
---
## 2. Every Slot MUST Have `slot_uri`
**`slot_uri`** provides the semantic meaning of the slot in a linked data context. It maps your slot to a predicate from an established ontology. Do avoid adding external uri in case there are no exact mapping! In this common case, the slot_uri should be a self-reference using the 'hc' prefix.
### Required Slot File Structure
```yaml
# Global slot definition for {slot_name}
# Used by: {list of classes}
id: https://nde.nl/ontology/hc/slot/{slot_name}
name: {slot_name}
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
# Add ontology prefixes as needed
rico: https://www.ica.org/standards/RiC/ontology#
schema: http://schema.org/
skos: http://www.w3.org/2004/02/skos/core#
slots:
{slot_name}:
slot_uri: {ontology_prefix}:{predicate} # ← REQUIRED
description: |
Description of the slot's semantic meaning.
{OntologyName}: {predicate} - "{definition from ontology}"
range: {ClassName or primitive}
required: true/false
multivalued: true/false
# Optional mappings for additional semantic relationships
exact_mappings:
- schema:alternatePredicate
close_mappings:
- dct:relatedPredicate
examples:
- value: {example}
description: {explanation}
```
### Ontology Sources for `slot_uri`
Consult these base ontology files in `/data/ontology/`:
| Ontology | File | Namespace | Use Cases |
|----------|------|-----------|-----------|
| **RiC-O** | `RiC-O_1-1.rdf` | `rico:` | Archival records, record sets, custody |
| **CIDOC-CRM** | `CIDOC_CRM_v7.1.3.rdf` | `crm:` | Cultural heritage objects, events |
| **Schema.org** | `schemaorg.owl` | `schema:` | Web semantics, general properties |
| **SKOS** | `skos.rdf` | `skos:` | Labels, concepts, mappings |
| **Dublin Core** | `dublin_core_elements.rdf` | `dcterms:` | Metadata properties |
| **PROV-O** | `prov-o.ttl` | `prov:` | Provenance tracking |
| **PAV** | `pav.rdf` | `pav:` | Provenance, authoring, versioning |
| **TOOI** | `tooiont.ttl` | `tooi:` | Dutch government organizations |
| **CPOV** | `core-public-organisation-ap.ttl` | `cpov:` | EU public sector |
| **ORG** | `org.rdf` | `org:` | Organizations, units, roles |
| **FOAF** | `foaf.ttl` | `foaf:` | People, agents, social network |
| **GLEIF** | `gleif_base.ttl` | `gleif_base:` | Legal entities |
### Example: Correct Slot with `slot_uri`
```yaml
# modules/slots/preferred_label.yaml
id: https://nde.nl/ontology/hc/slot/preferred_label
name: preferred_label_slot
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
skos: http://www.w3.org/2004/02/skos/core#
schema: http://schema.org/
rdfs: http://www.w3.org/2000/01/rdf-schema#
slots:
preferred_label:
slot_uri: skos:prefLabel # ← REQUIRED
description: |
The primary display name for this entity.
SKOS: prefLabel - "A preferred lexical label for a resource."
This is the CANONICAL name - the standardized label accepted by the
entity itself for public representation.
range: string
required: false
exact_mappings:
- schema:name
- rdfs:label
examples:
- value: "Rijksmuseum"
description: Primary display name for the Rijksmuseum
```
---
## 3. Mappings Can Apply to Both Classes AND Slots
LinkML provides SKOS-based mapping predicates that work on **both classes and slots**:
| Mapping Type | Predicate | Use Case |
|--------------|-----------|----------|
| `exact_mappings` | `skos:exactMatch` | Identical meaning |
| `close_mappings` | `skos:closeMatch` | Very similar meaning |
| `related_mappings` | `skos:relatedMatch` | Semantically related |
| `narrow_mappings` | `skos:narrowMatch` | More specific |
| `broad_mappings` | `skos:broadMatch` | More general |
### When to Use Mappings vs. slot_uri
| Scenario | Use |
|----------|-----|
| **Primary semantic identity** | `slot_uri` (exactly one) |
| **Equivalent predicates in other ontologies** | `exact_mappings` (multiple allowed) |
| **Similar but not identical predicates** | `close_mappings` |
| **Related predicates with different scope** | `narrow_mappings` / `broad_mappings` |
### Example: Slot with Multiple Mappings
```yaml
slots:
website:
slot_uri: gleif_base:hasWebsite # Primary predicate
range: uri
description: |
Official website URL of the organization or entity.
gleif_base:hasWebsite - "A website associated with something"
exact_mappings:
- schema:url # Identical meaning in Schema.org
close_mappings:
- foaf:homepage # Similar but specifically "main" page
```
### Example: Class with Multiple Mappings
```yaml
classes:
Collection:
class_uri: rico:RecordSet # Primary class
exact_mappings:
- crm:E78_Curated_Holding # CIDOC-CRM equivalent
close_mappings:
- bf:Collection # BIBFRAME close match
narrow_mappings:
- edm:ProvidedCHO # Europeana (narrower - cultural heritage objects)
```
---
## 4. Workflow for Creating a New Slot
### Step 1: Search Base Ontologies
Before creating a slot, search for existing predicates:
```bash
# Search for relevant predicates
rg "website|homepage|url" /data/ontology/*.ttl /data/ontology/*.rdf /data/ontology/*.owl
# Check specific ontology
rg "rdfs:label|rdfs:comment" /data/ontology/schemaorg.owl | grep -i "name"
```
### Step 2: Document Ontology Alignment
In the slot file, document WHY you chose that predicate:
```yaml
slots:
source_url:
slot_uri: pav:retrievedFrom
description: |
URL of the web page from which data was retrieved.
pav:retrievedFrom - "The URI from which the resource was retrieved."
Chosen over:
- schema:url (too generic - refers to the entity's URL, not source)
- dct:source (refers to intellectual source, not retrieval location)
- prov:wasDerivedFrom (refers to entity derivation, not retrieval)
```
### Step 3: Create Centralized Slot File
```bash
# Create new slot file
touch schemas/20251121/linkml/modules/slots/new_slot_name.yaml
```
### Step 4: Update Manifest
Run the manifest regeneration script or manually add to manifest:
```bash
cd schemas/20251121/linkml
python3 scripts/regenerate_manifest.py
```
### Step 5: Import in Class Files
Add the import to classes that use this slot.
---
## 5. Validation Checklist
Before committing slot changes:
- [ ] Slot file is in `modules/slots/`
- [ ] Slot has `slot_uri` pointing to an established ontology predicate
- [ ] Predicate is from `data/ontology/` files or standard vocabularies
- [ ] Description includes ontology definition
- [ ] Rationale documented if multiple predicates were considered
- [ ] `exact_mappings`/`close_mappings` added for equivalent predicates
- [ ] Manifest updated to include new slot file
- [ ] Classes using the slot have been updated with import
- [ ] Frontend slot files synced: `frontend/public/schemas/20251121/linkml/modules/slots/`
---
## 6. Common Slot URI Mappings
| Slot Concept | Recommended `slot_uri` | Alternative Mappings |
|--------------|------------------------|---------------------|
| Preferred name | `skos:prefLabel` | `schema:name`, `rdfs:label` |
| Alternative names | `skos:altLabel` | `schema:alternateName` |
| Description | `dcterms:description` | `schema:description`, `rdfs:comment` |
| Identifier | `dcterms:identifier` | `schema:identifier` |
| Website URL | `gleif_base:hasWebsite` | `schema:url`, `foaf:homepage` |
| Source URL | `pav:retrievedFrom` | `prov:wasDerivedFrom` |
| Created date | `dcterms:created` | `schema:dateCreated`, `prov:generatedAtTime` |
| Modified date | `dcterms:modified` | `schema:dateModified` |
| Language | `schema:inLanguage` | `dcterms:language` |
| Part of | `dcterms:isPartOf` | `rico:isOrWasPartOf`, `schema:isPartOf` |
| Has part | `dcterms:hasPart` | `rico:hasOrHadPart`, `schema:hasPart` |
| Location | `schema:location` | `locn:address`, `crm:P53_has_former_or_current_location` |
| Start date | `schema:startDate` | `prov:startedAtTime`, `rico:hasBeginningDate` |
| End date | `schema:endDate` | `prov:endedAtTime`, `rico:hasEndDate` |
---
## See Also
- [LinkML slot_uri documentation](https://linkml.io/linkml-model/latest/docs/slot_uri/)
- [LinkML mappings documentation](https://linkml.io/linkml-model/latest/docs/mappings/)
- [LinkML URIs and Mappings guide](https://linkml.io/linkml/schemas/uris-and-mappings.html)
- Rule 1: Ontology Files Are Your Primary Reference
- Rule 0: LinkML Schemas Are the Single Source of Truth
---
**Version**: 1.0.0
**Created**: 2026-01-06
**Author**: OpenCODE

View file

@ -0,0 +1,29 @@
# Rule: Slot Fixes File is Authoritative
**Scope:** Schema Migration / Slot Fixes
**Description:**
The file `slot_fixes.yaml` is the **single authoritative source** for tracking slot migrations and fixes.
**Directives:**
1. **Authoritative Source:** Always read and update `slot_fixes.yaml`.
2. **Processed Status:** When a slot migration is completed (schema updated, data migrated), you MUST update the entry in `slot_fixes.yaml` with a `processed` block containing:
* `status: true`
* `date: 'YYYY-MM-DD'`
* `notes`: Brief description of what was done.
3. **NEVER DELETE:** You MUST NOT delete entries from `slot_fixes.yaml`. Even if a slot is removed from the schema, the record of its fix MUST remain in this file with `status: true`.
4. **Format Compliance:** New slots added during migration must follow proper LinkML format conventions and use `slot_uri` and mappings (`exact_mappings`, `close_mappings`) that reference **legitimate predicates and classes found in `/Users/kempersc/apps/glam/data/ontology/`**.
**Example of Processed Entry:**
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/has_old_slot
revision:
- label: has_new_slot
type: slot
- label: NewClass
type: class
processed:
status: true
date: '2026-01-27'
notes: Migrated to has_new_slot + NewClass. Old slot archived.
```

View file

@ -0,0 +1,169 @@
# Rule: slot_fixes.yaml Revision Key Immutability
## Status: CRITICAL
## Summary
The `revision` key in `slot_fixes.yaml` is **IMMUTABLE**. AI agents MUST follow revision specifications exactly and are NEVER permitted to modify the content of revision entries.
## The Authoritative Source
The file `slot_fixes.yaml` serves as the **curated migration specification** for all slot consolidations in the Heritage Custodian Ontology. Each entry's `revision` section was manually curated based on:
- Ontology analysis (CIDOC-CRM, RiC-O, PROV-O, Schema.org alignment)
- Semantic correctness
- Pattern consistency (Rule 39: RiC-O style naming)
- Type/Types class hierarchy design (Rule 0b)
## What Agents CAN Do
| Action | Permitted | Location |
|--------|-----------|----------|
| Add completion notes | ✅ YES | `processed.notes` |
| Update status | ✅ YES | `processed.status` |
| Add feedback responses | ✅ YES | `feedback.response` |
| Mark feedback as done | ✅ YES | `feedback.done` |
| Execute the migration per revision | ✅ YES | Class/slot files |
## What Agents CANNOT Do
| Action | Permitted | Reason |
|--------|-----------|--------|
| Modify `revision` content | ❌ NEVER | Authoritative specification |
| Substitute different slots | ❌ NEVER | Violates curated design |
| Skip revision components | ❌ NEVER | Incomplete migration |
| Add new revision items | ❌ NEVER | Requires human curation |
| Change revision labels | ❌ NEVER | Breaks semantic mapping |
| Reorder revision items | ❌ NEVER | `link_branch` dependencies |
## Structure of slot_fixes.yaml Entries
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/example_slot
original_slot_label: example_slot
revision: # ← IMMUTABLE - DO NOT MODIFY
- label: has_or_had_example # Generic slot to use
type: slot
- label: Example # Class for range
type: class
- label: has_or_had_attribute # Nested attribute (link_branch: 1)
type: slot
link_branch: 1
processed:
status: false # ← CAN UPDATE to true
notes: "" # ← CAN ADD notes here
feedback: # ← CAN ADD responses here
user: "Simon C. Kemper"
date: "2026-01-17"
message: "Migration incomplete"
done: false # ← CAN UPDATE to true
response: "" # ← CAN ADD response here
```
## Understanding `link_branch` in Revisions
The `link_branch` field indicates **nested class attributes**:
| Revision Item | Meaning |
|---------------|---------|
| Items **WITHOUT** `link_branch` | PRIMARY slot and class to create |
| Items **WITH** `link_branch: 1` | First attribute the primary class needs |
| Items **WITH** `link_branch: 2` | Second attribute the primary class needs |
**Example**:
```yaml
revision:
- label: has_or_had_quantity # PRIMARY SLOT
type: slot
- label: Quantity # PRIMARY CLASS
type: class
- label: has_or_had_measurement_unit # Quantity.has_or_had_measurement_unit
type: slot
link_branch: 1
- label: MeasureUnit # Range of branch 1 slot
type: class
link_branch: 1
```
## Migration Workflow
1. **READ** the `revision` section completely
2. **VERIFY** all referenced slots/classes exist (or create them)
3. **REMOVE** old slot from imports, slots list, and slot_usage in consuming classes
4. **ADD** new slot(s) and class import(s) per revision specification
5. **UPDATE** slot_usage to narrow range to specified class
6. **VALIDATE** with `linkml-lint` or `gen-owl`
7. **UPDATE** slot_fixes.yaml:
- Set `processed.status: true`
- Add completion note to `processed.notes`
- If feedback exists, set `feedback.done: true` and add `feedback.response`
## Anti-Patterns
### WRONG - Modifying Revision Content
```yaml
# Agent incorrectly "improves" the revision
revision:
- label: has_description # ❌ CHANGED from has_or_had_description
type: slot
- label: TextDescription # ❌ CHANGED from Description
type: class
```
### WRONG - Substituting Different Slots
```yaml
# Agent uses a different slot than specified
# Revision says: has_or_had_type + BindingType
# Agent uses: binding_classification + BindingClassification ❌ WRONG
```
### WRONG - Partial Migration
```yaml
# Agent only creates the slot, ignores the class
revision:
- label: has_or_had_type # ✅ Agent created this
type: slot
- label: BindingType # ❌ Agent ignored this
type: class
```
### CORRECT - Following Revision Exactly
```yaml
# Revision specifies:
revision:
- label: has_or_had_description
type: slot
- label: Description
type: class
# Agent creates/uses EXACTLY:
# 1. Import ../slots/has_or_had_description
# 2. Import ../classes/Description
# 3. slot_usage: has_or_had_description with range: Description
```
## Rationale
1. **Curated Quality**: Revisions were manually designed with ontology expertise
2. **Consistency**: Same patterns applied across all migrations
3. **Auditability**: Clear record of intended vs. actual changes
4. **Reversibility**: Original specifications preserved for review
5. **Trust**: Users can rely on revision specifications being stable
## Related Rules
- **Rule 53**: Full Slot Migration - slot_fixes.yaml is AUTHORITATIVE
- **Rule 56**: Semantic Consistency Over Simplicity
- **Rule 39**: Slot Naming Convention (RiC-O Style)
- **Rule 38**: Slot Centralization and Semantic URI Requirements
- **Rule 0b**: Type/Types File Naming Convention
## See Also
- `schemas/20251121/linkml/modules/slots/slot_fixes.yaml` - The authoritative file
- `.opencode/rules/full-slot-migration-rule.md` - Migration execution rules
- `.opencode/rules/semantic-consistency-over-simplicity.md` - Why revisions must be followed

View file

@ -0,0 +1,69 @@
# Rule: Slot Naming Convention (Current Style)
🚨 **CRITICAL**: New LinkML slot names MUST follow the current verb-first naming style used in active slot files under `modules/slots/`.
## Core Naming Rules
1. Use `snake_case`.
2. Prefer short, descriptive verb predicates as canonical names.
3. Keep names ontology-neutral (no ontology namespace prefixes in slot names).
4. Use singular nouns in object positions (including multivalued slots).
5. Keep temporal semantics in mappings/definitions when needed, not by forcing a legacy prefix.
## Preferred Patterns
### 1) Simple verb predicates (default)
Use a single verb when it clearly expresses the relation.
Examples from active slots:
- `accept`
- `contain`
- `catalogue`
- `exhibit`
### 2) Verb + particle/preposition when needed
Use compact phrasal forms when a preposition carries core meaning.
Examples:
- `belong_to`
- `located_in`
- `derived_from`
### 3) Symmetric or directional pair pattern
Use `<present>_or_<past_participle>` when both directions/states are intentionally modeled in one predicate label.
Examples:
- `contains_or_contained`
- `includes_or_included`
- `operates_or_operated`
## Legacy Compatibility
- For migrations, keep backward compatibility via `aliases` when renaming to current-style canonical names.
- Do not rename canonical slots opportunistically; follow migration plans and canonical-slot protection rules.
## Anti-Patterns
- ❌ `rico_has_or_had_holder` (ontology prefix in name)
- ❌ `collections` (plural noun predicate)
- ❌ `has_museum_visitor_count` (class-specific slot name)
- ❌ Creating new `has_or_had_*` names by default when a verb predicate is clearer
## Quick Checklist
- [ ] Is the canonical slot name verb-first and descriptive?
- [ ] Is it `snake_case`?
- [ ] Is the noun part singular?
- [ ] Is the name ontology-neutral?
- [ ] If renaming legacy slots, are aliases/migration constraints handled?
## See Also
- `.opencode/rules/archive/DEPRECATED-slot-naming-convention-rico-style.md`
- `.opencode/rules/no-ontology-prefix-in-slot-names.md`
- `.opencode/rules/slot-noun-singular-convention.md`
- `.opencode/rules/generic-slots-specific-classes.md`
- `.opencode/rules/canonical-slot-protection-rule.md`

View file

@ -0,0 +1,80 @@
# Rule: Slot Nouns Must Be Singular
🚨 **CRITICAL**: LinkML slot names MUST use singular nouns, even for multivalued slots. The `multivalued: true` property indicates cardinality, not the slot name.
## Rationale
1. **Predicate semantics**: Slots represent predicates/relationships. In RDF, `hasCollection` can have multiple objects without changing the predicate name.
2. **Consistency**: Singular names work for both single-valued and multivalued slots.
3. **Ontology alignment**: Standard ontologies use singular predicates (`skos:broader`, `org:hasMember`, `rico:hasOrHadHolder`).
4. **Readability**: `custodian.has_or_had_custodian_type` reads naturally as "custodian has (or had) custodian type".
## Correct Pattern
```yaml
slots:
has_or_had_custodian_type: # ✅ CORRECT - singular noun
slot_uri: org:classification
range: CustodianType
multivalued: true # Cardinality expressed here, not in name
has_or_had_collection: # ✅ CORRECT - singular noun
slot_uri: rico:hasOrHadPart
range: CustodianCollection
multivalued: true
has_or_had_member: # ✅ CORRECT - singular noun
slot_uri: org:hasMember
range: Custodian
multivalued: true
```
## Incorrect Pattern
```yaml
slots:
has_or_had_custodian_types: # ❌ WRONG - plural noun
multivalued: true
collections: # ❌ WRONG - plural noun
multivalued: true
members: # ❌ WRONG - plural noun
multivalued: true
```
## Migration Examples
| Old (Plural) | New (Singular) |
|--------------|----------------|
| `custodian_types` | `has_or_had_custodian_type` |
| `collections` | `has_or_had_collection` |
| `identifiers` | `identifier` |
| `alternative_names` | `alternative_name` |
| `staff_members` | `staff_member` |
## Exceptions
**Compound concepts** where the plural is part of the concept name itself:
- `archives_regionales` - French administrative term (proper noun)
- `united_states` - Geographic proper noun
**NOT exceptions** (still use singular):
- `has_or_had_identifier` not `has_or_had_identifiers` (even if institution has multiple)
- `broader_type` not `broader_types` (even if multiple broader types)
## Implementation
When creating or renaming slots:
1. Extract the noun from the slot name
2. Convert to singular form
3. Combine with relationship prefix (`has_or_had_`, `is_or_was_`, etc.)
4. Set `multivalued: true` if multiple values are expected
## See Also
- `.opencode/rules/slot-naming-convention-current-style.md` - Current slot naming patterns
- `.opencode/rules/slot-centralization-and-semantic-uri-rule.md` - Slot centralization requirements

View file

@ -0,0 +1,174 @@
# Rule 49: Slot Usage Minimization - No Redundant Overrides
## Summary
LinkML `slot_usage` entries MUST provide meaningful modifications to the generic slot definition. Redundant `slot_usage` entries that merely re-declare the same values as the generic slot MUST be removed.
## Background
### What is slot_usage?
In LinkML, [`slot_usage`](https://linkml.io/linkml-model/latest/docs/slot_usage/) allows a class to customize how an inherited slot behaves within that specific class context. It enables:
- Narrowing the `range` to a more specific type
- Adding class-specific `required`, `multivalued`, or `identifier` constraints
- Providing class-specific `description`, `examples`, or `pattern` overrides
- Adding class-specific semantic mappings (`exact_mappings`, `close_mappings`, etc.)
### The Problem
A code generation process created **874 redundant `slot_usage` entries** across **374 class files** that simply re-declare the same `range` and `inlined` values already defined in the generic slot:
```yaml
# In modules/slots/template_specificity.yaml (GENERIC DEFINITION)
slots:
template_specificity:
slot_uri: hc:templateSpecificity
range: TemplateSpecificityScores
inlined: true
# In modules/classes/AdministrativeOffice.yaml (REDUNDANT OVERRIDE)
slot_usage:
template_specificity:
range: TemplateSpecificityScores # Same as generic!
inlined: true # Same as generic!
```
This creates:
1. **Visual noise** in the schema viewer (slot_usage badge displayed when nothing is actually customized)
2. **Maintenance burden** (changes to generic slot must be mirrored in 374 files)
3. **Semantic confusion** (suggests customization where none exists)
## The Rule
### MUST Remove: Truly Redundant Overrides
A `slot_usage` entry is **truly redundant** and MUST be removed if:
1. **All properties match the generic slot definition exactly**
2. **No additional properties are added** (no extra `examples`, `description`, `required`, etc.)
```yaml
# REDUNDANT - Remove this entire slot_usage entry
slot_usage:
template_specificity:
range: TemplateSpecificityScores
inlined: true
```
### MAY Keep: Description-Only Modifications
A `slot_usage` entry that ONLY modifies the `description` by adding articles or context MAY be kept if it provides **semantic value** by referring to a specific entity rather than a general concept.
**Tolerated Example** (adds definiteness):
```yaml
# Generic slot
slots:
has_or_had_record_set:
description: Record sets associated with a custodian.
range: RecordSet
# Class-specific slot_usage - TOLERABLE
slot_usage:
has_or_had_record_set:
description: The record sets held by this archive. # "The" makes it definite
```
**Rationale**: "The record sets" (definite) vs "record sets" (indefinite) conveys that this class specifically requires/expects record sets, rather than merely allowing them. This is a **semantic distinction** in linguistic terms (definiteness marking).
### MUST Keep: Meaningful Modifications
A `slot_usage` entry MUST be kept if it provides ANY of the following:
| Modification Type | Example |
|-------------------|---------|
| **Range narrowing** | `range: MuseumCollection` (from generic `Collection`) |
| **Required constraint** | `required: true` (when generic is optional) |
| **Pattern override** | `pattern: "^NL-.*"` (Dutch ISIL codes only) |
| **Examples addition** | Class-specific examples not in generic |
| **Inlined change** | `inlined: true` when generic is `false` |
| **Identifier designation** | `identifier: true` for primary key |
## Decision Matrix
| Scenario | Action |
|----------|--------|
| All properties match generic exactly | **REMOVE** |
| Only `range` and/or `inlined` match generic | **REMOVE** |
| Only `description` differs by adding articles | **TOLERATE** (but consider removing) |
| `description` provides substantive new information | **KEEP** |
| Any other property modified | **KEEP** |
## Implementation
### Cleanup Script
Use the following to identify and remove redundant overrides:
```python
# scripts/cleanup_redundant_slot_usage.py
import yaml
import glob
SLOTS_TO_CHECK = ['template_specificity', 'specificity_annotation']
for class_file in glob.glob('schemas/20251121/linkml/modules/classes/*.yaml'):
with open(class_file) as f:
content = yaml.safe_load(f)
modified = False
for cls_name, cls_def in content.get('classes', {}).items():
slot_usage = cls_def.get('slot_usage', {})
for slot_name in SLOTS_TO_CHECK:
if slot_name in slot_usage:
override = slot_usage[slot_name]
# Check if redundant (only range/inlined matching generic)
if is_redundant(override, slot_name):
del slot_usage[slot_name]
modified = True
# Remove empty slot_usage
if not slot_usage:
del cls_def['slot_usage']
if modified:
with open(class_file, 'w') as f:
yaml.dump(content, f, allow_unicode=True, sort_keys=False)
```
### Validation
After cleanup, validate that:
1. `linkml-validate` passes for all schemas
2. Generated RDF/OWL output is unchanged (redundant overrides have no semantic effect)
3. Frontend slot viewer shows fewer `slot_usage` badges
## Frontend UX Implications
The frontend LinkML viewer should:
1. **Display "Slot Usage"** (with space, no underscore) instead of `slot_usage`
2. **Add tooltip** explaining what slot_usage means, linking to [LinkML documentation](https://linkml.io/linkml-model/latest/docs/slot_usage/)
3. **Only show badge** when `slot_usage` contains meaningful modifications
4. **Comparison view** should highlight actual differences, not redundant re-declarations
## Affected Slots
Current analysis found redundant overrides for:
| Slot | Redundant Overrides | Files Affected |
|------|---------------------|----------------|
| `template_specificity` | 873 | 374 |
| `specificity_annotation` | 874 | 374 |
## References
- [LinkML slot_usage documentation](https://linkml.io/linkml-model/latest/docs/slot_usage/)
- Rule 38: Slot Centralization and Semantic URI Requirements
- Rule 48: Class Files Must Not Define Inline Slots
## Version History
| Date | Change |
|------|--------|
| 2026-01-12 | Initial rule created after identifying 874 redundant slot_usage entries |

View file

@ -0,0 +1,401 @@
# Rule: Specificity Score Convention for LinkML Schema Annotations
**Version**: 1.0.0
**Created**: 2025-01-04
**Status**: Active
**Applies to**: `schemas/20251121/linkml/modules/classes/*.yaml`
---
## Rule Statement
Every class in the Heritage Custodian Ontology MUST have specificity score annotations to enable intelligent filtering for RAG retrieval and UML visualization.
---
## Annotation Schema
### Required Annotations
Every class YAML file MUST include these annotations:
```yaml
classes:
ClassName:
annotations:
specificity_score: 0.75 # Required: General specificity (0.0-1.0)
specificity_rationale: "..." # Required: Why this score was assigned
```
### Optional Annotations
Template-specific scores for context-aware filtering:
```yaml
classes:
ClassName:
annotations:
specificity_score: 0.75
specificity_rationale: "..."
template_specificity: # Optional: Template-specific scores
archive_search: 0.95
museum_search: 0.20
person_research: 0.30
```
---
## Score Semantics
### General Specificity Score
The `specificity_score` measures how **context-dependent** a class is:
| Score Range | Meaning | Example Classes |
|-------------|---------|-----------------|
| 0.00-0.20 | **Universal** - relevant in almost all contexts | `HeritageCustodian`, `CustodianName`, `Location` |
| 0.20-0.40 | **Broadly useful** - relevant in most contexts | `Collection`, `Identifier`, `GHCID` |
| 0.40-0.60 | **Moderately specific** - relevant in several contexts | `ChangeEvent`, `PersonProfile`, `DigitalPlatform` |
| 0.60-0.80 | **Fairly specific** - relevant in limited contexts | `Archive`, `Museum`, `Library`, `FindingAid` |
| 0.80-1.00 | **Highly specific** - relevant only in specialized contexts | `LinkedInConnectionExtraction`, `GHCIDHistoryEntry` |
**Key Insight**: Lower scores = MORE generally relevant (always useful in RAG); Higher scores = MORE specific (only useful in specialized queries).
---
### Template Specificity Scores
The `template_specificity` maps class relevance to 10 conversation templates:
| Template ID | Focus Area | Example High-Score Classes |
|-------------|------------|---------------------------|
| `archive_search` | Archives and archival holdings | `Archive`, `RecordSet`, `Fonds` |
| `museum_search` | Museums and exhibitions | `Museum`, `Gallery`, `Exhibition` |
| `library_search` | Libraries and catalogs | `Library`, `Catalog`, `BibliographicCollection` |
| `collection_discovery` | Collections and holdings | `Collection`, `Accession`, `Extent` |
| `person_research` | People and staff | `PersonProfile`, `Staff`, `Role` |
| `location_browse` | Geographic information | `Location`, `Address`, `GeoCoordinates` |
| `identifier_lookup` | Identifiers (ISIL, Wikidata) | `Identifier`, `GHCID`, `ISIL` |
| `organizational_change` | History and changes | `ChangeEvent`, `Founding`, `Merger` |
| `digital_platform` | Online resources | `DigitalPlatform`, `Website`, `API` |
| `general_heritage` | Fallback/general | Uses `specificity_score` directly |
---
## Examples
### Example 1: Universal Class (Low Specificity)
```yaml
# modules/classes/HeritageCustodian.yaml
classes:
HeritageCustodian:
description: >-
Base class for all heritage custodian institutions.
annotations:
specificity_score: 0.15
specificity_rationale: >-
Universal base class relevant in virtually all heritage contexts.
Every query about heritage institutions implicitly involves this class.
template_specificity:
archive_search: 0.65
museum_search: 0.65
library_search: 0.65
collection_discovery: 0.70
person_research: 0.70
location_browse: 0.75
identifier_lookup: 0.70
organizational_change: 0.75
digital_platform: 0.70
general_heritage: 0.15
```
### Example 2: Domain-Specific Class (High Specificity)
```yaml
# modules/classes/Archive.yaml
classes:
Archive:
is_a: HeritageCustodian
description: >-
An archive institution holding historical records and documents.
annotations:
specificity_score: 0.70
specificity_rationale: >-
Domain-specific institution type. Highly relevant for archival research
but not needed for museum or library queries.
template_specificity:
archive_search: 0.95
museum_search: 0.20
library_search: 0.25
collection_discovery: 0.75
person_research: 0.40
location_browse: 0.65
identifier_lookup: 0.50
organizational_change: 0.60
digital_platform: 0.45
general_heritage: 0.70
```
### Example 3: Technical Class (Very High Specificity)
```yaml
# modules/classes/LinkedInConnectionExtraction.yaml
classes:
LinkedInConnectionExtraction:
description: >-
Technical class for extracting LinkedIn connection data.
annotations:
specificity_score: 0.95
specificity_rationale: >-
Internal extraction class with no semantic significance for end users.
Only relevant when specifically researching data extraction processes.
template_specificity:
archive_search: 0.05
museum_search: 0.05
library_search: 0.05
collection_discovery: 0.05
person_research: 0.40
location_browse: 0.05
identifier_lookup: 0.10
organizational_change: 0.05
digital_platform: 0.15
general_heritage: 0.95
```
---
## Score Assignment Guidelines
### Factors That LOWER Specificity Score
| Factor | Impact | Example |
|--------|--------|---------|
| Base/parent class | -0.20 to -0.30 | `HeritageCustodian` is parent of all |
| Used in identifiers | -0.10 to -0.15 | `CustodianName` used in GHCID |
| Geographic component | -0.10 to -0.15 | `Location` needed for all institutions |
| Universal attribute | -0.10 to -0.15 | `Provenance` applies to all data |
### Factors That RAISE Specificity Score
| Factor | Impact | Example |
|--------|--------|---------|
| Institution type | +0.30 to +0.40 | `Archive`, `Museum`, `Library` |
| Technical/extraction | +0.30 to +0.40 | `LinkedInConnectionExtraction` |
| Event subtype | +0.20 to +0.30 | `Merger`, `Founding`, `Closure` |
| Domain terminology | +0.15 to +0.25 | `Fonds`, `FindingAid`, `RecordSet` |
### Cross-Class Consistency Rules
1. **Inheritance**: Child classes should have equal or higher specificity than parents
2. **Siblings**: Classes at same hierarchy level should have similar base scores
3. **Competing types**: Institution types should reduce each other's template scores
```yaml
# CORRECT: Archive (0.70) inherits from HeritageCustodian (0.15)
Archive:
is_a: HeritageCustodian # Parent: 0.15
annotations:
specificity_score: 0.70 # Child: 0.70 >= 0.15 ✓
# WRONG: Child less specific than parent
Archive:
is_a: HeritageCustodian # Parent: 0.15
annotations:
specificity_score: 0.10 # Child: 0.10 < 0.15
```
---
## Validation Rules
### Required Validations
1. **Range Check**: `0.0 <= specificity_score <= 1.0`
2. **Rationale Present**: `specificity_rationale` must not be empty
3. **Inheritance Consistency**: Child score >= parent score
4. **Template Score Range**: All template scores must be 0.0-1.0
### Recommended Validations
1. **No Orphan Scores**: Every class should have annotations (warn if missing)
2. **Score Distribution**: Flag if >50% of classes have same score (lack of differentiation)
3. **Template Coverage**: Warn if template_specificity omits common templates
### Validation Script
```python
# scripts/validate_specificity_scores.py
from linkml_runtime import SchemaView
from pathlib import Path
import sys
REQUIRED_TEMPLATES = [
"archive_search", "museum_search", "library_search",
"collection_discovery", "person_research", "location_browse",
"identifier_lookup", "organizational_change", "digital_platform",
"general_heritage"
]
def validate_specificity_scores(schema_path: Path) -> list[str]:
"""Validate all specificity score annotations."""
errors = []
schema = SchemaView(str(schema_path))
for class_name in schema.all_classes():
cls = schema.get_class(class_name)
# Check required annotations
score = cls.annotations.get("specificity_score")
rationale = cls.annotations.get("specificity_rationale")
if score is None:
errors.append(f"{class_name}: Missing specificity_score")
continue
# Validate score range
try:
score_val = float(score.value)
if not 0.0 <= score_val <= 1.0:
errors.append(f"{class_name}: Score {score_val} out of range [0.0, 1.0]")
except (ValueError, TypeError):
errors.append(f"{class_name}: Invalid score value: {score.value}")
# Check rationale
if rationale is None or not rationale.value.strip():
errors.append(f"{class_name}: Missing or empty specificity_rationale")
# Check inheritance consistency
if cls.is_a:
parent = schema.get_class(cls.is_a)
parent_score = parent.annotations.get("specificity_score")
if parent_score and float(score.value) < float(parent_score.value):
errors.append(
f"{class_name}: Score {score.value} < parent {cls.is_a} score {parent_score.value}"
)
return errors
if __name__ == "__main__":
schema_path = Path("schemas/20251121/linkml/01_custodian_name.yaml")
errors = validate_specificity_scores(schema_path)
if errors:
print("Validation errors:")
for error in errors:
print(f" - {error}")
sys.exit(1)
else:
print("All specificity scores valid!")
sys.exit(0)
```
---
## Anti-Patterns
### What NOT to Do
| Anti-Pattern | Why It's Wrong | Correct Approach |
|--------------|----------------|------------------|
| Score without rationale | No audit trail for decisions | Always include rationale |
| All scores = 0.5 | No differentiation, useless for filtering | Differentiate based on semantics |
| Child < parent score | Violates specificity inheritance | Child should be equal or more specific |
| Template score > 1.0 | Invalid score value | Keep all scores in [0.0, 1.0] |
| Empty rationale | Fails validation, no documentation | Write meaningful rationale |
### Example of Incorrect Annotation
```yaml
# WRONG - Multiple issues
classes:
Archive:
annotations:
specificity_score: 1.5 # Out of range!
specificity_rationale: "" # Empty rationale!
template_specificity:
archive_search: 0.95
# Missing other templates - incomplete coverage
```
### Example of Correct Annotation
```yaml
# CORRECT
classes:
Archive:
annotations:
specificity_score: 0.70
specificity_rationale: >-
Domain-specific institution type for archives. Highly relevant
for archival research queries but less useful for museum or
library-focused questions.
template_specificity:
archive_search: 0.95
museum_search: 0.20
library_search: 0.25
collection_discovery: 0.75
person_research: 0.40
location_browse: 0.65
identifier_lookup: 0.50
organizational_change: 0.60
digital_platform: 0.45
general_heritage: 0.70
```
---
## Migration Checklist
When adding specificity scores to existing classes:
### Phase 1: Assessment
- [ ] Count classes without annotations
- [ ] Identify class hierarchy (parents → children order)
- [ ] Review existing descriptions for scoring hints
### Phase 2: Annotation
- [ ] Start with root classes (lowest specificity)
- [ ] Work down hierarchy (increasing specificity)
- [ ] Assign template scores based on domain alignment
- [ ] Write rationale explaining score decisions
### Phase 3: Validation
- [ ] Run validation script
- [ ] Check inheritance consistency
- [ ] Verify score distribution (not all same value)
- [ ] Review edge cases (technical classes, mixins)
### Phase 4: Documentation
- [ ] Update class count in plan documents
- [ ] Document any scoring decisions that were difficult
- [ ] Create PR with all changes
---
## Related Rules
- **Rule 0**: LinkML Schemas Are the Single Source of Truth
- **Rule 4**: Technical Classes Are Excluded from Visualizations
- **Rule 13**: Custodian Type Annotations on LinkML Schema Elements
---
## References
- `docs/plan/specificity_score/README.md` - System overview
- `docs/plan/specificity_score/04-prompt-conversation-templates.md` - Template definitions
- `docs/plan/specificity_score/06-uml-visualization.md` - UML filtering integration
---
## Changelog
| Date | Version | Change |
|------|---------|--------|
| 2025-01-04 | 1.0.0 | Initial rule created for specificity score system |

View file

@ -0,0 +1,223 @@
# Rule: LinkML Type/Types File Naming Convention
**Version**: 1.0.0
**Created**: 2025-01-04
**Status**: Active
**Applies to**: `schemas/20251121/linkml/modules/classes/`
---
## Rule Statement
When creating class hierarchies that replace enums in LinkML schemas, follow the **Type/Types** naming pattern to clearly distinguish abstract base classes from their concrete subclasses.
---
## Pattern Definition
| File Name Pattern | Purpose | Contains |
|-------------------|---------|----------|
| `[Entity]Type.yaml` (singular) | Abstract base class | Single abstract class defining the type taxonomy |
| `[Entity]Types.yaml` (plural) | Concrete subclasses | All concrete subclasses inheriting from the base |
---
## Class Naming Convention
🚨 **CRITICAL**: Follow these naming rules for classes within the files:
1. **Abstract Base Class** (`[Entity]Type.yaml`):
* **MUST** end with `Type` suffix.
* *Example*: `DigitalPlatformType`, `WarehouseType`.
2. **Concrete Subclasses** (`[Entity]Types.yaml`):
* **MUST NOT** end with `Type` suffix.
* Use the natural entity name.
* *Example*: `DigitalLibrary` (✅), `CentralDepot` (✅).
* *Incorrect*: `DigitalLibraryType` (❌), `CentralDepotType` (❌).
**Rationale**: The file context (`WarehouseTypes.yaml`) already establishes these are types. Repeating "Type" in the class name is redundant and makes the class name less natural when used as an object instance (e.g., "This object is a CentralDepot").
---
## Examples
### Current Implementations
| Base Class File | Subclasses File | Subclass Count | Description |
|-----------------|-----------------|----------------|-------------|
| `DigitalPlatformType.yaml` | `DigitalPlatformTypes.yaml` | 69 | Digital platform type taxonomy |
| `WebPortalType.yaml` | `WebPortalTypes.yaml` | ~15 | Web portal type taxonomy |
| `CustodianType.yaml` | `CustodianTypes.yaml` | 19 | Heritage custodian type taxonomy (GLAMORCUBESFIXPHDNT) |
| `DataServiceEndpointType.yaml` | `DataServiceEndpointTypes.yaml` | 7 | API/data service endpoint types |
### File Structure Example
```
modules/classes/
├── DigitalPlatformType.yaml # Abstract base class
├── DigitalPlatformTypes.yaml # 69 concrete subclasses
├── WebPortalType.yaml # Abstract base class
├── WebPortalTypes.yaml # ~15 concrete subclasses
├── CustodianType.yaml # Abstract base class
└── CustodianTypes.yaml # 19 concrete subclasses
```
---
## Import Pattern
The subclasses file MUST import the base class file:
```yaml
# In DigitalPlatformTypes.yaml (subclasses file)
id: https://w3id.org/heritage-custodian/linkml/digital_platform_types
name: digital_platform_types
imports:
- linkml:types
- ./DigitalPlatformType # Import base class (singular)
classes:
DigitalLibrary:
is_a: DigitalPlatformType # Inherit from base
description: >-
A digital library platform providing access to digitized collections.
class_uri: schema:DigitalDocument
DigitalArchive:
is_a: DigitalPlatformType
description: >-
A digital archive for born-digital or digitized archival materials.
```
---
## Slot Range Pattern
When other classes reference the type taxonomy, use the **base class** (singular) as the range:
```yaml
# In DigitalPlatform.yaml
imports:
- ./DigitalPlatformType # Import base class for range
- ./DigitalPlatformTypes # Import subclasses for validation
classes:
DigitalPlatform:
slots:
- platform_type
slot_usage:
platform_type:
range: DigitalPlatformType # Use base class as range
description: >-
The type of digital platform. Value must be one of the
concrete subclasses defined in DigitalPlatformTypes.
```
---
## Anti-Patterns
### What NOT to Do
| Anti-Pattern | Why It's Wrong | Correct Alternative |
|--------------|----------------|---------------------|
| `DigitalPlatformTypeBase.yaml` | "Base" suffix is redundant; singular "Type" already implies base class | `DigitalPlatformType.yaml` |
| `DigitalPlatformTypeClasses.yaml` | "Classes" is less intuitive than "Types" for a type taxonomy | `DigitalPlatformTypes.yaml` |
| All types in single file | Large files are hard to navigate; separation clarifies architecture | Split into Type.yaml + Types.yaml |
| `DigitalPlatformEnum.yaml` | Enums lack extensibility; class hierarchies are preferred | Use class hierarchy pattern |
| `CentralDepotType` (Class Name) | Redundant "Type" suffix on concrete subclass | `CentralDepot` |
### Example of Incorrect Naming
```yaml
# WRONG - Don't use "Base" suffix
# File: DigitalPlatformTypeBase.yaml
classes:
DigitalPlatformTypeBase: # Redundant "Base"
abstract: true
```
```yaml
# CORRECT - Use singular "Type"
# File: DigitalPlatformType.yaml
classes:
DigitalPlatformType: # Clean, clear naming
abstract: true
```
---
## Rationale
1. **Clarity**: "Type" (singular) = one abstract concept; "Types" (plural) = many concrete implementations
2. **Discoverability**: Related files appear adjacent in alphabetical directory listings
3. **Consistency**: Follows established pattern across entire schema
4. **Semantics**: Mirrors natural language ("a platform type" vs "the platform types")
5. **Scalability**: Easy to add new types without modifying base class file
---
## Migration Checklist
When renaming existing files to follow this convention:
### Pre-Migration
- [ ] Identify all files referencing the old name
- [ ] Create backup or ensure version control is clean
- [ ] Document the old → new name mapping
### File Rename
- [ ] Rename file: `[Entity]TypeBase.yaml``[Entity]Type.yaml`
- [ ] Update `id:` field in renamed file
- [ ] Update `name:` field in renamed file
- [ ] Update class name inside the file
- [ ] Update all internal documentation references
### Update References
- [ ] Update imports in `[Entity]Types.yaml` (subclasses file)
- [ ] Update `is_a:` in all subclasses
- [ ] Update imports in consuming classes (e.g., `DigitalPlatform.yaml`)
- [ ] Update `range:` in slot definitions
- [ ] Update any `slot_usage:` references
### Documentation
- [ ] Update AGENTS.md if convention is documented there
- [ ] Update any design documents
- [ ] Add migration note to changelog
### Verification
```bash
# Verify no references to old name remain
grep -r "OldClassName" schemas/20251121/linkml/
# Verify new file exists
ls -la schemas/20251121/linkml/modules/classes/NewClassName.yaml
# Verify old file is removed
ls -la schemas/20251121/linkml/modules/classes/OldClassName.yaml # Should fail
# Validate schema
linkml-validate schemas/20251121/linkml/01_custodian_name.yaml
```
---
## Related Rules
- **Rule 0**: LinkML Schemas Are the Single Source of Truth
- **Rule 9**: Enum-to-Class Promotion - Single Source of Truth
---
## Changelog
| Date | Version | Change |
|------|---------|--------|
| 2025-01-04 | 1.0.0 | Initial rule created after DigitalPlatformType refactoring |

View file

@ -0,0 +1,332 @@
# Rule: LinkML "Types" Classes Define SPARQL Template Variables
**Created**: 2025-01-08
**Status**: Active
**Applies to**: SPARQL template design, RAG pipeline slot extraction
## Core Principle
LinkML classes following the `*Type` / `*Types` naming pattern (Rule 0b) serve as the **single source of truth** for valid values in SPARQL template slot variables.
When designing SPARQL templates, **extract variables from the schema** rather than hardcoding values. This enables:
- **Flexibility**: Same template works across all institution types
- **Extensibility**: Adding new types to schema automatically extends templates
- **Consistency**: Variable values always align with ontology
- **Multilingual support**: Type labels in multiple languages available from schema
## Template Variable Sources
### 1. Institution Type Variable (`institution_type`)
**Schema Source**: `CustodianType` abstract class and its 19 subclasses
| Subclass | Code | Description |
|----------|------|-------------|
| `ArchiveOrganizationType` | A | Archives |
| `BioCustodianType` | B | Botanical gardens, zoos |
| `CommercialOrganizationType` | C | Corporations |
| `DigitalPlatformType` | D | Digital platforms |
| `EducationProviderType` | E | Universities, schools |
| `FeatureCustodianType` | F | Geographic features |
| `GalleryType` | G | Art galleries |
| `HolySacredSiteType` | H | Religious sites |
| `IntangibleHeritageGroupType` | I | Folklore organizations |
| `LibraryType` | L | Libraries |
| `MuseumType` | M | Museums |
| `NonProfitType` | N | NGOs |
| `OfficialInstitutionType` | O | Government agencies |
| `PersonalCollectionType` | P | Private collectors |
| `ResearchOrganizationType` | R | Research centers |
| `HeritageSocietyType` | S | Historical societies |
| `TasteScentHeritageType` | T | Culinary heritage |
| `UnspecifiedType` | U | Unknown |
| `MixedCustodianType` | X | Multiple types |
**Template Slot Definition**:
```yaml
slots:
institution_type:
type: institution_type
required: true
schema_source: "modules/classes/CustodianType.yaml"
# Valid values derived from CustodianType subclasses
```
### 2. Geographic Scope Variable (`location`)
Geographic scope is a **hierarchical variable** with three levels:
| Level | Schema Source | SPARQL Property | Example |
|-------|---------------|-----------------|---------|
| Country | ISO 3166-1 alpha-2 | `hc:countryCode` | NL, DE, BE |
| Subregion | ISO 3166-2 | `hc:subregionCode` | NL-NH, DE-BY |
| Settlement | GeoNames | `hc:settlementName` | Amsterdam, Berlin |
**Template Slot Definition**:
```yaml
slots:
location:
type: location
required: true
schema_source:
- "modules/enums/CountryCodeEnum.yaml" (if exists)
- "data/reference/geonames.db"
resolution_order: [settlement, subregion, country]
# SlotExtractor detects which level user specified
```
### 3. Digital Platform Type Variable (`platform_type`)
**Schema Source**: `DigitalPlatformType` abstract class and 69+ subclasses in `DigitalPlatformTypes.yaml`
Categories include:
- REPOSITORY: DigitalLibrary, DigitalArchivePlatform, OpenAccessRepository
- AGGREGATOR: Europeana-type aggregators, BibliographicDatabasePlatform
- DISCOVERY: WebPortal, OnlineDatabase, OpenDataPortal
- VIRTUAL_HERITAGE: VirtualMuseum, VirtualLibrary, OnlineArtGallery
- RESEARCH: DisciplinaryRepository, PrePrintServer, GenealogyDatabase
- ...and many more
**Template Slot Definition**:
```yaml
slots:
platform_type:
type: platform_type
required: false
schema_source: "modules/classes/DigitalPlatformTypes.yaml"
```
## Template Design Pattern
### Before (Hardcoded - WRONG)
```yaml
# Separate templates for each institution type - DO NOT DO THIS
templates:
count_museums_in_region:
sparql: |
SELECT (COUNT(?s) AS ?count) WHERE {
?s hc:institutionType "M" ;
hc:subregionCode "{{ region }}" .
}
count_archives_in_region:
sparql: |
SELECT (COUNT(?s) AS ?count) WHERE {
?s hc:institutionType "A" ;
hc:subregionCode "{{ region }}" .
}
```
### After (Parameterized - CORRECT)
```yaml
# Single template with institution_type as variable
templates:
count_institutions_by_type_location:
description: "Count heritage institutions by type and location"
slots:
institution_type:
type: institution_type
required: true
schema_source: "modules/classes/CustodianType.yaml"
location:
type: location
required: true
resolution_order: [settlement, subregion, country]
# Multiple SPARQL variants based on location resolution
sparql_template: |
SELECT (COUNT(DISTINCT ?institution) AS ?count) WHERE {
?institution a hcc:Custodian ;
hc:institutionType "{{ institution_type }}" ;
hc:settlementName "{{ location }}" .
}
sparql_template_region: |
SELECT (COUNT(DISTINCT ?institution) AS ?count) WHERE {
?institution a hcc:Custodian ;
hc:institutionType "{{ institution_type }}" ;
hc:subregionCode "{{ location }}" .
}
sparql_template_country: |
SELECT (COUNT(DISTINCT ?institution) AS ?count) WHERE {
?institution a hcc:Custodian ;
hc:institutionType "{{ institution_type }}" ;
hc:countryCode "{{ location }}" .
}
```
## SlotExtractor Responsibilities
The SlotExtractor module must:
1. **Detect institution type** from user query:
- "musea" → M (Dutch plural)
- "archives" → A (English)
- "bibliotheken" → L (Dutch)
- Use synonyms from `_slot_types.institution_type.synonyms`
2. **Detect location level** from user query:
- "Amsterdam" → settlement level → use `sparql_template`
- "Noord-Holland" → subregion level → use `sparql_template_region`
- "Nederland" → country level → use `sparql_template_country`
3. **Normalize values** to schema-compliant codes:
- "Noord-Holland" → "NL-NH"
- "museum" → "M"
## Dynamic Label Resolution (NO HARDCODING)
**CRITICAL**: Labels MUST be resolved at runtime from schema/reference files, NOT hardcoded in templates or code.
### Institution Type Labels
The `CustodianType` classes contain multilingual labels via `type_label` slot:
```yaml
MuseumType:
type_label:
- "Museum"@en
- "museum"@nl
- "Museum"@de
- "museo"@es
```
**Label Resolution Chain**:
1. Load `CustodianType.yaml` and subclass files
2. Parse `type_label` slot for each type code (M, L, A, etc.)
3. Build runtime label dictionary keyed by code + language
### Geographic Labels
Subregion/settlement names come from **reference data files**, not hardcoded:
```yaml
label_sources:
- "data/reference/iso_3166_2_{country}.json" # e.g., iso_3166_2_nl.json
- "data/reference/geonames.db" # GeoNames database
- "data/reference/admin1CodesASCII.txt" # GeoNames fallback
```
**Example**: `iso_3166_2_nl.json` contains:
```json
{
"provinces": {
"Noord-Holland": "NH",
"Zuid-Holland": "ZH",
"North Holland": "NH" // English synonym
}
}
```
### SlotExtractor Label Loading
```python
class SlotExtractor:
def __init__(self, schema_path: str, reference_path: str):
# Load institution type labels from schema
self.type_labels = self._load_custodian_type_labels(schema_path)
# Load geographic labels from reference files
self.subregion_labels = self._load_subregion_labels(reference_path)
def _load_custodian_type_labels(self, schema_path: str) -> dict:
"""Load multilingual labels from CustodianType schema files."""
# Parse YAML, extract type_label slots
# Return: {"M": {"nl": "musea", "en": "museums"}, ...}
def _load_subregion_labels(self, reference_path: str) -> dict:
"""Load subregion labels from ISO 3166-2 JSON files."""
# Load iso_3166_2_nl.json, iso_3166_2_de.json, etc.
# Return: {"NL-NH": {"nl": "Noord-Holland", "en": "North Holland"}, ...}
```
### UI Template Interpolation
```yaml
ui_template:
nl: "Er zijn {{ count }} {{ institution_type_nl }} in {{ location }}."
en: "There are {{ count }} {{ institution_type_en }} in {{ location }}."
```
The RAG pipeline populates `institution_type_nl` / `institution_type_en` from dynamically loaded labels:
```python
# At runtime, NOT hardcoded
template_context["institution_type_nl"] = slot_extractor.type_labels[type_code]["nl"]
template_context["institution_type_en"] = slot_extractor.type_labels[type_code]["en"]
```
## Adding New Types
When the schema gains new institution types:
1. **No template changes needed** - parameterized templates automatically support new types
2. **Update synonyms** in `_slot_types.institution_type.synonyms` for NLP recognition
3. **Labels auto-discovered** from schema files - no code changes needed
## Anti-Patterns (FORBIDDEN)
### Hardcoded Labels in Templates
```yaml
# WRONG - Hardcoded labels
labels:
NL-NH: {nl: "Noord-Holland", en: "North Holland"}
NL-ZH: {nl: "Zuid-Holland", en: "South Holland"}
```
```python
# WRONG - Hardcoded labels in code
INSTITUTION_TYPE_LABELS_NL = {
"M": "musea", "L": "bibliotheken", ...
}
```
### Correct Approach
```yaml
# CORRECT - Reference to schema/data source
label_sources:
- "schemas/20251121/linkml/modules/classes/CustodianType.yaml"
- "data/reference/iso_3166_2_{country}.json"
```
```python
# CORRECT - Load labels at runtime
type_labels = load_labels_from_schema("CustodianType.yaml")
region_labels = load_labels_from_reference("iso_3166_2_nl.json")
```
**Why?**
1. **Single source of truth** - Labels defined once in schema/reference files
2. **Automatic sync** - Schema changes automatically propagate to UI
3. **Extensibility** - Adding new countries/types doesn't require code changes
4. **Multilingual** - All language variants come from same source
## Validation
Templates MUST validate slot values against schema:
```python
def validate_institution_type(value: str) -> bool:
"""Validate institution_type against CustodianType schema."""
valid_codes = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I',
'L', 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'X']
return value in valid_codes
```
## Related Rules
- **Rule 0b**: Type/Types file naming convention
- **Rule 13**: Custodian type annotations on LinkML schema elements
- **Rule 37**: Specificity score annotations for template filtering
## References
- Schema: `schemas/20251121/linkml/modules/classes/CustodianType.yaml`
- Types: `schemas/20251121/linkml/modules/classes/*Types.yaml`
- Enums: `schemas/20251121/linkml/modules/enums/InstitutionTypeCodeEnum.yaml`
- Templates: `data/sparql_templates.yaml`

View file

@ -0,0 +1,323 @@
# Rule: Verified Ontology Mapping Requirements
## Overview
All LinkML slot files MUST include ontology mappings that are **verified against the actual ontology files** in `data/ontology/`. Never use hallucinated or assumed ontology terms.
---
## 1. Source Ontology Files
The following ontology files are available for verification:
| Prefix | Namespace | File | Key Properties |
|--------|-----------|------|----------------|
| `crm:` | `http://www.cidoc-crm.org/cidoc-crm/` | `CIDOC_CRM_v7.1.3.rdf` | P1, P2, P22, P23, P70, P82, etc. |
| `rico:` | `https://www.ica.org/standards/RiC/ontology#` | `RiC-O_1-1.rdf` | hasOrHadHolder, isOrWasPartOf, etc. |
| `prov:` | `http://www.w3.org/ns/prov#` | `prov.ttl` | wasInfluencedBy, wasDerivedFrom, used, etc. |
| `schema:` | `http://schema.org/` | `schemaorg.owl` | url, name, description, etc. |
| `dcterms:` | `http://purl.org/dc/terms/` | `dcterms.rdf` | format, rights, source, etc. |
| `skos:` | `http://www.w3.org/2004/02/skos/core#` | `skos.rdf` | prefLabel, notation, inScheme, etc. |
| `foaf:` | `http://xmlns.com/foaf/0.1/` | `foaf.ttl` | page, homepage, name, etc. |
| `dcat:` | `http://www.w3.org/ns/dcat#` | `dcat3.ttl` | mediaType, downloadURL, etc. |
| `time:` | `http://www.w3.org/2006/time#` | `time.ttl` | hasBeginning, hasEnd, etc. |
| `org:` | `http://www.w3.org/ns/org#` | `org.rdf` | siteOf, hasSite, subOrganizationOf, etc. |
| `sosa:` | `http://www.w3.org/ns/sosa/` | `sosa.ttl` | madeBySensor, observes, etc. |
---
## 2. Required Header Documentation
Every slot file MUST include a header comment block with an ontology alignment table:
```yaml
# ==============================================================================
# LinkML Slot Definition: {slot_name}
# ==============================================================================
# {Brief description - one line}
#
# ONTOLOGY ALIGNMENT (verified against data/ontology/):
#
# | Ontology | Property | File/Line | Mapping | Notes |
# |---------------|-----------------------|----------------------|---------|------------------------------------|
# | **PROV-O** | `prov:used` | prov.ttl:1046-1057 | exact | Entity used by activity |
# | **PROV-O** | `prov:wasInfluencedBy`| prov.ttl:1099-1121 | broad | Parent property (subPropertyOf) |
#
# HIERARCHY: prov:used rdfs:subPropertyOf prov:wasInfluencedBy (line 1046)
#
# CREATED: YYYY-MM-DD
# UPDATED: YYYY-MM-DD - Description of changes
# ==============================================================================
```
---
## 3. Mapping Types
Use the correct mapping type based on semantic relationship:
| Mapping Type | Usage | Example |
|--------------|-------|---------|
| `slot_uri` | Primary RDF predicate for this slot | `slot_uri: prov:used` |
| `exact_mappings` | Semantically equivalent properties | `- schema:dateRetrieved` |
| `close_mappings` | Very similar but slightly different semantics | `- prov:wasGeneratedBy` |
| `broad_mappings` | Parent/broader properties (slot is subPropertyOf these) | `- prov:wasInfluencedBy` |
| `narrow_mappings` | Child/narrower properties (these are subPropertyOf slot) | `- prov:qualifiedUsage` |
| `related_mappings` | Conceptually related but different scope | `- dcterms:source` |
---
## 4. Hierarchy Discovery Process
### Step 1: Search for subPropertyOf relationships
```bash
# Find if our property is subPropertyOf something (-> broad_mapping)
grep -n "OUR_PROPERTY.*subPropertyOf\|subPropertyOf.*OUR_PROPERTY" data/ontology/*.ttl
# Find properties that are subPropertyOf our property (-> narrow_mappings)
grep -n "subPropertyOf.*OUR_PROPERTY" data/ontology/*.rdf
```
### Step 2: Document the hierarchy
When you find a hierarchy, document it in:
1. The header comment block (HIERARCHY line)
2. The appropriate mapping field (`broad_mappings` or `narrow_mappings`)
3. Inline comments with file/line references
---
## 5. Key Ontology Hierarchies Reference
### PROV-O (`prov.ttl`)
```
prov:wasInfluencedBy (parent of many)
├── prov:wasDerivedFrom
│ ├── prov:hadPrimarySource
│ ├── prov:wasQuotedFrom
│ └── prov:wasRevisionOf
├── prov:wasGeneratedBy
├── prov:used
├── prov:wasAssociatedWith
├── prov:wasAttributedTo
└── prov:wasInformedBy
prov:influenced (inverse direction)
├── prov:generated
└── prov:invalidated
```
### CIDOC-CRM (`CIDOC_CRM_v7.1.3.rdf`)
```
crm:P1_is_identified_by
├── crm:P48_has_preferred_identifier
└── crm:P168_place_is_defined_by
crm:P82_at_some_time_within
├── crm:P82a_begin_of_the_begin
└── crm:P82b_end_of_the_end
crm:P81_ongoing_throughout
├── crm:P81a_end_of_the_begin
└── crm:P81b_begin_of_the_end
crm:P67_refers_to
└── crm:P70_documents
```
### RiC-O (`RiC-O_1-1.rdf`)
```
rico:isOrWasUnderAuthorityOf
├── rico:hasOrHadManager
│ └── rico:hasOrHadHolder
└── (other authority relationships)
rico:hasOrHadPart
└── rico:containsOrContained
└── rico:containsTransitive
rico:isSuccessorOf
├── rico:hasAncestor
├── rico:resultedFromTheMergerOf
└── rico:resultedFromTheSplitOf
```
### Dublin Core Terms (`dcterms.rdf`)
```
dcterms:rights
└── dcterms:accessRights
```
### DCAT (`dcat3.ttl`)
```
dcterms:format
├── dcat:mediaType
├── dcat:compressFormat
└── dcat:packageFormat
```
### FOAF (`foaf.ttl`)
```
foaf:page
├── foaf:homepage
├── foaf:weblog
├── foaf:interest
├── foaf:workplaceHomepage
└── foaf:schoolHomepage
```
### Schema.org (`schemaorg.owl`)
```
schema:workFeatured
├── schema:workPerformed
└── schema:workPresented
```
---
## 6. Verification Commands
### Check if a property exists
```bash
grep -n "PROPERTY_NAME" data/ontology/FILE.ttl
```
### Find all subPropertyOf for a property
```bash
grep -B5 -A5 "subPropertyOf" data/ontology/FILE.ttl | grep -A5 -B5 "PROPERTY_NAME"
```
### Validate YAML after editing
```bash
python3 -c "import yaml; yaml.safe_load(open('FILENAME.yaml')); print('✅ valid')"
```
---
## 7. Complete Slot File Example
```yaml
# ==============================================================================
# LinkML Slot Definition: retrieved_through
# ==============================================================================
# To denote the specific method, protocol, or mechanism by which a resource
# or data was accessed, fetched, or collected.
#
# ONTOLOGY ALIGNMENT (verified against data/ontology/):
#
# | Ontology | Property | File/Line | Mapping | Notes |
# |------------|--------------------------|--------------------|---------|------------------------------------|
# | **PROV-O** | `prov:used` | prov.ttl:1046-1057 | exact | Entity used by activity |
# | **PROV-O** | `prov:wasInfluencedBy` | prov.ttl:1099-1121 | broad | Parent property (subPropertyOf) |
# | **PROV-O** | `prov:qualifiedUsage` | prov.ttl:788-798 | narrow | Qualified usage with details |
#
# HIERARCHY: prov:used rdfs:subPropertyOf prov:wasInfluencedBy (line 1046)
#
# CREATED: 2026-01-26
# UPDATED: 2026-02-03 - Added broad/narrow mappings, header documentation
# ==============================================================================
id: https://nde.nl/ontology/hc/slot/retrieved_through
name: retrieved_through
title: Retrieved Through
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
prov: http://www.w3.org/ns/prov#
schema: http://schema.org/
imports:
- linkml:types
default_prefix: hc
slots:
retrieved_through:
slot_uri: prov:used
description: |
To denote the specific method, protocol, or mechanism by which a resource or data was accessed, fetched, or collected.
range: string
exact_mappings:
- prov:used # prov.ttl:1046-1057
broad_mappings:
- prov:wasInfluencedBy # prov.ttl:1099-1121 - parent (used subPropertyOf wasInfluencedBy)
narrow_mappings:
- prov:qualifiedUsage # prov.ttl:788-798 - qualified form with details
comments:
- |
**ONTOLOGY ALIGNMENT** (verified against data/ontology/):
| Ontology | Property | Line | Mapping | Notes |
|----------|----------|------|---------|-------|
| PROV-O | prov:used | 1046-1057 | exact | Entity used by activity |
| PROV-O | prov:wasInfluencedBy | 1099-1121 | broad | Parent property |
| PROV-O | prov:qualifiedUsage | 788-798 | narrow | Qualified usage |
```
---
## 8. Anti-Patterns
### ❌ WRONG: Hallucinated ontology terms
```yaml
exact_mappings:
- prov:retrievedWith # ❌ Does not exist in PROV-O!
- rico:wasObtainedBy # ❌ Not a real RiC-O property!
```
### ❌ WRONG: No verification references
```yaml
exact_mappings:
- prov:used # No file/line reference - how do we know this is correct?
```
### ✅ CORRECT: Verified with references
```yaml
exact_mappings:
- prov:used # prov.ttl:1046-1057 - "Entity used by activity"
broad_mappings:
- prov:wasInfluencedBy # prov.ttl:1099-1121 - parent property (verified subPropertyOf)
```
---
## 9. Validation Checklist
Before completing a slot file, verify:
- [ ] Header comment block includes ontology alignment table
- [ ] All mappings verified against actual ontology files in `data/ontology/`
- [ ] File/line references provided for each mapping
- [ ] `rdfs:subPropertyOf` relationships checked for broad/narrow mappings
- [ ] HIERARCHY line documents any property hierarchies
- [ ] No hallucinated or assumed ontology terms
- [ ] YAML validates correctly
---
## See Also
- Rule 1: Ontology Files Are Your Primary Reference (`no-hallucinated-ontology-references.md`)
- Rule: Verified Ontology Terms (`verified-ontology-terms.md`)
- Ontology files: `data/ontology/`
---
**Version**: 1.0.0
**Created**: 2026-02-03
**Author**: OpenCODE

View file

@ -0,0 +1,68 @@
# Rule 62: Verified Ontology Terms Reference
🚨 **CRITICAL**: All `class_uri`, `slot_uri`, and mapping properties (`exact_mappings`, `close_mappings`, etc.) MUST use verified classes and predicates that exist in the local ontology files at `data/ontology/`.
## 1. Verified Ontology Files
The following ontologies are locally available in `data/ontology/`. Always verify terms against these specific files. **NO HALLUCINATIONS ALLOWED.**
**Mandatory Verification Step**: Before using any `class_uri`, `slot_uri`, or mapping URI, you MUST `grep` the term in the local ontology file to confirm it exists.
| Prefix | Namespace | Local File | Key Classes/Predicates (Verified) |
|--------|-----------|------------|-----------------------------------|
| `cpov:` | `http://data.europa.eu/m8g/` | `core-public-organisation-ap.ttl` | `PublicOrganisation`, `contactPage`, `email` |
| `crm:` | `http://www.cidoc-crm.org/cidoc-crm/` | `CIDOC_CRM_v7.1.3.rdf` | `E1_CRM_Entity`, `E5_Event`, `P2_has_type` |
| `rico:` | `https://www.ica.org/standards/RiC/ontology#` | `RiC-O_1-1.rdf` | `Record`, `Agent`, `hasOrHadHolder` (Note: Use v1.1 file) |
| `pico:` | `https://personsincontext.org/model#` | `pico.ttl` | `PersonObservation`, `role` |
| `prov:` | `http://www.w3.org/ns/prov#` | `prov.ttl` | `Activity`, `Agent`, `wasGeneratedBy` |
| `skos:` | `http://www.w3.org/2004/02/skos/core#` | `skos.rdf` | `Concept`, `prefLabel`, `broader` |
| `schema:` | `https://schema.org/` | `frontend/public/ontology/schemaorg.owl` | `Organization`, `Place`, `name`, `url` |
| `dcterms:` | `http://purl.org/dc/terms/` | `dublin_core_elements.rdf` | `identifier`, `title`, `description` |
| `org:` | `http://www.w3.org/ns/org#` | `org.rdf` | `Organization`, `hasMember` |
| `tooi:` | `https://identifier.overheid.nl/tooi/def/ont/` | `tooiont.ttl` | `Overheidsorganisatie` |
| `dcat:` | `http://www.w3.org/ns/dcat#` | `dcat3.ttl` | `Dataset`, `Catalog`, `dataset` |
| `gn:` | `https://www.geonames.org/ontology#` | `geonames_ontology.rdf` | `Feature` |
| `dqv:` | `http://www.w3.org/ns/dqv#` | `dqv.ttl` | `QualityMeasurement`, `hasQualityAnnotation` |
| `premis:` | `http://www.loc.gov/premis/rdf/v3/` | `premis3.owl` | `fixity`, `storedAt`, `Event` |
## 2. Verification Procedure (MANDATORY)
**You MUST verify every term.** Do not assume a term exists just because it sounds standard.
```bash
# 1. Identify the source ontology file
ls data/ontology/
# 2. Grep for the specific term (e.g., 'hasFixity')
grep "hasFixity" data/ontology/premis3.owl
# Result: EMPTY -> Term does not exist! DO NOT USE.
# 3. Grep for the correct term (e.g., 'fixity')
grep "fixity" data/ontology/premis3.owl
# Result: <owl:ObjectProperty rdf:about=".../fixity"> -> Term exists. USE THIS.
```
## 3. LinkML Mapping Requirements
Mappings must be precise and verified.
* `exact_mappings` = `skos:exactMatch` (Semantic equivalence)
* `close_mappings` = `skos:closeMatch` (Near equivalence)
* `related_mappings` = `skos:relatedMatch` (Association)
* `broad_mappings` = `skos:broadMatch` (Broader concept)
* `narrow_mappings` = `skos:narrowMatch` (Narrower concept)
## 4. Prohibited/Invalid Terms (Hallucinations)
Do NOT use these commonly hallucinated or incorrect terms. They have been verified as **non-existent** in our local ontologies:
* ❌ `dqv:ConfidenceScore` (Use `dqv:QualityMeasurement`)
* ❌ `premis:hasFixity` (Use `premis:fixity`)
* ❌ `premis:hasFrameRate` (Verify specific PREMIS properties first)
* ❌ `schema:HeritageBuilding` (Use `schema:LandmarksOrHistoricalBuildings`)
* ❌ `rico:has_provenance` (Use `rico:history`)
* ❌ `rico:hasProvenance` (Use `rico:history`)
* ❌ `schema:archive` (Use `schema:archiveHeld` or `schema:archivedAt`)
**Always verify against the local file content.**

View file

@ -0,0 +1,162 @@
# Wikidata Mapping Discovery Rule
## Rule: Use Wikidata MCP to Discover and Verify Mappings Carefully
When adding Wikidata mappings to class files, you MUST verify the semantic meaning and relationship before adding any mapping.
### 🚨 CRITICAL: Always Verify Before Adding
**NEVER add a Wikidata QID without verifying:**
1. What the entity actually IS (not just the label)
2. That it's the SAME TYPE as your class (organization→organization, NOT organization→building)
3. That the semantic relationship makes sense
### Workflow
#### Step 1: VERIFY Existing Mappings First
Before trusting any existing mapping, verify it:
```sparql
SELECT ?item ?itemLabel ?itemDescription WHERE {
VALUES ?item { wd:Q22075301 wd:Q1643722 wd:Q185583 }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
```
**Example of WRONG mappings found in codebase:**
| QID | Label | Was Mapped To | WHY WRONG |
|-----|-------|---------------|-----------|
| Q22075301 | textile artwork | FacultyPaperCollection | Not related at all! |
| Q1643722 | building in Vienna | UniversityAdministrativeFonds | Not an archival concept! |
| Q185583 | candy | AcademicStudentRecordSeries | Completely unrelated! |
#### Step 2: Search for Candidates
Search for relevant Wikidata entities by keyword or hierarchy:
```sparql
SELECT ?item ?itemLabel ?itemDescription WHERE {
?item wdt:P279 wd:Q166118 . # subclasses of "archives"
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
```
#### Step 3: VERIFY Each Candidate
For EVERY candidate found, verify:
1. **Read the description** - does it match your class?
2. **Check instance of (P31)** - is it the same type?
3. **Check subclass of (P279)** - is it in a relevant hierarchy?
```sparql
SELECT ?item ?itemLabel ?itemDescription ?instanceLabel ?subclassLabel WHERE {
VALUES ?item { wd:Q9388534 }
OPTIONAL { ?item wdt:P31 ?instance. }
OPTIONAL { ?item wdt:P279 ?subclass. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
```
#### Step 4: Confirm Semantic Relationship
Ask: **Would a domain expert agree this mapping makes sense?**
| Your Class | Wikidata Entity | Verdict |
|------------|-----------------|---------|
| FacultyPaperCollection | Q22075301 (textile artwork) | ❌ NO - completely unrelated |
| CampusDocumentationCollection | Q9388534 (archival collection) | ✅ YES - semantically related |
| AcademicArchive | Q27032435 (academic archive) | ✅ YES - exact match |
### Type Compatibility Rules
| Your Class Type | Valid Wikidata Types | Invalid Wikidata Types |
|-----------------|---------------------|------------------------|
| Organization | organization, institution | building, person, artwork |
| Record Set Type | collection, fonds, series | building, candy, textile |
| Event | activity, occurrence | organization, place |
| Type/Category | type, concept, class | specific instances |
### Common Mistakes to Avoid
❌ **WRONG: Adding any QID found in search without verification**
```
"Found Q1643722 in search results, adding it as mapping"
→ Result: Mapping a "building in Vienna" to "UniversityAdministrativeFonds"
```
✅ **CORRECT: Verify description and type before adding**
```
1. Search finds Q1643722
2. Verify: Q1643722 = "building in Vienna, Austria"
3. Check: Is a building related to "UniversityAdministrativeFonds"?
4. Decision: NO - do not add this mapping
```
### When to Add Wikidata Mappings
Add Wikidata mappings ONLY when:
- [ ] You verified the entity's label AND description
- [ ] The entity is the same type as your class
- [ ] The semantic relationship is clear (exact, broader, narrower, related)
- [ ] A domain expert would agree the mapping makes sense
### When NOT to Add Wikidata Mappings
Do NOT add Wikidata mappings when:
- You only searched but didn't verify the description
- The entity type doesn't match (e.g., building vs. organization)
- The relationship is unclear or forced
- You're just trying to "fill in" mappings
### Mapping Categories
| Category | Wikidata Property | When to Use |
|----------|-------------------|-------------|
| `exact_mappings` | - | Same semantic meaning (rare!) |
| `close_mappings` | - | Similar but not identical |
| `broad_mappings` | P279 (subclass of) | Wikidata entity is BROADER |
| `narrow_mappings` | inverse of P279 | Wikidata entity is NARROWER |
| `related_mappings` | - | Non-hierarchical but semantically related |
### Checklist
For each Wikidata mapping:
- [ ] Verified entity label matches expected meaning
- [ ] Verified entity description confirms semantic fit
- [ ] Entity type is compatible with class type
- [ ] Mapping category (exact/close/broad/narrow/related) is correct
- [ ] A domain expert would agree this makes sense
### Example: Proper Verification for FacultyPaperCollection
**Step 1: What are we looking for?**
- Personal papers of faculty members
- Academic archives
- Manuscript collections
**Step 2: Search**
```sparql
SELECT ?item ?itemLabel ?itemDescription WHERE {
?item ?prop ?value .
?value bif:contains "'personal papers' OR 'faculty papers'" .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 10
```
**Step 3: Verify candidates**
- If no exact match found → DO NOT add a wrong mapping
- Better to have NO Wikidata mapping than a WRONG one
**Step 4: Decision**
- No exact Wikidata match for "FacultyPaperCollection"
- Keep ontology mappings only (rico-rst:Fonds, bf:Archival)
- Do NOT add unrelated QIDs like Q22075301 (textile artwork!)
### Integration with Other Rules
This rule complements:
- **mapping-specificity-hypernym-rule.md**: Rules for choosing mapping type
- **wikidata-mapping-verification-rule.md**: Rules for verifying QIDs exist
- **verified-ontology-mapping-requirements.md**: General ontology verification

View file

@ -0,0 +1,97 @@
# Wikidata Mapping Verification Rule
## Rule: Always Verify Wikidata Mappings Using Authenticated Tools
When adding or reviewing Wikidata mappings (wd:Qxxxxx), you MUST verify the entity exists and is semantically appropriate using the available tools.
### Verification Methods (in order of preference)
#### 1. Wikidata SPARQL Query (Primary)
Use `wikidata-authenticated_execute_sparql` to verify entity labels and descriptions:
```sparql
SELECT ?item ?itemLabel ?itemDescription WHERE {
VALUES ?item { wd:Q38723 wd:Q2385804 }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
```
#### 2. Wikidata Metadata API
Use `wikidata-authenticated_get_metadata` to retrieve label and description:
```
entity_id: Q38723
language: en
```
#### 3. Web Search as Fallback
If authenticated tools fail, use `linkup_linkup-search` or `exa_web_search_exa`:
```
query: "Wikidata Q38723 higher education institution"
```
### Common Errors to Avoid
| Error | Example | Fix |
|-------|---------|-----|
| **Wrong QID** | Q600875 (a person) for "academic program" | Q600134 (course) |
| **Too broad** | Q35120 (entity) for specific class | Use appropriate subclass |
| **Too narrow** | Q3918 (university) for general academic institution | Use Q38723 (higher education institution) |
| **Different concept** | Q416703 (museum building) for museum organization | Use appropriate organizational class |
### Verification Checklist
Before committing any Wikidata mapping:
- [ ] QID exists (not 404)
- [ ] Label matches expected concept
- [ ] Description confirms semantic alignment
- [ ] Mapping specificity follows Rule 63 (exact/broad/narrow/close)
- [ ] Not a duplicate of another mapping in the same class
### Example Verification
**WRONG:**
```yaml
# Q600875 was not verified - it's actually a person
close_mappings:
- wd:Q600875 # Juan Lindolfo Cuestas - President of Uruguay!
```
**CORRECT:**
```yaml
# Verified via SPARQL: Q600134 = "course"
close_mappings:
- wd:Q600134 # program of study, or unit of teaching
```
### SPARQL Query Template
```sparql
SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel WHERE {
VALUES ?item { wd:Q38723 }
OPTIONAL { ?item skos:altLabel ?itemAltLabel. FILTER(LANG(?itemAltLabel) = "en") }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
```
### Batch Verification
For multiple QIDs in a file, verify all at once:
```sparql
SELECT ?item ?itemLabel ?itemDescription WHERE {
VALUES ?item { wd:Q38723 wd:Q2385804 wd:Q600134 wd:Q3918 }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
```
### Integration with Other Rules
This rule complements:
- **Rule 63** (mapping-specificity-hypernym-rule.md): Determines mapping type (exact/broad/narrow)
- **no-hallucinated-ontology-references.md**: Prevents fake ontology terms
- **verified-ontology-terms.md**: General ontology verification

View file

@ -8,15 +8,108 @@ When mapping LinkML classes to external ontologies, you MUST distinguish between
1. **Exact Mappings (`skos:exactMatch`)**: Use ONLY when the external concept is **semantically equivalent** to your class.
* *Example*: `hc:Person` `exact_mappings` `schema:Person`.
* **CRITICAL**: Exact means the SAME semantic scope - neither broader nor narrower!
* **DO NOT AVOID EXACT BY DEFAULT**: If equivalence is verified (including class/property category match and ontology definition review), `exact_mappings` SHOULD be used.
2. **Broad Mappings (`skos:broadMatch`)**: Use when the external concept is a **hypernym** (a broader, more general category) of your class.
* *Example*: `hc:AcademicArchiveRecordSetType` `broad_mappings` `rico:RecordSetType`.
* *Rationale*: An academic archive record set *is a* record set type, but `rico:RecordSetType` is broader.
* *Common Hypernyms*: `skos:Concept`, `prov:Entity`, `schema:Thing`, `schema:Organization`, `rico:RecordSetType`.
* *Common Hypernyms*: `skos:Concept`, `prov:Entity`, `prov:Activity`, `schema:Thing`, `schema:Organization`, `schema:Action`, `rico:RecordSetType`, `crm:E55_Type`.
3. **Narrow Mappings (`skos:narrowMatch`)**: Use when the external concept is a **hyponym** (a narrower, more specific category) of your class.
* *Example*: `hc:Organization` `narrow_mappings` `hc:Library` (if mapping inversely).
4. **Close Mappings (`skos:closeMatch`)**: Use when the external concept is similar but not exactly equivalent.
* *Example*: `hc:AccessPolicy` `close_mappings` `dcterms:accessRights` (related but different scope).
5. **Related Mappings (`skos:relatedMatch`)**: Use for non-hierarchical relationships.
* *Example*: `hc:Collection` `related_mappings` `rico:RecordSet`.
### 🚨 Type Compatibility Rule
**Classes map to classes, properties map to properties.** Never mix types in mappings.
| Your Element | Valid Mapping Target |
|--------------|---------------------|
| Class | Class (owl:Class, rdfs:Class) |
| Slot | Property (owl:ObjectProperty, owl:DatatypeProperty, rdf:Property) |
**WRONG**:
```yaml
# AccessApplication is a CLASS, schema:Action is a CLASS - but Action is BROADER
AccessApplication:
exact_mappings:
- schema:Action # WRONG: Action is a hypernym, not equivalent
```
**CORRECT**:
```yaml
AccessApplication:
broad_mappings:
- schema:Action # CORRECT: Action is the broader category
```
### 🚨 No Self/Internal Exact Mappings
`exact_mappings` MUST NOT contain self-references or internal HC class references for the same concept.
**WRONG**:
```yaml
AcademicArchive:
exact_mappings:
- hc:AcademicArchive # Self/internal reference; not an external equivalence mapping
```
**CORRECT**:
```yaml
AcademicArchive:
exact_mappings:
- wd:Q27032435 # External concept with equivalent semantic scope
```
Use `exact_mappings` only for equivalent terms in external ontologies or external controlled vocabularies, not for repeating the class itself.
### ✅ Positive Guidance: When Exact Mapping Is Correct
Use `exact_mappings` when all checks below pass:
- Semantic scope is equivalent (not parent/child, not merely similar)
- Ontological category matches (Class↔Class, Slot↔Property)
- Target term is verified in the ontology source files under `data/ontology/` or verified Wikidata entity metadata
- No self/internal duplication (no `hc:` self-reference for the same concept)
**CORRECT**:
```yaml
Person:
exact_mappings:
- schema:Person
Acquisition:
exact_mappings:
- crm:E8_Acquisition
```
Do not downgrade a truly equivalent mapping to `close_mappings` or `broad_mappings` just to be conservative.
### Common Hypernyms That Are NEVER Exact Mappings
These terms are always BROADER than your specific class - never use them as `exact_mappings`:
| Hypernym | What It Means | Use Instead |
|----------|---------------|-------------|
| `schema:Action` | Any action | `broad_mappings` |
| `schema:Organization` | Any organization | `broad_mappings` |
| `schema:Thing` | Anything at all | `broad_mappings` |
| `schema:PropertyValue` | Any property value | `broad_mappings` |
| `schema:Permit` | Any permit | `broad_mappings` |
| `prov:Activity` | Any activity | `broad_mappings` |
| `prov:Entity` | Any entity | `broad_mappings` |
| `skos:Concept` | Any concept | `broad_mappings` |
| `crm:E55_Type` | Any type classification | `broad_mappings` |
| `crm:E42_Identifier` | Any identifier | `broad_mappings` |
| `rico:Identifier` | Any identifier | `broad_mappings` |
| `dcat:DataService` | Any data service | `broad_mappings` |
### Common Violations to Avoid
**WRONG**:
@ -47,8 +140,46 @@ SocialMovement:
- schema:Organization # CORRECT
```
**WRONG**:
```yaml
AccessApplication:
exact_mappings:
- schema:Action # WRONG: Action is a hypernym
```
**CORRECT**:
```yaml
AccessApplication:
broad_mappings:
- schema:Action # CORRECT: Action is the broader category
```
### How to Determine Mapping Type
Ask these questions:
1. **Is it the SAME thing?**`exact_mappings`
- "Could I swap these two terms in any context without changing meaning?"
- If NO, it's not an exact mapping
2. **Is the external term a PARENT category?**`broad_mappings`
- "Is my class a TYPE OF the external term?"
- Example: AccessApplication IS-A Action
3. **Is the external term a CHILD category?**`narrow_mappings`
- "Is the external term a TYPE OF my class?"
- Example: Library IS-A Organization (so Organization has narrow_mapping to Library)
4. **Is it similar but not hierarchical?**`close_mappings`
- "Related but not equivalent or hierarchical"
5. **Is there some other relationship?**`related_mappings`
- "Connected in some way"
### Verification Checklist
- [ ] Does the `exact_mapping` represent the **exact same scope**?
- [ ] If the external term is a generic parent class (e.g., `Type`, `Concept`, `Entity`), move it to `broad_mappings`.
- [ ] If the external term is a specific instance or subclass, check `narrow_mappings`.
- [ ] Is the external term a generic parent class (e.g., `Type`, `Concept`, `Entity`, `Action`, `Activity`, `Organization`)? → Move to `broad_mappings`
- [ ] Is the external term a specific instance or subclass? → Check `narrow_mappings`
- [ ] Is the external term the same type (class→class, property→property)?
- [ ] Would swapping the terms change the meaning? If yes, not an `exact_mapping`

View file

@ -0,0 +1,53 @@
# Rule: No Version Indicators in Names
## 🚨 Critical
Do not include version identifiers in **class names**, **slot names**, or **enum names**.
Version tags in semantic names create churn, break reuse, and force unnecessary migrations.
## The Rule
1. Use stable semantic names for LinkML elements.
- ✅ `DigitalPlatform`
- ❌ `DigitalPlatformV2`
2. If a model evolves, keep the name and update metadata/provenance.
- Track revision in changelog, annotations, or transformation metadata.
- Do not encode `v2`, `v3`, `_2026`, `beta`, `final` in the element name.
3. Apply this to all naming surfaces:
- `classes:` keys
- `slots:` keys
- `enums:` keys
- `name:` values in module files
## Allowed Versioning Locations
- File-level changelog/comments
- Dedicated metadata classes/slots (e.g., transformation metadata)
- External release tags (git tags, manifest versions)
## Migration Guidance
When you encounter versioned names:
1. Rename semantic elements to stable names.
2. Update references/imports/usages accordingly.
3. Preserve provenance of the migration in comments/annotations.
## Examples
✅ Correct:
```yaml
classes:
DigitalPlatformTransformationMetadata:
description: Metadata about record transformation steps.
```
❌ Wrong:
```yaml
classes:
DigitalPlatformV2TransformationMetadata:
description: Metadata about V2 transformation.
```

View file

@ -0,0 +1,45 @@
# Rule: Polished Slot Storage Location
## Summary
Polished (refactored) canonical slot files MUST be stored in the parent `slots/` directory:
```
schemas/20251121/linkml/modules/slots/
```
They must **NOT** be stored in the `20260202_matang/` subdirectory.
## Rationale
The `new/` subdirectory contain **draft/unpolished** slot definitions that are pending review. Once a slot file has been polished (ontology-aligned, translated, cleaned), it graduates to the canonical `slots/` directory.
## Directory Structure
```
schemas/20251121/linkml/modules/slots/
├── *.yaml ← Polished canonical slot files go HERE
└── 20260202_matang/
├── *.yaml ← Draft/unpolished canonical slots (staging area)
└── new/
└── *.yaml ← Raw/draft slot definitions pending triage
```
## Rule
- When polishing a slot file, write the result to `schemas/20251121/linkml/modules/slots/{slot_name}.yaml`
- If the source file was in `20260202_matang/`, remove it from there after writing to `slots/`
- If the source file was in `20260202_matang/new/`, it should only be deleted after user confirmation of alias absorption (per the no-autonomous-alias-assignment rule)
- If a file already exists in `slots/` (i.e., it was previously polished in an earlier session), overwrite it in place
## Examples
**CORRECT:**
```
schemas/20251121/linkml/modules/slots/has_pattern.yaml ← polished file
```
**WRONG:**
```
schemas/20251121/linkml/modules/slots/20260202_matang/has_pattern.yaml ← should not be here after polishing
```

Binary file not shown.

View file

@ -3,10 +3,10 @@
**Scope:** Schema Migration / Slot Fixes
**Description:**
The file `/Users/kempersc/apps/glam/data/fixes/slot_fixes.yaml` is the **single authoritative source** for tracking slot migrations and fixes.
The file `slot_fixes.yaml` is the **single authoritative source** for tracking slot migrations and fixes.
**Directives:**
1. **Authoritative Source:** Always read and update `/Users/kempersc/apps/glam/data/fixes/slot_fixes.yaml`. Do NOT use `schemas/.../slot_fixes.yaml` as the master list (though you may need to sync them if they diverge, the `data/fixes` version takes precedence).
1. **Authoritative Source:** Always read and update `slot_fixes.yaml`.
2. **Processed Status:** When a slot migration is completed (schema updated, data migrated), you MUST update the entry in `slot_fixes.yaml` with a `processed` block containing:
* `status: true`
* `date: 'YYYY-MM-DD'`

View file

@ -4,7 +4,7 @@
## Summary
The `revision` key in `schemas/20251121/linkml/modules/slots/slot_fixes.yaml` is **IMMUTABLE**. AI agents MUST follow revision specifications exactly and are NEVER permitted to modify the content of revision entries.
The `revision` key in `slot_fixes.yaml` is **IMMUTABLE**. AI agents MUST follow revision specifications exactly and are NEVER permitted to modify the content of revision entries.
## The Authoritative Source

View file

@ -0,0 +1,69 @@
# Rule: Slot Naming Convention (Current Style)
🚨 **CRITICAL**: New LinkML slot names MUST follow the current verb-first naming style used in active slot files under `modules/slots/`.
## Core Naming Rules
1. Use `snake_case`.
2. Prefer short, descriptive verb predicates as canonical names.
3. Keep names ontology-neutral (no ontology namespace prefixes in slot names).
4. Use singular nouns in object positions (including multivalued slots).
5. Keep temporal semantics in mappings/definitions when needed, not by forcing a legacy prefix.
## Preferred Patterns
### 1) Simple verb predicates (default)
Use a single verb when it clearly expresses the relation.
Examples from active slots:
- `accept`
- `contain`
- `catalogue`
- `exhibit`
### 2) Verb + particle/preposition when needed
Use compact phrasal forms when a preposition carries core meaning.
Examples:
- `belong_to`
- `located_in`
- `derived_from`
### 3) Symmetric or directional pair pattern
Use `<present>_or_<past_participle>` when both directions/states are intentionally modeled in one predicate label.
Examples:
- `contains_or_contained`
- `includes_or_included`
- `operates_or_operated`
## Legacy Compatibility
- For migrations, keep backward compatibility via `aliases` when renaming to current-style canonical names.
- Do not rename canonical slots opportunistically; follow migration plans and canonical-slot protection rules.
## Anti-Patterns
- ❌ `rico_has_or_had_holder` (ontology prefix in name)
- ❌ `collections` (plural noun predicate)
- ❌ `has_museum_visitor_count` (class-specific slot name)
- ❌ Creating new `has_or_had_*` names by default when a verb predicate is clearer
## Quick Checklist
- [ ] Is the canonical slot name verb-first and descriptive?
- [ ] Is it `snake_case`?
- [ ] Is the noun part singular?
- [ ] Is the name ontology-neutral?
- [ ] If renaming legacy slots, are aliases/migration constraints handled?
## See Also
- `.opencode/rules/archive/DEPRECATED-slot-naming-convention-rico-style.md`
- `.opencode/rules/no-ontology-prefix-in-slot-names.md`
- `.opencode/rules/slot-noun-singular-convention.md`
- `.opencode/rules/generic-slots-specific-classes.md`
- `.opencode/rules/canonical-slot-protection-rule.md`

View file

@ -76,5 +76,5 @@ When creating or renaming slots:
## See Also
- `.opencode/rules/slot-naming-convention-rico-style.md` - RiC-O naming patterns
- `.opencode/rules/slot-naming-convention-current-style.md` - Current slot naming patterns
- `.opencode/rules/slot-centralization-and-semantic-uri-rule.md` - Slot centralization requirements

View file

@ -0,0 +1,162 @@
# Wikidata Mapping Discovery Rule
## Rule: Use Wikidata MCP to Discover and Verify Mappings Carefully
When adding Wikidata mappings to class files, you MUST verify the semantic meaning and relationship before adding any mapping.
### 🚨 CRITICAL: Always Verify Before Adding
**NEVER add a Wikidata QID without verifying:**
1. What the entity actually IS (not just the label)
2. That it's the SAME TYPE as your class (organization→organization, NOT organization→building)
3. That the semantic relationship makes sense
### Workflow
#### Step 1: VERIFY Existing Mappings First
Before trusting any existing mapping, verify it:
```sparql
SELECT ?item ?itemLabel ?itemDescription WHERE {
VALUES ?item { wd:Q22075301 wd:Q1643722 wd:Q185583 }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
```
**Example of WRONG mappings found in codebase:**
| QID | Label | Was Mapped To | WHY WRONG |
|-----|-------|---------------|-----------|
| Q22075301 | textile artwork | FacultyPaperCollection | Not related at all! |
| Q1643722 | building in Vienna | UniversityAdministrativeFonds | Not an archival concept! |
| Q185583 | candy | AcademicStudentRecordSeries | Completely unrelated! |
#### Step 2: Search for Candidates
Search for relevant Wikidata entities by keyword or hierarchy:
```sparql
SELECT ?item ?itemLabel ?itemDescription WHERE {
?item wdt:P279 wd:Q166118 . # subclasses of "archives"
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
```
#### Step 3: VERIFY Each Candidate
For EVERY candidate found, verify:
1. **Read the description** - does it match your class?
2. **Check instance of (P31)** - is it the same type?
3. **Check subclass of (P279)** - is it in a relevant hierarchy?
```sparql
SELECT ?item ?itemLabel ?itemDescription ?instanceLabel ?subclassLabel WHERE {
VALUES ?item { wd:Q9388534 }
OPTIONAL { ?item wdt:P31 ?instance. }
OPTIONAL { ?item wdt:P279 ?subclass. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
```
#### Step 4: Confirm Semantic Relationship
Ask: **Would a domain expert agree this mapping makes sense?**
| Your Class | Wikidata Entity | Verdict |
|------------|-----------------|---------|
| FacultyPaperCollection | Q22075301 (textile artwork) | ❌ NO - completely unrelated |
| CampusDocumentationCollection | Q9388534 (archival collection) | ✅ YES - semantically related |
| AcademicArchive | Q27032435 (academic archive) | ✅ YES - exact match |
### Type Compatibility Rules
| Your Class Type | Valid Wikidata Types | Invalid Wikidata Types |
|-----------------|---------------------|------------------------|
| Organization | organization, institution | building, person, artwork |
| Record Set Type | collection, fonds, series | building, candy, textile |
| Event | activity, occurrence | organization, place |
| Type/Category | type, concept, class | specific instances |
### Common Mistakes to Avoid
❌ **WRONG: Adding any QID found in search without verification**
```
"Found Q1643722 in search results, adding it as mapping"
→ Result: Mapping a "building in Vienna" to "UniversityAdministrativeFonds"
```
✅ **CORRECT: Verify description and type before adding**
```
1. Search finds Q1643722
2. Verify: Q1643722 = "building in Vienna, Austria"
3. Check: Is a building related to "UniversityAdministrativeFonds"?
4. Decision: NO - do not add this mapping
```
### When to Add Wikidata Mappings
Add Wikidata mappings ONLY when:
- [ ] You verified the entity's label AND description
- [ ] The entity is the same type as your class
- [ ] The semantic relationship is clear (exact, broader, narrower, related)
- [ ] A domain expert would agree the mapping makes sense
### When NOT to Add Wikidata Mappings
Do NOT add Wikidata mappings when:
- You only searched but didn't verify the description
- The entity type doesn't match (e.g., building vs. organization)
- The relationship is unclear or forced
- You're just trying to "fill in" mappings
### Mapping Categories
| Category | Wikidata Property | When to Use |
|----------|-------------------|-------------|
| `exact_mappings` | - | Same semantic meaning (rare!) |
| `close_mappings` | - | Similar but not identical |
| `broad_mappings` | P279 (subclass of) | Wikidata entity is BROADER |
| `narrow_mappings` | inverse of P279 | Wikidata entity is NARROWER |
| `related_mappings` | - | Non-hierarchical but semantically related |
### Checklist
For each Wikidata mapping:
- [ ] Verified entity label matches expected meaning
- [ ] Verified entity description confirms semantic fit
- [ ] Entity type is compatible with class type
- [ ] Mapping category (exact/close/broad/narrow/related) is correct
- [ ] A domain expert would agree this makes sense
### Example: Proper Verification for FacultyPaperCollection
**Step 1: What are we looking for?**
- Personal papers of faculty members
- Academic archives
- Manuscript collections
**Step 2: Search**
```sparql
SELECT ?item ?itemLabel ?itemDescription WHERE {
?item ?prop ?value .
?value bif:contains "'personal papers' OR 'faculty papers'" .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 10
```
**Step 3: Verify candidates**
- If no exact match found → DO NOT add a wrong mapping
- Better to have NO Wikidata mapping than a WRONG one
**Step 4: Decision**
- No exact Wikidata match for "FacultyPaperCollection"
- Keep ontology mappings only (rico-rst:Fonds, bf:Archival)
- Do NOT add unrelated QIDs like Q22075301 (textile artwork!)
### Integration with Other Rules
This rule complements:
- **mapping-specificity-hypernym-rule.md**: Rules for choosing mapping type
- **wikidata-mapping-verification-rule.md**: Rules for verifying QIDs exist
- **verified-ontology-mapping-requirements.md**: General ontology verification

View file

@ -0,0 +1,97 @@
# Wikidata Mapping Verification Rule
## Rule: Always Verify Wikidata Mappings Using Authenticated Tools
When adding or reviewing Wikidata mappings (wd:Qxxxxx), you MUST verify the entity exists and is semantically appropriate using the available tools.
### Verification Methods (in order of preference)
#### 1. Wikidata SPARQL Query (Primary)
Use `wikidata-authenticated_execute_sparql` to verify entity labels and descriptions:
```sparql
SELECT ?item ?itemLabel ?itemDescription WHERE {
VALUES ?item { wd:Q38723 wd:Q2385804 }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
```
#### 2. Wikidata Metadata API
Use `wikidata-authenticated_get_metadata` to retrieve label and description:
```
entity_id: Q38723
language: en
```
#### 3. Web Search as Fallback
If authenticated tools fail, use `linkup_linkup-search` or `exa_web_search_exa`:
```
query: "Wikidata Q38723 higher education institution"
```
### Common Errors to Avoid
| Error | Example | Fix |
|-------|---------|-----|
| **Wrong QID** | Q600875 (a person) for "academic program" | Q600134 (course) |
| **Too broad** | Q35120 (entity) for specific class | Use appropriate subclass |
| **Too narrow** | Q3918 (university) for general academic institution | Use Q38723 (higher education institution) |
| **Different concept** | Q416703 (museum building) for museum organization | Use appropriate organizational class |
### Verification Checklist
Before committing any Wikidata mapping:
- [ ] QID exists (not 404)
- [ ] Label matches expected concept
- [ ] Description confirms semantic alignment
- [ ] Mapping specificity follows Rule 63 (exact/broad/narrow/close)
- [ ] Not a duplicate of another mapping in the same class
### Example Verification
**WRONG:**
```yaml
# Q600875 was not verified - it's actually a person
close_mappings:
- wd:Q600875 # Juan Lindolfo Cuestas - President of Uruguay!
```
**CORRECT:**
```yaml
# Verified via SPARQL: Q600134 = "course"
close_mappings:
- wd:Q600134 # program of study, or unit of teaching
```
### SPARQL Query Template
```sparql
SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel WHERE {
VALUES ?item { wd:Q38723 }
OPTIONAL { ?item skos:altLabel ?itemAltLabel. FILTER(LANG(?itemAltLabel) = "en") }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
```
### Batch Verification
For multiple QIDs in a file, verify all at once:
```sparql
SELECT ?item ?itemLabel ?itemDescription WHERE {
VALUES ?item { wd:Q38723 wd:Q2385804 wd:Q600134 wd:Q3918 }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
```
### Integration with Other Rules
This rule complements:
- **Rule 63** (mapping-specificity-hypernym-rule.md): Determines mapping type (exact/broad/narrow)
- **no-hallucinated-ontology-references.md**: Prevents fake ontology terms
- **verified-ontology-terms.md**: General ontology verification

View file

@ -33,6 +33,16 @@ AcademicArchiveRecordSetType:
**See**: `.opencode/rules/mapping-specificity-hypernym-rule.md` for complete documentation.
### Rule 64: Archive Organization Type Descriptions
🚨 **CRITICAL**: Archive classes that do NOT have `recordType` or `hold_record_set` as a primary distinguishing feature represent **archives as organizations**, not just collections of records.
**The Rule**:
- **Archive Organization Types** (e.g., `BankArchive`, `ChurchArchive`, `MunicipalArchive`): Emphasize institutional characteristics—governance, funding, legal status, parent organization relationships, and organizational functions.
- **Record Set Types** (have recordType): Focus on the nature and format of the records themselves.
**See**: `.opencode/rules/archive-organization-type-description-rule.md` for complete documentation.
### Rule: Ontology Detection vs Heuristics
🚨 **CRITICAL**: When detecting classes and predicates in `data/ontology/` or external ontology files, you must **read the actual ontology definitions** (e.g., RDF, OWL, TTL files) to determine if a term is a Class or a Property. Do not rely on naming heuristics (like "Capitalized means Class").

View file

@ -359,7 +359,7 @@ classes:
range: WikidataEnrichment
description: Full Wikidata enrichment data
ghcid:
range: GhcidBlock
range: GHCIDBlock
description: GHCID generation metadata with history
web_claims:
range: WebClaimsBlock
@ -1174,7 +1174,7 @@ classes:
# GHCID BLOCK - Heritage Custodian ID with history
# ---------------------------------------------------------------------------
GhcidBlock:
GHCIDBlock:
description: GHCID generation metadata and history
attributes:
ghcid_current:
@ -1203,7 +1203,7 @@ classes:
range: datetime
description: When GHCID was generated
ghcid_history:
range: GhcidHistoryEntry
range: GHCIDHistoryEntry
multivalued: true
inlined_as_list: true
description: History of GHCID changes
@ -1220,7 +1220,7 @@ classes:
range: boolean
description: Whether a collision was detected and resolved
GhcidHistoryEntry:
GHCIDHistoryEntry:
description: Historical GHCID entry with validity period
attributes:
ghcid:

View file

@ -75,8 +75,8 @@ imports:
- ./modules/classes/ProvenanceSources
- ./modules/classes/SourceRecord
# Identifiers Domain
- ./modules/classes/GhcidBlock
- ./modules/classes/GhcidHistoryEntry
- ./modules/classes/GHCIDBlock
- ./modules/classes/GHCIDHistoryEntry
- ./modules/classes/Identifier
# Location Domain
- ./modules/classes/CoordinateProvenance

View file

@ -66,7 +66,7 @@ instances:
examples:
- name: Private art collector
description: Individual maintaining personal art collection
ghcid_type: P # Personal collection
GHCID_type: P # Personal collection
- name: Family archivist
description: Individual preserving family papers and photographs
- name: Independent researcher

View file

@ -1143,13 +1143,13 @@
"category": "classes"
},
{
"name": "GhcidBlock",
"path": "modules/classes/GhcidBlock.yaml",
"name": "GHCIDBlock",
"path": "modules/classes/GHCIDBlock.yaml",
"category": "classes"
},
{
"name": "GhcidHistoryEntry",
"path": "modules/classes/GhcidHistoryEntry.yaml",
"name": "GHCIDHistoryEntry",
"path": "modules/classes/GHCIDHistoryEntry.yaml",
"category": "classes"
},
{

View file

@ -24,7 +24,7 @@ imports:
- ./CustodianNameConsensus
- ./DigitalPlatform
- ./GenealogiewerkbalkEnrichment
- ./GhcidBlock
- ./GHCIDBlock
- ./GoogleMapsEnrichment
- ./GoogleMapsPlaywrightEnrichment
- ./Identifier
@ -94,7 +94,7 @@ classes:
range: WikidataEnrichment
description: Full Wikidata enrichment data
ghcid:
range: GhcidBlock
range: GHCIDBlock
description: GHCID generation metadata with history
has_or_had_web_claim:
range: WebClaimsBlock

View file

@ -1,10 +1,10 @@
# GhcidBlock - GHCID generation metadata and history
# GHCIDBlock - GHCID generation metadata and history
# Extracted from custodian_source.yaml per Rule 38 (modular schema files)
# Extraction date: 2026-01-08
id: https://nde.nl/ontology/hc/classes/GhcidBlock
name: GhcidBlock
title: GhcidBlock
id: https://nde.nl/ontology/hc/classes/GHCIDBlock
name: GHCIDBlock
title: GHCIDBlock
prefixes:
linkml: https://w3id.org/linkml/
@ -17,12 +17,12 @@ imports:
- linkml:types
- ./GhcidHistoryEntry
- ./GHCIDHistoryEntry
- ./LocationResolution
default_range: string
classes:
GhcidBlock:
GHCIDBlock:
description: GHCID generation metadata and history
attributes:
ghcid_current:
@ -51,7 +51,7 @@ classes:
range: datetime
description: When GHCID was generated
ghcid_history:
range: GhcidHistoryEntry
range: GHCIDHistoryEntry
multivalued: true
inlined_as_list: true
description: History of GHCID changes

View file

@ -1,10 +1,10 @@
# GhcidHistoryEntry - Historical GHCID entry with validity period
# GHCIDHistoryEntry - Historical GHCID entry with validity period
# Extracted from custodian_source.yaml per Rule 38 (modular schema files)
# Extraction date: 2026-01-08
id: https://nde.nl/ontology/hc/classes/GhcidHistoryEntry
name: GhcidHistoryEntry
title: GhcidHistoryEntry
id: https://nde.nl/ontology/hc/classes/GHCIDHistoryEntry
name: GHCIDHistoryEntry
title: GHCIDHistoryEntry
prefixes:
linkml: https://w3id.org/linkml/
@ -20,7 +20,7 @@ imports:
default_range: string
classes:
GhcidHistoryEntry:
GHCIDHistoryEntry:
description: Historical GHCID entry with validity period
attributes:
ghcid:

View file

@ -1,6 +1,6 @@
id: https://nde.nl/ontology/hc/slot/ghcid
name: ghcid_slot
title: Ghcid Slot
title: GHCID Slot
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/

View file

@ -0,0 +1,123 @@
---
Goal
Improve the quality of LinkML class files in a GLAM (Galleries, Libraries, Archives, Museums) ontology project by:
1. Writing proper dictionary-style descriptions without repeating class names
2. Adding multilingual support (alt_descriptions and structured_aliases in 7 languages: nl, de, fr, es, ar, id, zh)
3. Structuring data properly using LinkML annotations (examples:, keywords:, comments:)
4. Ensuring all ontology mappings use the correct category (exact_mappings vs broad_mappings vs close_mappings vs narrow_mappings vs related_mappings)
5. Verifying all Wikidata mappings are semantically correct
6. Creating and updating rules in .opencode/rules/
Instructions
Key Rules Created/Updated
1. class-description-quality-rule.md:
- No repetition of class name in descriptions
- MIGRATE structured data before removing from descriptions
- Use folded block scalar (>-)
- Use examples: annotation properly
2. linkml-yaml-best-practices-rule.md:
- equals_expression anti-pattern → use equals_string or any_of
- Declare all prefixes
- Import referenced classes
- Quote regex patterns and annotation values
3. class-multilingual-support-rule.md:
- Required languages: nl, de, fr, es, ar, id, zh
- Structure for alt_descriptions and structured_aliases
4. mapping-specificity-hypernym-rule.md (updated):
- Classes that are NEVER exact mappings: schema:Action, schema:Organization, schema:Thing, schema:PropertyValue, schema:Permit, prov:Activity, skos:Concept, crm:E55_Type, crm:E42_Identifier, dcat:DataService
- Type compatibility: class→class, property→property
- Decision tree for mapping type
5. wikidata-mapping-verification-rule.md:
- Use wikidata-authenticated_execute_sparql to verify QIDs
- SPARQL templates for batch verification
6. wikidata-mapping-discovery-rule.md (updated):
- ALWAYS verify BOTH label AND description before adding
- Check type compatibility (organization→organization, NOT organization→building)
- Examples of WRONG mappings found: Q22075301 (textile artwork) was mapped to FacultyPaperCollection!
- "Better no mapping than wrong mapping" principle
Discoveries
1. Wrong Wikidata mappings found and removed:
- Q22075301 (textile artwork) → was mapped to FacultyPaperCollection ❌
- Q1643722 (building in Vienna) → was mapped to UniversityAdministrativeFonds ❌
- Q185583 (candy) → was mapped to AcademicStudentRecordSeries ❌
2. Wrong mapping categories corrected:
- schema:Action was exact_mappings → should be broad_mappings
- crm:E55_Type was exact_mappings → should be broad_mappings
- prov:Activity was exact_mappings → should be broad_mappings
- schema:Organization was exact_mappings → should be broad_mappings
- crm:E42_Identifier was exact_mappings → should be broad_mappings
3. Verified correct Wikidata mappings:
- Q27032435 (academic archive) → AcademicArchive (exact) ✓
- Q38723 (higher education institution) → AcademicInstitution (exact) ✓
- Q9388534 (archival collection) → CampusDocumentationCollection (related) ✓
4. Conceptual model clarification:
- AcademicArchive = the institution (organizational entity)
- AcademicArchiveRecordSetType = the classification of record sets
Accomplished
Fully Processed Class Files (25 files)
All with: dictionary-style descriptions, 7-language alt_descriptions, 7-language structured_aliases, proper examples, keywords, correct mapping categories, verified Wikidata mappings
| File | Key Mappings |
|------|--------------|
| AcademicArchive.yaml | wd:Q27032435 (exact), wd:Q166118 (broad) |
| AcademicArchiveRecordSetType.yaml | wd:Q27032435 (close), rico:RecordSetType (broad) |
| AcademicArchiveRecordSetTypes.yaml | 4 subclasses, rico-rst mappings |
| AcademicInstitution.yaml | wd:Q38723 (exact), wd:Q4671277 (close) |
| AcademicProgram.yaml | schema:EducationalOccupationalProgram (exact), wd:Q600134 (close) |
| Access.yaml | dcterms:RightsStatement (exact) |
| AccessApplication.yaml | schema:Action (broad) |
| AccessControl.yaml | schema:DigitalDocumentPermission (close) |
| AccessibilityFeature.yaml | schema:LocationFeatureSpecification (close) |
| AccessInterface.yaml | dcat:DataService (broad) |
| AccessionEvent.yaml | crm:E63_Beginning_of_Existence (broad) |
| AccessionNumber.yaml | crm:E42_Identifier (broad), rico:Identifier (broad) |
| AccessLevel.yaml | skos:Concept (broad) |
| AccessTriggerEvent.yaml | prov:Activity (broad) |
| AccountIdentifier.yaml | schema:PropertyValue (broad) |
| AccountStatus.yaml | skos:Concept (broad) |
| Accreditation.yaml | schema:Permit (broad) |
| AccreditationBody.yaml | schema:Organization (broad) |
| AccreditationEvent.yaml | prov:Activity (broad) |
| Accumulation.yaml | rico:AccumulationRelation (exact) |
| AccuracyLevel.yaml | skos:Concept (broad) |
| Acquisition.yaml | crm:E8_Acquisition (exact) |
| AcquisitionEvent.yaml | crm:E10_Transfer_of_Custody (exact) |
| AcquisitionMethod.yaml | crm:E55_Type (broad) |
Rules Created (6 new/updated)
- class-description-quality-rule.md
- class-multilingual-support-rule.md
- linkml-yaml-best-practices-rule.md
- mapping-specificity-hypernym-rule.md (updated)
- wikidata-mapping-verification-rule.md
- wikidata-mapping-discovery-rule.md (updated)
Remaining Work
- Continue processing remaining class files in /Users/kempersc/apps/glam/schemas/20251121/linkml/modules/classes/
- Fix LSP errors in CollectionType.yaml, UniversityArchiveRecordSetTypes.yaml, AccessPolicy.yaml (duplicate keys)
Relevant files / directories
Directories
- Classes: /Users/kempersc/apps/glam/schemas/20251121/linkml/modules/classes/ (~1,378 YAML files)
- Rules: /Users/kempersc/apps/glam/.opencode/rules/ (47 rule files)
- Ontology data: /Users/kempersc/apps/glam/data/ontology/ (RDF/RDFS/OWL files)
Key Processed Class Files
- AcademicArchive.yaml
- AcademicArchiveRecordSetType.yaml
- AcademicArchiveRecordSetTypes.yaml (contains 4 subclasses)
- AcademicInstitution.yaml
- AcademicProgram.yaml
- Access.yaml through AcquisitionMethod.yaml (20 Access/Accr/Acq files)
Rules Created/Updated This Session
- class-description-quality-rule.md
- class-multilingual-support-rule.md
- linkml-yaml-best-practices-rule.md
- mapping-specificity-hypernym-rule.md
- wikidata-mapping-verification-rule.md
- wikidata-mapping-discovery-rule.md
Files with LSP Errors (need fixing)
- CollectionType.yaml (line 81 - duplicate key)
- UniversityArchiveRecordSetTypes.yaml (lines 51, 86, 119 - duplicate keys)
- AccessPolicy.yaml (lines 129, 133, 184 - duplicate require: keys)
Do not refer to the class itself in the exact mappings. Prevent referring to the terms in the class label when describing it under the description header.
'Archive' referring classes that do not have recordType in their label almost always refer to the archive as an organisation, please emphasize this in their descriptions.
Never remove structured data represented as string: properly structuralise it using the LinkML conventions and syntax instead. We do need to keep structured data out of the description, but try to preserve it as structured LinkML data. See https://linkml.io/linkml/
REMEMBER THAT MAPPING HALLUCINATED CLASSES OR PREDICATES OR QID IS STRICTLY PROHIBITED! Always double check the Link mappings and the mappings categories (https://linkml.io/linkml-model/latest/docs/mappings/) through studying @data/ontology/ and Wikidata (using the Wikidata MCP) carefully! Continue with:

View file

@ -359,7 +359,7 @@ classes:
range: WikidataEnrichment
description: Full Wikidata enrichment data
ghcid:
range: GhcidBlock
range: GHCIDBlock
description: GHCID generation metadata with history
web_claims:
range: WebClaimsBlock
@ -1174,7 +1174,7 @@ classes:
# GHCID BLOCK - Heritage Custodian ID with history
# ---------------------------------------------------------------------------
GhcidBlock:
GHCIDBlock:
description: GHCID generation metadata and history
attributes:
ghcid_current:
@ -1203,7 +1203,7 @@ classes:
range: datetime
description: When GHCID was generated
ghcid_history:
range: GhcidHistoryEntry
range: GHCIDHistoryEntry
multivalued: true
inlined_as_list: true
description: History of GHCID changes
@ -1220,7 +1220,7 @@ classes:
range: boolean
description: Whether a collision was detected and resolved
GhcidHistoryEntry:
GHCIDHistoryEntry:
description: Historical GHCID entry with validity period
attributes:
ghcid:

View file

@ -0,0 +1,165 @@
id: https://nde.nl/ontology/hc/class/BirthPlace
name: birth_place_class
title: Birth Place Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
crm: http://www.cidoc-crm.org/cidoc-crm/
gn: http://www.geonames.org/ontology#
wdt: http://www.wikidata.org/prop/direct/
dcterms: http://purl.org/dc/terms/
prov: http://www.w3.org/ns/prov#
skos: http://www.w3.org/2004/02/skos/core#
rdfs: http://www.w3.org/2000/01/rdf-schema#
org: http://www.w3.org/ns/org#
xsd: http://www.w3.org/2001/XMLSchema#
imports:
- linkml:types
- ../metadata
- ../slots/has_coordinates
- ../slots/in_country
- ../slots/identified_by
- ../slots/has_score
- ../slots/has_name
- ../slots/has_label
- ../slots/has_source
- ../slots/has_code
default_prefix: hc
classes:
BirthPlace:
class_uri: schema:Place
description: >-
Structured representation of where a person was born with support for historical
place names, modern equivalents, and geographic identifiers.
alt_descriptions:
nl: >-
Gestructureerde weergave van waar een persoon geboren is met ondersteuning voor historische
plaatsnamen, moderne equivalenten en geografische identificaties.
de: >-
Strukturierte Darstellung des Geburtsorts einer Person mit Unterstützung für historische
Ortsnamen, moderne Entsprechungen und geografische Kennungen.
fr: >-
Représentation structurée du lieu de naissance d'une personne avec support pour les noms
historiques, équivalents modernes et identifiants géographiques.
es: >-
Representación estructurada de dónde nació una persona con soporte para nombres
históricos de lugares, equivalentes modernos e identificadores geográficos.
ar: >-
تمثيل منظم لمكان ميلاد الشخص مع دعم الأسماء التاريخية
للأماكن والمعادلات الحديثة والمعرفات الجغرافية.
id: >-
Representasi terstruktur tempat seseorang lahir dengan dukungan nama tempat
historis, padanan modern, dan pengidentifikasi geografis.
zh: >-
人员出生地点的结构化表示,支持历史地名、现代等价物和地理标识符。
exact_mappings:
- schema:Place
close_mappings:
- crm:E53_Place
- gn:Feature
slots:
- has_label
- has_name
- in_country
- has_code
- identified_by
- has_coordinates
- has_source
- has_score
slot_usage:
has_label:
required: true
examples:
- value: Amsterdam
- value: Batavia
has_name:
required: false
examples:
- value: Jakarta
in_country:
required: false
pattern: "^[A-Z]{2}$"
examples:
- value: NL
- value: ID
has_code:
required: false
examples:
- value: NH
- value: 2759794
identified_by:
range: WikiDataIdentifier
required: false
examples:
- value:
has_coordinates:
required: false
examples:
- value: 52.3676,4.9041
has_source:
required: false
examples:
- value: born at the family estate in rural Gelderland
comments:
- Replaces simple birth_place string slot (Rule 53)
- Preserves historical place names while linking to modern identifiers
- GeoNames ID is authoritative per AGENTS.md
see_also:
- https://schema.org/birthPlace
- https://www.geonames.org/
examples:
- value:
has_label: Amsterdam
in_country: NL
has_code: NH
identified_by:
has_coordinates: 52.3676,4.9041
description: Dutch city with modern name
- value:
has_label: Batavia
has_name: Jakarta
in_country: ID
identified_by:
description: Historical colonial name with modern equivalent
- value:
has_label: rural Gelderland
in_country: NL
has_code: GE
has_source: born at the family estate in rural Gelderland
description: Imprecise location from source text
keywords:
- birth
- place
- location
- geographic
- historical
annotations:
specificity_score: 0.45
specificity_rationale: Relevant for person research across heritage sectors.
custodian_types: "['*']"
structured_aliases:
- literal_form: geboorteplaats
predicate: EXACT_SYNONYM
in_language: nl
- literal_form: geboren te
predicate: RELATED_SYNONYM
in_language: nl
- literal_form: Geburtsort
predicate: EXACT_SYNONYM
in_language: de
- literal_form: lieu de naissance
predicate: EXACT_SYNONYM
in_language: fr
- literal_form: lugar de nacimiento
predicate: EXACT_SYNONYM
in_language: es
- literal_form: مكان الميلاد
predicate: EXACT_SYNONYM
in_language: ar
- literal_form: tempat lahir
predicate: EXACT_SYNONYM
in_language: id
- literal_form: 出生地
predicate: EXACT_SYNONYM
in_language: zh

View file

@ -19,7 +19,7 @@ description: |
- provenance: Data tier tracking and source lineage
- ghcid: Global Heritage Custodian ID with history
- identifiers: ISIL, Wikidata, GHCID variants
- enrichments: Google Maps, Wikidata, Genealogiewerkbalk, etc.
- enrichments: Google Maps, Wikidata, genealogy archive registries, etc.
- web_claims: Extracted claims with XPath provenance
- custodian_name: Consensus name determination
- location: Normalized geographic data
@ -42,6 +42,8 @@ default_range: string
imports:
- linkml:types
# Slots bundle (required for LinkML JSON Schema generation)
- ./modules/slots/SlotsBundle
# =============================================================================
# ENUMERATIONS (7 enums)
# =============================================================================
@ -57,6 +59,12 @@ imports:
# =============================================================================
# Root Class
- ./modules/classes/CustodianSourceFile
# Shared base classes
- ./modules/classes/CustodianType
- ./modules/classes/DigitalPlatformType
- ./modules/classes/Claim
- ./modules/classes/ReconstructedEntity
- ./modules/classes/Entity
# Original Entry Domain
- ./modules/classes/DuplicateEntry
- ./modules/classes/MowInscription
@ -75,8 +83,8 @@ imports:
- ./modules/classes/ProvenanceSources
- ./modules/classes/SourceRecord
# Identifiers Domain
- ./modules/classes/GhcidBlock
- ./modules/classes/GhcidHistoryEntry
- ./modules/classes/GHCIDBlock
- ./modules/classes/GHCIDHistoryEntry
- ./modules/classes/Identifier
# Location Domain
- ./modules/classes/CoordinateProvenance
@ -93,7 +101,6 @@ imports:
- ./modules/classes/GoogleMapsPlaywrightEnrichment
- ./modules/classes/GooglePhoto
- ./modules/classes/GoogleReview
- ./modules/classes/LlmVerification
- ./modules/classes/OpeningHours
- ./modules/classes/OpeningPeriod
- ./modules/classes/PhotoAttribution
@ -145,7 +152,6 @@ imports:
- ./modules/classes/WebEnrichment
- ./modules/classes/WebSource
# Custodian Name Domain
- ./modules/classes/AlternativeName
- ./modules/classes/CustodianLegalNameClaim
- ./modules/classes/CustodianNameConsensus
- ./modules/classes/FormerName
@ -153,7 +159,7 @@ imports:
- ./modules/classes/MergeNote
# Dutch Enrichments Domain
- ./modules/classes/ArchiveInfo
- ./modules/classes/GenealogiewerkbalkEnrichment
- ./modules/classes/GenealogyArchivesRegistryEnrichment
- ./modules/classes/IsilCodeEntry
- ./modules/classes/MunicipalityInfo
- ./modules/classes/NanIsilEnrichment
@ -182,22 +188,21 @@ imports:
- ./modules/classes/YoutubeTranscript
- ./modules/classes/YoutubeVideo
# CH-Annotator Domain
- ./modules/classes/ChAnnotatorAnnotationMetadata
- ./modules/classes/ChAnnotatorAnnotationProvenance
- ./modules/classes/ChAnnotatorBlock
- ./modules/classes/ChAnnotatorEntityClaim
- ./modules/classes/ChAnnotatorEntityClassification
- ./modules/classes/ChAnnotatorIntegrationNote
- ./modules/classes/ChAnnotatorModel
- ./modules/classes/ChAnnotatorProvenance
- ./modules/classes/AnnotatorAnnotationMetadata
- ./modules/classes/AnnotatorAnnotationProvenance
- ./modules/classes/AnnotatorBlock
- ./modules/classes/AnnotatorEntityClaim
- ./modules/classes/AnnotatorEntityClassification
- ./modules/classes/AnnotatorIntegrationNote
- ./modules/classes/AnnotatorModel
- ./modules/classes/AnnotatorProvenance
- ./modules/classes/ExtractionSourceInfo
- ./modules/classes/PatternClassification
# Person/Staff Domain
- ./modules/classes/CareerEntry
- ./modules/classes/CertificationEntry
- ./modules/classes/CurrentPosition
- ./modules/classes/EducationEntry
- ./modules/classes/ExaSearchMetadata
- ./modules/classes/ExternalSearchMetadata
- ./modules/classes/HeritageExperienceEntry
- ./modules/classes/MediaAppearanceEntry
- ./modules/classes/PersonProfile

View file

@ -66,7 +66,7 @@ instances:
examples:
- name: Private art collector
description: Individual maintaining personal art collection
ghcid_type: P # Personal collection
GHCID_type: P # Personal collection
- name: Family archivist
description: Individual preserving family papers and photographs
- name: Independent researcher

File diff suppressed because it is too large Load diff

View file

@ -1,25 +1,94 @@
id: https://nde.nl/ontology/hc/class/APIEndpoint
name: APIEndpoint
title: APIEndpoint
description: An API endpoint.
title: API Endpoint Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
skos: http://www.w3.org/2004/02/skos/core#
rico: https://www.ica.org/standards/RiC/ontology#
wd: http://www.wikidata.org/entity/
classes:
APIEndpoint:
class_uri: schema:EntryPoint
description: An API endpoint.
slots:
- has_url
- has_description
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
dcat: http://www.w3.org/ns/dcat#
hydra: http://www.w3.org/ns/hydra/core#
default_prefix: hc
imports:
- linkml:types
- ../slots/has_description
- ../slots/has_url
classes:
APIEndpoint:
class_uri: schema:EntryPoint
description: >-
Uniform Resource Locator that provides programmatic access to a service
or data resource through a defined interface specification.
alt_descriptions:
nl: >-
Uniform Resource Locator die programmatische toegang biedt tot een service
of gegevensbron via een gedefinieerde interfacespecificatie.
de: >-
Uniform Resource Locator, der programmatischen Zugriff auf einen Dienst
oder eine Datenressource uber eine definierte Schnittstellenspezifikation bietet.
fr: >-
Localisateur uniforme de ressources fournissant un acces programmatique
a un service ou a une ressource de donnees via une specification d'interface definie.
es: >-
Localizador uniforme de recursos que proporciona acceso programatico a un
servicio o recurso de datos a traves de una especificacion de interfaz definida.
ar: >-
محدد موقع الموارد المنتظم الذي يوفر وصولاً برمجياً إلى خدمة أو مورد بيانات
من خلال مواصفات واجهة محددة.
id: >-
Uniform Resource Locator yang menyediakan akses terprogram ke layanan atau
sumber daya data melalui spesifikasi antarmuka yang ditentukan.
zh: >-
通过定义的接口规范提供服务或数据资源的编程访问的统一资源定位符。
structured_aliases:
- literal_form: API-eindpunt
in_language: nl
- literal_form: API-Endpunkt
in_language: de
- literal_form: point de terminaison API
in_language: fr
- literal_form: punto final API
in_language: es
- literal_form: نقطة نهاية API
in_language: ar
- literal_form: titik akhir API
in_language: id
- literal_form: API端点
in_language: zh
close_mappings:
- hydra:EntryPoint
- dcat:DataService
broad_mappings:
- schema:EntryPoint
- skos:Concept
slots:
- has_url
- has_description
comments:
- Represents a callable URL for an API operation
- Part of schema.org for describing web services
- Use with APIVersion for versioned endpoints
see_also:
- https://schema.org/EntryPoint
- APIVersion
- APIRequest
examples:
- value:
has_url: "https://api.example.org/v2/collections"
has_description: "Collections listing endpoint"
description: REST API endpoint for collections
- value:
has_url: "https://api.example.org/v2/search"
has_description: "Search endpoint with query parameters"
description: Search API endpoint
keywords:
- API
- endpoint
- URL
- web service
- REST
- programmatic access
annotations:
specificity_score: "0.3"
specificity_rationale: Specific to API/web service endpoints
custodian_types: "['*']"

View file

@ -1,14 +1,14 @@
id: https://nde.nl/ontology/hc/class/APIRequest
name: APIRequest
title: APIRequest
description: An API request event.
title: API Request Class
prefixes:
rov: http://www.w3.org/ns/regorg#
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
skos: http://www.w3.org/2004/02/skos/core#
rico: https://www.ica.org/standards/RiC/ontology#
wd: http://www.wikidata.org/entity/
prov: http://www.w3.org/ns/prov#
dcat: http://www.w3.org/ns/dcat#
default_prefix: hc
imports:
- linkml:types
- ../slots/has_endpoint
@ -16,15 +16,89 @@ imports:
- ../slots/has_version
classes:
APIRequest:
class_uri: prov:Activity
class_uri: hc:APIRequest
description: >-
Single invocation of an API endpoint, capturing the request context,
provenance, and version information for audit and debugging purposes.
alt_descriptions:
nl: >-
Enkele aanroeping van een API-eindpunt, waarbij de verzoekcontext,
herkomst en versie-informatie worden vastgelegd voor controle en debugging.
de: >-
Einzelner Aufruf eines API-Endpunkts, der den Anfragekontext,
die Provenienz und Versionsinformationen fur Audit und Debugging erfasst.
fr: >-
Invocation unique d'un point de terminaison API, capturant le contexte
de la requete, la provenance et les informations de version pour l'audit
et le debogage.
es: >-
Invocacion unica de un punto final API, capturando el contexto de la solicitud,
la procedencia y la informacion de version para auditoria y depuracion.
ar: >-
استدعاء واحد لنقطة نهاية API، يلتقط سياق الطلب والمصدر ومعلومات الإصدار
لأغراض التدقيق وتصحيح الأخطاء.
id: >-
Pemanggilan tunggal titik akhir API, menangkap konteks permintaan,
provenans, dan informasi versi untuk audit dan debugging.
zh: >-
对API端点的单次调用捕获请求上下文、来源和版本信息用于审计和调试。
structured_aliases:
- literal_form: API-verzoek
in_language: nl
- literal_form: API-Anfrage
in_language: de
- literal_form: requete API
in_language: fr
- literal_form: solicitud API
in_language: es
- literal_form: طلب API
in_language: ar
- literal_form: permintaan API
in_language: id
- literal_form: API请求
in_language: zh
broad_mappings:
- prov:Activity
- schema:Action
close_mappings:
- schema:Action
description: An API request event.
- dcat:DataService
slots:
- has_provenance
- has_endpoint
- has_version
slot_usage:
has_endpoint:
range: APIEndpoint
required: true
has_provenance:
range: Provenance
required: false
has_version:
range: APIVersion
required: false
comments:
- Captures individual API call events for logging and analysis
- Useful for rate limiting, audit trails, and usage analytics
- Links to endpoint definition and API version
see_also:
- APIEndpoint
- APIVersion
examples:
- value:
has_endpoint: "https://api.example.org/v2/search"
has_provenance:
created_by: "https://example.org/users/app123"
created_at: "2025-01-14T10:00:00Z"
has_version: "v2.1.0"
description: Logged search API request with provenance
keywords:
- API request
- API call
- invocation
- audit
- logging
- provenance
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
specificity_score: "0.4"
specificity_rationale: Specific to API request event tracking
custodian_types: "['*']"

View file

@ -1,25 +1,107 @@
id: https://nde.nl/ontology/hc/class/APIVersion
name: APIVersion
title: APIVersion
description: Version of an API.
title: API Version Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
skos: http://www.w3.org/2004/02/skos/core#
rico: https://www.ica.org/standards/RiC/ontology#
wd: http://www.wikidata.org/entity/
dcat: http://www.w3.org/ns/dcat#
dcterms: http://purl.org/dc/terms/
default_prefix: hc
imports:
- linkml:types
- ../slots/identified_by
- ../slots/has_label
classes:
APIVersion:
class_uri: schema:SoftwareApplication
description: Version of an API.
class_uri: hc:APIVersion
description: >-
Specific release or iteration of an Application Programming Interface,
identified by version number and potentially associated with changelogs
and deprecation schedules.
alt_descriptions:
nl: >-
Specifieke release of iteratie van een Application Programming Interface,
geidentificeerd door versienummer en mogelijk gekoppeld aan changelogs
en deprecation-schema's.
de: >-
Spezifische Veroffentlichung oder Iteration einer Programmierschnittstelle,
identifiziert durch Versionsnummer und moglicherweise verknupft mit
Anderungsprotokollen und Deprecation-Zeitplanen.
fr: >-
Version specifique ou iteration d'une interface de programmation d'application,
identifiee par numero de version et potentiellement associee aux journaux
des modifications et aux calendriers d'obsolescence.
es: >-
Version especifica o iteracion de una interfaz de programacion de aplicaciones,
identificada por numero de version y potencialmente asociada con registros
de cambios y calendarios de obsolescencia.
ar: >-
إصدار محدد أو تكرار لواجهة برمجة التطبيقات، محدد برقم الإصدار وربما
مرتبط بسجلات التغيير وجداول الإهمال.
id: >-
Rilis atau iterasi spesifik dari Antarmuka Pemrograman Aplikasi, diidentifikasi
oleh nomor versi dan berpotensi terkait dengan log perubahan dan jadwal penghentian.
zh: >-
应用程序编程接口的特定版本或迭代,由版本号标识,可能与变更日志和弃用计划相关联。
structured_aliases:
- literal_form: API-versie
in_language: nl
- literal_form: API-Version
in_language: de
- literal_form: version d'API
in_language: fr
- literal_form: version de API
in_language: es
- literal_form: إصدار API
in_language: ar
- literal_form: versi API
in_language: id
- literal_form: API版本
in_language: zh
close_mappings:
- schema:SoftwareVersion
- dcterms:hasVersion
broad_mappings:
- skos:Concept
slots:
- has_label
- identified_by
slot_usage:
has_label:
pattern: "^v?[0-9]+\\.[0-9]+(\\.[0-9]+)?(-[a-zA-Z0-9]+)?$"
examples:
- value: "v2.1.0"
- value: "1.0.0-beta"
- value: "2.0"
identified_by:
range: string
required: true
comments:
- Follows semantic versioning convention (MAJOR.MINOR.PATCH)
- Used to track API compatibility and deprecation
- Links to endpoint definitions for versioned access
see_also:
- APIEndpoint
- https://semver.org/
examples:
- value:
has_label: "v2.1.0"
identified_by: "v2.1.0"
description: Semantic version 2.1.0 of an API
- value:
has_label: "v3.0.0-beta"
identified_by: "v3.0.0-beta"
description: Beta release of version 3.0.0
keywords:
- API version
- semantic versioning
- release
- changelog
- deprecation
- compatibility
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
specificity_score: "0.4"
specificity_rationale: Specific to API versioning
custodian_types: "['*']"

View file

@ -15,16 +15,66 @@ prefixes:
xsd: http://www.w3.org/2001/XMLSchema#
default_prefix: hc
imports:
- linkml:types
- ../slots/has_name
- ../slots/has_type
- linkml:types
- ../slots/has_name
- ../slots/has_type
classes:
AVEquipment:
class_uri: schema:Product
description: AV Equipment.
description: >-
Audiovisual equipment used in heritage contexts for playback, digitization,
recording, or presentation of audiovisual materials and collections.
alt_descriptions:
nl: Audiovisuele apparatuur die in erfgoedcontexten wordt gebruikt voor het afspelen, digitaliseren, opnemen of presenteren van audiovisuele materialen en collecties.
de: Audiovisuelle Ausrüstung, die in Heritage-Kontexten für die Wiedergabe, Digitalisierung, Aufnahme oder Präsentation audiovisueller Materialien und Sammlungen verwendet wird.
fr: Équipement audiovisuel utilisé dans les contextes patrimoniaux pour la lecture, la numérisation, l'enregistrement ou la présentation de matériaux et collections audiovisuels.
es: Equipo audiovisual utilizado en contextos patrimoniales para la reproducción, digitalización, grabación o presentación de materiales y colecciones audiovisuales.
ar: معدات سمعية بصرية تُستخدم في السياقات التراثية للتشغيل والرقمنة والتسجيل أو عرض المواد والمجموعات السمعية البصرية.
id: Peralatan audiovisual yang digunakan dalam konteks warisan untuk pemutaran, digitalisasi, perekaman, atau presentasi materi dan koleksi audiovisual.
zh: 在遗产环境中用于播放、数字化、录制或展示视听材料和藏品的视听设备。
broad_mappings:
- schema:Product
close_mappings:
- crm:E19_Physical_Object
related_mappings:
- prov:Entity
slots:
- has_name
- has_type
structured_aliases:
- literal_form: AV-apparatuur
in_language: nl
- literal_form: AV-Ausrüstung
in_language: de
- literal_form: équipement AV
in_language: fr
- literal_form: equipo AV
in_language: es
- literal_form: معدات سمعية بصرية
in_language: ar
- literal_form: peralatan AV
in_language: id
- literal_form: 视听设备
in_language: zh
comments:
- Used for playback, digitization, recording, and presentation
- Includes projectors, players, recorders, and display equipment
keywords:
- audiovisual
- equipment
- playback
- digitization
- recording
- AV hardware
examples:
- value:
has_name: U-matic SP Player
has_type: VIDEO_PLAYER
description: Video playback equipment
- value:
has_name: Studer A810
has_type: AUDIO_RECORDER
description: Professional audio recorder
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration

View file

@ -1,46 +1,87 @@
id: https://nde.nl/ontology/hc/class/AcademicArchive
name: AcademicArchive
title: Academic Archive Type
title: Academic Archive
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
skos: http://www.w3.org/2004/02/skos/core#
rico: https://www.ica.org/standards/RiC/ontology#
wd: http://www.wikidata.org/entity/
default_prefix: hc
imports:
- linkml:types
- ../slots/has_hypernym
- ../slots/identified_by
- ../slots/has_label
- ../slots/has_scope
- ../slots/has_score
- ../slots/has_type
- ../slots/hold_record_set
- ../slots/related_to
- linkml:types
- ../classes/ArchiveOrganizationType
- ../classes/WikidataAlignment
- ../slots/has_hypernym
- ../slots/has_label
- ../slots/has_score
- ../slots/has_type
- ../slots/hold_record_set
- ../slots/identified_by
- ../slots/related_to
classes:
AcademicArchive:
description: >-
Organizational unit serving as the official custodian for the documentary heritage of a tertiary educational institution.
alt_descriptions:
nl: >-
Organisatorische eenheid die fungeert als de officiële bewaarder van het documentair erfgoed van een instelling voor hoger onderwijs.
de: >-
Organisatorische Einheit, die als offizieller Verwahrer des dokumentarischen Erbes einer Hochschuleinrichtung dient.
fr: >-
Unite organisationnelle agissant en tant que depositeur officiel du patrimoine documentaire d'un etablissement d'enseignement superieur.
es: >-
Unidad organizativa que sirve como depositario oficial del patrimonio documental de una institucion de educacion superior.
ar: >-
وحدة تنظيمية تعمل كحارس رسمي للتراث الوثائقي لمؤسسة التعليم العالي.
id: >-
Unit organisasi yang berfungsi sebagai penjaga resmi warisan dokumenter institusi pendidikan tinggi.
zh: >-
作为高等教育机构文献遗产官方保管者的组织单位。
examples:
- value:
has_type: hc:ArchiveOrganizationType
has_label: University Archives
has_hypernym: wd:Q166118
hold_record_set:
- hc:UniversityAdministrativeFonds
- hc:FacultyPaperCollection
description: A university archives institution holding administrative records and faculty papers
- value:
has_type: hc:ArchiveOrganizationType
has_label: College Archives
has_hypernym: wd:Q166118
hold_record_set:
- hc:AcademicStudentRecordSeries
description: A college archives institution preserving student records
is_a: ArchiveOrganizationType
class_uri: schema:ArchiveOrganization
description: Archive of a higher education institution (university, college, polytechnic).
slots:
- has_type
- hold_record_set
- has_hypernym
- has_label
- has_hypernym
- hold_record_set
- has_score
- related_to
structured_aliases:
- literal_form: Hochschularchiv
in_language: de
- literal_form: "archivo acad\xE9mico"
in_language: es
- literal_form: "archives acad\xE9miques"
in_language: fr
- literal_form: archivio accademico
in_language: it
- literal_form: academisch archief
in_language: nl
- literal_form: "arquivo acad\xEAmico"
- literal_form: Hochschularchiv
in_language: de
- literal_form: archives academiques
in_language: fr
- literal_form: archivo academico
in_language: es
- literal_form: أرشيف أكاديمي
in_language: ar
- literal_form: arsip akademik
in_language: id
- literal_form: 学术档案馆
in_language: zh
- literal_form: archivio accademico
in_language: it
- literal_form: arquivo academico
in_language: pt
keywords:
- administrative records
@ -59,43 +100,40 @@ classes:
- campus life documentation
slot_usage:
hold_record_set:
equals_expression: '["hc:UniversityAdministrativeFonds", "hc:StudentRecordSeries", "hc:FacultyPaperCollection", "hc:CampusDocumentationCollection"]
'
equals_string_in:
- "hc:UniversityAdministrativeFonds"
- "hc:AcademicStudentRecordSeries"
- "hc:FacultyPaperCollection"
- "hc:CampusDocumentationCollection"
identified_by:
pattern: ^Q[0-9]+$
pattern: "^Q[0-9]+$"
has_type:
equals_expression: '["hc:ArchiveOrganizationType"]'
equals_string: "hc:ArchiveOrganizationType"
related_to:
range: WikidataAlignment
inlined: true
has_hypernym:
equals_expression: '["wd:Q166118"]'
equals_string: "wd:Q166118"
has_label:
ifabsent: string(archive)
exact_mappings:
- wd:Q27032435
close_mappings:
- rico:CorporateBody
- skos:Concept
- wd:Q27032435
broad_mappings:
- wd:Q166118
- wd:Q124762372
narrow_mappings:
- wd:Q2496264
related_mappings:
- wd:Q1065413
comments:
- Custodian type class for academic/higher education archives
- 'Part of dual-class pattern: custodian type + rico:RecordSetType'
- Parent institution is typically a university or college
- class_uri is schema:ArchiveOrganization - primary semantic meaning
- skos:broader relationship to wd:Q166118 (archive) expressed via broad_mappings
- Institutional custodian type for higher education archives
- Distinguished from institutional repositories (wd:Q1065413) which manage published scholarly outputs
- The actual holdings are represented by AcademicArchiveRecordSetType instances
- 'Preserved from prior description: Organizational unit serving as the official custodian for the documentary heritage of a tertiary educational institution. Charged with acquiring, preserving, and providing access to administrative records, faculty papers, student records, and campus documentation. Distinguished from institutional repositories that primarily manage published scholarly outputs.'
see_also:
- wd:Q2496264
- wd:Q124762372
- wd:Q1065413
- AcademicArchiveRecordSetType
- wd:Q1065413
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
specificity_score: "0.3"
specificity_rationale: Specific to higher education archival custodians
custodian_types: "['AcademicArchive']"

View file

@ -1,62 +1,101 @@
id: https://nde.nl/ontology/hc/class/AcademicArchiveRecordSetType
name: AcademicArchiveRecordSetType
title: AcademicArchive Record Set Type
title: Academic Archive Record Set Type
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
skos: http://www.w3.org/2004/02/skos/core#
rico: https://www.ica.org/standards/RiC/ontology#
wd: http://www.wikidata.org/entity/
default_prefix: hc
imports:
- linkml:types
- ../slots/has_scope
- ../slots/has_score
- ../slots/has_type
- ../slots/related_to
- linkml:types
- ../classes/CollectionType
- ../classes/WikidataAlignment
- ../slots/has_score
- ../slots/has_type
- ../slots/related_to
classes:
AcademicArchiveRecordSetType:
description: A rico:RecordSetType for classifying collections of academic and
higher education institutional records.
description: >-
Category for grouping documentary materials accumulated by tertiary educational institutions during their administrative, academic, and operational activities.
alt_descriptions:
nl: >-
Categorie voor het groeperen van documentair materiaal dat door hogeronderwijsinstellingen is verzameld tijdens hun administratieve, academische en operationele activiteiten.
de: >-
Kategorie zur Gruppierung von Dokumentenmaterial, das von Hochschulen während ihrer administrativen, akademischen und betrieblichen Aktivitäten angesammelt wurde.
fr: >-
Catégorie de regroupement des documents accumulés par les établissements d'enseignement supérieur au cours de leurs activités administratives, académiques et opérationnelles.
es: >-
Categoría para agrupar materiales documentales acumulados por instituciones de educación superior durante sus actividades administrativas, académicas y operativas.
ar: >-
فئة لتجميع المواد الوثائقية التي جمعتها مؤسسات التعليم العالي خلال أنشطتها الإدارية والأكاديمية والتشغيلية.
id: >-
Kategori untuk mengelompokkan materi dokumenter yang dikumpulkan oleh institusi pendidikan tinggi selama aktivitas administratif, akademik, dan operasional mereka.
zh: >-
高等教育机构在行政、学术和运营活动中积累的文献材料的分类类别。
examples:
- value:
has_type: hc:ArchiveOrganizationType
has_label: University Administrative Records
related_to: wd:Q27032435
description: Administrative fonds containing governance records, committee minutes, and policy documents
- value:
has_type: hc:ArchiveOrganizationType
has_label: Student Records Series
related_to: wd:Q27032435
description: Enrollment records, academic transcripts, and graduation documentation
- value:
has_type: hc:ArchiveOrganizationType
has_label: Faculty Papers Collection
related_to: wd:Q27032435
description: Research documentation, teaching materials, and correspondence
- value:
has_type: hc:ArchiveOrganizationType
has_label: Campus Documentation Collection
related_to: wd:Q27032435
description: Photographs, university publications, and audiovisual materials
is_a: CollectionType
class_uri: rico:RecordSetType
slots:
- has_type
- has_score
- has_scope
- related_to
comments:
- Collection type class for academic/higher education record sets
- Record set TYPE classification, not the custodian organization
- Part of dual-class pattern with AcademicArchive (custodian type)
- Use AcademicArchive for the archive organization; use this class for collection types
- 'Preserved from prior description: Category for grouping documentary materials accumulated by tertiary educational institutions during their administrative, academic, and operational activities. Distinguishes the classification of holdings from the repository organization responsible for their custody.'
structured_aliases:
- literal_form: Hochschularchivbestand
in_language: de
- literal_form: fondo de archivo académico
in_language: es
- literal_form: fonds d'archives académiques
in_language: fr
- literal_form: academisch archiefbestand
in_language: nl
- literal_form: Hochschularchivbestand
in_language: de
- literal_form: fonds d'archives académiques
in_language: fr
- literal_form: fondo de archivo académico
in_language: es
- literal_form: مجموعة الأرشيف الأكاديمي
in_language: ar
- literal_form: koleksi arsip akademik
in_language: id
- literal_form: 学术档案集
in_language: zh
slot_usage:
has_type:
equals_expression: '["hc:ArchiveOrganizationType"]'
equals_string: "hc:ArchiveOrganizationType"
related_to:
range: WikidataAlignment
inlined: true
exact_mappings:
- wd:Q27032435
- rico:RecordSetType
broad_mappings:
- wd:Q27032435
- rico:RecordSetType
close_mappings:
- skos:Concept
- wd:Q27032435
see_also:
- AcademicArchive
- rico:RecordSetType
- UniversityAdministrativeFonds
- StudentRecordSeries
- FacultyPaperCollection
- CampusDocumentationCollection
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: '[''*'']'
specificity_score: "0.3"
specificity_rationale: Specific to academic/higher education archival collections
custodian_types: "['AcademicArchive']"

View file

@ -6,44 +6,80 @@ prefixes:
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
skos: http://www.w3.org/2004/02/skos/core#
crm: http://www.cidoc-crm.org/cidoc-crm/
rico: https://www.ica.org/standards/RiC/ontology#
rico-rst: https://www.ica.org/standards/RiC/vocabularies/recordSetTypes#
wd: http://www.wikidata.org/entity/
bf: http://id.loc.gov/ontologies/bibframe/
default_prefix: hc
imports:
- ./AcademicArchiveRecordSetType
- linkml:types
- ../slots/has_score
- ../slots/has_type
- ../slots/has_note
- ../slots/has_scope
- ./AcademicArchiveRecordSetType
- linkml:types
- ../slots/has_score
- ../slots/has_type
- ../slots/has_note
- ../slots/has_scope
classes:
UniversityAdministrativeFonds:
description: >-
Records created or accumulated by a university's central administration in the exercise of governance, policy-making, and operational functions.
alt_descriptions:
nl: >-
Archiefbescheiden gecreeerd of verzameld door de centrale administratie van een universiteit bij de uitoefening van bestuurlijke, beleidsmatige en operationele functies.
de: >-
Unterlagen, die von der Zentralverwaltung einer Universitaet bei der Ausuebung von Regierungs-, Politik- und Betriebsfunktionen erstellt oder gesammelt wurden.
fr: >-
Documents crees ou accumules par l'administration centrale d'une universite dans l'exercice de fonctions de gouvernance, d'elaboration de politiques et operationnelles.
es: >-
Registros creados o acumulados por la administracion central de una universidad en el ejercicio de funciones de gobernanza, formulacion de politicas y operaciones.
ar: >-
سجلات تم إنشاؤها أو تجميعها من قبل الإدارة المركزية للجامعة في ممارسة وظائف الحوكمة وصنع السياسات والعمليات.
id: >-
Catatan yang dibuat atau dikumpulkan oleh administrasi pusat universitas dalam menjalankan fungsi tata kelola, pembuatan kebijakan, dan operasional.
zh: >-
大学中央行政管理部门在行使治理、决策和运营职能时创建或积累的记录。
examples:
- value:
has_type: hc:ArchiveOrganizationType
has_label: Board of Trustees Minutes
has_note: Meeting minutes, resolutions, and supporting documents
description: Governance records from university board of trustees
- value:
has_type: hc:ArchiveOrganizationType
has_label: Faculty Senate Records
has_note: Senate minutes, committee reports, policy proposals
description: Faculty governance documentation from university senate
is_a: AcademicArchiveRecordSetType
class_uri: rico:RecordSetType
description: "A rico:RecordSetType for university administrative records organized\
\ as a fonds.\n\n**Definition**:\nRecords created or accumulated by a university's\
\ central administration in the \nexercise of governance, policy-making, and\
\ operational functions. Organized \naccording to archival principles of provenance\
\ (respect des fonds).\n\n**Typical Contents**:\n- Governance records (board\
\ minutes, resolutions, bylaws)\n- Committee records (senate, faculty councils,\
\ standing committees)\n- Policy records (institutional policies, procedures,\
\ guidelines)\n- Strategic planning documents\n- Accreditation and institutional\
\ assessment records\n- Executive correspondence\n\n**RiC-O Alignment**:\nThis\
\ class is a specialized rico:RecordSetType. Records classified with this\n\
type follow the fonds organizational principle as defined by rico-rst:Fonds\n\
(respect des fonds / provenance-based organization from university central administration).\n"
slots:
- has_type
- has_score
- has_note
- has_scope
structured_aliases:
- literal_form: Hochschulverwaltungsbestand
in_language: de
- literal_form: "fondo de administraci\xF3n universitaria"
in_language: es
- literal_form: fonds d'administration universitaire
in_language: fr
- literal_form: universiteitsbestuursarchief
predicate: EXACT_SYNONYM
in_language: nl
- literal_form: "fundo de administra\xE7\xE3o universit\xE1ria"
- literal_form: Hochschulverwaltungsbestand
predicate: EXACT_SYNONYM
in_language: de
- literal_form: fonds d'administration universitaire
predicate: EXACT_SYNONYM
in_language: fr
- literal_form: fondo de administracion universitaria
predicate: EXACT_SYNONYM
in_language: es
- literal_form: أرشيف الإدارة الجامعية
predicate: EXACT_SYNONYM
in_language: ar
- literal_form: arsip administrasi universitas
predicate: EXACT_SYNONYM
in_language: id
- literal_form: 大学行政档案
predicate: EXACT_SYNONYM
in_language: zh
- literal_form: fundo de administracao universitaria
predicate: EXACT_SYNONYM
in_language: pt
keywords:
- governance records
@ -56,76 +92,90 @@ classes:
- accreditation records
- executive correspondence
- institutional bylaws
- resolutions
- procedures
- guidelines
slot_usage:
has_type:
equals_string: "hc:ArchiveOrganizationType"
broad_mappings:
- rico:RecordSetType
- skos:Concept
- crm:E55_Type
related_mappings:
- rico-rst:Fonds
- wd:Q1643722
- rico:RecordSetType
- skos:Concept
close_mappings:
- skos:Concept
see_also:
- AcademicArchiveRecordSetType
- rico:RecordSetType
- rico-rst:Fonds
comments:
- Records follow the fonds organizational principle reflecting provenance from university central administration
- Subject to records retention schedules and institutional access policies
- "Preserved from prior description: Records created or accumulated by a university's central administration in the exercise of governance, policy-making, and operational functions. Organized according to archival principles of provenance (respect des fonds)."
annotations:
specificity_score: "0.5"
specificity_rationale: Specific to university administrative records
custodian_types: "['AcademicArchive']"
AcademicStudentRecordSeries:
description: >-
Records documenting the academic careers and activities of students, typically organized as series within a larger university fonds.
alt_descriptions:
nl: >-
Archiefbescheiden die de academische carrieeres en activiteiten van studenten documenteren, doorgaans georganiseerd als series binnen een groter universitair fonds.
de: >-
Unterlagen, die die akademischen Karrieren und Aktivitaeten von Studenten dokumentieren, typischerweise als Serie innerhalb eines groesseren Universitaetsfonds organisiert.
fr: >-
Documents recensant les carrieres et activites academiques des etudiants, generalement organises en series au sein d'un fonds universitaire plus vaste.
es: >-
Registros que documentan las carreras academicas y actividades de los estudiantes, generalmente organizados como series dentro de un fondo universitario mas amplio.
ar: >-
سجلات توثق المسارات الأكاديمية وأنشطة الطلاب، وعادة ما تكون منظمة كسلاسل ضمن صندوق جامعي أكبر.
id: >-
Catatan yang mendokumentasikan karir dan aktivitas akademik mahasiswa, biasanya diatur sebagai seri dalam dana universitas yang lebih besar.
zh: >-
记录学生学术生涯和活动的记录,通常作为较大大学档案中的系列组织。
examples:
- value:
has_type: hc:ArchiveOrganizationType
has_label: Registrar Student Records
has_note: Enrollment, transcripts, graduation records with privacy restrictions
description: Student academic records series with 75-year retention period
- value:
has_type: hc:ArchiveOrganizationType
has_label: Historical Student Records
has_note: Pre-1950 student records with fewer access restrictions
description: Historical student records open for research access
is_a: AcademicArchiveRecordSetType
class_uri: rico:RecordSetType
slots:
- has_type
- has_score
- organizational_principle
- organizational_principle_uri
- has_note
- has_type
- has_scope
- has_scope
slot_usage:
has_type:
equals_expression: '["hc:ArchiveOrganizationType"]'
has_type:
equals_string: UniversityAdministrativeFonds
organizational_principle:
equals_string: fonds
organizational_principle_uri:
equals_string: https://www.ica.org/standards/RiC/vocabularies/recordSetTypes#Fonds
has_note:
equals_string: This RecordSetType classifies record sets following the fonds
principle. The fonds structure reflects provenance from university central
administration.
has_scope:
equals_string: '["governance records", "committee records", "policy records",
"strategic planning", "accreditation records"]'
has_scope:
equals_string: '["student records", "faculty papers", "research data"]'
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: '[''*'']'
AcademicStudentRecordSeries:
is_a: AcademicArchiveRecordSetType
class_uri: rico:RecordSetType
description: "A rico:RecordSetType for student records organized as archival series.\n\
\n**Definition**:\nRecords documenting the academic careers and activities of\
\ students, typically \norganized as series within a larger university fonds.\
\ Subject to retention \nschedules and privacy regulations (FERPA in US, GDPR\
\ in EU, AVG in NL).\n\n**Typical Contents**:\n- Enrollment and registration\
\ records\n- Academic transcripts and grade records\n- Graduation records and\
\ diploma registers\n- Disciplinary records\n- Financial aid records\n- Student\
\ organization records\n\n**Privacy Considerations**:\nAccess restrictions typically\
\ apply due to personally identifiable information.\nHistorical student records\
\ (typically 75+ years) may have fewer restrictions.\n\n**RiC-O Alignment**:\n\
This class is a specialized rico:RecordSetType. Records classified with this\n\
type follow the series organizational principle as defined by rico-rst:Series\n\
(organizational level within the university fonds).\n"
structured_aliases:
- literal_form: Studentenaktenserie
in_language: de
- literal_form: serie de expedientes estudiantiles
in_language: es
- literal_form: "s\xE9rie de dossiers \xE9tudiants"
in_language: fr
- literal_form: studentendossiers
predicate: EXACT_SYNONYM
in_language: nl
- literal_form: "s\xE9rie de registros de alunos"
- literal_form: Studentenaktenserie
predicate: EXACT_SYNONYM
in_language: de
- literal_form: serie de dossiers etudiants
predicate: EXACT_SYNONYM
in_language: fr
- literal_form: serie de expedientes estudiantiles
predicate: EXACT_SYNONYM
in_language: es
- literal_form: سجلات الطلاب
predicate: EXACT_SYNONYM
in_language: ar
- literal_form: seri catatan mahasiswa
predicate: EXACT_SYNONYM
in_language: id
- literal_form: 学生档案系列
predicate: EXACT_SYNONYM
in_language: zh
- literal_form: serie de registros de alunos
predicate: EXACT_SYNONYM
in_language: pt
keywords:
- enrollment records
@ -138,78 +188,91 @@ classes:
- disciplinary records
- student organizations
- financial aid records
slot_usage:
has_type:
equals_string: "hc:ArchiveOrganizationType"
broad_mappings:
- rico:RecordSetType
- skos:Concept
- crm:E55_Type
related_mappings:
- rico-rst:Series
- wd:Q185583
- rico:RecordSetType
- skos:Concept
close_mappings:
- skos:Concept
see_also:
- AcademicArchiveRecordSetType
- rico:RecordSetType
- rico-rst:Series
- UniversityAdministrativeFonds
comments:
- Records follow the series organizational principle within the university fonds
- Access restrictions typically apply for records less than 75 years old
- Subject to educational records privacy laws (FERPA, GDPR, AVG)
- 'Preserved from prior description: Records documenting the academic careers and activities of students, typically organized as series within a larger university fonds. Subject to retention schedules and privacy regulations (FERPA in US, GDPR in EU, AVG in NL).'
annotations:
specificity_score: "0.5"
specificity_rationale: Specific to student academic records
custodian_types: "['AcademicArchive']"
FacultyPaperCollection:
description: >-
Personal papers of faculty members documenting their academic careers, research
activities, teaching, and professional service. Typically acquired as donations
or bequests, distinct from official university records.
alt_descriptions:
nl: >-
Persoonlijke archieven van hoogleraren die hun academische carrieeres, onderzoeksactiviteiten, onderwijs en professionele dienst documenteren.
de: >-
Persoenliche Papiere von Fakultaetsmitgliedern, die ihre akademischen Karrieren, Forschungsaktivitaeten, Lehre und den professionellen Dienst dokumentieren.
fr: >-
Papiers personnels des membres du corps professoral documentant leurs carrieres academiques, activites de recherche, enseignement et service professionnel.
es: >-
Papeles personales de los miembros de la facultad que documentan sus carreras academicas, actividades de investigacion, docencia y servicio profesional.
ar: >-
أوراق شخصية لأعضاء هيئة التدريس توثق مسيرتهم الأكاديمية وأنشطة البحث والتدريس والخدمة المهنية.
id: >-
Arsip pribadi anggota fakultas yang mendokumentasikan karir akademik, kegiatan penelitian, pengajaran, dan layanan profesional.
zh: >-
教职员个人档案,记录其学术生涯、研究活动、教学和专业服务。
examples:
- value:
has_type: hc:ArchiveOrganizationType
has_label: Professor Smith Papers
has_note: Research notes, correspondence, lecture materials, conference papers
description: Personal papers of a professor including research documentation
- value:
has_type: hc:ArchiveOrganizationType
has_label: Department Chair Archive
has_note: Administrative correspondence, committee service records, teaching materials
description: Faculty papers from a department chair with administrative responsibilities
is_a: AcademicArchiveRecordSetType
class_uri: rico:RecordSetType
slots:
- has_type
- has_score
- organizational_principle
- organizational_principle_uri
- has_note
- has_note
- has_type
- has_scope
- has_scope
slot_usage:
has_type:
equals_expression: '["hc:ArchiveOrganizationType"]'
has_type:
equals_string: AcademicStudentRecordSeries
organizational_principle:
equals_string: series
organizational_principle_uri:
equals_string: https://www.ica.org/standards/RiC/vocabularies/recordSetTypes#Series
has_note:
equals_string: This RecordSetType classifies record sets following the series
principle. Typically a series within the university administration fonds
or registrar's office fonds.
has_scope:
equals_string: '["enrollment records", "academic transcripts", "graduation
records", "disciplinary records", "financial aid records"]'
has_scope:
equals_string: '["faculty records", "research records", "administrative policy"]'
has_note:
equals_string: Subject to educational records privacy laws (FERPA, GDPR, AVG). Access
restrictions typically apply for records less than 75 years old.
FacultyPaperCollection:
is_a: AcademicArchiveRecordSetType
class_uri: rico:RecordSetType
description: "A rico:RecordSetType for faculty papers and personal archives.\n\
\n**Definition**:\nPersonal papers of faculty members documenting their academic\
\ careers, research \nactivities, teaching, and professional service. These\
\ are typically acquired as \ndonations or bequests, distinct from official\
\ university records.\n\n**Typical Contents**:\n- Research documentation and\
\ notes\n- Correspondence (professional and personal)\n- Lecture notes and course\
\ materials\n- Manuscripts and drafts\n- Conference papers and presentations\n\
- Professional organization records\n- Photographs and audiovisual materials\n\
\n**Provenance**:\nUnlike administrative fonds, faculty papers are personal\
\ archives with the \nindividual faculty member as creator/accumulator. The\
\ university acquires \ncustody but respects original order where it exists.\n\
\n**RiC-O Alignment**:\nThis class is a specialized rico:RecordSetType. Records\
\ classified with this\ntype follow the fonds organizational principle as defined\
\ by rico-rst:Fonds\n(personal papers fonds with the faculty member as creator/accumulator).\n"
structured_aliases:
- literal_form: Professorennachlass
in_language: de
- literal_form: archivo personal de profesores
in_language: es
- literal_form: fonds de professeurs
in_language: fr
- literal_form: hoogleraarspapieren
predicate: EXACT_SYNONYM
in_language: nl
- literal_form: Professorennachlass
predicate: EXACT_SYNONYM
in_language: de
- literal_form: fonds de professeurs
predicate: EXACT_SYNONYM
in_language: fr
- literal_form: archivo personal de profesores
predicate: EXACT_SYNONYM
in_language: es
- literal_form: أوراق هيئة التدريس
predicate: EXACT_SYNONYM
in_language: ar
- literal_form: arsip fakultas
predicate: EXACT_SYNONYM
in_language: id
- literal_form: 教职员档案
predicate: EXACT_SYNONYM
in_language: zh
- literal_form: arquivo pessoal de docentes
predicate: EXACT_SYNONYM
in_language: pt
keywords:
- personal papers
@ -220,129 +283,136 @@ classes:
- conference papers
- professional papers
- academic papers
- correspondence
- manuscripts
- drafts
- professional organization records
- photographs
- audiovisual materials
slot_usage:
has_type:
equals_string: "hc:ArchiveOrganizationType"
broad_mappings:
- rico:RecordSetType
- skos:Concept
- crm:E55_Type
related_mappings:
- rico-rst:Fonds
- wd:Q22075301
- rico:RecordSetType
- skos:Concept
close_mappings:
- skos:Concept
- bf:Archival
see_also:
- AcademicArchiveRecordSetType
- rico:RecordSetType
- rico-rst:Fonds
comments:
- Personal archives with individual faculty member as creator/accumulator
- Typically acquired through donation or bequest with possible donor restrictions
- Respects original order where it exists
annotations:
acquisition_note: Typically acquired through donation or bequest. May include
restrictions on access or publication specified by donor agreement.
specificity_score: "0.5"
specificity_rationale: Specific to faculty personal papers
custodian_types: "['AcademicArchive']"
acquisition_note: "Typically acquired through donation or bequest. May include restrictions on access or publication specified by donor agreement."
CampusDocumentationCollection:
description: >-
Materials documenting campus life, institutional identity, and university culture beyond formal administrative records.
alt_descriptions:
nl: >-
Materialen die campusleven, institutionele identiteit en universiteitscultuur documenteren buiten formele administratieve archieven om.
de: >-
Materialien, die das Campusleben, die institutionelle Identitaet und die Universitaetskultur jenseits formaler Verwaltungsunterlagen dokumentieren.
fr: >-
Documents recensant la vie du campus, l'identite institutionnelle et la culture universitaire au-dela des archives administratives formelles.
es: >-
Materiales que documentan la vida del campus, la identidad institucional y la cultura universitaria mas alla de los registros administrativos formales.
ar: >-
مواد توثق حياة الحرم الجامعي والهوية المؤسسية وثقافة الجامعة خارج السجلات الإدارية الرسمية.
id: >-
Materi yang mendokumentasikan kehidupan kampus, identitas institusional, dan budaya universitas di luar catatan administratif formal.
zh: >-
记录校园生活、机构身份和大学文化的材料,超越正式行政记录。
examples:
- value:
has_type: hc:ArchiveOrganizationType
has_label: Campus Photograph Collection
has_note: Historical photographs, yearbooks, event programs, oral histories
description: Campus life documentation including photographs and publications
- value:
has_type: hc:ArchiveOrganizationType
has_label: Student Newspaper Archive
has_note: Student newspapers, magazines, ephemera, memorabilia
description: Student publication documentation with campus culture materials
is_a: AcademicArchiveRecordSetType
class_uri: rico:RecordSetType
slots:
- has_type
- has_score
- organizational_principle
- organizational_principle_uri
- has_note
- has_type
- has_scope
- has_scope
slot_usage:
has_type:
equals_expression: '["hc:ArchiveOrganizationType", "hc:LibraryType"]'
has_type:
equals_string: FacultyPaperCollection
organizational_principle:
equals_string: fonds
organizational_principle_uri:
equals_string: https://www.ica.org/standards/RiC/vocabularies/recordSetTypes#Fonds
has_note:
equals_string: This RecordSetType classifies record sets following the fonds
principle. Personal archives with individual faculty member as creator/accumulator.
has_scope:
equals_string: '["research documentation", "correspondence", "lecture notes",
"manuscripts", "conference papers"]'
has_scope:
equals_string: '["official university records", "student records", "administrative
files"]'
CampusDocumentationCollection:
is_a: AcademicArchiveRecordSetType
class_uri: rico:RecordSetType
description: "A rico:RecordSetType for campus life and institutional documentation.\n\
\n**Definition**:\nMaterials documenting campus life, institutional identity,\
\ and university \nculture beyond formal administrative records. Often includes\
\ visual materials, \npublications, and ephemera that capture the lived experience\
\ of the institution.\n\n**Typical Contents**:\n- Campus photographs and audiovisual\
\ materials\n- University publications (yearbooks, newspapers, magazines)\n\
- Ephemera (programs, posters, invitations)\n- Memorabilia and artifacts\n-\
\ Oral histories\n- Event documentation\n- Building and facilities documentation\n\
\n**Collection Nature**:\nMay be assembled collections (artificial) rather than\
\ strictly provenance-based,\nespecially for ephemera and visual materials.\
\ Documentation value often takes\nprecedence over strict archival arrangement.\n\
\n**RiC-O Alignment**:\nThis class is a specialized rico:RecordSetType. Records\
\ classified with this\ntype follow the collection organizational principle\
\ as defined by rico-rst:Collection\n(assembled/artificial collection organized\
\ by subject or documentation purpose).\n"
structured_aliases:
- literal_form: Campus-Dokumentationssammlung
in_language: de
- literal_form: "colecci\xF3n de documentaci\xF3n del campus"
in_language: es
- literal_form: collection de documentation du campus
in_language: fr
- literal_form: campusdocumentatiecollectie
predicate: EXACT_SYNONYM
in_language: nl
- literal_form: "cole\xE7\xE3o de documenta\xE7\xE3o do campus"
- literal_form: Campus-Dokumentationssammlung
predicate: EXACT_SYNONYM
in_language: de
- literal_form: collection de documentation du campus
predicate: EXACT_SYNONYM
in_language: fr
- literal_form: coleccion de documentacion del campus
predicate: EXACT_SYNONYM
in_language: es
- literal_form: توثيق الحرم الجامعي
predicate: EXACT_SYNONYM
in_language: ar
- literal_form: koleksi dokumentasi kampus
predicate: EXACT_SYNONYM
in_language: id
- literal_form: 校园文献集
predicate: EXACT_SYNONYM
in_language: zh
- literal_form: colecao de documentacao do campus
predicate: EXACT_SYNONYM
in_language: pt
keywords:
- campus photographs
- audiovisual materials
- university publications
- student newspapers
- yearbooks
- magazines
- oral histories
- event documentation
- building documentation
- campus life
- ephemera
- programs
- posters
- memorabilia
- artifacts
slot_usage:
has_type:
equals_string: "hc:ArchiveOrganizationType"
broad_mappings:
- rico:RecordSetType
- skos:Concept
- crm:E55_Type
related_mappings:
- rico-rst:Collection
- wd:Q9388534
- rico:RecordSetType
- skos:Concept
close_mappings:
- skos:Concept
- schema:Collection
see_also:
- AcademicArchiveRecordSetType
- rico:RecordSetType
- rico-rst:Collection
comments:
- Often includes assembled/artificial collections organized by subject or documentation purpose
- May prioritize documentation value over strict archival arrangement
- Can include ephemera, memorabilia, and visual materials
- 'Preserved from prior description: Materials documenting campus life, institutional identity, and university culture beyond formal administrative records. Often includes visual materials, publications, and ephemera capturing the lived experience of the institution.'
annotations:
collection_nature_note: Often includes artificial/assembled collections organized
by subject, format, or documentation purpose rather than strict provenance.
slots:
- has_type
- has_score
- organizational_principle
- organizational_principle_uri
- has_note
- has_type
- has_scope
- has_scope
slot_usage:
has_type:
equals_expression: '["hc:ArchiveOrganizationType", "hc:LibraryType", "hc:MuseumType"]'
has_type:
equals_string: CampusDocumentationCollection
organizational_principle:
equals_string: collection
organizational_principle_uri:
equals_string: https://www.ica.org/standards/RiC/vocabularies/recordSetTypes#Collection
has_note:
equals_string: This RecordSetType classifies record sets following the collection
principle. May be assembled collection (artificial) organized by subject
or documentation purpose.
has_scope:
equals_string: '["photographs", "audiovisual materials", "publications", "ephemera",
"oral histories", "memorabilia"]'
has_scope:
equals_string: '["administrative records", "student records", "faculty papers"]'
specificity_score: "0.5"
specificity_rationale: Specific to campus documentation materials
custodian_types: "['AcademicArchive']"
collection_nature_note: "Often includes artificial/assembled collections organized by subject, format, or documentation purpose rather than strict provenance."

View file

@ -1,22 +1,103 @@
id: https://nde.nl/ontology/hc/class/AcademicInstitution
name: AcademicInstitution
title: AcademicInstitution
description: An institution of higher education or research.
title: Academic Institution
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
skos: http://www.w3.org/2004/02/skos/core#
wd: http://www.wikidata.org/entity/
default_prefix: hc
imports:
- linkml:types
- ../slots/has_label
classes:
AcademicInstitution:
description: >-
Organization providing post-secondary education or conducting advanced research.
Includes universities, colleges, polytechnics, institutes of technology, research
institutes, and other tertiary educational entities.
alt_descriptions:
nl: >-
Organisatie die hoger onderwijs biedt of gevorderd onderzoek uitvoert. Omvat
universiteiten, hogescholen, polytechnische instituten, technologische instituten,
onderzoeksinstituten en andere tertiaire onderwijsinstellingen.
de: >-
Organisation, die Hochschulbildung anbietet oder fortgeschrittene Forschung
betreibt. Umfasst Universitaeten, Hochschulen, Polytechnische Institute,
Technologieinstitute, Forschungsinstitute und andere tertiaere Bildungseinrichtungen.
fr: >-
Organisation dispensant un enseignement superieur ou menant des recherches
avancees. Comprend les universites, les colleges, les instituts polytechniques,
les instituts de technologie, les instituts de recherche et autres etablissements
d'enseignement tertiaire.
es: >-
Organizacion que proporciona educacion postsecundaria o realiza investigacion
avanzada. Incluye universidades, colegios, institutos politecnicos, institutos
de tecnologia, institutos de investigacion y otras entidades educativas terciarias.
ar: >-
منظمة توفر تعليمًا ما بعد الثانوي أو تجري أبحاثًا متقدمة. تشمل الجامعات
والكليات والمعاهد الفنية ومعاهد التكنولوجيا ومعاهد البحث وغيرها من
المؤسسات التعليمية العليا.
id: >-
Organisasi yang menyediakan pendidikan pasca-menengah atau melakukan penelitian
lanjutan. Termasuk universitas, perguruan tinggi, politeknik, institut teknologi,
institut penelitian, dan entitas pendidikan tersier lainnya.
zh: >-
提供高等教育或进行高级研究的组织。包括大学、学院、理工学院、技术学院、
研究所和其他高等教育机构。
examples:
- value:
has_label: University of Amsterdam
description: A research university in the Netherlands
- value:
has_label: Technical College of Berlin
description: A technical higher education institution in Germany
class_uri: schema:EducationalOrganization
description: Academic institution.
slots:
- has_label
structured_aliases:
- literal_form: onderwijsinstelling
in_language: nl
- literal_form: Bildungseinrichtung
in_language: de
- literal_form: etablissement d'enseignement
in_language: fr
- literal_form: institucion educativa
in_language: es
- literal_form: مؤسسة تعليمية
in_language: ar
- literal_form: institusi pendidikan
in_language: id
- literal_form: 教育机构
in_language: zh
keywords:
- university
- college
- polytechnic
- institute of technology
- research institute
- higher education
- tertiary education
- academic
close_mappings:
- wd:Q4671277
- wd:Q38723
broad_mappings:
- wd:Q2385804
- schema:EducationalOrganization
- skos:Concept
narrow_mappings:
- wd:Q3918
- schema:CollegeOrUniversity
comments:
- Encompasses both degree-granting institutions and research-focused organizations
- Distinct from primary and secondary educational institutions
- May include specialized academies, conservatories, and professional schools
see_also:
- AcademicArchive
- AcademicProgram
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
specificity_score: "0.4"
specificity_rationale: Specific to tertiary education and research organizations
custodian_types: "['AcademicArchive']"

View file

@ -1,22 +1,92 @@
id: https://nde.nl/ontology/hc/class/AcademicProgram
name: AcademicProgram
title: AcademicProgram
description: An educational or research program offered by an academic institution.
title: Academic Program
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
skos: http://www.w3.org/2004/02/skos/core#
wd: http://www.wikidata.org/entity/
default_prefix: hc
imports:
- linkml:types
- ../slots/has_label
- linkml:types
- ../slots/has_label
classes:
AcademicProgram:
description: >-
Course of study or research offered by a tertiary educational institution, leading to a credential such as a degree, diploma, or certificate.
alt_descriptions:
nl: >-
Gestructureerd studieprogramma of onderzoeksprogramma aangeboden door een instelling voor hoger onderwijs, leidend tot een kwalificatie zoals een graad, diploma of certificaat.
de: >-
Strukturierter Studiengang oder Forschungsprogramm, das von einer Hochschuleinrichtung angeboten wird und zu einem Abschluss wie einem Grad, Diplom oder Zertifikat fuehrt.
fr: >-
Programme d'etudes ou de recherche structure propose par un etablissement d'enseignement superieur, conduisant a une qualification telle qu'un diplome, un certificat ou une attestation.
es: >-
Programa de estudios o investigacion estructurado ofrecido por una institucion de educacion superior, que conduce a una credencial como un titulo, diploma o certificado.
ar: >-
برنامج دراسي أو بحثي منظم تقدمه مؤسسة تعليم عالي، يؤدي إلى مؤهل مثل درجة علمية أو دبلوم أو شهادة.
id: >-
Program studi atau penelitian terstruktur yang ditawarkan oleh institusi pendidikan tinggi, yang mengarah ke kredensial seperti gelar, diploma, atau sertifikat.
zh: >-
高等教育机构提供的结构化学习或研究课程,可获学位、文凭或证书等资格。
examples:
- value:
has_label: Bachelor of Computer Science
description: Undergraduate degree program in computer science
- value:
has_label: Master of Arts in History
description: Graduate degree program in historical studies
- value:
has_label: PhD Program in Molecular Biology
description: Doctoral research program in molecular biology
- value:
has_label: Professional Certificate in Digital Archiving
description: Non-degree professional development program
class_uri: schema:EducationalOccupationalProgram
description: Academic program.
slots:
- has_label
structured_aliases:
- literal_form: studieprogramma
in_language: nl
- literal_form: Studiengang
in_language: de
- literal_form: programme d'etudes
in_language: fr
- literal_form: programa de estudios
in_language: es
- literal_form: برنامج أكاديمي
in_language: ar
- literal_form: program akademik
in_language: id
- literal_form: 学术项目
in_language: zh
keywords:
- degree program
- diploma program
- certificate program
- undergraduate
- graduate
- doctoral
- course of study
- curriculum
- major
- specialization
- research program
exact_mappings:
- schema:EducationalOccupationalProgram
close_mappings:
- wd:Q600134
broad_mappings:
- skos:Concept
comments:
- Programs may be full-time, part-time, or hybrid in delivery format
- May include professional, vocational, or academic orientations
- Often organized into semesters, quarters, or modular units
- 'Preserved from prior description: Course of study or research offered by a tertiary educational institution, leading to a credential such as a degree, diploma, or certificate. Comprises a defined sequence of learning opportunities with clear requirements, start and end points, and intended outcomes.'
see_also:
- AcademicInstitution
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
specificity_score: "0.4"
specificity_rationale: Specific to structured academic offerings
custodian_types: "['AcademicArchive']"

View file

@ -1,12 +1,13 @@
id: https://nde.nl/ontology/hc/class/Access
name: Access
title: Access Class
title: Access
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
dcterms: http://purl.org/dc/terms/
crm: http://www.cidoc-crm.org/cidoc-crm/
skos: http://www.w3.org/2004/02/skos/core#
default_prefix: hc
imports:
- linkml:types
@ -19,43 +20,107 @@ imports:
- ../slots/temporal_extent
classes:
Access:
description: >-
Information describing how heritage collections, services, or facilities
may be used or consulted. Captures access types, eligible user categories,
conditions, and temporal availability.
alt_descriptions:
nl: >-
Gestructureerde informatie over hoe erfgoedcollecties, diensten of faciliteiten
kunnen worden gebruikt of geraadpleegd. Legt toegangstypen, in aanmerking komende
gebruikerscategorieen, voorwaarden en tijdelijke beschikbaarheid vast.
de: >-
Strukturierte Informationen darueber, wie Erbesammlungen, Dienstleistungen oder
Einrichtungen genutzt oder konsultiert werden koennen. Erfasst Zugangsarten,
berechtigte Benutzerkategorien, Bedingungen und zeitliche Verfuegbarkeit.
fr: >-
Informations structurees decrivant comment les collections patrimoniales,
les services ou les installations peuvent etre utilises ou consultes. Capture
les types d'acces, les categories d'utilisateurs eligibles, les conditions
et la disponibilite temporelle.
es: >-
Informacion estructurada que describe como se pueden utilizar o consultar
las colecciones patrimoniales, servicios o instalaciones. Captura los tipos
de acceso, las categorias de usuarios elegibles, las condiciones y la
disponibilidad temporal.
ar: >-
معلومات منظمة تصف كيف يمكن استخدام أو استشارة مجموعات التراث أو الخدمات
أو المرافق. تسجل أنواع الوصول وفئات المستخدمين المؤهلين والشروط
والتوفر الزمني.
id: >-
Informasi terstruktur yang menjelaskan bagaimana koleksi warisan, layanan,
atau fasilitas dapat digunakan atau dikonsultasikan. Merekam jenis akses,
kategori pengguna yang memenuhi syarat, kondisi, dan ketersediaan temporal.
zh: >-
描述如何使用或查阅遗产馆藏、服务或设施的结构化信息。记录访问类型、
符合条件的用户类别、条件和时间可用性。
examples:
- value:
has_type: PUBLIC
has_description: Open to general public during gallery hours
has_user_category:
- general public
description: Museum gallery with unrestricted public access
- value:
has_type: BY_APPOINTMENT
has_user_category:
- credentialed researchers
- graduate students with faculty sponsor
description: Special collections reading room requiring advance booking
- value:
has_type: ACADEMIC
has_description: Open to enrolled students and faculty; public by appointment
has_user_category:
- enrolled students
- faculty
- research staff
description: University library with priority for academic community
- value:
has_type: DIGITAL_ONLY
has_description: Collection accessible only through online database
has_user_category:
- anyone with internet access
description: Digitized collection available remotely
- value:
has_type: RESTRICTED
has_description: Fragile materials require staff supervision
has_user_category:
- senior researchers with institutional affiliation
description: Conservation-restricted materials with supervised access
class_uri: dcterms:RightsStatement
description: |
Structured access information for heritage collections, services, or facilities.
**Purpose**:
Replaces simple string descriptions of access conditions with structured
data capturing access types, eligible users, conditions, and restrictions.
**Key Properties**:
- `has_type`: Type of access (PUBLIC, BY_APPOINTMENT, RESTRICTED, etc.)
- `has_user_category`: Who can access (public, students, faculty, researchers)
- `condition_of_access`: Conditions or requirements for access
- `has_description`: Free-text description
- `temporal_extent`: When this access policy applies
**Access Types**:
- PUBLIC: Open to general public
- BY_APPOINTMENT: Requires advance appointment
- ACADEMIC: Restricted to academic community
- RESEARCHER: Restricted to credentialed researchers
- MEMBER: Requires membership
- RESTRICTED: Limited access with specific conditions
- CLOSED: Not currently accessible
- DIGITAL_ONLY: Available only in digital form
**Ontological Alignment**:
- **Primary**: `dcterms:RightsStatement` - Dublin Core rights statement
- **Close**: `schema:publicAccess` - Schema.org access indicator
- **Related**: `crm:E30_Right` - CIDOC-CRM rights
exact_mappings:
- dcterms:RightsStatement
close_mappings:
- schema:publicAccess
related_mappings:
- crm:E30_Right
slots:
- has_type
- has_user_category
- has_description
- temporal_extent
- has_frequency
- has_type
- has_user_category
- has_description
- temporal_extent
- has_frequency
structured_aliases:
- literal_form: toegang
in_language: nl
- literal_form: Zugang
in_language: de
- literal_form: acces
in_language: fr
- literal_form: acceso
in_language: es
- literal_form: وصول
in_language: ar
- literal_form: akses
in_language: id
- literal_form: 访问
in_language: zh
keywords:
- access policy
- access rights
- opening hours
- appointment required
- restricted materials
- public access
- research access
- reading room
- digital access
- physical access
- user eligibility
slot_usage:
has_type:
range: AccessTypeEnum
@ -63,9 +128,9 @@ classes:
has_user_category:
required: false
examples:
- value: "enrolled students"
- value: "faculty and staff"
- value: "visiting researchers with credentials"
- value: enrolled students
- value: faculty and staff
- value: visiting researchers with credentials
temporal_extent:
required: false
range: TimeSpan
@ -75,37 +140,55 @@ classes:
range: Frequency
inlined: true
examples:
- value:
has_label: "Daily"
annotations:
specificity_score: 0.50
specificity_rationale: "Moderately specific - applies to collection and service access contexts"
custodian_types: '["*"]'
custodian_types_rationale: "All institution types offer some form of access"
- value:
has_label: Daily
broad_mappings:
- dcterms:RightsStatement
- skos:Concept
close_mappings:
- schema:publicAccess
related_mappings:
- crm:E30_Right
comments:
- "Created per slot_fixes.yaml revision for collection_access migration"
- "Replaces string-based collection_access with structured access data"
- "RULE 53: Part of collection_access → offers_or_offered_access + Access migration"
examples:
- value:
has_type: PUBLIC
has_description: "Open to general public during gallery hours"
has_user_category:
- "general public"
- value:
has_type: BY_APPOINTMENT
has_user_category:
- "credentialed researchers"
- "graduate students with faculty sponsor"
- value:
has_type: ACADEMIC
has_description: "Open to enrolled students and faculty; public by appointment"
has_user_category:
- "enrolled students"
- "faculty"
- "research staff"
- value:
has_type: DIGITAL_ONLY
has_description: "Collection accessible only through online database"
has_user_category:
- "anyone with internet access"
- Replaces simple string descriptions of access conditions with structured data
- Key slots include has_type (access type), has_user_category (eligible users), has_description (conditions), and temporal_extent (when policy applies)
- Common access types include PUBLIC, BY_APPOINTMENT, ACADEMIC, RESEARCHER, MEMBER, RESTRICTED, CLOSED, and DIGITAL_ONLY
- Created per slot_fixes.yaml revision for collection_access migration
- RULE 53: Part of collection_access to offers_or_offered_access plus Access migration
see_also:
- AccessTypeEnum
- dcterms:accessRights
notes:
- |
Preserved from prior description (commit ae09ff81):
Preserved from prior description (commit ae09ff81):
Structured access information for heritage collections, services, or facilities.
**Purpose**:
Replaces simple string descriptions of access conditions with structured
data capturing access types, eligible users, conditions, and restrictions.
**Key Properties**:
- `has_type`: Type of access (PUBLIC, BY_APPOINTMENT, RESTRICTED, etc.)
- `has_user_category`: Who can access (public, students, faculty, researchers)
- `condition_of_access`: Conditions or requirements for access
- `has_description`: Free-text description
- `temporal_extent`: When this access policy applies
**Access Types**:
- PUBLIC: Open to general public
- BY_APPOINTMENT: Requires advance appointment
- ACADEMIC: Restricted to academic community
- RESEARCHER: Restricted to credentialed researchers
- MEMBER: Requires membership
- RESTRICTED: Limited access with specific conditions
- CLOSED: Not currently accessible
- DIGITAL_ONLY: Available only in digital form
**Ontological Alignment**:
- **Primary**: `dcterms:RightsStatement` - Dublin Core rights statement
- **Close**: `schema:publicAccess` - Schema.org access indicator
- **Related**: `crm:E30_Right` - CIDOC-CRM rights
annotations:
specificity_score: "0.5"
specificity_rationale: Moderately specific - applies to collection and service access contexts
custodian_types: "['*']"
custodian_types_rationale: All institution types offer some form of access

Some files were not shown because too many files have changed in this diff Show more