From 554fe520ea686ea012b6d615f44b757f6b0881b1 Mon Sep 17 00:00:00 2001 From: kempersc Date: Sun, 15 Feb 2026 19:20:09 +0100 Subject: [PATCH] Add comprehensive rules for LinkML schema management and ontology mapping - Introduced Rule 42: No Ontology Prefixes in Slot Names to enforce clean naming conventions. - Established Rule: No Rough Edits in Schema Files to ensure structural integrity during modifications. - Implemented Rule: No Version Indicators in Names to maintain stable semantic naming. - Created Rule: Ontology Detection vs Heuristics to emphasize the importance of verifying ontology definitions. - Defined Rule 50: Ontology-to-LinkML Mapping Convention to standardize mapping practices. - Added Rule: Polished Slot Storage Location to specify directory structure for polished slot files. - Enforced Rule: Preserve Bespoke Slots Until Refactoring to prevent unintended migrations during slot updates. - Instituted Rule 56: Semantic Consistency Over Simplicity to mandate execution of revisions in slot_fixes.yaml. - Added new Genealogy Archives Registry Enrichment class with multilingual support and structured aliases. --- .../ontology-driven-cache-segmentation.md | 583 ++++++++++++++++++ ...ring-parsimony-and-domain-modeling-rule.md | 65 ++ .../rules/entity-resolution-no-heuristics.md | 91 +-- .../disambiguation-entity-profiles.md | 248 ++++++++ .../entity-resolution-no-heuristics.md | 307 +++++++++ .../inferred-data-explicit-provenance-rule.md | 422 +++++++++++++ .../kien-authoritative-source-rule.md | 251 ++++++++ .../ppid-birth-date-enrichment-rule.md | 351 +++++++++++ ...act-mapping-predicate-class-distinction.md | 4 +- ...ring-parsimony-and-domain-modeling-rule.md | 65 ++ ...act-mapping-predicate-class-distinction.md | 37 ++ .../feedback-vs-revision-distinction.md | 144 +++++ .../rules/linkml/full-slot-migration-rule.md | 373 +++++++++++ .../linkml/generic-slots-specific-classes.md | 129 ++++ .../linkml-union-type-range-any-rule.md | 157 +++++ .../linkml/linkml-yaml-best-practices-rule.md | 181 ++++++ .../mapping-specificity-hypernym-rule.md | 185 ++++++ .../multilingual-support-requirements.md | 177 ++++++ .../linkml/no-autonomous-alias-assignment.md | 24 + .../linkml/no-deletion-from-slot-fixes.md | 46 ++ .../linkml/no-duplicate-ontology-mappings.md | 189 ++++++ .../no-hallucinated-ontology-references.md | 316 ++++++++++ .../linkml/no-migration-deferral-rule.md | 164 +++++ .../no-ontology-prefix-in-slot-names.md | 215 +++++++ .../rules/linkml/no-rough-edits-in-schema.md | 61 ++ .../no-version-indicators-in-names-rule.md | 53 ++ .../rules/linkml/ontology-detection-rule.md | 15 + .../ontology-to-linkml-mapping-convention.md | 306 +++++++++ .../linkml/polished-slot-storage-location.md | 45 ++ ...reserve-bespoke-slots-until-refactoring.md | 32 + .../semantic-consistency-over-simplicity.md | 190 ++++++ .../rules/no-tool-specific-classes-rule.md | 48 -- .../rules/polished-slot-storage-location.md | 2 +- .../schemas/20251121/linkml/manifest.json | 2 +- schemas/20251121/linkml/custodian_source.yaml | 4 +- schemas/20251121/linkml/manifest.json | 2 +- .../modules/classes/DonationScheme.yaml | 18 +- .../linkml/modules/classes/FundingCall.yaml | 38 +- .../linkml/modules/classes/FundingFocus.yaml | 41 +- .../modules/classes/FundingProgram.yaml | 48 +- .../linkml/modules/classes/FundingRate.yaml | 41 +- .../modules/classes/FundingRequirement.yaml | 242 ++------ .../linkml/modules/classes/FundingScheme.yaml | 48 +- .../linkml/modules/classes/FundingSource.yaml | 51 +- .../linkml/modules/classes/Fylkesarkiv.yaml | 46 +- .../modules/classes/GBIFIdentifier.yaml | 43 +- .../linkml/modules/classes/GHCIdentifier.yaml | 17 +- .../linkml/modules/classes/Gallery.yaml | 45 +- .../linkml/modules/classes/GalleryType.yaml | 248 ++------ .../linkml/modules/classes/GalleryTypes.yaml | 28 +- .../modules/classes/GenBankAccession.yaml | 39 +- .../linkml/modules/classes/Gender.yaml | 48 +- .../classes/GenealogiewerkbalkEnrichment.yaml | 33 - .../GenealogyArchivesRegistryEnrichment.yaml | 48 ++ .../modules/classes/GenerationEvent.yaml | 80 +-- .../linkml/modules/classes/GeoFeature.yaml | 35 +- .../modules/classes/GeoFeatureType.yaml | 21 +- .../modules/classes/GeoFeatureTypes.yaml | 67 +- .../modules/classes/GeoNamesIdentifier.yaml | 19 +- .../modules/classes/GeoSpatialPlace.yaml | 158 ++--- .../modules/classes/GeographicExtent.yaml | 21 +- .../modules/classes/GeographicScope.yaml | 20 +- .../linkml/modules/classes/Geometry.yaml | 35 +- .../linkml/modules/classes/GeometryType.yaml | 21 +- .../linkml/modules/classes/GeometryTypes.yaml | 44 +- .../modules/classes/GeospatialIdentifier.yaml | 20 +- .../modules/classes/GeospatialLocation.yaml | 15 +- .../linkml/modules/classes/GhcidBlock.yaml | 42 +- .../linkml/modules/enums/DataTierEnum.yaml | 4 +- 69 files changed, 6113 insertions(+), 1095 deletions(-) create mode 100644 .opencode/rules/context/ontology-driven-cache-segmentation.md create mode 100644 .opencode/rules/engineering-parsimony-and-domain-modeling-rule.md create mode 100644 .opencode/rules/entity_resolution/disambiguation-entity-profiles.md create mode 100644 .opencode/rules/entity_resolution/entity-resolution-no-heuristics.md create mode 100644 .opencode/rules/entity_resolution/inferred-data-explicit-provenance-rule.md create mode 100644 .opencode/rules/entity_resolution/kien-authoritative-source-rule.md create mode 100644 .opencode/rules/entity_resolution/ppid-birth-date-enrichment-rule.md create mode 100644 .opencode/rules/linkml/engineering-parsimony-and-domain-modeling-rule.md create mode 100644 .opencode/rules/linkml/exact-mapping-predicate-class-distinction.md create mode 100644 .opencode/rules/linkml/feedback-vs-revision-distinction.md create mode 100644 .opencode/rules/linkml/full-slot-migration-rule.md create mode 100644 .opencode/rules/linkml/generic-slots-specific-classes.md create mode 100644 .opencode/rules/linkml/linkml-union-type-range-any-rule.md create mode 100644 .opencode/rules/linkml/linkml-yaml-best-practices-rule.md create mode 100644 .opencode/rules/linkml/mapping-specificity-hypernym-rule.md create mode 100644 .opencode/rules/linkml/multilingual-support-requirements.md create mode 100644 .opencode/rules/linkml/no-autonomous-alias-assignment.md create mode 100644 .opencode/rules/linkml/no-deletion-from-slot-fixes.md create mode 100644 .opencode/rules/linkml/no-duplicate-ontology-mappings.md create mode 100644 .opencode/rules/linkml/no-hallucinated-ontology-references.md create mode 100644 .opencode/rules/linkml/no-migration-deferral-rule.md create mode 100644 .opencode/rules/linkml/no-ontology-prefix-in-slot-names.md create mode 100644 .opencode/rules/linkml/no-rough-edits-in-schema.md create mode 100644 .opencode/rules/linkml/no-version-indicators-in-names-rule.md create mode 100644 .opencode/rules/linkml/ontology-detection-rule.md create mode 100644 .opencode/rules/linkml/ontology-to-linkml-mapping-convention.md create mode 100644 .opencode/rules/linkml/polished-slot-storage-location.md create mode 100644 .opencode/rules/linkml/preserve-bespoke-slots-until-refactoring.md create mode 100644 .opencode/rules/linkml/semantic-consistency-over-simplicity.md delete mode 100644 .opencode/rules/no-tool-specific-classes-rule.md delete mode 100644 schemas/20251121/linkml/modules/classes/GenealogiewerkbalkEnrichment.yaml create mode 100644 schemas/20251121/linkml/modules/classes/GenealogyArchivesRegistryEnrichment.yaml diff --git a/.opencode/rules/context/ontology-driven-cache-segmentation.md b/.opencode/rules/context/ontology-driven-cache-segmentation.md new file mode 100644 index 0000000000..e230c8bd2e --- /dev/null +++ b/.opencode/rules/context/ontology-driven-cache-segmentation.md @@ -0,0 +1,583 @@ +# Rule 46: Ontology-Driven Cache Segmentation + +๐Ÿšจ **CRITICAL**: The semantic cache MUST use vocabulary derived from LinkML `*Type.yaml` and `*Types.yaml` schema files to extract entities for cache key generation. Hardcoded regex patterns are deprecated. + +**Status**: Implemented (Evolved v2.0) +**Version**: 2.0 (Epistemological Evolution) +**Updated**: 2026-01-10 + +## Evolution Overview + +Rule 46 v2.0 incorporates insights from Volodymyr Pavlyshyn's work on agentic memory systems: + +1. **Epistemic Provenance** (Phase 1) - Track WHERE, WHEN, HOW data originated +2. **Topological Distance** (Phase 2) - Use ontology structure, not just embeddings +3. **Holarchic Cache** (Phase 3) - Entries as holons with up/down links +4. **Message Passing** (Phase 4, planned) - Smalltalk-style introspectable cache +5. **Clarity Trading** (Phase 5, planned) - Block ambiguous queries from cache + +## Epistemic Provenance + +Every cached response carries epistemological metadata: + +```typescript +interface EpistemicProvenance { + dataSource: 'ISIL_REGISTRY' | 'WIKIDATA' | 'CUSTODIAN_YAML' | 'LLM_INFERENCE' | ...; + dataTier: 1 | 2 | 3 | 4; // TIER_1_AUTHORITATIVE โ†’ TIER_4_INFERRED + sourceTimestamp: string; + derivationChain: string[]; // ["SPARQL:Qdrant", "RAG:retrieve", "LLM:generate"] + revalidationPolicy: 'static' | 'daily' | 'weekly' | 'on_access'; +} +``` + +**Benefit**: Users see "This answer is from TIER_1 ISIL registry data, captured 2025-01-08". + +## Topological Distance + +Beyond embedding similarity, cache matching considers **structural distance** in the type hierarchy: + +``` + HeritageCustodian (*) + โ”‚ + โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” + โ–ผ โ–ผ โ–ผ + MuseumType (M) ArchiveType (A) LibraryType (L) + โ”‚ โ”‚ โ”‚ + โ”Œโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ” + โ–ผ โ–ผ โ–ผ โ–ผ โ–ผ โ–ผ +ArtMuseum History Municipal State Public Academic +``` + +**Combined Similarity Formula**: +```typescript +finalScore = 0.7 * embeddingSimilarity + 0.3 * (1 - topologicalDistance) +``` + +**Benefit**: "Art museum" won't match "natural history museum" even with 95% embedding similarity. + +## Holarchic Cache Structure + +Cache entries are **holons** - simultaneously complete AND parts of aggregates: + +| Level | Example | Aggregates | +|-------|---------|------------| +| Micro | "Rijksmuseum details" | None | +| Meso | "Museums in Amsterdam" | List of micro holons | +| Macro | "Heritage in Noord-Holland" | All meso holons in region | + +```typescript +interface CachedQuery { + // ... existing fields ... + holonLevel?: 'micro' | 'meso' | 'macro'; + participatesIn?: string[]; // Higher-level cache keys + aggregates?: string[]; // Lower-level entries +} +``` + +## Problem Statement + +The ArchiefAssistent semantic cache prevents geographic false positives using entity extraction: + +``` +Query: "Hoeveel musea in Amsterdam?" +Cached: "Hoeveel musea in Noord-Holland?" +Result: BLOCKED (location mismatch) โœ… +``` + +However, the current implementation uses **hardcoded regex patterns**: + +```typescript +// DEPRECATED: Hardcoded patterns in semantic-cache.ts +const INSTITUTION_PATTERNS: Record = { + M: /\b(muse(um|a|ums?)|musea)/i, + A: /\b(archie[fv]en?|archives?|archief)/i, + // ... 19 patterns to maintain manually +}; +``` + +**Problems with hardcoded patterns**: +1. **Maintenance burden** - Every new institution type requires code changes +2. **Missing subtypes** - "kunstmuseum" vs "museum" should cache separately +3. **No multilingual support** - Only Dutch/English, misses German/French labels +4. **Duplication** - Same vocabulary exists in LinkML schemas +5. **No record type awareness** - "burgerlijke stand" queries mixed with general archive queries + +## Solution: Schema-Derived Vocabulary + +The LinkML schema already contains rich vocabulary: + +| Schema File | Content | Cache Utility | +|-------------|---------|---------------| +| `CustodianType.yaml` | 19 top-level types | Primary segmentation (M/A/L/G...) | +| `MuseumType.yaml` | 187+ museum subtypes | Subtype segmentation | +| `ArchiveOrganizationType.yaml` | 144+ archive subtypes | Subtype segmentation | +| `*RecordSetTypes.yaml` | Record type taxonomies | Finding aids specificity | + +### Vocabulary Sources in Schema + +1. **`type_label`** - Multilingual labels via `skos:prefLabel` +2. **`structured_aliases`** - Language-tagged alternative names +3. **`keywords`** - Search terms for entity recognition +4. **`wikidata_entity`** - Linked Data identifiers + +## Architecture + +### Overview: Two-Tier Embedding Hierarchy + +The system uses a **hierarchical embedding approach** for fast semantic routing: + +1. **Tier 1: Types File Embeddings** - Which category? (Museum vs Archive vs Library) +2. **Tier 2: Individual Type Embeddings** - Which specific type? (ArtMuseum vs NaturalHistoryMuseum) + +``` +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ BUILD TIME: Extract vocabulary + generate embeddings โ”‚ +โ”‚ โ”‚ +โ”‚ schemas/20251121/linkml/modules/classes/*Type.yaml โ”‚ +โ”‚ schemas/20251121/linkml/modules/classes/*Types.yaml โ”‚ +โ”‚ โ†“ โ”‚ +โ”‚ scripts/extract-types-vocab.ts โ”‚ +โ”‚ โ†“ โ”‚ +โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ +โ”‚ โ”‚ types-vocab.json โ”‚ โ”‚ +โ”‚ โ”‚ โ”œโ”€โ”€ tier1Embeddings: { MuseumType: [...], ArchiveType: [...] } โ”‚ โ”‚ +โ”‚ โ”‚ โ”œโ”€โ”€ tier2Embeddings: { ArtMuseum: [...], MunicipalArchive: [...]}โ”‚ โ”‚ +โ”‚ โ”‚ โ””โ”€โ”€ termLog: { "kunstmuseum": { type: "M", subtype: "ART_MUSEUM"}โ”‚ โ”‚ +โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ + โ”‚ + โ–ผ (loaded at runtime) +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ RUNTIME: Two-Tier Semantic Routing โ”‚ +โ”‚ โ”‚ +โ”‚ Query: "Hoeveel gemeentearchieven in Amsterdam?" โ”‚ +โ”‚ โ†“ โ”‚ +โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ +โ”‚ โ”‚ TIER 1: Types File Selection โ”‚ โ”‚ +โ”‚ โ”‚ Query embedding vs Tier1 embeddings (19 categories) โ”‚ โ”‚ +โ”‚ โ”‚ Result: ArchiveOrganizationType (similarity: 0.89) โ”‚ โ”‚ +โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ +โ”‚ โ†“ โ”‚ +โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ +โ”‚ โ”‚ TIER 2: Specific Type Selection โ”‚ โ”‚ +โ”‚ โ”‚ Query embedding vs Tier2 embeddings (144 archive subtypes) โ”‚ โ”‚ +โ”‚ โ”‚ Result: MunicipalArchive (similarity: 0.94) โ”‚ โ”‚ +โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ +โ”‚ โ†“ โ”‚ +โ”‚ Structured cache key: "count:A.MUNICIPAL_ARCHIVE:amsterdam" โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ +``` + +### Tier 1: Types File Embeddings + +Each Types file (e.g., `MuseumType.yaml`, `ArchiveOrganizationType.yaml`) gets ONE embedding +representing the **accumulated vocabulary** of all types within that file. + +**Embedding Text Construction**: +``` +MuseumType: museum musea kunstmuseum art museum natural history museum + science museum open-air museum ecomuseum virtual museum + heritage farm national museum regional museum university museum + [... all keywords from all 187 subtypes ...] +``` + +**Purpose**: Fast first-pass filter to identify which GLAMORCUBESFIXPHDNT category the query relates to. + +| Types File | Code | Accumulated Terms Count | +|------------|------|------------------------| +| MuseumType | M | ~500+ terms from 187 subtypes | +| ArchiveOrganizationType | A | ~400+ terms from 144 subtypes | +| LibraryType | L | ~200+ terms from subtypes | +| GalleryType | G | ~100+ terms from subtypes | +| ... | ... | ... | + +### Tier 2: Individual Type Embeddings + +Each **specific type** within a Types file gets its own embedding from its accumulated terms. + +**Embedding Text Construction**: +``` +MunicipalArchive: gemeentearchief stadsarchief city archive municipal archive + town archive local government records burgerlijke stand + bevolkingsregister council minutes building permits + [... all keywords + structured_aliases + labels ...] +``` + +**Purpose**: Precise subtype identification after Tier 1 narrows the category. + +### Term Log Structure + +A lookup table mapping every extracted term to its type/subtype: + +```json +{ + "termLog": { + "kunstmuseum": { + "typeCode": "M", + "typeName": "MuseumType", + "subtypeName": "ART_MUSEUM", + "wikidata": "Q207694", + "language": "nl" + }, + "art museum": { + "typeCode": "M", + "typeName": "MuseumType", + "subtypeName": "ART_MUSEUM", + "wikidata": "Q207694", + "language": "en" + }, + "gemeentearchief": { + "typeCode": "A", + "typeName": "ArchiveOrganizationType", + "subtypeName": "MUNICIPAL_ARCHIVE", + "wikidata": "Q8362876", + "language": "nl" + } + } +} +``` + +**Purpose**: +1. Fast O(1) keyword lookup (no embedding needed for exact matches) +2. Audit trail of which terms map to which types +3. Debugging which queries match which types + +### Runtime Lookup Strategy + +```typescript +async function extractEntitiesWithEmbeddings(query: string): Promise { + const vocab = await loadTypesVocabulary(); + const normalized = query.toLowerCase(); + + // FAST PATH: Check termLog for exact keyword matches + for (const [term, mapping] of Object.entries(vocab.termLog)) { + if (normalized.includes(term)) { + return { + institutionType: mapping.typeCode, + institutionSubtype: mapping.subtypeName, + subtypeWikidata: mapping.wikidata, + // ... location and intent extraction + }; + } + } + + // SLOW PATH: Embedding-based semantic matching + const queryEmbedding = await generateEmbedding(query); + + // Tier 1: Find best matching Types file + let bestType: string | null = null; + let bestTypeSimilarity = 0; + for (const [typeName, typeEmbedding] of Object.entries(vocab.tier1Embeddings)) { + const similarity = cosineSimilarity(queryEmbedding, typeEmbedding); + if (similarity > bestTypeSimilarity && similarity > 0.7) { + bestTypeSimilarity = similarity; + bestType = typeName; + } + } + + if (!bestType) return {}; // No type matched + + // Tier 2: Find best matching subtype within the Types file + const typeCode = vocab.institutionTypes[bestType].code; + let bestSubtype: string | null = null; + let bestSubtypeSimilarity = 0; + + for (const [subtypeName, subtypeEmbedding] of Object.entries(vocab.tier2Embeddings[typeCode] || {})) { + const similarity = cosineSimilarity(queryEmbedding, subtypeEmbedding); + if (similarity > bestSubtypeSimilarity && similarity > 0.75) { + bestSubtypeSimilarity = similarity; + bestSubtype = subtypeName; + } + } + + return { + institutionType: typeCode, + institutionSubtype: bestSubtype, + // ... location and intent extraction + }; +} +``` + +### Embedding Model Choice + +For build-time embedding generation, use the same model as the semantic cache: + +| Option | Model | Dimensions | Quality | +|--------|-------|------------|---------| +| **Primary** | `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` | 384 | Good multilingual | +| Fallback | `all-MiniLM-L6-v2` | 384 | English-focused | +| High Quality | `multilingual-e5-large` | 1024 | Best multilingual | + +**Build-time generation**: Embeddings are generated ONCE at build time and stored in JSON. +This avoids runtime embedding API calls for type classification. + +## TypesVocabulary JSON Structure + +Generated at build time with **pre-computed embeddings**: + +```json +{ + "version": "2026-01-10T12:00:00Z", + "schemaVersion": "20251121", + "embeddingModel": "paraphrase-multilingual-MiniLM-L12-v2", + "embeddingDimensions": 384, + + "tier1Embeddings": { + "MuseumType": [0.023, -0.045, 0.087, ...], + "ArchiveOrganizationType": [0.012, 0.056, -0.034, ...], + "LibraryType": [-0.034, 0.089, 0.012, ...], + "GalleryType": [0.045, -0.023, 0.067, ...] + }, + + "tier2Embeddings": { + "M": { + "ART_MUSEUM": [0.034, -0.056, 0.078, ...], + "NATURAL_HISTORY_MUSEUM": [0.045, 0.023, -0.089, ...], + "SCIENCE_MUSEUM": [0.067, -0.012, 0.045, ...] + }, + "A": { + "MUNICIPAL_ARCHIVE": [0.089, 0.034, -0.056, ...], + "NATIONAL_ARCHIVE": [0.012, -0.078, 0.045, ...], + "CHURCH_ARCHIVE": [-0.023, 0.067, 0.034, ...] + } + }, + + "termLog": { + "kunstmuseum": {"typeCode": "M", "subtypeName": "ART_MUSEUM", "wikidata": "Q207694", "lang": "nl"}, + "art museum": {"typeCode": "M", "subtypeName": "ART_MUSEUM", "wikidata": "Q207694", "lang": "en"}, + "gemeentearchief": {"typeCode": "A", "subtypeName": "MUNICIPAL_ARCHIVE", "wikidata": "Q8362876", "lang": "nl"}, + "stadsarchief": {"typeCode": "A", "subtypeName": "MUNICIPAL_ARCHIVE", "wikidata": "Q8362876", "lang": "nl"}, + "city archive": {"typeCode": "A", "subtypeName": "MUNICIPAL_ARCHIVE", "wikidata": "Q8362876", "lang": "en"}, + "burgerlijke stand": {"typeCode": "A", "recordSetType": "CIVIL_REGISTRY", "lang": "nl"}, + "geboorteakte": {"typeCode": "A", "recordSetType": "CIVIL_REGISTRY", "lang": "nl"} + }, + + "institutionTypes": { + "M": { + "code": "M", + "className": "MuseumType", + "baseWikidata": "Q33506", + "accumulatedTerms": "museum musea kunstmuseum art museum natural history museum science museum open-air museum ecomuseum virtual museum heritage farm national museum regional museum university museum...", + "keywords": { + "nl": ["museum", "musea"], + "en": ["museum", "museums"], + "de": ["Museum", "Museen"] + }, + "subtypes": { + "ART_MUSEUM": { + "className": "ArtMuseum", + "wikidata": "Q207694", + "accumulatedTerms": "kunstmuseum art museum kunstmusea art museums fine art museum visual arts museum painting gallery sculpture museum", + "keywords": { + "nl": ["kunstmuseum", "kunstmusea"], + "en": ["art museum", "art museums"] + } + }, + "NATURAL_HISTORY_MUSEUM": { + "className": "NaturalHistoryMuseum", + "wikidata": "Q559049", + "accumulatedTerms": "natuurhistorisch museum natuurmuseum natural history museum science museum fossils taxidermy specimens geology biology", + "keywords": { + "nl": ["natuurhistorisch museum", "natuurmuseum"], + "en": ["natural history museum"] + } + } + } + }, + "A": { + "code": "A", + "className": "ArchiveOrganizationType", + "baseWikidata": "Q166118", + "accumulatedTerms": "archief archieven archive archives gemeentearchief stadsarchief nationaal archief rijksarchief church archive company archive film archive...", + "keywords": { + "nl": ["archief", "archieven"], + "en": ["archive", "archives"] + }, + "subtypes": { + "MUNICIPAL_ARCHIVE": { + "className": "MunicipalArchive", + "wikidata": "Q8362876", + "accumulatedTerms": "gemeentearchief stadsarchief municipal archive city archive town archive local government records civil registry population register building permits council minutes", + "keywords": { + "nl": ["gemeentearchief", "stadsarchief", "gemeentelijke archiefdienst"], + "en": ["municipal archive", "city archive", "town archive"] + } + }, + "NATIONAL_ARCHIVE": { + "className": "NationalArchive", + "wikidata": "Q1188452", + "accumulatedTerms": "nationaal archief rijksarchief national archive state archive government records national records federal archive", + "keywords": { + "nl": ["nationaal archief", "rijksarchief"], + "en": ["national archive", "state archive"] + } + } + } + } + }, + + "recordSetTypes": { + "CIVIL_REGISTRY": { + "className": "CivilRegistrySeries", + "accumulatedTerms": "burgerlijke stand geboorteakte huwelijksakte overlijdensakte bevolkingsregister civil registry birth records marriage records death records population register vital records genealogy", + "keywords": { + "nl": ["burgerlijke stand", "geboorteakte", "huwelijksakte", "overlijdensakte", "bevolkingsregister"], + "en": ["civil registry", "birth records", "marriage records", "death records"] + } + }, + "COUNCIL_GOVERNANCE": { + "className": "CouncilGovernanceFonds", + "accumulatedTerms": "gemeenteraad raadsnotulen raadsbesluit verordening council minutes ordinances resolutions bylaws municipal council town council city council", + "keywords": { + "nl": ["gemeenteraad", "raadsnotulen", "raadsbesluit", "verordening"], + "en": ["council minutes", "ordinances", "resolutions"] + } + } + } +} +``` + +### Key Additions for Embedding Support + +| Field | Purpose | +|-------|---------| +| `tier1Embeddings` | Pre-computed embeddings for each Types file (19 categories) | +| `tier2Embeddings` | Pre-computed embeddings for each subtype (500+ types) | +| `termLog` | Fast O(1) lookup table for exact keyword matches | +| `accumulatedTerms` | Raw text used to generate embeddings (for debugging/regeneration) | +| `embeddingModel` | Model used to generate embeddings (for reproducibility) | + +## Enhanced ExtractedEntities Interface + +```typescript +export interface ExtractedEntities { + // Existing fields + institutionType?: InstitutionTypeCode | null; + location?: string | null; + locationType?: 'city' | 'province' | null; + intent?: 'count' | 'list' | 'info' | null; + + // NEW: Ontology-derived fields + institutionSubtype?: string | null; // e.g., 'MUNICIPAL_ARCHIVE', 'ART_MUSEUM' + recordSetType?: string | null; // e.g., 'CIVIL_REGISTRY', 'COUNCIL_GOVERNANCE' + subtypeWikidata?: string | null; // e.g., 'Q8362876' for LOD integration +} +``` + +## Enhanced Cache Key Format + +``` +{intent}:{institutionType}[.{subtype}][:{recordSetType}]:{location} + +Examples: +- "count:m:amsterdam" # Basic museum count +- "count:m.art_museum:amsterdam" # Art museum count (subtype) +- "list:a.municipal_archive:nh" # Municipal archives in Noord-Holland +- "query:a:civil_registry:utrecht" # Civil registry in Utrecht +- "info:a.national_archive::nl" # National archive info (no location filter) +``` + +## Implementation Files + +| File | Purpose | +|------|---------| +| `scripts/extract-types-vocab.ts` | Build-time vocabulary extraction from LinkML | +| `apps/archief-assistent/public/types-vocab.json` | Generated vocabulary file | +| `apps/archief-assistent/src/lib/types-vocabulary.ts` | Runtime vocabulary loader | +| `apps/archief-assistent/src/lib/semantic-cache.ts` | Updated entity extraction | + +## Build Integration + +Add to `apps/archief-assistent/package.json`: + +```json +{ + "scripts": { + "prebuild": "tsx ../../scripts/extract-types-vocab.ts", + "build": "vite build" + } +} +``` + +## Keyword Extraction Priority + +When extracting keywords from schema files: + +1. **`keywords`** array (highest priority) - Explicit search terms +2. **`structured_aliases.literal_form`** - Multilingual alternative names +3. **`type_label`** - Preferred labels per language +4. **Class name conversion** - `MunicipalArchive` โ†’ "municipal archive" + +## Cache Segmentation Rules + +### Rule 1: Subtype Specificity + +Queries with **specific subtypes** should NOT match **generic type** cache entries: + +``` +Query: "kunstmusea in Amsterdam" โ†’ key: "count:m.art_museum:amsterdam" +Cached: "musea in Amsterdam" โ†’ key: "count:m:amsterdam" +Result: MISS (subtype mismatch) โœ… +``` + +### Rule 2: Record Set Type Isolation + +Queries about **specific record types** should cache separately: + +``` +Query: "burgerlijke stand Utrecht" โ†’ key: "query:a:civil_registry:utrecht" +Cached: "archieven in Utrecht" โ†’ key: "list:a:utrecht" +Result: MISS (record set type mismatch) โœ… +``` + +### Rule 3: Subtype-to-Type Fallback + +Generic queries CAN match subtype cache entries (broader is acceptable): + +``` +Query: "musea in Amsterdam" โ†’ key: "count:m:amsterdam" +Cached: "kunstmusea in Amsterdam" โ†’ key: "count:m.art_museum:amsterdam" +Result: MISS (don't return subset for superset query) +``` + +## Migration Notes + +1. **Backwards Compatible**: Existing cache entries without `institutionSubtype` continue to work +2. **Gradual Rollout**: New cache entries get subtype, old entries remain valid +3. **Cache Clear**: Consider clearing cache after deployment to ensure consistency + +## Validation + +Run E2E tests to verify: + +```bash +cd apps/archief-assistent +npm run test:e2e +``` + +Key test cases: +- Geographic isolation (Amsterdam โ‰  Rotterdam โ‰  Noord-Holland) +- Subtype isolation (kunstmuseum โ‰  museum) +- Record set isolation (burgerlijke stand โ‰  archive) +- Intent isolation (count โ‰  list โ‰  info) + +## References + +- **Rule 41**: Types classes define SPARQL template variables +- **Rule 0b**: Type/Types file naming convention +- **CustodianType.yaml**: Base taxonomy definition +- **AGENTS.md**: GLAMORCUBESFIXPHDNT taxonomy documentation + +--- + +**Created**: 2026-01-10 +**Author**: OpenCode Agent +**Status**: Implemented (v2.0) + +## References + +- Pavlyshyn, V. "Context Graphs and Data Traces: Building Epistemology Layers for Agentic Memory" +- Pavlyshyn, V. "The Shape of Knowledge: Topology Theory for Knowledge Graphs" +- Pavlyshyn, V. "Beyond Hierarchy: Why Agentic AI Systems Need Holarchies" +- Pavlyshyn, V. "Smalltalk: The Language That Changed Everything" +- Pavlyshyn, V. "Clarity Traders: Beyond Vibe Coding" diff --git a/.opencode/rules/engineering-parsimony-and-domain-modeling-rule.md b/.opencode/rules/engineering-parsimony-and-domain-modeling-rule.md new file mode 100644 index 0000000000..5ab6222aa8 --- /dev/null +++ b/.opencode/rules/engineering-parsimony-and-domain-modeling-rule.md @@ -0,0 +1,65 @@ +# Rule: Engineering Parsimony and Domain Modeling + +## Critical Convention + +Our ontology follows an engineering-oriented approach: practical domain utility and +stable interoperability take priority over minimal, tool-specific class catalogs. + +## Rule + +1. Model domain concepts, not implementation tools. + - Reject classes like `ExaSearchMetadata`, `OpenAIFetchResult`, `ElasticsearchHit`. + +2. Prefer generic, reusable activity/entity classes for operational provenance. + - Use classes such as `ExternalSearchMetadata`, `RetrievalActivity`, `SearchResult`. + +3. Capture tool/vendor details in slot values, not class names. + - Record with generic predicates like `has_tool`, `has_method`, `has_agent`, `has_note`. + +4. Digital platforms acting as custodians are valid domain classes. + - Platform-as-custodian classes (for example YouTube-related custodian classes) are allowed. + - Data processing/search tools are not ontology class candidates. + +5. Avoid ontology growth driven by transient engineering stack choices. + - New class proposals must be justified by cross-tool, domain-stable semantics. + +## Rationale + +- Tool names are volatile implementation details and age quickly. +- Domain-level abstractions maximize reuse, query consistency, and mapping stability. +- This aligns with an engineering ontology practice where strict theoretical + parsimony in candidate theories is not the only optimization criterion; practical + semantic interoperability and maintainability are primary. + +## Examples + +### Wrong + +```yaml +classes: + ExaSearchMetadata: + class_uri: prov:Activity +``` + +### Correct + +```yaml +classes: + ExternalSearchMetadata: + class_uri: prov:Activity + slots: + - has_tool + - has_method + - has_agent +``` + +## References + +1. Liefke, K. (2024). *Natural Language Ontology and Semantic Theory*. + Cambridge Elements in Semantics. DOI: `10.1017/9781009307789`. + URL: https://www.cambridge.org/core/elements/abs/natural-language-ontology-and-semantic-theory/E8DDE548BB8A98137721984E26FAD764 + +2. Liefke, K. (2025). *Reduction and Unification in Natural Language Ontology*. + Cambridge Elements in Semantics. DOI: `10.1017/9781009559683`. + URL: https://www.cambridge.org/core/elements/abs/reduction-and-unification-in-natural-language-ontology/40F58ABA0D9C08958B5926F0CBDAD3CA + diff --git a/.opencode/rules/entity-resolution-no-heuristics.md b/.opencode/rules/entity-resolution-no-heuristics.md index fb07751c03..c7f1971595 100644 --- a/.opencode/rules/entity-resolution-no-heuristics.md +++ b/.opencode/rules/entity-resolution-no-heuristics.md @@ -18,7 +18,7 @@ ## ๐Ÿšซ AUTOMATED ENRICHMENT IS PROHIBITED ๐Ÿšซ -**DO NOT USE** automated scripts to enrich person profiles with web search data. The `enrich_person_comprehensive.py` script has been deprecated. +**DO NOT USE** automated scripts to enrich person profiles with web search data. **Why automated enrichment failed**: - Web searches return data about DIFFERENT people with similar names @@ -184,95 +184,12 @@ Domains: geni.com, ancestry.*, familysearch.org, findagrave.com, myheritage.* โ†’ Exception: If source explicitly links to living person with verifiable connection ``` -## Implementation in Enrichment Scripts - -```python -def validate_entity_match(profile: dict, search_result: dict) -> tuple[bool, str]: - """ - Validate that a search result refers to the same person as the profile. - - REQUIRES: At least 3 of 5 identity attributes must match. - Name match alone is INSUFFICIENT and automatically rejected. - - Returns (is_valid, reason) - """ - profile_employer = profile.get('affiliations', [{}])[0].get('custodian_name', '').lower() - profile_location = profile.get('profile_data', {}).get('location', '').lower() - profile_role = profile.get('profile_data', {}).get('headline', '').lower() - - source_text = search_result.get('answer', '').lower() - source_url = search_result.get('source_url', '').lower() - - # AUTOMATIC REJECTION: Genealogy sources - genealogy_domains = ['geni.com', 'ancestry.', 'familysearch.', 'findagrave.', 'myheritage.'] - if any(domain in source_url for domain in genealogy_domains): - return False, "genealogy_source_rejected" - - # AUTOMATIC REJECTION: Profession conflicts - heritage_roles = ['curator', 'archivist', 'librarian', 'conservator', 'registrar', 'collection', 'heritage'] - entertainment_roles = ['actress', 'actor', 'singer', 'footballer', 'politician', 'model', 'athlete'] - - profile_is_heritage = any(role in profile_role for role in heritage_roles) - source_is_entertainment = any(role in source_text for role in entertainment_roles) - - if profile_is_heritage and source_is_entertainment: - return False, "conflicting_profession" - - # AUTOMATIC REJECTION: Location conflicts - if profile_location: - location_conflicts = [ - ('venezuela', 'uk'), ('mexico', 'netherlands'), ('brazil', 'france'), - ('caracas', 'london'), ('mexico city', 'amsterdam') - ] - for source_loc, profile_loc in location_conflicts: - if source_loc in source_text and profile_loc in profile_location: - return False, "conflicting_location" - - # Count positive identity attribute matches (need 3 of 5) - matches = 0 - match_details = [] - - # 1. Employer match - if profile_employer and profile_employer in source_text: - matches += 1 - match_details.append(f"employer:{profile_employer}") - - # 2. Location match - if profile_location and profile_location in source_text: - matches += 1 - match_details.append(f"location:{profile_location}") - - # 3. Role/profession match - if profile_role: - role_words = [w for w in profile_role.split() if len(w) > 4] - if any(word in source_text for word in role_words): - matches += 1 - match_details.append(f"role_match") - - # 4. Education/institution match (if available) - profile_education = profile.get('profile_data', {}).get('education', []) - if profile_education: - edu_names = [e.get('school', '').lower() for e in profile_education if e.get('school')] - if any(edu in source_text for edu in edu_names): - matches += 1 - match_details.append(f"education_match") - - # 5. Time period match (career dates) - # (implementation depends on available data) - - # REQUIRE 3 OF 5 MATCHES - if matches < 3: - return False, f"insufficient_identity_verification (only {matches}/5 attributes matched)" - - return True, f"verified ({matches}/5 matches: {', '.join(match_details)})" -``` - ## Claim Rejection Patterns -The following patterns should trigger automatic claim rejection: +The following inconsisten patterns should trigger automatic claim rejection: ```python -# Genealogy sources - ALWAYS REJECT +# Genealogy sources conflict - ALWAYS REJECT GENEALOGY_DOMAINS = [ 'geni.com', 'ancestry.com', 'ancestry.co.uk', 'familysearch.org', 'findagrave.com', 'myheritage.com', 'wikitree.com', 'geneanet.org' @@ -293,7 +210,7 @@ LOCATION_PAIRS = [ ('caracas', 'london'), ('caracas', 'amsterdam'), ] -# Age impossibility - if birth year makes current career implausible, REJECT +# Age impossibility - if birth year makes current career implausible, REJECT. For instance, for a Junior role: MIN_PLAUSIBLE_BIRTH_YEAR = 1945 # Would be 80 in 2025 - still plausible but verify MAX_PLAUSIBLE_BIRTH_YEAR = 2002 # Would be 23 in 2025 - plausible for junior roles ``` diff --git a/.opencode/rules/entity_resolution/disambiguation-entity-profiles.md b/.opencode/rules/entity_resolution/disambiguation-entity-profiles.md new file mode 100644 index 0000000000..9b1998b873 --- /dev/null +++ b/.opencode/rules/entity_resolution/disambiguation-entity-profiles.md @@ -0,0 +1,248 @@ +# Rule 47: Disambiguation Entity Profiles - Prevent Repeated Entity Resolution Errors + +## Status: CRITICAL + +## Summary + +When entity resolution determines that a web source describes a **different person** with a similar name, **create a PPID profile for that person** in `data/person/`. The PPID system is universal - ANY person who ever lived can have a profile, regardless of heritage relevance. + +--- + +## The Universal PPID Principle + +**In principle, all persons on Earth should be assigned PPIDs** - whether or not they are active in the heritage field. This includes: + +- Heritage workers (curators, archivists, librarians, etc.) +- Non-heritage professionals (actors, doctors, athletes, etc.) +- Historical persons (deceased individuals from any era) +- Public figures and private individuals + +The `heritage_relevance` field indicates whether someone works in the heritage sector, but does NOT determine whether they can have a profile. **Anyone can have a PPID.** + +--- + +## The Problem + +During entity resolution, we often discover that web search results describe a **different person** with a similar name: + +| Heritage Profile | Namesake Discovered | Why Different | +|------------------|---------------------|---------------| +| Carmen Juliรก (UK curator) | Carmen Julia รlvarez (Venezuelan actress) | Different profession, location, timeline | +| Jan de Vries (Rijksmuseum curator) | Jan de Vries (footballer) | Different profession | +| Robert Ritter (heritage worker) | Robert Ritter (Nazi doctor, 1901-1951) | Different era, profession | + +Without creating a profile for the namesake, future enrichment attempts may: +1. Re-discover the same namesake +2. Waste time re-investigating +3. Risk attributing false claims again + +--- + +## The Solution: Create PPID Profiles for Namesakes + +When entity resolution proves two entities are different, **create a regular PPID profile for the namesake**: + +1. Use standard PPID naming convention (no special prefix) +2. Set `heritage_relevance.is_heritage_relevant: false` +3. Document the disambiguation in BOTH profiles + +--- + +## Example: Venezuelan Actress Profile + +```json +{ + "ppid": "ID_VE-XX-CCS_1952_VE-XX-CCS_XXXX_CARMEN-JULIA-ALVAREZ", + "profile_data": { + "full_name": "Carmen Julia รlvarez", + "profession": "actress", + "nationality": "Venezuelan", + "birth_year": 1952, + "birth_location": "Caracas, Venezuela", + "active_period": "1970s-2000s" + }, + "heritage_relevance": { + "is_heritage_relevant": false, + "relevance_score": 0.0, + "reason": "Entertainment industry professional - actress in film and television" + }, + "disambiguation_notes": { + "commonly_confused_with": [ + { + "ppid": "ID_UK-XX-XXX_XXXX_UK-XX-XXX_XXXX_CARMEN-JULIA", + "name": "Carmen Juliรก", + "profession": "curator", + "employer": "New Contemporaries", + "location": "UK", + "why_different": "Different profession (actress vs curator), different location (Venezuela vs UK), overlapping active periods in incompatible roles" + } + ], + "disambiguation_note": "This is the Venezuelan actress, NOT the UK-based art curator." + }, + "web_claims": [ + { + "claim_type": "birth_year", + "claim_value": 1952, + "provenance": { + "source_url": "https://en.wikipedia.org/wiki/Carmen_Julia_รlvarez", + "retrieved_on": "2026-01-11T14:30:00Z", + "retrieval_agent": "manual-human-curator" + } + }, + { + "claim_type": "profession", + "claim_value": "actress", + "provenance": { + "source_url": "https://en.wikipedia.org/wiki/Carmen_Julia_รlvarez", + "retrieved_on": "2026-01-11T14:30:00Z", + "retrieval_agent": "manual-human-curator" + } + } + ], + "extraction_metadata": { + "created_at": "2026-01-11T15:00:00Z", + "created_by": "manual-human-curator", + "creation_reason": "Created during entity resolution to distinguish from heritage worker Carmen Juliรก" + } +} +``` + +--- + +## Update the Heritage Profile Too + +The heritage profile should also reference the disambiguation: + +```json +{ + "ppid": "ID_UK-XX-XXX_XXXX_UK-XX-XXX_XXXX_CARMEN-JULIA", + "profile_data": { + "full_name": "Carmen Juliรก", + "headline": "Curator at New Contemporaries" + }, + "heritage_relevance": { + "is_heritage_relevant": true, + "relevance_score": 0.85 + }, + "disambiguation_notes": { + "known_namesakes": [ + { + "ppid": "ID_VE-XX-CCS_1952_VE-XX-CCS_XXXX_CARMEN-JULIA-ALVAREZ", + "name": "Carmen Julia รlvarez", + "profession": "actress", + "location": "Venezuela", + "why_not_same_person": "Different profession, location, timeline" + } + ], + "disambiguation_warning": "Web searches for 'Carmen Julia' return data about Venezuelan actress Carmen Julia รlvarez (born 1952). This is a DIFFERENT person." + } +} +``` + +--- + +## When to Create Namesake Profiles + +Create a PPID profile for a namesake when: + +1. **Entity resolution proves they are a different person** +2. **They are notable enough** to appear in search results repeatedly (Wikipedia, IMDB, news) +3. **The confusion risk is high** (similar name, some overlapping attributes) + +**Do NOT create profiles for**: +- Random social media accounts with no notable presence +- Obvious mismatches unlikely to recur in searches + +--- + +## Benefits + +1. **Universal person database**: Any person can have a PPID +2. **Prevents repeated mistakes**: Future enrichment can check for known namesakes +3. **Bidirectional linking**: Both profiles reference each other +4. **Consistent data model**: No special file naming or profile types needed +5. **Audit trail**: Documents why profiles were created + +--- + +## Workflow + +### Step 1: During Entity Resolution + +When you reject a claim due to identity mismatch with a notable namesake: + +``` +1. Document WHY the source describes a different person +2. Check if the namesake is notable (Wikipedia, IMDB, frequent search results) +3. If notable โ†’ Create PPID profile for the namesake +4. Link both profiles via disambiguation_notes +``` + +### Step 2: Create Namesake Profile + +Use standard PPID naming: +``` +ID_{birth-location}_{birth-decade}_{current-location}_{death-decade}_{NAME}.json +``` + +Example: `ID_VE-XX-CCS_1952_VE-XX-CCS_XXXX_CARMEN-JULIA-ALVAREZ.json` + +### Step 3: Update Both Profiles + +- Namesake profile: Add `commonly_confused_with` pointing to heritage profile +- Heritage profile: Add `known_namesakes` pointing to namesake profile + +--- + +## Historical Persons + +Historical persons (deceased) can also have PPID profiles: + +```json +{ + "ppid": "ID_DE-XX-XXX_1901_DE-XX-XXX_1951_ROBERT-RITTER", + "profile_data": { + "full_name": "Robert Ritter", + "profession": "physician", + "birth_year": 1901, + "death_year": 1951, + "nationality": "German", + "historical_note": "Nazi-era physician involved in racial hygiene programs" + }, + "heritage_relevance": { + "is_heritage_relevant": false, + "relevance_score": 0.0 + }, + "disambiguation_notes": { + "commonly_confused_with": [ + { + "ppid": "ID_XX-XX-XXX_XXXX_XX-XX-XXX_XXXX_ROBERT-RITTER", + "name": "Robert Ritter", + "profession": "heritage worker", + "why_different": "Different era - historical figure (1901-1951) vs living heritage professional" + } + ] + } +} +``` + +--- + +## Related Rules + +- **Rule 46**: Entity Resolution - Names Are NEVER Sufficient +- **Rule 21**: Data Fabrication is Strictly Prohibited +- **Rule 26**: Person Data Provenance - Web Claims for Staff Information + +--- + +## Summary + +**The PPID system is universal.** When you discover during entity resolution that a web source describes a different person: + +1. **Create a regular PPID profile** for the namesake (actress, historical figure, etc.) +2. **Set `heritage_relevance.is_heritage_relevant: false`** (unless they happen to also work in heritage) +3. **Link both profiles** via `disambiguation_notes` +4. **Use standard PPID naming** - no special prefixes needed + +This builds a comprehensive person database while preventing entity resolution errors. diff --git a/.opencode/rules/entity_resolution/entity-resolution-no-heuristics.md b/.opencode/rules/entity_resolution/entity-resolution-no-heuristics.md new file mode 100644 index 0000000000..c7f1971595 --- /dev/null +++ b/.opencode/rules/entity_resolution/entity-resolution-no-heuristics.md @@ -0,0 +1,307 @@ +# Rule 46: Entity Resolution - Names Are NEVER Sufficient + +## Status: CRITICAL + +## ๐Ÿšจ DATA QUALITY IS OF UTMOST IMPORTANCE ๐Ÿšจ + +**Wrong data is worse than no data.** Attributing a birth year, spouse, or social media profile to the wrong person is a **critical data quality failure** that undermines the entire dataset's trustworthiness. + +**ALL enrichments MUST be done MANUALLY and double-checked.** Automated web search enrichment has been DISABLED due to catastrophic entity resolution failures (540+ false claims removed in Jan 2026). + +**The cost of false data**: +- Corrupts downstream analysis and reporting +- Creates legal/privacy risks (attributing data to wrong person) +- Destroys user trust in the dataset +- Requires expensive manual cleanup + +--- + +## ๐Ÿšซ AUTOMATED ENRICHMENT IS PROHIBITED ๐Ÿšซ + +**DO NOT USE** automated scripts to enrich person profiles with web search data. + +**Why automated enrichment failed**: +- Web searches return data about DIFFERENT people with similar names +- Regex pattern matching cannot distinguish between namesakes +- Wikipedia, IMDB, ResearchGate, Instagram all returned data from wrong people +- Example: "Carmen Juliรก" search returned Venezuelan actress, Mexican hydrogeologist, Spanish medievalist - NONE were the UK art curator + +**ONLY ALLOWED enrichment methods**: +1. **Manual research** - Human curator verifies source refers to the correct person +2. **Institutional sources** - Data from the person's employer website (verified) +3. **LinkedIn profile data** - Already verified via direct profile access +4. **ORCID/Wikidata** - If the person has a verified identifier + +--- + +## The Core Principle + +๐Ÿšจ **SIMILAR OR IDENTICAL NAMES ARE NEVER SUFFICIENT FOR ENTITY RESOLUTION.** + +A web search result mentioning "Carmen Juliรก born 1952" is **NOT** evidence that the Carmen Juliรก in our person profile was born in 1952. Names are not unique identifiers - there are thousands of people with the same name worldwide. + +**Entity resolution requires verification of MULTIPLE independent identity attributes:** + +| Attribute | Purpose | Example | +|-----------|---------|---------| +| **Age/Birth Year** | Temporal consistency | Both sources describe someone in their 40s | +| **Career Path** | Professional identity | Both are art curators, not one curator and one actress | +| **Location** | Geographic consistency | Both are based in UK, not one UK and one Venezuela | +| **Employer** | Institutional affiliation | Both work at New Contemporaries | +| **Education** | Academic background | Same university or field | + +**Minimum Requirement**: At least **3 of 5** attributes must match before attributing ANY claim from a web source. Name match alone = **AUTOMATIC REJECTION**. + +## Problem Statement + +When enriching person profiles via web search (Linkup, Exa, etc.), search results often return data about **different people with similar or identical names**. Without proper entity resolution, the enrichment process can attribute false claims to the wrong person. + +**Example Failure** (Carmen Juliรก - UK Art Curator): +- Source profile: Carmen Juliรก, Curator at New Contemporaries (UK) +- Birth year extracted: 1952 from Carmen Julia **รlvarez** (Venezuelan actress) +- Spouse extracted: "actors Eduardo Serrano" from the Venezuelan actress +- ResearchGate: Carmen Julia **Navarro** (Mexican hydrogeologist) +- Academia.edu: Carmen Julia **Gutiรฉrrez** (Spanish medieval studies) + +All data is from **different people** - none is the actual Carmen Juliรก who is a UK-based art curator. + +**Why This Happened**: The enrichment script used regex pattern matching to extract "born 1952" without verifying that the Wikipedia article described the SAME person. + +## The Rule + +### DO NOT use name matching as the basis for entity resolution. EVER. + +For person enrichment via web search: + +**FORBIDDEN** (Name-based extraction): +- โŒ Extracting birth years from any search result mentioning "Carmen Julia born..." +- โŒ Attributing social media profiles just because the name appears +- โŒ Claiming relationships (spouse, parent, child) from web text pattern matching +- โŒ Assigning academic profiles (ResearchGate, Academia.edu, Google Scholar) based on name matching alone +- โŒ Using Wikipedia articles without verifying ALL identity attributes +- โŒ Trusting genealogy sites (Geni, Ancestry, MyHeritage) which describe historical namesakes +- โŒ Using IMDB for birth years (actors with same names) + +**REQUIRED** (Multi-Attribute Entity Resolution): +1. **Verify identity via MULTIPLE attributes** - name alone is INSUFFICIENT +2. **Cross-reference with known facts** (employer, location, job title from LinkedIn) +3. **Detect conflicting signals** - actress vs curator, Venezuela vs UK, 1950s birth vs active 2020s career +4. **Reject ambiguous matches** - if source doesn't clearly identify the same person, reject the claim +5. **Document rejection rationale** - log why claim was rejected for audit trail + +## Entity Resolution Verification Checklist + +Before attributing a web claim to a person profile, verify MULTIPLE identity attributes: + +| # | Attribute | What to Check | Example Match | Example Conflict | +|---|-----------|---------------|---------------|------------------| +| 1 | **Career/Profession** | Same field/industry | Both are curators | Source says "actress", profile is curator | +| 2 | **Employer** | Same institution | Both at Rijksmuseum | Source says "film studio", profile is museum | +| 3 | **Location** | Same city/country | Both UK-based | Source says Venezuela, profile is UK | +| 4 | **Age Range** | Plausible for career | Birth 1980s, active 2020s | Birth 1952, still active in 2025 as junior | +| 5 | **Education** | Same university/field | Both art history | Source says "medical school" | + +**Minimum requirement**: At least **3 of 5** attributes must match. Name match alone = **AUTOMATIC REJECTION**. + +**Any conflicting signal = AUTOMATIC REJECTION** (e.g., source says "actress" when profile is "curator"). + +## Sources with High Entity Resolution Risk + +These sources are NOT forbidden, but require **stricter verification thresholds** due to high false-positive rates: + +| Source Type | Risk Level | Why | Required Matches | +|-------------|------------|-----|------------------| +| Genealogy sites | CRITICAL | Historical persons with same name | 5/5 attributes (or explicit link to living person) | +| IMDB | CRITICAL | Actors with common names | 5/5 attributes (unless person works in film/TV) | +| Wikipedia | HIGH | Many people with same name have pages | 4/5 attributes match | +| Academic profiles | HIGH | Multiple researchers with same name | 4/5 attributes + institution match | +| Social media | HIGH | Many accounts with similar handles | 4/5 attributes + verify employer/location in bio | +| News articles | MEDIUM | May mention multiple people | 3/5 attributes + read full context | +| Institutional websites | LOW | Usually about their own staff | 2/5 attributes (good source if person works there) | + +**Key point**: High-risk sources CAN be used if you verify enough identity attributes. The risk level determines the verification threshold, not whether the source is allowed. + +## Red Flags Requiring Investigation + +The following are **red flags** that require careful investigation - NOT automatic rejection. People change careers and relocate. + +### Profession Differences +If source profession differs from profile profession, **investigate**: +``` +Source: "actress", "actor", "singer" +Profile: "curator", "archivist", "librarian" + +ASK: Did this person change careers? +- Check timeline: Did acting career END before heritage career BEGAN? +- Check for transition evidence: "former actress turned curator" +- If careers overlap in time โ†’ likely different people โ†’ REJECT +- If sequential careers with clear transition โ†’ may be same person โ†’ ACCEPT with documentation +``` + +### Location Differences +If source location differs from profile location, **investigate**: +``` +Source: "Venezuela", "Mexico", "Brazil" +Profile: "UK", "Netherlands", "France" + +ASK: Did this person relocate? +- Check timeline: When were they in each location? +- Check for migration evidence: education abroad, international career moves +- If locations overlap in time โ†’ likely different people โ†’ REJECT +- If sequential locations with clear move โ†’ may be same person โ†’ ACCEPT with documentation +``` + +### When to Actually REJECT + +Reject when investigation shows **no plausible connection**: +``` +Example: Carmen Julia รlvarez (Venezuelan actress, active 1970s-2000s) + vs Carmen Juliรก (UK curator, active 2015-present) + +- Overlapping active periods in DIFFERENT professions on DIFFERENT continents +- No evidence of career change or relocation +- Birth year 1952 makes current junior curator role implausible +โ†’ REJECT: These are clearly different people +``` + +### Age Conflicts (Still Automatic Rejection) +If source age is **physically implausible** for profile career stage, REJECT: +``` +Source: Born 1922, 1915, 1939 +Profile: Currently active professional in 2025 +โ†’ REJECT (person would be 86-103 years old) + +Source: Born 2007, 2004 +Profile: Senior curator +โ†’ REJECT (person would be 18-21, too young) +``` + +### Genealogy Source +Genealogy sources require **5 of 5 attribute matches** due to high false-positive rates: +``` +Domains: geni.com, ancestry.*, familysearch.org, findagrave.com, myheritage.* +โ†’ REQUIRE 5/5 attribute matches (these often describe historical namesakes) +โ†’ Exception: If source explicitly links to living person with verifiable connection +``` + +## Claim Rejection Patterns + +The following inconsisten patterns should trigger automatic claim rejection: + +```python +# Genealogy sources conflict - ALWAYS REJECT +GENEALOGY_DOMAINS = [ + 'geni.com', 'ancestry.com', 'ancestry.co.uk', 'familysearch.org', + 'findagrave.com', 'myheritage.com', 'wikitree.com', 'geneanet.org' +] + +# Profession conflicts - if profile has one and source has another, REJECT +PROFESSION_CONFLICTS = { + 'heritage': ['curator', 'archivist', 'librarian', 'conservator', 'registrar', 'collection manager'], + 'entertainment': ['actress', 'actor', 'singer', 'footballer', 'politician', 'model', 'athlete'], + 'medical': ['doctor', 'nurse', 'surgeon', 'physician'], + 'tech': ['software engineer', 'developer', 'programmer'], +} + +# Location conflicts - if source describes person in location X and profile is location Y, REJECT +LOCATION_PAIRS = [ + ('venezuela', 'uk'), ('venezuela', 'netherlands'), ('venezuela', 'germany'), + ('mexico', 'uk'), ('mexico', 'netherlands'), ('brazil', 'france'), + ('caracas', 'london'), ('caracas', 'amsterdam'), +] + +# Age impossibility - if birth year makes current career implausible, REJECT. For instance, for a Junior role: +MIN_PLAUSIBLE_BIRTH_YEAR = 1945 # Would be 80 in 2025 - still plausible but verify +MAX_PLAUSIBLE_BIRTH_YEAR = 2002 # Would be 23 in 2025 - plausible for junior roles +``` + +## Handling Rejected Claims + +When a claim fails entity resolution: + +```json +{ + "claim_type": "birth_year", + "claim_value": 1952, + "entity_resolution": { + "status": "REJECTED", + "reason": "conflicting_profession", + "details": "Source describes Venezuelan actress, profile is UK curator", + "source_identity": "Carmen Julia รlvarez (Venezuelan actress)", + "profile_identity": "Carmen Juliรก (UK art curator)", + "rejected_at": "2026-01-11T15:00:00Z", + "rejected_by": "entity_resolution_validator_v1" + } +} +``` + +## Special Cases + +### Common Names + +For very common names (e.g., "John Smith", "Maria Garcรญa", "Jan de Vries"), require **4 of 5** verification checks instead of 3. The more common the name, the higher the threshold. + +| Name Commonality | Required Matches | +|------------------|------------------| +| Unique name (e.g., "Xander Vermeulen-Oosterhuis") | 2 of 5 | +| Moderately common (e.g., "Carmen Juliรก") | 3 of 5 | +| Very common (e.g., "Jan de Vries") | 4 of 5 | +| Extremely common (e.g., "John Smith") | 5 of 5 or reject | + +### Abbreviated Names + +For profiles with abbreviated names (e.g., "J. Smith"), entity resolution is inherently uncertain: +- Set `entity_resolution_confidence: "very_low"` +- Require **human review** for all claims +- Do NOT attribute web claims automatically + +### Historical Persons + +When sources describe historical/deceased persons: +- Check if death date conflicts with profile activity (living person active in 2025) +- **ALWAYS REJECT** genealogy site data +- Reject any source describing events before 1950 unless profile is known to be historical + +### Wikipedia Articles + +Wikipedia is particularly dangerous because: +- Many people with the same name have articles +- Search engines return Wikipedia first +- The Wikipedia Carmen Julia รlvarez article describes a Venezuelan actress born 1952 +- This is a DIFFERENT PERSON from Carmen Juliรก the UK curator + +**For Wikipedia sources**: +1. Read the FULL article, not just snippets +2. Verify the Wikipedia subject's profession matches the profile +3. Verify the Wikipedia subject's location matches the profile +4. If ANY conflict detected โ†’ REJECT + +## Audit Trail + +All entity resolution decisions must be logged: + +```json +{ + "enrichment_history": [ + { + "enrichment_timestamp": "2026-01-11T15:00:00Z", + "enrichment_agent": "enrich_person_comprehensive.py v1.4.0", + "entity_resolution_decisions": [ + { + "source_url": "https://en.wikipedia.org/wiki/Carmen_Julia_รlvarez", + "decision": "REJECTED", + "reason": "Different person - Venezuelan actress, not UK curator" + } + ], + "claims_rejected_count": 5, + "claims_accepted_count": 1 + } + ] +} +``` + +## See Also + +- Rule 21: Data Fabrication is Strictly Prohibited +- Rule 26: Person Data Provenance - Web Claims for Staff Information +- Rule 45: Inferred Data Must Be Explicit with Provenance diff --git a/.opencode/rules/entity_resolution/inferred-data-explicit-provenance-rule.md b/.opencode/rules/entity_resolution/inferred-data-explicit-provenance-rule.md new file mode 100644 index 0000000000..262a5bb03c --- /dev/null +++ b/.opencode/rules/entity_resolution/inferred-data-explicit-provenance-rule.md @@ -0,0 +1,422 @@ +# Rule 45: Inferred Data Must Be Explicit with Provenance + +**Status**: Active +**Created**: 2025-01-09 +**Applies to**: PPID enrichment, person entity profiles, any data inference + +## Core Principle + +**All inferred data MUST be stored in explicit `inferred_*` fields with full provenance statements. Inferred values MUST NEVER silently replace or merge with verified data.** + +This ensures: +1. **Transparency**: Users can distinguish verified facts from heuristic estimates +2. **Auditability**: The inference method and source observations are traceable +3. **Reversibility**: Inferred data can be corrected when verified data becomes available +4. **Quality Signals**: Confidence levels and argument chains are preserved + +## Required Structure for Inferred Data + +Every inferred claim MUST include: + +```yaml +inferred_[field_name]: + value: "the inferred value" + edtf: "196X" # For dates: EDTF notation + formatted: "NL-UT-UTR" # For locations: CC-RR-PPP format + confidence: "low|medium|high" + inference_provenance: + method: "heuristic_name" + inference_chain: + - step: 1 + observation: "University start year 1986" + source_field: "profile_data.education[0].date_range" + source_value: "1986 - 1990" + - step: 2 + assumption: "University entry at age 18" + rationale: "Standard Dutch university entry age" + - step: 3 + calculation: "1986 - 18 = 1968" + result: "Estimated birth year 1968" + - step: 4 + generalization: "Round to decade โ†’ 196X" + rationale: "EDTF decade notation for uncertain years" + inferred_at: "2025-01-09T18:00:00Z" + inferred_by: "enrich_ppids.py" +``` + +## Explicit Inferred Fields + +### For Person Profiles (PPID) + +| Inferred Field | Source Observations | Heuristic | +|----------------|---------------------|-----------| +| `inferred_birth_year` | Earliest education/job dates | Entry age assumptions | +| `inferred_birth_decade` | Birth year estimate | EDTF decade notation | +| `inferred_birth_settlement` | School/university location | Residential proximity | +| `inferred_birth_region` | Settlement location | GeoNames admin1 | +| `inferred_birth_country` | Settlement location | GeoNames country | +| `inferred_current_settlement` | Profile location, current job | Direct extraction | +| `inferred_current_region` | Settlement location | GeoNames admin1 | +| `inferred_current_country` | Settlement location | GeoNames country | + +### Example: Complete Inferred Birth Data + +```json +{ + "ppid": "ID_NL-UT-UTR_196X_NL-UT-UTR_XXXX_AART-HARTEN", + + "birth_date": { + "edtf": "XXXX", + "precision": "unknown", + "note": "See inferred_birth_decade for heuristic estimate" + }, + + "inferred_birth_decade": { + "value": "196X", + "edtf": "196X", + "precision": "decade", + "confidence": "low", + "inference_provenance": { + "method": "earliest_education_heuristic", + "inference_chain": [ + { + "step": 1, + "observation": "University education record found", + "source_field": "profile_data.education[0]", + "source_value": { + "institution": "Universiteit Utrecht", + "degree": "Social & Organisational psychology, doctoraal", + "date_range": "1986 - 1990" + } + }, + { + "step": 2, + "extraction": "Start year extracted from date_range", + "extracted_value": 1986 + }, + { + "step": 3, + "assumption": "University entry age", + "assumed_value": 18, + "rationale": "Standard Dutch university entry age (post-VWO)", + "confidence_impact": "Assumption reduces confidence; actual age 17-20 possible" + }, + { + "step": 4, + "calculation": "1986 - 18 = 1968", + "result": "Estimated birth year: 1968" + }, + { + "step": 5, + "generalization": "Convert to EDTF decade", + "input": 1968, + "output": "196X", + "rationale": "Decade precision appropriate for heuristic estimate" + } + ], + "inferred_at": "2025-01-09T18:00:00Z", + "inferred_by": "enrich_ppids.py" + } + }, + + "inferred_birth_settlement": { + "value": "Utrecht", + "formatted": "NL-UT-UTR", + "confidence": "low", + "inference_provenance": { + "method": "earliest_education_location", + "inference_chain": [ + { + "step": 1, + "observation": "Earliest education institution identified", + "source_field": "profile_data.education[0].institution", + "source_value": "Universiteit Utrecht" + }, + { + "step": 2, + "lookup": "Institution location mapping", + "mapping_key": "Universiteit Utrecht", + "mapping_value": "Utrecht, Netherlands" + }, + { + "step": 3, + "geocoding": "GeoNames resolution", + "query": "Utrecht", + "country_code": "NL", + "result": { + "geonames_id": 2745912, + "name": "Utrecht", + "admin1_code": "09", + "admin1_name": "Utrecht" + } + }, + { + "step": 4, + "formatting": "CC-RR-PPP generation", + "country_code": "NL", + "region_code": "UT", + "settlement_code": "UTR", + "result": "NL-UT-UTR" + } + ], + "assumption_note": "University location used as proxy for birth location; student may have relocated for education", + "inferred_at": "2025-01-09T18:00:00Z", + "inferred_by": "enrich_ppids.py" + } + } +} +``` + +## List-Valued Inferred Data (EDTF Set Notation) + +When inference yields multiple plausible values (e.g., someone born in 1968 could be in either the 1960s or 1970s decade), store as a **list** with EDTF set notation. + +### EDTF Set Notation Standards + +| Notation | Meaning | Use Case | +|----------|---------|----------| +| `[196X,197X]` | One of these values | Person born in late 1960s (uncertainty spans decades) | +| `{196X,197X}` | All of these values | NOT for birth decade (use `[...]`) | +| `[1965..1970]` | Range within set | Birth year between 1965-1970 | + +### When to Use List Values + +1. **Decade Boundary Cases**: Estimated birth year is within 3 years of a decade boundary + - Estimated 1968 โ†’ `[196X,197X]` (could be late 60s or early 70s due to age assumption variance) + - Estimated 1972 โ†’ `[196X,197X]` (same logic) + - Estimated 1975 โ†’ `197X` (confidently mid-decade) + +2. **Multiple Plausible Locations**: Student attended schools in different cities + - `["NL-UT-UTR", "NL-NH-AMS"]` with provenance explaining each candidate + +### Example: List-Valued Birth Decade + +```json +{ + "inferred_birth_decade": { + "values": ["196X", "197X"], + "edtf": "[196X,197X]", + "edtf_meaning": "one of: 1960s or 1970s", + "precision": "decade_set", + "confidence": "low", + "primary_value": "196X", + "primary_rationale": "1968 is closer to 1960s center than 1970s", + "inference_provenance": { + "method": "earliest_observation_heuristic", + "inference_chain": [ + { + "step": 1, + "observation": "University start 1986", + "source_field": "profile_data.education[0].date_range" + }, + { + "step": 2, + "assumption": "University entry at age 18 (ยฑ3 years)", + "rationale": "Dutch university entry typically 17-21" + }, + { + "step": 3, + "calculation": "1986 - 18 = 1968 (range: 1965-1971)", + "result": "Birth year estimate: 1968 with variance 1965-1971" + }, + { + "step": 4, + "generalization": "Birth year range spans decade boundary", + "input_range": [1965, 1971], + "output": ["196X", "197X"], + "rationale": "Cannot determine which decade without additional evidence" + } + ], + "inferred_at": "2025-01-09T18:00:00Z", + "inferred_by": "enrich_ppids.py" + } + } +} +``` + +### PPID Generation with List Values + +When `inferred_birth_decade` is a list, use `primary_value` for PPID: + +```json +{ + "ppid": "ID_NL-UT-UTR_196X_NL-UT-UTR_XXXX_AART-HARTEN", + "ppid_components": { + "first_date": "196X", + "first_date_source": "inferred_birth_decade.primary_value", + "first_date_alternatives": ["197X"] + } +} +``` + +### Example: List-Valued Location + +```json +{ + "inferred_birth_settlement": { + "values": [ + {"settlement": "Utrecht", "formatted": "NL-UT-UTR"}, + {"settlement": "Amsterdam", "formatted": "NL-NH-AMS"} + ], + "primary_value": "NL-UT-UTR", + "primary_rationale": "Earlier education (1986) in Utrecht; Amsterdam job later (1990)", + "confidence": "very_low", + "inference_provenance": { + "method": "education_locations", + "inference_chain": [ + { + "step": 1, + "observation": "Multiple education institutions found", + "source_field": "profile_data.education", + "candidates": ["Universiteit Utrecht (1986)", "UvA (1990)"] + }, + { + "step": 2, + "assumption": "Earlier education more likely near birth location", + "rationale": "Students often attend local university first" + } + ] + } + } +} +``` + +## Confidence Levels + +| Level | Criteria | Example | +|-------|----------|---------| +| **high** | Direct extraction from authoritative source | Profile states "Born in Amsterdam" | +| **medium** | Single-step inference with reliable source | Current job location from employment record | +| **low** | Multi-step heuristic with assumptions | Birth year from university start date | +| **very_low** | Speculative, multiple assumptions, or list-valued | Birth location from first observed location, or decade spanning boundary | + +## Anti-Patterns (FORBIDDEN) + +### โŒ Silent Replacement +```json +{ + "birth_date": { + "edtf": "196X", + "precision": "decade" + } +} +``` +**Problem**: No indication this is inferred, no provenance, no confidence level. + +### โŒ Hidden in Metadata +```json +{ + "birth_date": { + "edtf": "196X" + }, + "enrichment_metadata": { + "birth_date_inferred": true + } +} +``` +**Problem**: Inference metadata separated from the value; easy to miss. + +### โŒ Missing Inference Chain +```json +{ + "inferred_birth_decade": { + "value": "196X", + "method": "heuristic" + } +} +``` +**Problem**: No explanation of HOW the value was derived; not auditable. + +## Correct Pattern โœ… + +```json +{ + "birth_date": { + "edtf": "XXXX", + "precision": "unknown", + "note": "See inferred_birth_decade" + }, + "inferred_birth_decade": { + "value": "196X", + "edtf": "196X", + "confidence": "low", + "inference_provenance": { + "method": "earliest_education_heuristic", + "inference_chain": [ + {"step": 1, "observation": "...", "source_field": "...", "source_value": "..."}, + {"step": 2, "assumption": "...", "rationale": "..."}, + {"step": 3, "calculation": "...", "result": "..."} + ], + "inferred_at": "2025-01-09T18:00:00Z", + "inferred_by": "enrich_ppids.py" + } + } +} +``` + +## PPID Component Handling + +When inferred values are used in PPID components: + +```json +{ + "ppid": "ID_NL-UT-UTR_196X_NL-NH-AMS_XXXX_AART-HARTEN", + "ppid_components": { + "type": "ID", + "first_location": "NL-UT-UTR", + "first_location_source": "inferred_birth_settlement", + "first_date": "196X", + "first_date_source": "inferred_birth_decade", + "last_location": "NL-NH-AMS", + "last_location_source": "inferred_current_settlement", + "last_date": "XXXX", + "name_tokens": ["AART", "HARTEN"] + } +} +``` + +The `*_source` fields document which inferred field was used for PPID generation. + +## Upgrade Path: Inferred โ†’ Verified + +When verified data becomes available: + +1. **Keep inferred data** in `inferred_*` fields for audit trail +2. **Add verified data** to canonical fields +3. **Mark inferred as superseded**: + +```json +{ + "birth_date": { + "edtf": "1967-03-15", + "precision": "day", + "verified": true, + "source": "official_record" + }, + "inferred_birth_decade": { + "value": "196X", + "superseded": true, + "superseded_by": "birth_date", + "superseded_at": "2025-01-15T10:00:00Z", + "accuracy_assessment": "Inferred decade was correct (1960s), actual year 1967" + } +} +``` + +## Implementation Checklist + +For any enrichment script: + +- [ ] Create explicit `inferred_*` fields for ALL inferred data +- [ ] Include `inference_provenance` with complete `inference_chain` +- [ ] Record each step: observation โ†’ assumption โ†’ calculation โ†’ result +- [ ] Set appropriate `confidence` level +- [ ] Add `*_source` references in PPID components +- [ ] Preserve original unknown values (`XXXX`, `XX-XX-XXX`) +- [ ] Add `note` in canonical fields pointing to inferred alternatives + +## Related Rules + +- **Rule 44**: PPID Birth Date Enrichment and EDTF Unknown Date Notation +- **Rule 35**: Provenance Statements MUST Have Dual Timestamps +- **Rule 6**: WebObservation Claims MUST Have XPath Provenance diff --git a/.opencode/rules/entity_resolution/kien-authoritative-source-rule.md b/.opencode/rules/entity_resolution/kien-authoritative-source-rule.md new file mode 100644 index 0000000000..55460bb372 --- /dev/null +++ b/.opencode/rules/entity_resolution/kien-authoritative-source-rule.md @@ -0,0 +1,251 @@ +# Rule 40: KIEN Registry is Authoritative for Intangible Heritage Custodians + +## Summary + +For Intangible Heritage Custodians (Type I), the KIEN registry at `https://www.immaterieelerfgoed.nl/` is the **TIER_1_AUTHORITATIVE** source for contact data and addresses. Google Maps enrichment is **TIER_3_CROWD_SOURCED** and should NEVER override KIEN data. + +## Empirical Validation (January 2025) + +A comprehensive audit of 188 Type I custodian files revealed: + +| Category | Count | Percentage | +|----------|-------|------------| +| โœ… Google Maps matches OK | 101 | 53.7% | +| ๐Ÿ”ง **FALSE_MATCH detected** | **62** | **33.0%** | +| โš ๏ธ No official website (valid) | 20 | 10.6% | +| ๐Ÿ“ญ No Google Maps data | 5 | 2.7% | + +**Key Finding: 33% of Google Maps enrichment data for Type I custodians was incorrect.** + +### False Match Categories Identified + +1. **Domain mismatches** (39 files): Google Maps website โ‰  KIEN official website +2. **Name mismatches** (8 files): Completely different organizations (e.g., "Ria Bos" heritage practitioner โ†’ "Ria Money Transfer Agent") +3. **Wrong location** (6 files): Same-ish name but different city (Amsterdamโ†’Den Haag, Netherlandsโ†’Suriname!) +4. **Wrong organization type** (5 files): Federation vs specific member, heritage org vs webshop +5. **Different entity type** (3 files): Organization vs location/street name +6. **Different event** (3 files): Horse racing vs festival, different village's event + +### Why Google Maps Fails for Type I + +Google Maps is optimized for commercial businesses with physical storefronts. Type I intangible heritage custodians are fundamentally different: + +- **Virtual organizations** without commercial presence +- **Person-based heritage** (individual practitioners preserving traditional crafts) +- **Volunteer networks** meeting in private residences +- **Event-based organizations** that exist only during festivals +- **Federations** that coordinate member organizations without own premises + +## Rationale + +Google Maps frequently returns **false matches** for intangible heritage organizations because: + +1. **Virtual Organizations**: Many intangible heritage custodians operate as networks/platforms without commercial storefronts +2. **Name Collisions**: Common words in organization names (e.g., "Platform") match unrelated businesses +3. **No Physical Presence**: Organizations focused on intangible heritage (handwriting, oral traditions, crafts) often have no Google Maps listing +4. **Volunteer-Run**: Contact addresses are often private residences, not businesses + +KIEN (Kenniscentrum Immaterieel Erfgoed Nederland) is the official Dutch registry for intangible cultural heritage and maintains verified contact information directly from the organizations. + +## Data Tier Hierarchy for Type I Custodians + +| Priority | Source | Data Tier | Trust Level | +|----------|--------|-----------|-------------| +| 1st | KIEN Registry (`immaterieelerfgoed.nl`) | TIER_1_AUTHORITATIVE | Highest | +| 2nd | Organization's Official Website | TIER_2_VERIFIED | High | +| 3rd | Wikidata | TIER_3_CROWD_SOURCED | Medium | +| 4th | Google Maps | TIER_3_CROWD_SOURCED | Low (verify!) | + +## Required Workflow for Type I Enrichment + +### Step 1: Scrape KIEN Page First + +For every intangible heritage custodian, the KIEN profile page MUST be scraped to extract: + +```yaml +kien_enrichment: + kien_name: "Platform Handschriftontwikkeling" + kien_url: "https://www.immaterieelerfgoed.nl/nl/page/2476/platform-handschriftontwikkeling" + heritage_page_url: "https://www.immaterieelerfgoed.nl/nl/handschrift" + heritage_forms: + - "Ambachten, handwerk en techniek" + - "Sociale praktijken" + address: + street: "De Hazelaar 41" + postal_code: "6903 BB" + city: "Zevenaar" + province: "Gelderland" + country: "NL" + registered_since: "2019-11" + enrichment_timestamp: "2025-01-08T00:00:00Z" + source: "https://www.immaterieelerfgoed.nl" +``` + +### Step 2: Validate Google Maps Match (If Any) + +If Google Maps enrichment exists, compare against KIEN data: + +```python +def validate_google_maps_match(kien_data, gmaps_data): + """Check if Google Maps data matches KIEN authoritative source.""" + + # Check website domain match + kien_domain = extract_domain(kien_data.get('website')) + gmaps_domain = extract_domain(gmaps_data.get('website')) + + if kien_domain and gmaps_domain and kien_domain != gmaps_domain: + return { + 'status': 'FALSE_MATCH', + 'reason': f'Website mismatch: KIEN={kien_domain}, GMaps={gmaps_domain}' + } + + # Check name similarity + kien_name = kien_data.get('kien_name', '').lower() + gmaps_name = gmaps_data.get('name', '').lower() + + if fuzz.ratio(kien_name, gmaps_name) < 70: + return { + 'status': 'FALSE_MATCH', + 'reason': f'Name mismatch: KIEN="{kien_name}", GMaps="{gmaps_name}"' + } + + return {'status': 'VERIFIED'} +``` + +### Step 3: Mark False Matches + +When Google Maps returns a different organization: + +```yaml +google_maps_enrichment: + status: FALSE_MATCH + false_match_reason: >- + Google Maps returned "Platform 9 BV" (a health/coaching business at + Nieuwleusen) instead of "Platform Handschriftontwikkeling" (a virtual + handwriting development platform). These are completely different + organizations. KIEN registry is authoritative for this Type I custodian. + original_false_match: + place_id: ChIJNZ6o7H_fx0cR-TURAN3Bj54 + name: Platform 9 BV + formatted_address: Burg, Burgemeester Backxlaan 321, 7711 AD Nieuwleusen + website: http://www.platform9.nl/ + correction_timestamp: "2025-01-08T00:00:00Z" + correction_agent: opencode-claude-sonnet-4 +``` + +## KIEN Contact Data Extraction + +The KIEN heritage pages follow a consistent structure. Extract from the "Contact" section: + +``` +## Contact +[Organization Name](link-to-profile-page) +Street Address +Postal Code +City +Province +[Website](url) +Bijgeschreven in inventaris vanaf: [date] +``` + +### Example Extraction (from immaterieelerfgoed.nl/nl/handschrift): + +```yaml +contact: + organization: "Platform Handschriftontwikkeling" + profile_url: "https://www.immaterieelerfgoed.nl/nl/page/2476/platform-handschriftontwikkeling" + address: + street: "De Hazelaar 41" + postal_code: "6903 BB" + city: "Zevenaar" + province: "Gelderland" + website: "http://www.handschriftontwikkeling.nl/" + registered_since: "november 2019" +``` + +## Location Resolution for Type I + +When KIEN provides an address: + +1. **Use KIEN address** for `location.formatted_address` +2. **Geocode KIEN address** to get coordinates (NOT Google Maps coordinates) +3. **Update location_resolution** with method `KIEN_ADDRESS_GEOCODE` + +```yaml +location: + street_address: "De Hazelaar 41" + postal_code: "6903 BB" + city: Zevenaar + region_code: GE + country: NL + coordinate_provenance: + source_type: KIEN_ADDRESS_GEOCODE + source_url: "https://www.immaterieelerfgoed.nl/nl/handschrift" + geocoding_service: nominatim + geocoding_timestamp: "2025-01-08T00:00:00Z" +``` + +## Batch Re-Enrichment Script + +To fix all Type I custodians with potentially incorrect Google Maps data: + +```bash +# Find all Type I custodians +python scripts/rescrape_kien_contacts.py --type I --output data/custodian/ + +# This script should: +# 1. Read all NL-*-I-*.yaml files +# 2. Fetch KIEN page for each (from kien_enrichment.kien_url) +# 3. Extract contact/address from KIEN +# 4. Compare with google_maps_enrichment +# 5. Mark mismatches as FALSE_MATCH +# 6. Update location with KIEN address +``` + +## Anti-Patterns + +### WRONG - Using Google Maps as primary source for Type I: + +```yaml +# WRONG - Google Maps overriding KIEN data +location: + formatted_address: "Burg, Burgemeester Backxlaan 321, 7711 AD Nieuwleusen" + coordinate_provenance: + source_type: GOOGLE_MAPS # WRONG for Type I! +``` + +### CORRECT - KIEN as primary source: + +```yaml +# CORRECT - KIEN is authoritative +location: + street_address: "De Hazelaar 41" + postal_code: "6903 BB" + city: Zevenaar + coordinate_provenance: + source_type: KIEN_ADDRESS_GEOCODE # Correct! +``` + +## Affected Files + +This rule affects approximately 100+ Type I custodian files: +- `data/custodian/NL-*-I-*.yaml` + +All should be reviewed to ensure: +1. `kien_enrichment` contains address from KIEN page +2. `google_maps_enrichment` is validated against KIEN +3. `location` uses KIEN address (not Google Maps) +4. False matches are properly documented + +## Related Rules + +- **Rule 5**: NEVER Delete Enriched Data - Keep false match data in `original_false_match` +- **Rule 6**: WebObservation Claims - KIEN data should have provenance +- **Rule 22**: Custodian YAML Files Are Single Source of Truth +- **Rule 35**: Provenance Timestamps - Include KIEN fetch timestamps + +## See Also + +- KIEN Registry: https://www.immaterieelerfgoed.nl/ +- UNESCO Intangible Cultural Heritage: https://ich.unesco.org/ +- Dutch Intangible Heritage Network documentation diff --git a/.opencode/rules/entity_resolution/ppid-birth-date-enrichment-rule.md b/.opencode/rules/entity_resolution/ppid-birth-date-enrichment-rule.md new file mode 100644 index 0000000000..022e0e6fcd --- /dev/null +++ b/.opencode/rules/entity_resolution/ppid-birth-date-enrichment-rule.md @@ -0,0 +1,351 @@ +# Rule 44: PPID Birth Date Enrichment and Unknown Date Notation + +**Version**: 1.0.0 +**Created**: 2025-01-09 +**Status**: ACTIVE +**Related**: [PPID-GHCID Alignment](../../docs/plan/person_pid/10_ppid_ghcid_alignment.md) | [EDTF Specification](https://www.loc.gov/standards/datetime/) + +--- + +## 1. Summary + +When birth/death dates are missing from person entity sources, agents MUST: + +1. **Search for dates** using Exa Search and Linkup tools +2. **Record all enrichment data** as web claims with provenance +3. **If not found**, use **EDTF-compliant notation** for estimated/unknown dates +4. **Never fabricate** specific dates without source evidence + +--- + +## 2. Enrichment Workflow + +### 2.1 Required Search Before Using Unknown Notation + +Before marking a date as unknown, agents MUST attempt enrichment: + +``` +Person Entity (missing birth_date) + โ†“ +1. Search Exa: "{full_name} born birth date" + โ†“ +2. Search Exa: "{full_name} {known_employer}" + โ†“ +3. Search Linkup: "{full_name} biography" + โ†“ +4. If found โ†’ Record as web_claim with provenance + โ†“ +5. If NOT found โ†’ Use EDTF unknown notation + โ†“ +6. Record enrichment_attempt in metadata +``` + +### 2.2 Enrichment Search Requirements + +| Search Tool | Query Pattern | When to Use | +|-------------|---------------|-------------| +| `exa_web_search_exa` | `"{name}" born birthday birth date year` | Primary search | +| `exa_linkedin_search_exa` | `"{name}" at "{employer}"` | For work context | +| `linkup_linkup-search` | `"{name}" biography personal` | Deep research | + +### 2.3 Recording Successful Enrichment + +When birth date is found, record as web claim: + +```yaml +web_claims: + - claim_type: birth_date + claim_value: "1985-03-15" + source_url: "https://example.org/person/bio" + retrieved_on: "2025-01-09T14:30:00Z" + retrieval_agent: "opencode-claude-sonnet-4" + confidence_score: 0.85 + notes: "Found in biography section" +``` + +### 2.4 Recording Failed Enrichment Attempts + +Always record that enrichment was attempted: + +```yaml +enrichment_metadata: + birth_date_search: + attempted: true + search_date: "2025-01-09T14:30:00Z" + search_agent: "opencode-claude-sonnet-4" + search_tools_used: + - exa_web_search_exa + - linkup_linkup-search + queries_tried: + - '"Jan van Berg" born birthday' + - '"Jan van Berg" biography' + result: "NOT_FOUND" + notes: "No publicly available birth date found after comprehensive search" +``` + +--- + +## 3. EDTF-Compliant Unknown Date Notation + +### 3.1 Standard: Extended Date/Time Format (EDTF) + +This project follows the **Library of Congress EDTF Specification** (ISO 8601-2:2019) for representing uncertain, approximate, and unspecified dates. + +**Key EDTF Characters**: + +| Character | Meaning | EDTF Level | Example | +|-----------|---------|------------|---------| +| `X` | Unspecified digit | Level 1+ | `19XX` = some year 1900-1999 | +| `~` | Approximate (circa) | Level 1+ | `1985~` = circa 1985 | +| `?` | Uncertain | Level 1+ | `1985?` = possibly 1985 | +| `%` | Uncertain AND approximate | Level 1+ | `1985%` = possibly circa 1985 | +| `S` | Significant digits | Level 2 | `1950S2` = 1900-1999, estimated 1950 | +| `[..]` | One of set | Level 2 | `[1970,1980]` = either 1970 or 1980 | +| `{..}` | All of set | Level 2 | `{1970..1980}` = all years 1970-1980 | + +### 3.2 Unspecified Date Components (X Notation) + +Use `X` to replace unknown digits: + +| Known Information | EDTF Format | Meaning | +|-------------------|-------------|---------| +| Only decade known (1970s) | `197X` | Some year 1970-1979 | +| Only century known (1900s) | `19XX` | Some year 1900-1999 | +| Year unknown entirely | `XXXX` | Year unknown | +| Year known, month unknown | `1985-XX` | Some month in 1985 | +| Year+month known, day unknown | `1985-04-XX` | Some day in April 1985 | +| Year known, month+day unknown | `1985-XX-XX` | Some day in 1985 | +| Only decade and final digit known | `197X-XX-XX` or use set | 1970-1979 | + +### 3.3 Multiple Possible Decades (Set Notation) + +When the decade is uncertain but constrained to specific options: + +| Scenario | EDTF Format | Meaning | +|----------|-------------|---------| +| Born in 1970s OR 1980s | `[197X,198X]` | One of: some year in 1970s or 1980s | +| Born in specific years | `[1975,1985]` | Either 1975 or 1985 | +| Born 1970-1985 range | `1970/1985` | Interval: between 1970 and 1985 | + +### 3.4 Estimated Dates with Significant Digits + +When you can estimate a year with confidence bounds: + +``` +1975S2 = Estimated 1975, significant to 2 digits (1900-1999) +1975S3 = Estimated 1975, significant to 3 digits (1970-1979) +``` + +This is useful when you can estimate based on career timeline (e.g., "started working 1998, likely born 1970s"). + +### 3.5 Living Persons - Birth Date Estimation + +For living persons in LinkedIn data, estimate birth decade from: + +1. **Graduation year** (if available): Subtract ~22 years for bachelor's degree +2. **Career start** (first job): Subtract ~22-25 years +3. **Current role seniority**: "Senior" roles suggest 35+ years old + +```yaml +# Example: Person graduated 2010 +birth_date_estimate: + edtf: "1988S2" # Estimated 1988, significant to 2 digits (1980-1999) + estimation_method: "graduation_year_inference" + estimation_basis: "Graduated bachelor's 2010, estimated birth ~1988" + confidence: 0.60 +``` + +--- + +## 4. PPID Format with Unknown Dates + +### 4.1 PPID Date Component Rules + +The PPID format includes birth and death dates: + +``` +{TYPE}_{FL}_{FD}_{LL}_{LD}_{NT} + โ”‚ โ”‚ + โ”‚ โ””โ”€โ”€ Last Date (death) - EDTF format + โ””โ”€โ”€ First Date (birth) - EDTF format +``` + +### 4.2 Examples with Unknown Components + +| Scenario | PPID Example | +|----------|--------------| +| All known | `PID_NL-NH-AMS_1985-03-15_NL-NH-HAA_2020-08-22_JAN-BERG` | +| Birth year only | `ID_NL-NH-AMS_1985_XX-XX-XXX_XXXX_JAN-BERG` | +| Birth decade only | `ID_XX-XX-XXX_197X_XX-XX-XXX_XXXX_JAN-BERG` | +| Nothing known | `ID_XX-XX-XXX_XXXX_XX-XX-XXX_XXXX_JAN-BERG` | +| Living person | `ID_NL-NH-AMS_1985_XX-XX-XXX_XXXX_JAN-BERG` | + +### 4.3 Filename Safety + +EDTF characters are **filename-safe**: + +| Character | Filename Safe? | Notes | +|-----------|----------------|-------| +| `X` | YES | Uppercase letter | +| `~` | YES | Allowed on macOS/Linux/Windows | +| `?` | NO | Not allowed on Windows | +| `%` | CAUTION | URL encoding issues | +| `[` `]` | CAUTION | Shell escaping issues | +| `,` | YES | Allowed | +| `/` | NO | Directory separator | +| `\|` | CAUTION | Shell pipe, Windows disallowed | + +**Recommendation**: For filenames, use only: +- `X` for unknown digits +- `~` for approximate (suffix only) +- Avoid `?`, `%`, `[]`, `/`, `|` in filenames + +When set notation `[..]` is needed, store in metadata but use simplified form in filename: +- Filename: `ID_XX-XX-XXX_197X_...` (simplified) +- Metadata: `birth_date_edtf: "[1975,1985]"` (full EDTF) + +--- + +## 5. Decision Tree + +``` +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ Person entity missing birth_date โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ + โ–ผ +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ Search Exa + Linkup for birth date โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ + โ–ผ + โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” + โ”‚ Date found? โ”‚ + โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ + YES โ”‚ NO + โ–ผ โ”‚ โ–ผ +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ Record as โ”‚ โ”‚ Can estimate from career? โ”‚ +โ”‚ web_claim with โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ +โ”‚ provenance โ”‚ YES โ”‚ NO +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ–ผ โ”‚ โ–ผ + โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” + โ”‚ Use EDTF โ”‚ โ”‚ Use XXXX โ”‚ + โ”‚ estimate: โ”‚ โ”‚ (unknown) โ”‚ + โ”‚ 1988S2 or โ”‚ โ”‚ โ”‚ + โ”‚ 198X โ”‚ โ”‚ โ”‚ + โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ +``` + +--- + +## 6. Examples + +### 6.1 Fully Unknown (No Enrichment Found) + +```yaml +# Person: Nora Ruijs (student, no public birth info) +ppid: ID_XX-XX-XXX_XXXX_XX-XX-XXX_XXXX_NORA-RUIJS + +birth_date: + edtf: "XXXX" + precision: "unknown" + +enrichment_metadata: + birth_date_search: + attempted: true + search_date: "2025-01-09T14:30:00Z" + result: "NOT_FOUND" +``` + +### 6.2 Decade Estimated from Career + +```yaml +# Person: Senior curator, started career 1995 +ppid: ID_NL-NH-AMS_197X_XX-XX-XXX_XXXX_JAN-BERG + +birth_date: + edtf: "197X" + edtf_full: "1972S3" # Estimated 1972, significant to 3 digits + precision: "decade" + estimation_method: "career_start_inference" + estimation_basis: "Career started 1995 as junior curator, estimated age 23" +``` + +### 6.3 Multiple Possible Decades + +```yaml +# Person: Could be born 1970s or 1980s based on conflicting sources +ppid: ID_XX-XX-XXX_197X_XX-XX-XXX_XXXX_MARIA-SILVA # Simplified for filename + +birth_date: + edtf: "[197X,198X]" # Full EDTF with set notation + edtf_filename: "197X" # Simplified for filename (earlier estimate) + precision: "decade_uncertain" + notes: "Sources conflict: LinkedIn suggests 1980s, university bio suggests 1970s" +``` + +### 6.4 Exact Date Found via Enrichment + +```yaml +# Person: Birth date found on institutional bio page +ppid: ID_NL-NH-AMS_1985-03-15_XX-XX-XXX_XXXX_JAN-BERG + +birth_date: + edtf: "1985-03-15" + precision: "day" + +web_claims: + - claim_type: birth_date + claim_value: "1985-03-15" + source_url: "https://museum.nl/team/jan-berg" + retrieved_on: "2025-01-09T14:30:00Z" + retrieval_agent: "opencode-claude-sonnet-4" +``` + +--- + +## 7. Anti-Patterns + +### 7.1 FORBIDDEN: Fabricating Dates + +```yaml +# WRONG - No source, no search attempted +birth_date: + edtf: "1985-03-15" # Where did this come from?! +``` + +### 7.2 FORBIDDEN: Using Non-EDTF Notation + +```yaml +# WRONG - Not EDTF compliant +birth_date: "197~8~" # Invalid notation +birth_date: "1970s" # Use 197X instead +birth_date: "circa 1985" # Use 1985~ instead +birth_date: "unknown" # Use XXXX instead +``` + +### 7.3 FORBIDDEN: Skipping Enrichment Search + +```yaml +# WRONG - No search attempted +birth_date: + edtf: "XXXX" + # No enrichment_metadata showing search was attempted! +``` + +--- + +## 8. Validation Rules + +1. **Search Required**: Cannot use `XXXX` without `enrichment_metadata.birth_date_search.attempted: true` +2. **EDTF Compliance**: All dates must parse as valid EDTF (use validator) +3. **Filename Safety**: PPID filenames must avoid `?`, `%`, `[]`, `/`, `|` +4. **Provenance Required**: All found dates must have `web_claims` with source + +--- + +## 9. References + +- [EDTF Specification (Library of Congress)](https://www.loc.gov/standards/datetime/) +- [ISO 8601-2:2019](https://www.iso.org/standard/70908.html) +- [PPID-GHCID Alignment Document](../../docs/plan/person_pid/10_ppid_ghcid_alignment.md) +- [Rule 21: Data Fabrication Prohibition](../DATA_FABRICATION_PROHIBITION.md) diff --git a/.opencode/rules/exact-mapping-predicate-class-distinction.md b/.opencode/rules/exact-mapping-predicate-class-distinction.md index 662e6d6f42..6976454c71 100644 --- a/.opencode/rules/exact-mapping-predicate-class-distinction.md +++ b/.opencode/rules/exact-mapping-predicate-class-distinction.md @@ -5,8 +5,8 @@ ## The Rule 1. **Slots (Predicates)** MUST ONLY have `exact_mappings` to ontology **predicates** (properties). - * โŒ INVALID: Slot `analyzes_or_analyzed` maps to `schema:object` (a Class). - * โœ… VALID: Slot `analyzes_or_analyzed` maps to `crm:P129_is_about` (a Property). + * โŒ INVALID: Slot `analyze` maps to `schema:object` (a Class). + * โœ… VALID: Slot `analyze` maps to `crm:P129_is_about` (a Property). 2. **Classes (Entities)** MUST ONLY have `exact_mappings` to ontology **classes** (entities). * โŒ INVALID: Class `Person` maps to `foaf:name` (a Property). diff --git a/.opencode/rules/linkml/engineering-parsimony-and-domain-modeling-rule.md b/.opencode/rules/linkml/engineering-parsimony-and-domain-modeling-rule.md new file mode 100644 index 0000000000..5ab6222aa8 --- /dev/null +++ b/.opencode/rules/linkml/engineering-parsimony-and-domain-modeling-rule.md @@ -0,0 +1,65 @@ +# Rule: Engineering Parsimony and Domain Modeling + +## Critical Convention + +Our ontology follows an engineering-oriented approach: practical domain utility and +stable interoperability take priority over minimal, tool-specific class catalogs. + +## Rule + +1. Model domain concepts, not implementation tools. + - Reject classes like `ExaSearchMetadata`, `OpenAIFetchResult`, `ElasticsearchHit`. + +2. Prefer generic, reusable activity/entity classes for operational provenance. + - Use classes such as `ExternalSearchMetadata`, `RetrievalActivity`, `SearchResult`. + +3. Capture tool/vendor details in slot values, not class names. + - Record with generic predicates like `has_tool`, `has_method`, `has_agent`, `has_note`. + +4. Digital platforms acting as custodians are valid domain classes. + - Platform-as-custodian classes (for example YouTube-related custodian classes) are allowed. + - Data processing/search tools are not ontology class candidates. + +5. Avoid ontology growth driven by transient engineering stack choices. + - New class proposals must be justified by cross-tool, domain-stable semantics. + +## Rationale + +- Tool names are volatile implementation details and age quickly. +- Domain-level abstractions maximize reuse, query consistency, and mapping stability. +- This aligns with an engineering ontology practice where strict theoretical + parsimony in candidate theories is not the only optimization criterion; practical + semantic interoperability and maintainability are primary. + +## Examples + +### Wrong + +```yaml +classes: + ExaSearchMetadata: + class_uri: prov:Activity +``` + +### Correct + +```yaml +classes: + ExternalSearchMetadata: + class_uri: prov:Activity + slots: + - has_tool + - has_method + - has_agent +``` + +## References + +1. Liefke, K. (2024). *Natural Language Ontology and Semantic Theory*. + Cambridge Elements in Semantics. DOI: `10.1017/9781009307789`. + URL: https://www.cambridge.org/core/elements/abs/natural-language-ontology-and-semantic-theory/E8DDE548BB8A98137721984E26FAD764 + +2. Liefke, K. (2025). *Reduction and Unification in Natural Language Ontology*. + Cambridge Elements in Semantics. DOI: `10.1017/9781009559683`. + URL: https://www.cambridge.org/core/elements/abs/reduction-and-unification-in-natural-language-ontology/40F58ABA0D9C08958B5926F0CBDAD3CA + diff --git a/.opencode/rules/linkml/exact-mapping-predicate-class-distinction.md b/.opencode/rules/linkml/exact-mapping-predicate-class-distinction.md new file mode 100644 index 0000000000..6976454c71 --- /dev/null +++ b/.opencode/rules/linkml/exact-mapping-predicate-class-distinction.md @@ -0,0 +1,37 @@ +# Exact Mapping Predicate/Class Distinction Rule + +๐Ÿšจ **CRITICAL**: The `exact_mappings` property implies semantic equivalence. Equivalence can only exist between elements of the same ontological category. + +## The Rule + +1. **Slots (Predicates)** MUST ONLY have `exact_mappings` to ontology **predicates** (properties). + * โŒ INVALID: Slot `analyze` maps to `schema:object` (a Class). + * โœ… VALID: Slot `analyze` maps to `crm:P129_is_about` (a Property). + +2. **Classes (Entities)** MUST ONLY have `exact_mappings` to ontology **classes** (entities). + * โŒ INVALID: Class `Person` maps to `foaf:name` (a Property). + * โœ… VALID: Class `Person` maps to `foaf:Person` (a Class). + +3. **When true equivalence exists and is verified, exact mapping is preferred.** + * โœ… VALID: Class `Acquisition` maps to `crm:E8_Acquisition`. + * โœ… VALID: Slot mapped to an actually equivalent ontology property. + * โ— Do not avoid `exact_mappings` by default; avoid only when scope is broader/narrower/similar-but-not-equal. + +## Rationale + +Mapping a slot (which defines a relationship or attribute) to a class (which defines a type of entity) is a category error. `schema:object` represents the *class* of objects, not the *relationship* of "having an object" or "analyzing an object". + +## Verification Checklist + +When adding or reviewing `exact_mappings`: +- [ ] Is the LinkML element a Class or a Slot? +- [ ] Did you verify the target term type in the ontology definition files (do not rely on naming heuristics)? +- [ ] Do they match? (Classโ†”Class, Slotโ†”Property) +- [ ] If the target ontology uses opaque IDs (like CIDOC-CRM `E55_Type`), verify the type definition in the ontology file. +- [ ] If semantic scope is truly equivalent, use `exact_mappings` (not `close`/`broad` as a conservative fallback). + +## Common Pitfalls to Fix + +- Mapping slots to `schema:Object` or `schema:Thing`. +- Mapping slots to `skos:Concept`. +- Mapping classes to `schema:name` or `dc:title`. diff --git a/.opencode/rules/linkml/feedback-vs-revision-distinction.md b/.opencode/rules/linkml/feedback-vs-revision-distinction.md new file mode 100644 index 0000000000..e96d3f67d9 --- /dev/null +++ b/.opencode/rules/linkml/feedback-vs-revision-distinction.md @@ -0,0 +1,144 @@ +# Rule 58: Feedback vs Revision Distinction in slot_fixes.yaml + +## Summary + +The `feedback` and `revision` fields in `slot_fixes.yaml` serve distinct purposes and MUST NOT be conflated or renamed. + +## Field Definitions + +### `revision` Field +- **Purpose**: Defines WHAT the migration target is +- **Content**: List of slots and classes to create +- **Authority**: IMMUTABLE (per Rule 57) +- **Format**: Structured YAML list with `label`, `type`, optional `link_branch` + +### `feedback` Field +- **Purpose**: Contains user instructions on HOW the revision needs to be applied or corrected +- **Content**: Can be string or structured format +- **Authority**: User directives that override previous `notes` +- **Action Required**: Agent must interpret and act upon feedback + +## Feedback Formats + +### Format 1: Structured (with `done` field) +```yaml +feedback: + - timestamp: '2026-01-17T00:01:57Z' + user: Simon C. Kemper + done: false # Becomes true after agent processes + comment: | + The migration should use X instead of Y. + response: "" # Agent fills this after completing +``` + +### Format 2: String (direct instruction) +```yaml +feedback: I reject this! type_id should be migrated to has_or_had_identifier + Identifier +``` + +Or: +```yaml +feedback: I altered the revision based on this feedback. Conduct this new migration accordingly. +``` + +## Interpretation Rules + +| Feedback Contains | Meaning | Action Required | +|-------------------|---------|-----------------| +| "I reject this" | Previous `notes` were WRONG | Follow `revision` field instead | +| "I altered the revision" | User updated `revision` | Execute migration per NEW revision | +| "Conduct the migration" | Migration not yet done | Execute migration now | +| "Please conduct accordingly" | Migration pending | Execute migration now | +| "ADDRESSED" or `done: true` | Already processed | No action needed | + +## Decision Tree + +``` +Is feedback field present? +โ”œโ”€ NO โ†’ Check `processed.status` +โ”‚ โ”œโ”€ true โ†’ Migration complete +โ”‚ โ””โ”€ false โ†’ Execute revision +โ”‚ +โ””โ”€ YES โ†’ What format? + โ”œโ”€ Structured with `done: true` โ†’ No action needed + โ”œโ”€ Structured with `done: false` โ†’ Process feedback, then set done: true + โ””โ”€ String format โ†’ Parse for keywords: + โ”œโ”€ "reject" โ†’ Previous notes invalid, follow revision + โ”œโ”€ "altered/adjusted revision" โ†’ Execute NEW revision + โ”œโ”€ "conduct/please" โ†’ Migration pending, execute now + โ””โ”€ "ADDRESSED" โ†’ Already done, no action +``` + +## Anti-Patterns + +### WRONG: Renaming feedback to revision +```yaml +# DO NOT DO THIS +# feedback contains instructions, not migration specs +revision: # Was: feedback + - I reject this! Use has_or_had_identifier +``` + +### WRONG: Ignoring string feedback +```yaml +feedback: Please conduct the migration accordingly. +notes: "NO MIGRATION NEEDED" # WRONG - feedback overrides notes +``` + +### WRONG: Treating all feedback as completed +```yaml +feedback: I altered the revision. Conduct this new migration. +processed: + status: true # WRONG if migration not actually done +``` + +## Correct Workflow + +1. **Read feedback** - Understand user instruction +2. **Check revision** - This defines the target migration +3. **Execute migration** - Create/update slots and classes per revision +4. **Update processed.status** - Set to `true` +5. **Add response** - Document what was done + - For structured feedback: Set `done: true` and fill `response` + - For string feedback: Add new structured feedback entry confirming completion + +## Example: Processing String Feedback + +Before: +```yaml +- original_slot_id: https://nde.nl/ontology/hc/slot/type_id + feedback: I reject this! type_id should be migrated to has_or_had_identifier + Identifier + revision: + - label: has_or_had_identifier + type: slot + - label: Identifier + type: class + processed: + status: false + notes: "Previously marked as no migration needed" +``` + +After processing: +```yaml +- original_slot_id: https://nde.nl/ontology/hc/slot/type_id + feedback: + - timestamp: '2026-01-17T12:00:00Z' + user: System + done: true + comment: "Original string feedback: I reject this! type_id should be migrated to has_or_had_identifier + Identifier" + response: "Migration completed. type_id.yaml archived, consuming classes updated to use has_or_had_identifier slot with Identifier range." + revision: + - label: has_or_had_identifier + type: slot + - label: Identifier + type: class + processed: + status: true + notes: "Migration completed per user feedback rejecting previous notes." +``` + +## See Also + +- **Rule 53**: Full Slot Migration - slot_fixes.yaml is AUTHORITATIVE +- **Rule 57**: slot_fixes.yaml Revision Key is IMMUTABLE +- **Rule 39**: Slot Naming Convention (RiC-O Style) diff --git a/.opencode/rules/linkml/full-slot-migration-rule.md b/.opencode/rules/linkml/full-slot-migration-rule.md new file mode 100644 index 0000000000..6764f100d6 --- /dev/null +++ b/.opencode/rules/linkml/full-slot-migration-rule.md @@ -0,0 +1,373 @@ +# Rule 53: Full Slot Migration - No Deprecation Notes + +๐Ÿšจ **CRITICAL**: When migrating slots from `slot_fixes.yaml`: + +1. **Follow the `revision` section EXACTLY** - The `slot_fixes.yaml` file specifies the exact replacement slots and classes to use +2. **Perform FULL MIGRATION** - Completely remove the deprecated slot from the entity class +3. **Do NOT add deprecation notes** - Never keep both old and new slots with deprecation markers + +--- + +## ๐Ÿšจ slot_fixes.yaml is AUTHORITATIVE AND CURATED ๐Ÿšจ + +**File Location**: `schemas/20251121/linkml/modules/slots/slot_fixes.yaml` + +**THIS FILE IS THE SINGLE SOURCE OF TRUTH FOR ALL SLOT MIGRATIONS.** + +The `slot_fixes.yaml` file has been **manually curated** to specify the exact replacement slots and classes for each deprecated slot. The revisions are based on: + +1. **Ontology analysis** - Each replacement was chosen based on alignment with base ontologies (CIDOC-CRM, RiC-O, PROV-O, Schema.org, etc.) +2. **Semantic correctness** - Revisions reflect the intended meaning of the original slot +3. **Pattern consistency** - Follows established naming conventions (Rule 39: RiC-O style, Rule 43: singular nouns) +4. **Class hierarchy design** - Type/Types pattern (Rule 0b) applied where appropriate + +**YOU MUST NOT**: +- โŒ Substitute different slots than those specified in `revision` +- โŒ Use your own judgment to pick "similar" slots +- โŒ Skip the revision and invent new mappings +- โŒ Partially apply the revision (e.g., use the slot but not the class) + +**YOU MUST**: +- โœ… Follow the `revision` section TO THE LETTER +- โœ… Use EXACTLY the slots and classes specified +- โœ… Apply ALL components of the revision (both slots AND classes) +- โœ… Interpret `link_branch` fields correctly (see below) +- โœ… Update `processed.status: true` after completing migration + +--- + +## Understanding `link_branch` in Revision Plans + +๐Ÿšจ **CRITICAL**: The `link_branch` field in revision plans indicates **nested class attributes**. Items with `link_branch: N` are slots/classes that belong TO the primary class, not standalone replacements. + +### How to Interpret `link_branch` + +| Revision Item | Meaning | +|---------------|---------| +| Items **WITHOUT** `link_branch` | **PRIMARY** slot and class to create | +| Items **WITH** `link_branch: 1` | First attribute branch that the primary class needs | +| Items **WITH** `link_branch: 2` | Second attribute branch that the primary class needs | +| Items **WITH** `link_branch: N` | Nth attribute branch for the primary class | + +### Example: `visitor_count` Revision + +```yaml +- original_slot_id: https://nde.nl/ontology/hc/slot/visitor_count + revision: + - label: has_or_had_quantity # PRIMARY SLOT (no link_branch) + type: slot + - label: Quantity # PRIMARY CLASS (no link_branch) + type: class + - label: has_or_had_measurement_unit # Quantity needs this slot + type: slot + link_branch: 1 # โ† Branch 1: unit attribute + - label: MeasureUnit # Range of has_or_had_measurement_unit + type: class + value: + - visitors + link_branch: 1 + - label: temporal_extent # Quantity needs this slot too + type: slot + link_branch: 2 # โ† Branch 2: time attribute + - label: TimeSpan # Range of temporal_extent + type: class + link_branch: 2 +``` + +**Interpretation**: This creates: +1. **Primary**: `has_or_had_quantity` slot โ†’ `Quantity` class +2. **Branch 1**: `Quantity.has_or_had_measurement_unit` โ†’ `MeasureUnit` (with value "visitors") +3. **Branch 2**: `Quantity.temporal_extent` โ†’ `TimeSpan` + +### Resulting Class Structure + +```yaml +# The Quantity class should have these slots: +Quantity: + slots: + - has_or_had_measurement_unit # From link_branch: 1 + - temporal_extent # From link_branch: 2 +``` + +### Complex Example: `visitor_conversion_rate` + +```yaml +- original_slot_id: https://nde.nl/ontology/hc/slot/visitor_conversion_rate + revision: + - label: has_or_had_conversion_rate # PRIMARY SLOT + type: slot + - label: ConversionRate # PRIMARY CLASS + type: class + - label: has_or_had_type # ConversionRate.has_or_had_type + type: slot + link_branch: 1 + - label: ConversionRateType # Abstract type class + type: class + link_branch: 1 + - label: includes_or_included # ConversionRateType hierarchy slot + type: slot + link_branch: 1 + - label: ConversionRateTypes # Concrete subclasses file + type: class + link_branch: 1 + - label: temporal_extent # ConversionRate.temporal_extent + type: slot + link_branch: 2 + - label: TimeSpan # Range of temporal_extent + type: class + link_branch: 2 +``` + +**Interpretation**: +1. **Primary**: `has_or_had_conversion_rate` โ†’ `ConversionRate` +2. **Branch 1**: Type hierarchy with `ConversionRateType` (abstract) + `ConversionRateTypes` (concrete subclasses) +3. **Branch 2**: Temporal tracking via `temporal_extent` โ†’ `TimeSpan` + +### Migration Checklist for `link_branch` Revisions + +- [ ] Create/verify PRIMARY slot exists +- [ ] Create/verify PRIMARY class exists +- [ ] For EACH `link_branch: N`: + - [ ] Add the branch slot to PRIMARY class's `slots:` list + - [ ] Import the branch slot file + - [ ] Import the branch class file (if creating new class) + - [ ] Verify range of branch slot points to branch class +- [ ] Update consuming class to use PRIMARY slot (not deprecated slot) +- [ ] Update examples to show nested structure + +--- + +## Mandatory: Follow slot_fixes.yaml Revisions Exactly + +**The `revision` section in `slot_fixes.yaml` is AUTHORITATIVE.** Do not substitute different slots based on your own judgment. + +**Example from slot_fixes.yaml**: +```yaml +- original_slot_id: https://nde.nl/ontology/hc/slot/actual_start + revision: + - label: begin_of_the_begin # โ† USE THIS SLOT + type: slot + - label: TimeSpan # โ† USE THIS CLASS + type: class +``` + +**CORRECT**: Use `begin_of_the_begin` slot (as specified) +**WRONG**: Substitute `has_actual_start_date` (not in revision) + +## The Problem + +Adding deprecation notes while keeping both old and new slots: +- Creates schema bloat with redundant properties +- Confuses data consumers about which slot to use +- Violates single-source-of-truth principle +- Complicates future data validation + +## Anti-Pattern (WRONG) + +```yaml +# WRONG - Keeping deprecated slot with deprecation note +classes: + TemporaryLocation: + slots: + - actual_start # OLD - kept with deprecation note + - actual_end # OLD - kept with deprecation note + - has_actual_start_date # NEW + - has_actual_end_date # NEW + slot_usage: + actual_start: + deprecated: | + DEPRECATED: Use has_actual_start_date instead. + # ... more deprecation documentation +``` + +## Correct Pattern + +```yaml +# CORRECT - Only new slots, old slots completely removed +classes: + TemporaryLocation: + slots: + - has_actual_start_date # NEW - only new slots present + - has_actual_end_date # NEW + # NO slot_usage for deprecated slots - they don't exist in this class +``` + +## Migration Steps + +When processing a slot from `slot_fixes.yaml`: + +1. **Identify affected entity class(es)** +2. **Remove old slot from imports** (if dedicated import file exists) +3. **Remove old slot from slots list** +4. **Remove any slot_usage for old slot** +5. **Add new slot import** (if not already present) +6. **Add new slot to slots list** +7. **Add slot_usage for new slot** (if range override or customization needed) +8. **Update examples** to use new slot +9. **Validate with gen-owl** + +## What Happens to Old Slot Files + +The old slot files in `modules/slots/` (e.g., `actual_start.yaml`, `activities_societies.yaml`) are **NOT deleted** because: +- Other entity classes might still use them +- They serve as documentation of the old schema +- They can be archived when all usages are migrated + +However, the old slots are **removed from the entity class** being migrated. + +## Example: TemporaryLocation Migration + +**Before** (with old slots): +```yaml +imports: + - ../slots/actual_end + - ../slots/actual_start + - ../slots/has_actual_start_date + - ../slots/has_actual_end_date + +slots: + - actual_end + - actual_start + - has_actual_start_date + - has_actual_end_date +``` + +**After** (fully migrated): +```yaml +imports: + # actual_end and actual_start imports REMOVED + - ../slots/has_actual_start_date + - ../slots/has_actual_end_date + +slots: + # actual_end and actual_start REMOVED from list + - has_actual_start_date + - has_actual_end_date +``` + +## Slot Usage for New Slots + +Only add `slot_usage` for the new slot if you need to: +- Override the range for this specific class +- Add class-specific examples +- Add class-specific constraints + +Do NOT add `slot_usage` just to document that it replaces an old slot. + +## Recording in slot_fixes.yaml + +When marking a slot as processed: + +```yaml +- original_slot_id: https://nde.nl/ontology/hc/slot/actual_start + processed: + status: true + timestamp: '2026-01-14T16:00:00Z' + session: "session-2026-01-14-type-migration" + notes: "FULLY MIGRATED: TemporaryLocation - actual_start REMOVED, using temporal_extent with TimeSpan.begin_of_the_begin (Rule 53)" +``` + +Note the "FULLY MIGRATED" prefix in notes to confirm this was a complete removal, not a deprecation-in-place. + +--- + +## โš ๏ธ Common Mistakes to Avoid โš ๏ธ + +### Mistake 1: Substituting Different Slots + +**slot_fixes.yaml specifies**: +```yaml +- original_slot_id: https://nde.nl/ontology/hc/slot/actual_start + revision: + - label: begin_of_the_begin # โ† MUST USE THIS + type: slot + - label: TimeSpan # โ† WITH THIS CLASS + type: class +``` + +| Action | Status | +|--------|--------| +| Using `begin_of_the_begin` with `TimeSpan` | โœ… CORRECT | +| Using `has_actual_start_date` (invented) | โŒ WRONG | +| Using `start_date` (different slot) | โŒ WRONG | +| Using `begin_of_the_begin` WITHOUT `TimeSpan` | โŒ WRONG (incomplete) | + +### Mistake 2: Partial Application + +The revision often specifies MULTIPLE components that work together: + +```yaml +revision: + - label: has_or_had_type # โ† Slot for linking + type: slot + - label: BackupType # โ† Abstract base class + type: class + - label: includes_or_included # โ† Slot for hierarchy + type: slot + - label: BackupTypes # โ† Concrete subclasses + type: class +``` + +**All four components** are part of the migration. Don't just use `has_or_had_type` and ignore the class structure. + +### Mistake 3: Using `temporal_extent` Slot Correctly + +When `slot_fixes.yaml` specifies TimeSpan-based revision: + +```yaml +revision: + - label: begin_of_the_begin + type: slot + - label: TimeSpan + type: class +``` + +This means: **Use the `temporal_extent` slot** (which has `range: TimeSpan`) and access the temporal bounds via TimeSpan's slots: + +```yaml +# CORRECT: Use temporal_extent with TimeSpan structure +temporal_extent: + begin_of_the_begin: '2020-06-15' + end_of_the_end: '2022-03-15' + +# WRONG: Create new has_actual_start_date slot +has_actual_start_date: '2020-06-15' # โŒ Not in revision! +``` + +### Mistake 4: Not Updating Examples + +When migrating slots, **update ALL examples** in the class file: +- Description examples (in class description) +- slot_usage examples +- Class-level examples (at bottom of file) + +--- + +## Verification Checklist + +Before marking a slot as processed: + +- [ ] Read the `revision` section completely +- [ ] Identified ALL slots and classes in revision +- [ ] Removed old slot from imports +- [ ] Removed old slot from slots list +- [ ] Removed old slot from slot_usage +- [ ] Added new slot(s) per revision +- [ ] Added new class import(s) per revision +- [ ] Updated ALL examples to use new slots +- [ ] Validated with `linkml-lint` or `gen-owl` +- [ ] Updated `slot_fixes.yaml` with: + - `status: true` + - `timestamp` (ISO 8601) + - `session` identifier + - `notes` with "FULLY MIGRATED:" prefix + +--- + +## See Also + +- Rule 9: Enum-to-Class Promotion (single source of truth principle) +- Rule 0b: Type/Types File Naming Convention +- Rule: Slot Naming Convention (Current Style) +- `.opencode/ENUM_TO_CLASS_PRINCIPLE.md` +- `schemas/20251121/linkml/modules/slots/slot_fixes.yaml` - **AUTHORITATIVE** master list of migrations diff --git a/.opencode/rules/linkml/generic-slots-specific-classes.md b/.opencode/rules/linkml/generic-slots-specific-classes.md new file mode 100644 index 0000000000..39b6f5a5b5 --- /dev/null +++ b/.opencode/rules/linkml/generic-slots-specific-classes.md @@ -0,0 +1,129 @@ +# Rule: Generic Slots, Specific Classes + +**Identifier**: `generic-slots-specific-classes` +**Severity**: **CRITICAL** + +## Core Principle + +**Slots MUST be generic predicates** that can be reused across multiple classes. **Classes MUST be specific** to provide context and constraints. + +**DO NOT** create class-specific slots when a generic predicate can be used. + +## Rationale + +1. **Predicate Proliferation**: Creating bespoke slots for every class explodes the schema size (e.g., `has_museum_name`, `has_library_name`, `has_archive_name` instead of `has_name`). +2. **Interoperability**: Generic predicates (`has_name`, `has_identifier`, `has_part`) map cleanly to standard ontologies (Schema.org, Dublin Core, RiC-O). +3. **Querying**: It's easier to query "all entities with a name" than "all entities with museum_name OR library_name OR archive_name". +4. **Maintenance**: Updating one generic slot propagates to all classes. + +## Examples + +### โŒ Anti-Pattern: Class-Specific Slots + +```yaml +# WRONG: Creating specific slots for each class +slots: + has_museum_visitor_count: + range: integer + has_library_patron_count: + range: integer + +classes: + Museum: + slots: + - has_museum_visitor_count + Library: + slots: + - has_library_patron_count +``` + +### โœ… Correct Pattern: Generic Slot, Specific Class Usage + +```yaml +# CORRECT: One generic slot reused +slots: + has_or_had_quantity: + slot_uri: rico:hasOrHadQuantity + range: Quantity + multivalued: true + +classes: + Museum: + slots: + - has_or_had_quantity + slot_usage: + has_or_had_quantity: + description: The number of visitors to the museum. + + Library: + slots: + - has_or_had_quantity + slot_usage: + has_or_had_quantity: + description: The number of registered patrons. +``` + +## Intermediate Class Pattern + +Making slots generic often requires introducing **Intermediate Classes** to hold structured data, rather than flattening attributes onto the parent class. + +### โŒ Anti-Pattern: Specific Flattened Slots + +```yaml +# WRONG: Flattened specific attributes +classes: + Museum: + slots: + - has_museum_budget_amount + - has_museum_budget_currency + - has_museum_budget_year +``` + +### โœ… Correct Pattern: Generic Slot + Intermediate Class + +```yaml +# CORRECT: Generic slot pointing to structured class +slots: + has_or_had_budget: + range: Budget + multivalued: true + +classes: + Museum: + slots: + - has_or_had_budget + + Budget: + slots: + - has_or_had_amount + - has_or_had_currency + - has_or_had_year +``` + +## Specificity Levels + +| Level | Component | Example | +|-------|-----------|---------| +| **Generic** | **Slot (Predicate)** | `has_or_had_identifier` | +| **Specific** | **Class (Subject/Object)** | `ISILCode` | +| **Specific** | **Slot Usage (Context)** | "The ISIL code assigned to this library" | + +## Migration Guide + +If you encounter an overly specific slot: + +1. **Identify the generic concept** (e.g., `has_museum_opening_hours` โ†’ `has_opening_hours`). +2. **Check if a generic slot exists** in `modules/slots/`. +3. **If yes**, use the generic slot and add `slot_usage` to the class. +4. **If no**, create the **generic** slot, not a specific one. + +## Naming Indicators + +**Reject slots containing:** +* Class names (e.g., `has_custodian_name` โ†’ `has_name`) +* Narrow types (e.g., `has_isbn_identifier` โ†’ `has_identifier`) +* Contextual specifics (e.g., `has_primary_email` โ†’ `has_email` + type/role) + +## See Also +* Rule 55: Broaden Generic Predicate Ranges +* Rule: Slot Naming Convention (Current Style) diff --git a/.opencode/rules/linkml/linkml-union-type-range-any-rule.md b/.opencode/rules/linkml/linkml-union-type-range-any-rule.md new file mode 100644 index 0000000000..65fbe33442 --- /dev/null +++ b/.opencode/rules/linkml/linkml-union-type-range-any-rule.md @@ -0,0 +1,157 @@ +# Rule 59: LinkML Union Types Require `range: Any` + +๐Ÿšจ **CRITICAL**: When using `any_of` for union types in LinkML, you MUST also specify `range: Any` at the attribute level. Without it, the union type validation does NOT work. + +## The Problem + +LinkML's `any_of` construct allows defining slots that accept multiple types (e.g., string OR integer). However, there's a critical implementation detail: + +**Without `range: Any`, the `any_of` constraint is silently ignored during validation.** + +This leads to validation failures where data that should be valid (e.g., integer value in a string/integer union field) is rejected. + +## Correct Pattern + +```yaml +slots: + identifier_value: + range: Any # โ† REQUIRED for any_of to work + any_of: + - range: string + - range: integer + description: The identifier value (can be string or integer) +``` + +## Incorrect Pattern (WILL FAIL) + +```yaml +slots: + identifier_value: + # Missing range: Any - validation will fail! + any_of: + - range: string + - range: integer + description: The identifier value (can be string or integer) +``` + +## Common Use Cases + +This pattern is required for: + +| Use Case | Types | Example Fields | +|----------|-------|----------------| +| Identifier values | string \| integer | `identifier_value`, `geonames_id`, `viaf_id` | +| Social media IDs | string \| array | `youtube_channel_id`, `facebook_id`, `twitter_username` | +| Flexible identifiers | object \| array | `identifiers` (dict or list format) | +| Numeric strings | string \| integer | `postal_code`, `kvk_number` | + +## Real-World Examples from GLAM Schema + +### Example 1: OriginalEntryIdentifier.yaml + +```yaml +# Before (BROKEN): +attributes: + identifier_value: + any_of: + - range: string + - range: integer + +# After (WORKING): +attributes: + identifier_value: + range: Any # Added + any_of: + - range: string + - range: integer +``` + +### Example 2: WikidataSocialMedia.yaml + +```yaml +# Social media fields that can be single value or array +attributes: + youtube_channel_id: + range: Any # Required for string|array union + any_of: + - range: string + - range: string + multivalued: true + description: YouTube channel ID (single value or array) + + facebook_id: + range: Any + any_of: + - range: string + - range: string + multivalued: true +``` + +### Example 3: OriginalEntry.yaml (object|array union) + +```yaml +# identifiers field that accepts both dict and array formats +attributes: + identifiers: + range: Any # Required for flexible typing + description: >- + Identifiers from original source. Accepts both dict format + (e.g., {isil: "XX-123"}) and array format + (e.g., [{scheme: "isil", value: "XX-123"}]) +``` + +### Example 4: OriginalEntryLocation.yaml + +```yaml +attributes: + geonames_id: + range: Any # Required for string|integer + any_of: + - range: string + - range: integer + description: GeoNames ID (may be string or integer depending on source) +``` + +## Validation Behavior + +| Schema Definition | Integer Data | String Data | Result | +|-------------------|--------------|-------------|--------| +| `range: string` | โŒ FAIL | โœ… PASS | Strict string only | +| `range: integer` | โœ… PASS | โŒ FAIL | Strict integer only | +| `any_of` without `range: Any` | โŒ FAIL | โŒ FAIL | Broken - nothing works | +| `any_of` with `range: Any` | โœ… PASS | โœ… PASS | Correct union behavior | + +## Why This Happens + +LinkML's validation engine processes `range` first to determine the basic type constraint. When `range` is not specified (or defaults to `string`), it applies that constraint before checking `any_of`. The `range: Any` tells the validator to defer type checking to the `any_of` constraints. + +## Checklist for Union Types + +When adding a field that accepts multiple types: + +- [ ] Define the `any_of` block with all acceptable ranges +- [ ] Add `range: Any` at the same level as `any_of` +- [ ] Test with sample data of each type +- [ ] Document the accepted types in the description + +## See Also + +- LinkML Documentation: [Union Types](https://linkml.io/linkml/schemas/advanced.html#union-types) +- GLAM Validation: `schemas/20251121/linkml/modules/classes/CustodianSourceFile.yaml` +- Validation command: `linkml-validate -s .yaml .yaml` + +## Migration Notes + +**Affected Files (Fixed January 2026)**: +- `OriginalEntryIdentifier.yaml` - `identifier_value` +- `Identifier.yaml` - `identifier_value` slot_usage +- `WikidataSocialMedia.yaml` - `youtube_channel_id`, `facebook_id`, `instagram_username`, `linkedin_company_id`, `twitter_username`, `facebook_page_id` +- `YoutubeEnrichment.yaml` - `channel_id` +- `OriginalEntryLocation.yaml` - `geonames_id` +- `OriginalEntry.yaml` - `identifiers` + +--- + +**Version**: 1.0 +**Created**: 2026-01-18 +**Author**: AI Agent (OpenCode Claude) diff --git a/.opencode/rules/linkml/linkml-yaml-best-practices-rule.md b/.opencode/rules/linkml/linkml-yaml-best-practices-rule.md new file mode 100644 index 0000000000..1c237ef1f5 --- /dev/null +++ b/.opencode/rules/linkml/linkml-yaml-best-practices-rule.md @@ -0,0 +1,181 @@ +# LinkML YAML Best Practices Rule + +## Rule: Follow LinkML Conventions for Valid, Interoperable Schema Files + +### 1. equals_expression Anti-Pattern + +`equals_expression` is for dynamic formula evaluation (e.g., `"{age_in_years} * 12"`). Never use it for static value constraints. + +**WRONG:** +```yaml +slot_usage: + has_type: + equals_expression: '["hc:ArchiveOrganizationType"]' + hold_record_set: + equals_expression: '["hc:Fonds", "hc:Series"]' +``` + +**CORRECT** (single value): +```yaml +slot_usage: + has_type: + equals_string: "hc:ArchiveOrganizationType" +``` + +**CORRECT** (multiple allowed values - if classes): +```yaml +slot_usage: + hold_record_set: + any_of: + - range: UniversityAdministrativeFonds + - range: StudentRecordSeries + - range: FacultyPaperCollection +``` + +**CORRECT** (multiple allowed values - if literals): +```yaml +slot_usage: + status: + equals_string_in: + - "active" + - "inactive" + - "pending" +``` + +### 2. Declare All Used Prefixes + +Every CURIE prefix used in the file must be declared in the `prefixes:` block. + +**WRONG:** +```yaml +prefixes: + linkml: https://w3id.org/linkml/ + skos: http://www.w3.org/2004/02/skos/core# +slot_usage: + has_type: + equals_string: "hc:ArchiveOrganizationType" # hc: not declared! +``` + +**CORRECT:** +```yaml +prefixes: + linkml: https://w3id.org/linkml/ + hc: https://nde.nl/ontology/hc/ + skos: http://www.w3.org/2004/02/skos/core# +default_prefix: hc +slot_usage: + has_type: + equals_string: "hc:ArchiveOrganizationType" +``` + +### 3. Import Referenced Classes + +When using external classes in `is_a`, `range`, or other references, import them. + +**WRONG:** +```yaml +imports: + - linkml:types +classes: + AcademicArchive: + is_a: ArchiveOrganizationType # Not imported! + slot_usage: + related_to: + range: WikidataAlignment # Not imported! +``` + +**CORRECT:** +```yaml +imports: + - linkml:types + - ../classes/ArchiveOrganizationType + - ../classes/WikidataAlignment +classes: + AcademicArchive: + is_a: ArchiveOrganizationType + slot_usage: + related_to: + range: WikidataAlignment +``` + +### 4. Quote Regex Patterns and Annotation Values + +**Regex patterns:** +```yaml +# WRONG +pattern: ^Q[0-9]+$ + +# CORRECT +pattern: "^Q[0-9]+$" +``` + +**Annotation values (must be strings):** +```yaml +# WRONG +annotations: + specificity_score: 0.1 + +# CORRECT +annotations: + specificity_score: "0.1" +``` + +### 5. Remove Unused Imports + +Only import slots and classes that are actually used in the file. + +**WRONG:** +```yaml +imports: + - ../slots/has_scope # Never used in slots: or slot_usage: + - ../slots/has_score + - ../slots/has_type +``` + +**CORRECT:** +```yaml +imports: + - ../slots/has_score + - ../slots/has_type +``` + +### 6. Slot Usage Requires Slot Presence + +A slot referenced in `slot_usage:` must either be: +- Listed in the `slots:` array, OR +- Inherited from a parent class via `is_a` + +**WRONG:** +```yaml +classes: + MyClass: + slots: + - has_type + slot_usage: + has_type: {...} + identified_by: {...} # Not in slots: and not inherited! +``` + +**CORRECT:** +```yaml +classes: + MyClass: + slots: + - has_type + - identified_by + slot_usage: + has_type: {...} + identified_by: {...} +``` + +## Checklist for Class Files + +- [ ] All prefixes used in CURIEs are declared +- [ ] `default_prefix` set if module belongs to that namespace +- [ ] All referenced classes are imported +- [ ] All used slots are imported +- [ ] No `equals_expression` with static JSON arrays +- [ ] Regex patterns are quoted +- [ ] Annotation values are quoted strings +- [ ] No unused imports +- [ ] `slot_usage` only references slots that exist (via slots: or inheritance) diff --git a/.opencode/rules/linkml/mapping-specificity-hypernym-rule.md b/.opencode/rules/linkml/mapping-specificity-hypernym-rule.md new file mode 100644 index 0000000000..66672c46b9 --- /dev/null +++ b/.opencode/rules/linkml/mapping-specificity-hypernym-rule.md @@ -0,0 +1,185 @@ +# Mapping Specificity Rule: Broad vs Narrow vs Exact Mappings + +## ๐Ÿšจ CRITICAL: Mapping Semantics + +When mapping LinkML classes to external ontologies, you MUST distinguish between **equivalence**, **hypernyms** (broader concepts), and **hyponyms** (narrower concepts). + +### The Rule + +1. **Exact Mappings (`skos:exactMatch`)**: Use ONLY when the external concept is **semantically equivalent** to your class. + * *Example*: `hc:Person` `exact_mappings` `schema:Person`. + * **CRITICAL**: Exact means the SAME semantic scope - neither broader nor narrower! + * **DO NOT AVOID EXACT BY DEFAULT**: If equivalence is verified (including class/property category match and ontology definition review), `exact_mappings` SHOULD be used. + +2. **Broad Mappings (`skos:broadMatch`)**: Use when the external concept is a **hypernym** (a broader, more general category) of your class. + * *Example*: `hc:AcademicArchiveRecordSetType` `broad_mappings` `rico:RecordSetType`. + * *Rationale*: An academic archive record set *is a* record set type, but `rico:RecordSetType` is broader. + * *Common Hypernyms*: `skos:Concept`, `prov:Entity`, `prov:Activity`, `schema:Thing`, `schema:Organization`, `schema:Action`, `rico:RecordSetType`, `crm:E55_Type`. + +3. **Narrow Mappings (`skos:narrowMatch`)**: Use when the external concept is a **hyponym** (a narrower, more specific category) of your class. + * *Example*: `hc:Organization` `narrow_mappings` `hc:Library` (if mapping inversely). + +4. **Close Mappings (`skos:closeMatch`)**: Use when the external concept is similar but not exactly equivalent. + * *Example*: `hc:AccessPolicy` `close_mappings` `dcterms:accessRights` (related but different scope). + +5. **Related Mappings (`skos:relatedMatch`)**: Use for non-hierarchical relationships. + * *Example*: `hc:Collection` `related_mappings` `rico:RecordSet`. + +### ๐Ÿšจ Type Compatibility Rule + +**Classes map to classes, properties map to properties.** Never mix types in mappings. + +| Your Element | Valid Mapping Target | +|--------------|---------------------| +| Class | Class (owl:Class, rdfs:Class) | +| Slot | Property (owl:ObjectProperty, owl:DatatypeProperty, rdf:Property) | + +โŒ **WRONG**: +```yaml +# AccessApplication is a CLASS, schema:Action is a CLASS - but Action is BROADER +AccessApplication: + exact_mappings: + - schema:Action # WRONG: Action is a hypernym, not equivalent +``` + +โœ… **CORRECT**: +```yaml +AccessApplication: + broad_mappings: + - schema:Action # CORRECT: Action is the broader category +``` + +### ๐Ÿšจ No Self/Internal Exact Mappings + +`exact_mappings` MUST NOT contain self-references or internal HC class references for the same concept. + +โŒ **WRONG**: +```yaml +AcademicArchive: + exact_mappings: + - hc:AcademicArchive # Self/internal reference; not an external equivalence mapping +``` + +โœ… **CORRECT**: +```yaml +AcademicArchive: + exact_mappings: + - wd:Q27032435 # External concept with equivalent semantic scope +``` + +Use `exact_mappings` only for equivalent terms in external ontologies or external controlled vocabularies, not for repeating the class itself. + +### โœ… Positive Guidance: When Exact Mapping Is Correct + +Use `exact_mappings` when all checks below pass: + +- Semantic scope is equivalent (not parent/child, not merely similar) +- Ontological category matches (Classโ†”Class, Slotโ†”Property) +- Target term is verified in the ontology source files under `data/ontology/` or verified Wikidata entity metadata +- No self/internal duplication (no `hc:` self-reference for the same concept) + +โœ… **CORRECT**: +```yaml +Person: + exact_mappings: + - schema:Person + +Acquisition: + exact_mappings: + - crm:E8_Acquisition +``` + +Do not downgrade a truly equivalent mapping to `close_mappings` or `broad_mappings` just to be conservative. + +### Common Hypernyms That Are NEVER Exact Mappings + +These terms are always BROADER than your specific class - never use them as `exact_mappings`: + +| Hypernym | What It Means | Use Instead | +|----------|---------------|-------------| +| `schema:Action` | Any action | `broad_mappings` | +| `schema:Organization` | Any organization | `broad_mappings` | +| `schema:Thing` | Anything at all | `broad_mappings` | +| `schema:PropertyValue` | Any property value | `broad_mappings` | +| `schema:Permit` | Any permit | `broad_mappings` | +| `prov:Activity` | Any activity | `broad_mappings` | +| `prov:Entity` | Any entity | `broad_mappings` | +| `skos:Concept` | Any concept | `broad_mappings` | +| `crm:E55_Type` | Any type classification | `broad_mappings` | +| `crm:E42_Identifier` | Any identifier | `broad_mappings` | +| `rico:Identifier` | Any identifier | `broad_mappings` | +| `dcat:DataService` | Any data service | `broad_mappings` | + +### Common Violations to Avoid + +โŒ **WRONG**: +```yaml +AcademicArchiveRecordSetType: + exact_mappings: + - rico:RecordSetType # WRONG: This implies AcademicArchiveRecordSetType == RecordSetType +``` + +โœ… **CORRECT**: +```yaml +AcademicArchiveRecordSetType: + broad_mappings: + - rico:RecordSetType # CORRECT: RecordSetType is broader +``` + +โŒ **WRONG**: +```yaml +SocialMovement: + exact_mappings: + - schema:Organization # WRONG: SocialMovement is a specific TYPE of Organization +``` + +โœ… **CORRECT**: +```yaml +SocialMovement: + broad_mappings: + - schema:Organization # CORRECT +``` + +โŒ **WRONG**: +```yaml +AccessApplication: + exact_mappings: + - schema:Action # WRONG: Action is a hypernym +``` + +โœ… **CORRECT**: +```yaml +AccessApplication: + broad_mappings: + - schema:Action # CORRECT: Action is the broader category +``` + +### How to Determine Mapping Type + +Ask these questions: + +1. **Is it the SAME thing?** โ†’ `exact_mappings` + - "Could I swap these two terms in any context without changing meaning?" + - If NO, it's not an exact mapping + +2. **Is the external term a PARENT category?** โ†’ `broad_mappings` + - "Is my class a TYPE OF the external term?" + - Example: AccessApplication IS-A Action + +3. **Is the external term a CHILD category?** โ†’ `narrow_mappings` + - "Is the external term a TYPE OF my class?" + - Example: Library IS-A Organization (so Organization has narrow_mapping to Library) + +4. **Is it similar but not hierarchical?** โ†’ `close_mappings` + - "Related but not equivalent or hierarchical" + +5. **Is there some other relationship?** โ†’ `related_mappings` + - "Connected in some way" + +### Verification Checklist + +- [ ] Does the `exact_mapping` represent the **exact same scope**? +- [ ] Is the external term a generic parent class (e.g., `Type`, `Concept`, `Entity`, `Action`, `Activity`, `Organization`)? โ†’ Move to `broad_mappings` +- [ ] Is the external term a specific instance or subclass? โ†’ Check `narrow_mappings` +- [ ] Is the external term the same type (classโ†’class, propertyโ†’property)? +- [ ] Would swapping the terms change the meaning? If yes, not an `exact_mapping` diff --git a/.opencode/rules/linkml/multilingual-support-requirements.md b/.opencode/rules/linkml/multilingual-support-requirements.md new file mode 100644 index 0000000000..35aee580ff --- /dev/null +++ b/.opencode/rules/linkml/multilingual-support-requirements.md @@ -0,0 +1,177 @@ +# Rule: Multilingual Support Requirements + +## Overview + +All LinkML slot files MUST include multilingual support with translations in the following languages: + +| Code | Language | Required | +|------|----------|----------| +| `nl` | Dutch | โœ… Yes | +| `de` | German | โœ… Yes | +| `fr` | French | โœ… Yes | +| `ar` | Arabic | โœ… Yes | +| `id` | Indonesian | โœ… Yes | +| `zh` | Chinese (Simplified) | โœ… Yes | +| `es` | Spanish | โœ… Yes | + +--- + +## Required Multilingual Fields + +### 1. `alt_descriptions` + +Provide faithful translations of the English `description` field: + +```yaml +slots: + my_slot: + description: >- + To possess a specific structural arrangement or encoding standard. + alt_descriptions: + nl: >- + Het bezitten van een specifieke structurele rangschikking of coderingsstandaard. + de: >- + Das Besitzen einer spezifischen strukturellen Anordnung oder eines Kodierungsstandards. + fr: >- + Possรฉder un arrangement structurel spรฉcifique ou une norme de codage. + ar: >- + ุงู…ุชู„ุงูƒ ุชุฑุชูŠุจ ู‡ูŠูƒู„ูŠ ู…ุญุฏุฏ ุฃูˆ ู…ุนูŠุงุฑ ุชุฑู…ูŠุฒ. + id: >- + Memiliki susunan struktural tertentu atau standar pengkodean. + zh: >- + ๆ‹ฅๆœ‰็‰นๅฎš็š„็ป“ๆž„ๅฎ‰ๆŽ’ๆˆ–็ผ–็ ๆ ‡ๅ‡†ใ€‚ + es: >- + Poseer una disposiciรณn estructural especรญfica o un estรกndar de codificaciรณn. +``` + +### 2. `structured_aliases` + +Provide translated slot names/labels for each language: + +```yaml +slots: + has_format: + structured_aliases: + - literal_form: heeft formaat + predicate: EXACT_SYNONYM + in_language: nl + - literal_form: hat Format + predicate: EXACT_SYNONYM + in_language: de + - literal_form: a un format + predicate: EXACT_SYNONYM + in_language: fr + - literal_form: ู„ุฏูŠู‡ ุชู†ุณูŠู‚ + predicate: EXACT_SYNONYM + in_language: ar + - literal_form: memiliki format + predicate: EXACT_SYNONYM + in_language: id + - literal_form: ๅ…ทๆœ‰ๆ ผๅผ + predicate: EXACT_SYNONYM + in_language: zh + - literal_form: tiene formato + predicate: EXACT_SYNONYM + in_language: es +``` + +--- + +## Translation Guidelines + +### DO: +- Translate the semantic meaning faithfully +- Preserve technical precision +- Use natural phrasing for each language +- Keep translations concise (similar length to English) + +### DON'T: +- Paraphrase or expand beyond the original meaning +- Add information not present in the English description +- Use machine translation without review +- Skip any of the required languages + +--- + +## Complete Example + +```yaml +id: https://nde.nl/ontology/hc/slot/catalogue +name: catalogue +title: catalogue + +slots: + catalogue: + slot_uri: crm:P70_documents + description: >- + To systematically record, classify, and organize items within a structured + inventory or database for the purposes of documentation and retrieval. + alt_descriptions: + nl: >- + Het systematisch vastleggen, classificeren en ordenen van items binnen een + gestructureerde inventaris of database voor documentatie en terugvinding. + de: >- + Das systematische Erfassen, Klassifizieren und Ordnen von Objekten in einem + strukturierten Inventar oder einer Datenbank fรผr Dokumentation und Abruf. + fr: >- + Enregistrer, classer et organiser systรฉmatiquement des รฉlรฉments dans un + inventaire structurรฉ ou une base de donnรฉes ร  des fins de documentation et de rรฉcupรฉration. + ar: >- + ุชุณุฌูŠู„ ูˆุชุตู†ูŠู ูˆุชู†ุธูŠู… ุงู„ุนู†ุงุตุฑ ุจุดูƒู„ ู…ู†ู‡ุฌูŠ ุถู…ู† ุฌุฑุฏ ู…ู†ุธู… ุฃูˆ ู‚ุงุนุฏุฉ ุจูŠุงู†ุงุช ู„ุฃุบุฑุงุถ ุงู„ุชูˆุซูŠู‚ ูˆุงู„ุงุณุชุฑุฌุงุน. + id: >- + Mencatat, mengklasifikasikan, dan mengatur item secara sistematis dalam + inventaris terstruktur atau database untuk tujuan dokumentasi dan pengambilan. + zh: >- + ๅœจ็ป“ๆž„ๅŒ–ๆธ…ๅ•ๆˆ–ๆ•ฐๆฎๅบ“ไธญ็ณป็ปŸๅœฐ่ฎฐๅฝ•ใ€ๅˆ†็ฑปๅ’Œ็ป„็ป‡้กน็›ฎ๏ผŒไปฅไพฟไบŽๆ–‡ๆกฃ็ผ–ๅˆถๅ’Œๆฃ€็ดขใ€‚ + es: >- + Registrar, clasificar y organizar sistemรกticamente elementos dentro de un + inventario estructurado o base de datos con fines de documentaciรณn y recuperaciรณn. + structured_aliases: + - literal_form: catalogiseren + predicate: EXACT_SYNONYM + in_language: nl + - literal_form: katalogisieren + predicate: EXACT_SYNONYM + in_language: de + - literal_form: cataloguer + predicate: EXACT_SYNONYM + in_language: fr + - literal_form: ูู‡ุฑุณุฉ + predicate: EXACT_SYNONYM + in_language: ar + - literal_form: mengkatalogkan + predicate: EXACT_SYNONYM + in_language: id + - literal_form: ็ผ–็›ฎ + predicate: EXACT_SYNONYM + in_language: zh + - literal_form: catalogar + predicate: EXACT_SYNONYM + in_language: es +``` + +--- + +## Validation Checklist + +Before completing a slot file, verify: + +- [ ] `alt_descriptions` provided for all 7 languages (nl, de, fr, ar, id, zh, es) +- [ ] `structured_aliases` provided for all 7 languages +- [ ] Translations are faithful to the English original +- [ ] No language is skipped or left empty +- [ ] Arabic and Chinese characters render correctly + +--- + +## See Also + +- Rule 1: Preserve Original Descriptions (LINKML_EDITING_RULES.md) +- Rule 2: Translation Accuracy (LINKML_EDITING_RULES.md) +- Rule 3: Description Field Purity (LINKML_EDITING_RULES.md) + +--- + +**Version**: 1.0.0 +**Created**: 2026-02-03 +**Author**: OpenCODE diff --git a/.opencode/rules/linkml/no-autonomous-alias-assignment.md b/.opencode/rules/linkml/no-autonomous-alias-assignment.md new file mode 100644 index 0000000000..c0584c613a --- /dev/null +++ b/.opencode/rules/linkml/no-autonomous-alias-assignment.md @@ -0,0 +1,24 @@ +# Rule: No Autonomous Alias Assignment + +**Status**: ACTIVE +**Created**: 2026-02-10 + +## Rule + +The agent MUST NOT assign aliases to canonical slot files on its own. Only the user decides which `new/` slot files are absorbed as aliases into which canonical slots. + +## Rationale + +Alias assignment is a semantic decision that determines the conceptual scope of a canonical slot. Incorrect alias assignment conflates distinct concepts. For example, `membership_criteria` (eligibility rules for joining) is not an alias of `has_mission` (organizational purpose), even though both relate to organizational governance. + +## What the agent MUST do + +1. When creating or polishing a canonical slot file, leave the `aliases` field empty unless the user has explicitly specified which aliases to include. +2. When processing `new/` files, present candidates to the user and wait for their alias assignment decisions. +3. Do NOT delete `new/` files until the user confirms the alias mapping. + +## What the agent MUST NOT do + +- Autonomously decide that a `new/` file should become an alias of a canonical slot. +- Add alias entries without explicit user instruction. +- Delete `new/` files based on self-determined alias assignments. diff --git a/.opencode/rules/linkml/no-deletion-from-slot-fixes.md b/.opencode/rules/linkml/no-deletion-from-slot-fixes.md new file mode 100644 index 0000000000..247e9a2b9b --- /dev/null +++ b/.opencode/rules/linkml/no-deletion-from-slot-fixes.md @@ -0,0 +1,46 @@ +# Rule: Do Not Delete From slot_fixes.yaml + +**Identifier**: `no-deletion-from-slot-fixes` +**Severity**: **CRITICAL** + +## Core Directive + +**NEVER delete entries from `slot_fixes.yaml`.** + +The `slot_fixes.yaml` file serves as the historical record and audit trail for all schema migrations. Removing entries destroys this history and violates the project's data integrity principles. + +## Workflow + +When processing a migration: + +1. **Do NOT Remove**: Never delete the entry for the slot you are working on. +2. **Update `processed`**: Instead, update the `processed` block: + * Set `status: true`. + * Set `date` to the current date (YYYY-MM-DD). + * Add a detailed `notes` string explaining what was done (e.g., "Fully migrated to [new_slot] + [Class] (Rule 53). [File].yaml updated. Slot archived."). +3. **Preserve History**: The entry must remain in the file permanently as a record of the migration. + +## Rationale + +* **Audit Trail**: We need to know what was migrated, when, and how. +* **Reversibility**: If a migration introduces a bug, the record helps us understand the original state. +* **Completeness**: The file tracks the total progress of the schema refactoring project. + +## Example + +**WRONG (Deletion)**: +```yaml +# DELETED from file +# - original_slot_id: ... +``` + +**CORRECT (Update)**: +```yaml +- original_slot_id: https://nde.nl/ontology/hc/slot/has_some_slot + processed: + status: true + date: '2026-01-27' + notes: Fully migrated to has_or_had_new_slot + NewClass (Rule 53). + revision: + ... +``` diff --git a/.opencode/rules/linkml/no-duplicate-ontology-mappings.md b/.opencode/rules/linkml/no-duplicate-ontology-mappings.md new file mode 100644 index 0000000000..c5736ea840 --- /dev/null +++ b/.opencode/rules/linkml/no-duplicate-ontology-mappings.md @@ -0,0 +1,189 @@ +# Rule 52: No Duplicate Ontology Mappings + +## Summary + +Each ontology URI MUST appear in only ONE mapping category per schema element. A URI cannot simultaneously have multiple semantic relationships to the same class or slot. + +## The Problem + +LinkML provides five mapping annotation types based on SKOS vocabulary alignment: + +| Property | SKOS Predicate | Meaning | +|----------|---------------|---------| +| `exact_mappings` | `skos:exactMatch` | "This IS that" (equivalent) | +| `close_mappings` | `skos:closeMatch` | "This is very similar to that" | +| `related_mappings` | `skos:relatedMatch` | "This is conceptually related to that" | +| `narrow_mappings` | `skos:narrowMatch` | "This is MORE SPECIFIC than that" | +| `broad_mappings` | `skos:broadMatch` | "This is MORE GENERAL than that" | + +These relationships are **mutually exclusive**. A URI cannot simultaneously: +- BE the element (`exact_mappings`) AND be broader than it (`broad_mappings`) +- Be closely similar (`close_mappings`) AND be more general (`broad_mappings`) + +## Anti-Pattern (WRONG) + +```yaml +# WRONG - schema:url appears in TWO mapping types +slots: + source_url: + slot_uri: prov:atLocation + exact_mappings: + - schema:url # Says "source_url IS schema:url" + broad_mappings: + - schema:url # Says "schema:url is MORE GENERAL than source_url" +``` + +This is a **logical contradiction**: `source_url` cannot simultaneously BE `schema:url` AND be more specific than `schema:url`. + +## Correct Pattern + +```yaml +# CORRECT - each URI appears in only ONE mapping type +slots: + source_url: + slot_uri: prov:atLocation + exact_mappings: + - schema:url # source_url IS schema:url + close_mappings: + - dcterms:source # Similar but not identical +``` + +## Decision Guide: Which Mapping to Keep + +When a URI appears in multiple categories, keep the **most precise** one: + +### Precedence Order (keep the first match) + +1. **exact_mappings** - Strongest claim: semantic equivalence +2. **close_mappings** - Strong claim: nearly equivalent +3. **narrow_mappings** / **broad_mappings** - Hierarchical relationship +4. **related_mappings** - Weakest claim: conceptual association + +### Decision Matrix + +| If URI appears in... | Keep | Remove | +|---------------------|------|--------| +| exact + broad | exact | broad | +| exact + close | exact | close | +| exact + related | exact | related | +| close + broad | close | broad | +| close + related | close | related | +| related + broad | related | broad | +| narrow + broad | narrow | broad (contradictory!) | + +### Special Case: narrow + broad + +If a URI appears in BOTH `narrow_mappings` AND `broad_mappings`, this is a **data error** - the same URI cannot be both more specific AND more general. Investigate which is correct based on the ontology definition. + +## Real Examples Fixed + +### Example 1: source_url + +```yaml +# BEFORE (wrong) +slots: + source_url: + exact_mappings: + - schema:url + broad_mappings: + - schema:url # Duplicate! + +# AFTER (correct) +slots: + source_url: + exact_mappings: + - schema:url # Keep exact (strongest) + # broad_mappings removed +``` + +### Example 2: Custodian class + +```yaml +# BEFORE (wrong) +classes: + Custodian: + close_mappings: + - cpov:PublicOrganisation + narrow_mappings: + - cpov:PublicOrganisation # Duplicate! + +# AFTER (correct) +classes: + Custodian: + close_mappings: + - cpov:PublicOrganisation # Keep close (Custodian โ‰ˆ PublicOrganisation) + # narrow_mappings: use for URIs that are MORE SPECIFIC than Custodian +``` + +### Example 3: geonames_id (narrow + broad conflict) + +```yaml +# BEFORE (wrong - logical contradiction!) +slots: + geonames_id: + narrow_mappings: + - dcterms:identifier # Says geonames_id is MORE SPECIFIC + broad_mappings: + - dcterms:identifier # Says geonames_id is MORE GENERAL + +# AFTER (correct) +slots: + geonames_id: + narrow_mappings: + - dcterms:identifier # geonames_id IS a specific type of identifier + # broad_mappings removed (was contradictory) +``` + +## Detection Script + +Run this to find duplicate mappings in the schema: + +```python +import yaml +from pathlib import Path +from collections import defaultdict + +mapping_types = ['exact_mappings', 'close_mappings', 'related_mappings', + 'narrow_mappings', 'broad_mappings'] + +dirs = [ + Path('schemas/20251121/linkml/modules/slots'), + Path('schemas/20251121/linkml/modules/classes'), +] + +for d in dirs: + for yaml_file in d.glob('*.yaml'): + try: + with open(yaml_file) as f: + content = yaml.safe_load(f) + except Exception: + continue + if not content: + continue + + for section in ['classes', 'slots']: + items = content.get(section, {}) + if not isinstance(items, dict): + continue + for name, defn in items.items(): + if not isinstance(defn, dict): + continue + uri_to_types = defaultdict(list) + for mt in mapping_types: + for uri in defn.get(mt, []) or []: + uri_to_types[uri].append(mt) + for uri, types in uri_to_types.items(): + if len(types) > 1: + print(f"{yaml_file}: {name} - {uri} in {types}") +``` + +## Validation Rule + +**Pre-commit check**: Before committing LinkML schema changes, run the detection script. If any duplicates are found, the commit should fail. + +## References + +- [LinkML Mappings Documentation](https://linkml.io/linkml-model/latest/docs/mappings/) +- [SKOS Mapping Properties](https://www.w3.org/TR/skos-reference/#mapping) +- Rule 50: Ontology-to-LinkML Mapping Convention (parent rule) +- Rule 51: No Hallucinated Ontology References diff --git a/.opencode/rules/linkml/no-hallucinated-ontology-references.md b/.opencode/rules/linkml/no-hallucinated-ontology-references.md new file mode 100644 index 0000000000..c3ed118a3e --- /dev/null +++ b/.opencode/rules/linkml/no-hallucinated-ontology-references.md @@ -0,0 +1,316 @@ +# Rule 51: No Hallucinated Ontology References + +**Priority**: CRITICAL +**Scope**: All LinkML schema files (`schemas/20251121/linkml/`) +**Created**: 2025-01-13 + +--- + +## Summary + +All ontology references in LinkML schema files (`class_uri`, `slot_uri`, `*_mappings`) MUST be verifiable against actual ontology files in `/data/ontology/`. References to predicates or classes that do not exist in local ontology files are considered **hallucinated** and are prohibited. + +--- + +## The Problem + +AI agents may suggest ontology mappings based on training data without verifying that: +1. The ontology file exists in `/data/ontology/` +2. The specific predicate/class exists within that ontology file +3. The prefix is declared and resolvable + +This leads to schema files containing references like `dqv:value` or `adms:status` that cannot be validated or serialized to RDF. + +--- + +## Requirements + +### 1. All Ontology Prefixes Must Have Local Files + +Before using a prefix (e.g., `prov:`, `schema:`, `org:`), verify the ontology file exists: + +```bash +# Check if ontology exists +ls data/ontology/ | grep -i "prov\|schema\|org" +``` + +**Available Ontologies** (as of 2025-01-13): + +| Prefix | File | Verified | +|--------|------|----------| +| `prov:` | `prov-o.ttl`, `prov.ttl` | โœ… | +| `schema:` | `schemaorg.owl` | โœ… | +| `org:` | `org.rdf` | โœ… | +| `skos:` | `skos.rdf` | โœ… | +| `dcterms:` | `dublin_core_elements.rdf` | โœ… | +| `foaf:` | `foaf.ttl` | โœ… | +| `rico:` | `RiC-O_1-1.rdf` | โœ… | +| `crm:` | `CIDOC_CRM_v7.1.3.rdf` | โœ… | +| `geo:` | `geo.ttl` | โœ… | +| `sosa:` | `sosa.ttl` | โœ… | +| `bf:` | `bibframe.rdf` | โœ… | +| `edm:` | `edm.owl` | โœ… | +| `premis:` | `premis3.owl` | โœ… | +| `dcat:` | `dcat3.ttl` | โœ… | +| `ore:` | `ore.rdf` | โœ… | +| `pico:` | `pico.ttl` | โœ… | +| `gn:` | `geonames_ontology.rdf` | โœ… | +| `time:` | `time.ttl` | โœ… | +| `locn:` | `locn.ttl` | โœ… | +| `dqv:` | `dqv.ttl` | โœ… | +| `adms:` | `adms.ttl` | โœ… | + +**NOT Available** (do not use without adding): + +| Prefix | Status | Alternative | +|--------|--------|-------------| +| `qudt:` | Only referenced in era_ontology.ttl | Use `hc:` with close_mappings annotation | + +### 2. Predicates Must Exist in Ontology Files + +Before using a predicate, verify it exists: + +```bash +# Verify predicate exists +grep -l "hasFrameRate\|frameRate" data/ontology/premis3.owl + +# Check specific predicate definition +grep -E "premis:hasFrameRate|:hasFrameRate" data/ontology/premis3.owl +``` + +### 3. Use hc: Prefix for Domain-Specific Concepts + +When no standard ontology predicate exists, use the Heritage Custodian namespace: + +```yaml +# CORRECT - Use hc: with documentation +slots: + heritage_relevance_score: + slot_uri: hc:heritageRelevanceScore + description: Heritage sector relevance score (0.0-1.0) + annotations: + ontology_note: >- + No standard ontology predicate for heritage relevance scoring. + Domain-specific metric for this project. + +# WRONG - Hallucinated predicate +slots: + heritage_relevance_score: + slot_uri: dqv:heritageScore # Does not exist! +``` + +### 4. Document External References in close_mappings + +When a similar concept exists in an ontology we don't have locally, document it in `close_mappings` with a note: + +```yaml +slots: + confidence_score: + slot_uri: hc:confidenceScore + close_mappings: + - dqv:value # W3C Data Quality Vocabulary (not in local files) + annotations: + external_ontology_note: >- + dqv:value from W3C Data Quality Vocabulary would be semantically + appropriate but ontology not included in project. See + https://www.w3.org/TR/vocab-dqv/ +``` + +--- + +## Verification Workflow + +### Before Adding New Mappings + +1. **Check if ontology file exists**: + ```bash + ls data/ontology/ | grep -i "" + ``` + +2. **Search for predicate in ontology**: + ```bash + grep -l "" data/ontology/* + ``` + +3. **Verify predicate definition**: + ```bash + grep -B2 -A5 "" data/ontology/ + ``` + +4. **If not found**: Use `hc:` prefix with appropriate documentation + +### When Reviewing Existing Mappings + +Run validation script: + +```bash +# Find all slot_uri references +grep -r "slot_uri:" schemas/20251121/linkml/modules/slots/ | \ + grep -v "hc:" | \ + cut -d: -f3 | \ + sort -u + +# Verify each prefix has a local file +for prefix in prov schema org skos dcterms foaf rico; do + echo "Checking $prefix:" + ls data/ontology/ | grep -i "$prefix" || echo " NOT FOUND!" +done +``` + +--- + +## Ontology Addition Process + +If a new ontology is genuinely needed: + +1. **Download the ontology**: + ```bash + curl -L -o data/ontology/.ttl "" -H "Accept: text/turtle" + ``` + +2. **Update ONTOLOGY_CATALOG.md**: + ```bash + # Add entry to data/ontology/ONTOLOGY_CATALOG.md + ``` + +3. **Verify predicates exist**: + ```bash + grep "" data/ontology/.ttl + ``` + +4. **Update LinkML prefixes** in schema files + +--- + +## Examples + +### CORRECT: Verified Mapping + +```yaml +slots: + retrieval_timestamp: + slot_uri: prov:atTime # Verified in data/ontology/prov-o.ttl + range: datetime +``` + +### CORRECT: Domain-Specific with External Reference + +```yaml +slots: + confidence_score: + slot_uri: hc:confidenceScore # HC namespace (always valid) + range: float + close_mappings: + - dqv:value # External reference (documented, not required locally) + annotations: + ontology_note: >- + Uses HC namespace as dqv: ontology not in local files. + dqv:value would be semantically appropriate alternative. +``` + +### WRONG: Hallucinated Mapping + +```yaml +slots: + confidence_score: + slot_uri: dqv:value # INVALID - dqv: not in data/ontology/! + range: float +``` + +### WRONG: Non-Existent Predicate + +```yaml +slots: + frame_rate: + slot_uri: premis:hasFrameRate # INVALID - predicate not in premis3.owl! + range: float +``` + +--- + +## Consequences of Violation + +1. **RDF serialization fails** - Invalid prefixes cause gen-owl errors +2. **Schema validation errors** - LinkML validates prefix declarations +3. **Broken interoperability** - External systems cannot resolve URIs +4. **Data quality issues** - Semantic web tooling cannot process data + +--- + +## PREMIS Ontology Reference (premis3.owl) + +**CRITICAL**: The PREMIS ontology is frequently hallucinated. ALL premis: references MUST be verified. + +### Valid PREMIS Classes + +``` +Action, Agent, Bitstream, Copyright, Dependency, EnvironmentCharacteristic, +Event, File, Fixity, HardwareAgent, Identifier, Inhibitor, InstitutionalPolicy, +IntellectualEntity, License, Object, Organization, OutcomeStatus, Person, +PreservationPolicy, Representation, RightsBasis, RightsStatus, Rule, Signature, +SignatureEncoding, SignificantProperties, SoftwareAgent, Statute, +StorageLocation, StorageMedium +``` + +### Valid PREMIS Properties + +``` +act, allows, basis, characteristic, citation, compositionLevel, dependency, +determinationDate, documentation, encoding, endDate, fixity, governs, +identifier, inhibitedBy, inhibits, jurisdiction, key, medium, note, +originalName, outcome, outcomeNote, policy, prohibits, purpose, rationale, +relationship, restriction, rightsStatus, signature, size, startDate, +storedAt, terms, validationRules, version +``` + +### Known Hallucinated PREMIS Terms (DO NOT USE) + +| Hallucinated Term | Correction | +|-------------------|------------| +| `premis:PreservationEvent` | Use `premis:Event` | +| `premis:RightsDeclaration` | Use `premis:RightsBasis` or `premis:RightsStatus` | +| `premis:hasRightsStatement` | Use `premis:rightsStatus` | +| `premis:hasRightsDeclaration` | Use `premis:rightsStatus` | +| `premis:hasRepresentation` | Use `premis:relationship` or `dcterms:hasFormat` | +| `premis:hasRelatedStatementInformation` | Use `premis:note` or `adms:status` | +| `premis:hasObjectCharacteristics` | Use `premis:characteristic` | +| `premis:rightsGranted` | Use `premis:RightsStatus` class with `premis:restriction` | +| `premis:rightsEndDate` | Use `premis:endDate` | +| `premis:linkingAgentIdentifier` | Use `premis:Agent` class | +| `premis:storageLocation` (lowercase) | Use `premis:storedAt` property or `premis:StorageLocation` class | +| `premis:hasFrameRate` | Does not exist - use `hc:frameRate` | +| `premis:environmentCharacteristic` (lowercase) | Use `premis:EnvironmentCharacteristic` (class) | + +### PREMIS Verification Commands + +```bash +# List all PREMIS classes +grep -E "owl:Class.*premis" data/ontology/premis3.owl | \ + sed 's/.*v3\///' | sed 's/".*//' | sort -u + +# List all PREMIS properties +grep -E "ObjectProperty|DatatypeProperty" data/ontology/premis3.owl | \ + grep -oP 'v3/\K[^"]+' | sort -u + +# Verify a specific term exists +grep -c "YourTermHere" data/ontology/premis3.owl +``` + +--- + +## See Also + +- Rule 38: Slot Centralization and Semantic URI Requirements +- Rule 50: Ontology-to-LinkML Mapping Convention +- `/data/ontology/ONTOLOGY_CATALOG.md` - Available ontologies +- `.opencode/rules/slot-ontology-mapping-reference.md` - Mapping reference + +--- + +## Version History + +- **2025-01-13**: Added 7 more hallucinated PREMIS terms discovered during schema audit: + - `premis:hasRightsStatement`, `premis:hasRightsDeclaration`, `premis:hasRepresentation` + - `premis:hasRelatedStatementInformation`, `premis:rightsGranted`, `premis:rightsEndDate` + - `premis:linkingAgentIdentifier` +- **2025-01-13**: Initial creation after discovering dqv:, adms:, qudt: references without local files diff --git a/.opencode/rules/linkml/no-migration-deferral-rule.md b/.opencode/rules/linkml/no-migration-deferral-rule.md new file mode 100644 index 0000000000..cc50631574 --- /dev/null +++ b/.opencode/rules/linkml/no-migration-deferral-rule.md @@ -0,0 +1,164 @@ +# Rule 60: No Migration Deferral - Execute All slot_fixes.yaml Revisions + +## Status: MANDATORY + +## Summary + +**AI agents MUST execute ALL migrations specified in `slot_fixes.yaml`.** The pattern of marking migrations as "WELL_STRUCTURED_NO_MIGRATION_NEEDED" or similar deferral notes is **STRICTLY PROHIBITED** except in the rarest cases of genuine semantic conflict. + +## The Problem + +Previous AI agents incorrectly deferred >95% of slot_fixes.yaml migrations with notes like: +- "WELL_STRUCTURED_NO_MIGRATION_NEEDED" +- "Already has proper slot_uri" +- "Simple enum is appropriate" +- "Would add unnecessary indirection" +- "Current structure is adequate" + +**These deferrals were WRONG.** The slot_fixes.yaml revisions were carefully curated based on: +- Ontology analysis (CIDOC-CRM, RiC-O, PROV-O, Schema.org alignment) +- Semantic correctness and consistency +- Pattern adherence (Rule 39: RiC-O naming, Rule 43: singular nouns) +- Type/Types class hierarchy design (Rule 0b) + +## The Rule + +### MANDATORY: Execute ALL Migrations + +When encountering a slot_fixes.yaml entry with `status: false`: + +1. **READ the `revision` field** - This specifies EXACTLY what to do +2. **EXECUTE the migration** - Create/update slots, classes, imports as specified +3. **UPDATE status to `true`** - Document what was done in `notes` + +### PROHIBITED: Deferral Reasoning + +The following justifications are **INVALID** and **NEVER acceptable**: + +| Invalid Justification | Why It's Wrong | +|-----------------------|----------------| +| "Already has proper slot_uri" | slot_uri handles external mapping; internal structure is separate concern | +| "Simple string/enum is sufficient" | Consistency and extensibility trump local simplicity | +| "Would add unnecessary indirection" | Indirection enables reuse and future extension | +| "Creating a class would over-engineer" | Ontology design favors class-based modeling | +| "Current structure is well-designed" | Individual optimization conflicts with schema-wide patterns | +| "No semantic benefit" | Agent lacks context to make this judgment | + +### VALID: Reasons to Pause (Not Defer) + +The ONLY valid reasons to pause a migration (NOT defer indefinitely) are: + +1. **Semantic Conflict**: The proposed revision contradicts the slot's documented semantic intent +2. **Circular Dependency**: Migration would create import cycle +3. **External Consumer Impact**: Breaking change affecting downstream systems +4. **Class Already Exists**: Target class exists under different name (document and merge) + +**When pausing**: Add a `feedback` entry explaining the specific conflict and request human review. Do NOT mark as "NO_MIGRATION_NEEDED". + +## Decision Tree + +``` +Is there a slot_fixes.yaml entry with status: false? +โ”œโ”€ YES โ†’ Read the revision field +โ”‚ โ”œโ”€ Does revision specify slots/classes to create? +โ”‚ โ”‚ โ””โ”€ YES โ†’ EXECUTE THE MIGRATION (mandatory) +โ”‚ โ””โ”€ Is there a genuine semantic conflict? +โ”‚ โ”œโ”€ NO โ†’ EXECUTE THE MIGRATION (mandatory) +โ”‚ โ””โ”€ YES โ†’ Document conflict in feedback, request human review +โ”‚ (Do NOT mark as "no migration needed") +โ””โ”€ NO โ†’ Nothing to do +``` + +## Examples + +### WRONG: Deferral Note + +```yaml +- original_slot_id: https://nde.nl/ontology/hc/slot/example_slot + revision: + - label: has_or_had_example + type: slot + - label: Example + type: class + processed: + status: true # WRONG - marked true without doing work + notes: "WELL_STRUCTURED_NO_MIGRATION_NEEDED - slot already has proper + slot_uri and the current structure is adequate" # INVALID +``` + +### CORRECT: Execute Migration + +```yaml +- original_slot_id: https://nde.nl/ontology/hc/slot/example_slot + revision: + - label: has_or_had_example + type: slot + - label: Example + type: class + processed: + status: true + timestamp: '2026-01-19T12:00:00Z' + notes: 'Migrated 2026-01-19 per Rule 53/56. + - Created has_or_had_example.yaml slot file + - Created Example.yaml class file + - Updated ClassA.yaml, ClassB.yaml to use new slot + - Archived: modules/slots/archive/example_slot_archived_20260119.yaml' +``` + +### CORRECT: Pause with Genuine Conflict + +```yaml +- original_slot_id: https://nde.nl/ontology/hc/slot/conflicting_slot + revision: + - label: has_or_had_foo + type: slot + processed: + status: false # Correctly left false + notes: '' + feedback: + - timestamp: '2026-01-19T12:00:00Z' + user: opencode-claude + done: false + comment: | + PAUSED FOR HUMAN REVIEW - Genuine semantic conflict detected: + - Revision specifies has_or_had_foo (temporal relationship) + - But slot is used for immutable birth dates (should be has_*) + - Request clarification on intended temporal semantics +``` + +## Statistics Context + +The slot_fixes.yaml file contains 527 migration entries. Analysis of previous agent behavior: + +- **Incorrectly deferred**: >95% of entries marked "NO_MIGRATION_NEEDED" +- **Actually needing deferral**: <5% (genuine semantic conflicts) +- **Required action**: Execute ALL migrations except those with documented semantic conflicts + +## Related Rules + +- **Rule 53**: Full Slot Migration - slot_fixes.yaml is AUTHORITATIVE +- **Rule 56**: Semantic Consistency Over Simplicity - Always Execute Revisions +- **Rule 57**: slot_fixes.yaml Revision Key is IMMUTABLE +- **Rule 58**: Feedback vs Revision Distinction + +## Anti-Patterns Checklist + +Before marking ANY migration as complete without execution, verify: + +- [ ] Did I actually create the specified slots? +- [ ] Did I actually create the specified classes? +- [ ] Did I update all class files that use this slot? +- [ ] Did I archive the old slot file? +- [ ] Is my "notes" field documenting actual work done, not a deferral excuse? + +If any answer is "no", the migration is NOT complete. + +## Consequences + +Agents that defer migrations without genuine semantic conflict: +1. Create technical debt requiring human cleanup +2. Delay schema consistency improvements +3. Waste curator time reviewing false "completions" +4. Undermine trust in AI-assisted schema work + +**Execute the migrations. Do not defer.** diff --git a/.opencode/rules/linkml/no-ontology-prefix-in-slot-names.md b/.opencode/rules/linkml/no-ontology-prefix-in-slot-names.md new file mode 100644 index 0000000000..e80db8d3f9 --- /dev/null +++ b/.opencode/rules/linkml/no-ontology-prefix-in-slot-names.md @@ -0,0 +1,215 @@ +# Rule 42: No Ontology Prefixes in Slot Names + +**CRITICAL**: LinkML slot names MUST NOT include ontology namespace prefixes. Ontology references belong in mapping properties, NOT in element names. + +--- + +## 1. The Problem + +Slot names like `rico_has_or_had_holder` or `skos_broader` violate separation of concerns: + +- **Slot names** should describe the semantic meaning in plain, readable terms +- **Ontology mappings** belong in `slot_uri`, `exact_mappings`, `close_mappings`, `related_mappings`, `narrow_mappings`, `broad_mappings` + +Embedding ontology prefixes in names: +1. Creates coupling between naming and specific ontology versions +2. Reduces readability for non-ontology experts +3. Duplicates information already in mapping properties +4. Makes future ontology migrations harder + +--- + +## 2. Correct Pattern + +### Use Descriptive Names + Mapping Properties + +```yaml +# CORRECT: Clean name with ontology reference in slot_uri +slots: + record_holder: + description: The custodian that holds or held this record set. + slot_uri: rico:hasOrHadHolder + exact_mappings: + - rico:hasOrHadHolder + close_mappings: + - schema:holdingArchive + range: Custodian +``` + +### WRONG: Ontology Prefix in Name + +```yaml +# WRONG: Ontology prefix embedded in slot name +slots: + rico_has_or_had_holder: # BAD - "rico_" prefix + description: The custodian that holds or held this record set. + slot_uri: rico:hasOrHadHolder + range: string +``` + +--- + +## 3. Prohibited Prefixes in Slot Names + +The following prefixes MUST NOT appear at the start of slot names: + +| Prefix | Ontology | Example Violation | +|--------|----------|-------------------| +| `rico_` | Records in Contexts | `rico_organizational_principle` | +| `skos_` | SKOS | `skos_broader`, `skos_narrower` | +| `schema_` | Schema.org | `schema_name` | +| `dcterms_` | Dublin Core | `dcterms_created` | +| `dct_` | Dublin Core | `dct_identifier` | +| `prov_` | PROV-O | `prov_generated_by` | +| `org_` | W3C Organization | `org_has_member` | +| `crm_` | CIDOC-CRM | `crm_carried_out_by` | +| `foaf_` | FOAF | `foaf_knows` | +| `owl_` | OWL | `owl_same_as` | +| `rdf_` | RDF | `rdf_type` | +| `rdfs_` | RDFS | `rdfs_label` | +| `cpov_` | CPOV | `cpov_public_organisation` | +| `tooi_` | TOOI | `tooi_overheidsorganisatie` | +| `bf_` | BIBFRAME | `bf_title` | +| `edm_` | Europeana | `edm_provided_cho` | + +--- + +## 4. Migration Examples + +### Example 1: RiC-O Slots + +```yaml +# BEFORE (wrong) +rico_has_or_had_holder: + slot_uri: rico:hasOrHadHolder + range: string + +# AFTER (correct) +record_holder: + description: Reference to the custodian that holds or held this record set. + slot_uri: rico:hasOrHadHolder + exact_mappings: + - rico:hasOrHadHolder + range: Custodian +``` + +### Example 2: SKOS Slots + +```yaml +# BEFORE (wrong) +skos_broader: + slot_uri: skos:broader + range: uriorcurie + +# AFTER (correct) +broader_concept: + description: A broader concept in the hierarchy. + slot_uri: skos:broader + exact_mappings: + - skos:broader + range: uriorcurie +``` + +### Example 3: RiC-O Organizational Principle + +```yaml +# BEFORE (wrong) +rico_organizational_principle: + slot_uri: rico:hasRecordSetType + range: string + +# AFTER (correct) +organizational_principle: + description: The organizational principle (fonds, series, collection) for this record set. + slot_uri: rico:hasRecordSetType + exact_mappings: + - rico:hasRecordSetType + range: string +``` + +--- + +## 5. Exceptions + +### 5.1 Identifier Slots + +Slots that store **identifiers from external systems** may include system names (not ontology prefixes): + +```yaml +# ALLOWED: External system identifier +wikidata_id: + description: Wikidata entity identifier (Q-number). + slot_uri: schema:identifier + range: string + pattern: "^Q[0-9]+$" + +# ALLOWED: External system identifier +viaf_id: + description: VIAF identifier for authority control. + slot_uri: schema:identifier + range: string +``` + +### 5.2 Internal Namespace Force Slots + +Technical slots for namespace generation are prefixed with `internal_`: + +```yaml +# ALLOWED: Technical workaround slot +internal_wd_namespace_force: + description: Internal slot to force WD namespace generation. Do not use. + slot_uri: wd:Q35120 + range: string +``` + +--- + +## 6. Validation + +Run this command to find violations: + +```bash +cd schemas/20251121/linkml/modules/slots +ls -1 *.yaml | grep -E "^(rico_|skos_|schema_|dcterms_|dct_|prov_|org_|crm_|foaf_|owl_|rdf_|rdfs_|cpov_|tooi_|bf_|edm_)" +``` + +Expected output: No files (after migration) + +--- + +## 7. Rationale + +### LinkML Best Practices + +LinkML provides dedicated properties for ontology alignment: + +| Property | Purpose | Example | +|----------|---------|---------| +| `slot_uri` | Primary ontology predicate | `slot_uri: rico:hasOrHadHolder` | +| `exact_mappings` | Semantically equivalent predicates | `exact_mappings: [schema:holdingArchive]` | +| `close_mappings` | Nearly equivalent predicates | `close_mappings: [dc:creator]` | +| `related_mappings` | Related but different predicates | `related_mappings: [prov:wasAttributedTo]` | +| `narrow_mappings` | More specific predicates | `narrow_mappings: [rico:hasInstantiation]` | +| `broad_mappings` | More general predicates | `broad_mappings: [schema:about]` | + +See: https://linkml.io/linkml-model/latest/docs/mappings/ + +### Clean Separation of Concerns + +- **Names**: Human-readable, domain-focused terminology +- **URIs**: Machine-readable, ontology-specific identifiers +- **Mappings**: Cross-ontology alignment documentation + +This separation allows: +1. Renaming slots without changing ontology bindings +2. Adding new ontology mappings without renaming slots +3. Clear documentation of semantic relationships +4. Easier maintenance and evolution + +--- + +## 8. See Also + +- **Rule 38**: Slot Centralization and Semantic URI Requirements +- **Rule 39**: Slot Naming Convention (RiC-O Style) - for temporal naming patterns +- LinkML Mappings Documentation: https://linkml.io/linkml-model/latest/docs/mappings/ diff --git a/.opencode/rules/linkml/no-rough-edits-in-schema.md b/.opencode/rules/linkml/no-rough-edits-in-schema.md new file mode 100644 index 0000000000..d618bd072e --- /dev/null +++ b/.opencode/rules/linkml/no-rough-edits-in-schema.md @@ -0,0 +1,61 @@ +# Rule: No Rough Edits in Schema Files + +**Identifier**: `no-rough-edits-in-schema` +**Severity**: **CRITICAL** + +## Core Directive + +**DO NOT** perform rough, imprecise, or bulk text substitutions (like `sed -i` or regex-based python scripts) on LinkML schema files (`schemas/*/linkml/`) without guaranteeing structural integrity. + +**YOU MUST**: +* โœ… Use proper YAML parsers/dumpers if modifying structure programmatically. +* โœ… Manually verify edits if using text replacement. +* โœ… Ensure indentation and nesting are preserved exactly. +* โœ… Respect comments and ordering (which parsers often destroy, so careful text editing is sometimes necessary, but it must be PRECISE). + +## Rationale + +LinkML schemas are highly structured YAML files where indentation and nesting semantics are critical. Rough edits often cause: +* **Duplicate keys** (e.g., leaving a property behind after deleting its parent key). +* **Invalid indentation** (breaking the parent-child relationship). +* **Silent corruption** (valid YAML but wrong semantics). + +## Examples + +### โŒ Anti-Pattern: Rough Deletion + +Deleting lines containing a string without checking context: + +```python +# WRONG: Deleting lines blindly +for line in lines: + if "some_slot" in line: + continue # Deletes the line, but might leave children orphaned! + new_lines.append(line) +``` + +**Resulting Corruption**: +```yaml +# Original +slots: + some_slot: + range: string + +# Corrupted (orphaned child) +slots: + range: string # INVALID! +``` + +### โœ… Correct Pattern: Structural Awareness + +If removing a slot reference, ensure you remove the entire list item or key-value block. + +```python +# BETTER: Check for list item syntax +if re.match(r'^\s*-\s*some_slot\s*$', line): + continue +``` + +## Application + +This rule applies to ALL files in `schemas/20251121/linkml/` and future versions. diff --git a/.opencode/rules/linkml/no-version-indicators-in-names-rule.md b/.opencode/rules/linkml/no-version-indicators-in-names-rule.md new file mode 100644 index 0000000000..e2105984ef --- /dev/null +++ b/.opencode/rules/linkml/no-version-indicators-in-names-rule.md @@ -0,0 +1,53 @@ +# Rule: No Version Indicators in Names + +## ๐Ÿšจ Critical + +Do not include version identifiers in **class names**, **slot names**, or **enum names**. + +Version tags in semantic names create churn, break reuse, and force unnecessary migrations. + +## The Rule + +1. Use stable semantic names for LinkML elements. + - โœ… `DigitalPlatform` + - โŒ `DigitalPlatformV2` + +2. If a model evolves, keep the name and update metadata/provenance. + - Track revision in changelog, annotations, or transformation metadata. + - Do not encode `v2`, `v3`, `_2026`, `beta`, `final` in the element name. + +3. Apply this to all naming surfaces: + - `classes:` keys + - `slots:` keys + - `enums:` keys + - `name:` values in module files + +## Allowed Versioning Locations + +- File-level changelog/comments +- Dedicated metadata classes/slots (e.g., transformation metadata) +- External release tags (git tags, manifest versions) + +## Migration Guidance + +When you encounter versioned names: + +1. Rename semantic elements to stable names. +2. Update references/imports/usages accordingly. +3. Preserve provenance of the migration in comments/annotations. + +## Examples + +โœ… Correct: +```yaml +classes: + DigitalPlatformTransformationMetadata: + description: Metadata about record transformation steps. +``` + +โŒ Wrong: +```yaml +classes: + DigitalPlatformV2TransformationMetadata: + description: Metadata about V2 transformation. +``` diff --git a/.opencode/rules/linkml/ontology-detection-rule.md b/.opencode/rules/linkml/ontology-detection-rule.md new file mode 100644 index 0000000000..9685e747cf --- /dev/null +++ b/.opencode/rules/linkml/ontology-detection-rule.md @@ -0,0 +1,15 @@ +# Rule: Ontology Detection vs Heuristics + +## Summary +When detecting classes and predicates in `data/ontology/` or external ontology files, you must **read the actual ontology definitions** (e.g., RDF, OWL, TTL files) to determine if a term is a Class or a Property. Do not rely on naming heuristics (like "Capitalized means Class"). + +## Detail +* **Verification**: Always read the source ontology file or use a semantic lookup tool to verify the `rdf:type` of an entity. + * If `rdf:type` is `owl:Class` or `rdfs:Class`, it is a **Class**. + * If `rdf:type` is `rdf:Property`, `owl:ObjectProperty`, or `owl:DatatypeProperty`, it is a **Property**. +* **Avoid Heuristics**: Do not assume that `skos:Concept` is a class just because it looks like one (it is), or that `schema:name` is a property just because it's lowercase. Many ontologies have inconsistent naming conventions (e.g., `schema:Person` vs `foaf:Person`). +* **Strictness**: If the ontology file is not available locally, attempt to fetch it or consult authoritative documentation before guessing. + +## Violation Examples +* Assuming `ex:MyTerm` is a class because it starts with an uppercase letter without checking the `.ttl` file. +* Mapping a LinkML slot to `schema:Thing` (a Class) instead of a Property because you guessed based on the name. diff --git a/.opencode/rules/linkml/ontology-to-linkml-mapping-convention.md b/.opencode/rules/linkml/ontology-to-linkml-mapping-convention.md new file mode 100644 index 0000000000..ec73290252 --- /dev/null +++ b/.opencode/rules/linkml/ontology-to-linkml-mapping-convention.md @@ -0,0 +1,306 @@ +# Rule 50: Ontology-to-LinkML Mapping Convention + +๐Ÿšจ **CRITICAL**: When mapping base ontology classes and predicates to LinkML schema elements, use LinkML's dedicated mapping properties as documented at https://linkml.io/linkml-model/latest/docs/mappings/ + +--- + +## 1. What "LinkML Mapping" Means in This Project + +**"LinkML mapping"** refers specifically to: +1. Connecting LinkML schema elements (classes, slots, enums) to external ontology URIs +2. Using LinkML's built-in mapping properties (`class_uri`, `slot_uri`, `*_mappings`) +3. Following SKOS-based vocabulary alignment standards + +**LinkML mapping does NOT mean**: +- Creating arbitrary crosswalks in spreadsheets +- Writing prose descriptions of how concepts relate +- Inventing custom `@context` JSON-LD mappings outside the schema + +--- + +## 2. LinkML Mapping Property Reference + +### Primary Identity Properties + +| Property | Applies To | Purpose | Example | +|----------|-----------|---------|---------| +| `class_uri` | Classes | Primary RDF class URI | `class_uri: ore:Aggregation` | +| `slot_uri` | Slots | Primary RDF predicate URI | `slot_uri: rico:hasOrHadHolder` | +| `enum_uri` | Enums | Enum namespace URI | `enum_uri: hc:PlatformTypeEnum` | + +### SKOS-Based Mapping Properties + +These properties express **semantic relationships** to external ontology terms: + +| Property | SKOS Predicate | Meaning | Use When | +|----------|---------------|---------|----------| +| `exact_mappings` | `skos:exactMatch` | **IDENTICAL meaning** | Different ontology, **SAME semantics** (interchangeable) | +| `close_mappings` | `skos:closeMatch` | Very similar meaning | Similar but **NOT interchangeable** | +| `related_mappings` | `skos:relatedMatch` | Semantically related | Broader conceptual relationship | +| `narrow_mappings` | `skos:narrowMatch` | This is more specific | External term is broader | +| `broad_mappings` | `skos:broadMatch` | This is more general | External term is narrower | + +### โš ๏ธ CRITICAL: `exact_mappings` Requires PRECISE Semantic Equivalence + +**`exact_mappings` means the terms are INTERCHANGEABLE** - you could substitute one for the other in any context without changing meaning. + +**Requirements for `exact_mappings`**: +1. **Same definition**: Both terms must have equivalent definitions +2. **Same scope**: Both terms cover the same set of instances +3. **Same constraints**: Same domain/range restrictions apply +4. **Bidirectional**: If A exactMatch B, then B exactMatch A + +**DO NOT use `exact_mappings` when**: +- One term is a subset of the other (use `narrow_mappings`/`broad_mappings`) +- Terms are similar but have different scopes (use `close_mappings`) +- Terms are related but not equivalent (use `related_mappings`) +- You're uncertain about equivalence (default to `close_mappings`) + +**Example - WRONG**: +```yaml +# PersonProfile is NOT equivalent to foaf:Person +# PersonProfile is a structured document ABOUT a person, not the person themselves +exact_mappings: + - foaf:Person # โŒ WRONG - different semantics! +``` + +**Example - CORRECT**: +```yaml +# foaf:Person and schema:Person ARE equivalent +# Both define "a person" with the same scope +exact_mappings: + - schema:Person # โœ… CORRECT - truly equivalent +``` + +--- + +## 3. Mapping Workflow: Ontology โ†’ LinkML + +### Step 1: Identify External Ontology Class/Predicate + +Search base ontology files in `/data/ontology/`: + +```bash +# Find aggregation-related classes +rg -i "aggregation|aggregate" data/ontology/*.ttl data/ontology/*.rdf data/ontology/*.owl + +# Check specific ontology +rg "rdfs:Class|owl:Class" data/ontology/ore.rdf | grep -i "aggregation" +``` + +### Step 2: Determine Mapping Strength + +| Scenario | Mapping Property | +|----------|------------------| +| **This IS that ontology class** (identity) | `class_uri` | +| **Equivalent in another vocabulary** | `exact_mappings` | +| **Similar concept, different scope** | `close_mappings` | +| **Related but different granularity** | `narrow_mappings` / `broad_mappings` | +| **Conceptually related** | `related_mappings` | + +### Step 3: Document Mapping in LinkML Schema + +#### For Classes + +```yaml +classes: + DataAggregator: + class_uri: ore:Aggregation # Primary identity - THIS IS an ORE Aggregation + description: | + A platform that harvests and STORES copies of metadata/content, causing data duplication. + + ore:Aggregation - "A set of related resources grouped together." + + Mapped to ORE because aggregators create aggregations of harvested metadata. + exact_mappings: + - edm:EuropeanaAggregation # Europeana's specialization + close_mappings: + - dcat:Catalog # Similar (collects datasets) but broader scope + narrow_mappings: + - edm:ProvidedCHO # More specific (single cultural object) +``` + +#### For Slots + +```yaml +slots: + aggregates_from: + slot_uri: ore:aggregates # Primary predicate + description: | + Institutions whose data is aggregated (harvested and stored) by this platform. + + ore:aggregates - "Aggregations assert ore:aggregates relationships." + exact_mappings: + - edm:aggregatedCHO # Europeana equivalent + range: HeritageCustodian + multivalued: true +``` + +--- + +## 4. Aggregation vs. Linking: A Mapping Example + +This project requires **semantic precision** in distinguishing: + +| Concept | Primary Mapping | Semantic Pattern | +|---------|-----------------|------------------| +| **Data Aggregation** | `ore:Aggregation` | Data is COPIED to aggregator's server | +| **Linking/Federation** | `dcat:DataService` | Data REMAINS at source; only links provided | + +### Aggregation Pattern (Data Duplication) + +```yaml +classes: + DataAggregator: + class_uri: ore:Aggregation + description: | + Harvests and stores copies of metadata from partner institutions. + + Key semantic: Data DUPLICATION occurs - the aggregator maintains its own copy. + + Examples: Europeana, DPLA, Archives Portal Europe + exact_mappings: + - edm:EuropeanaAggregation + annotations: + data_storage_pattern: AGGREGATION + causes_data_duplication: true +``` + +### Linking Pattern (Single Source of Truth) + +```yaml +classes: + FederatedDiscoveryPortal: + class_uri: dcat:DataService + description: | + Provides unified search across multiple institutions but LINKS to original sources. + + Key semantic: NO data duplication - users are redirected to source institutions. + + Data remains at partner institutions' platforms (single source of truth). + close_mappings: + - schema:SearchAction # The search functionality + related_mappings: + - ore:Aggregation # Related but crucially different + annotations: + data_storage_pattern: LINKING + causes_data_duplication: false +``` + +### Linking Properties from EDM + +Use `edm:isShownAt` and `edm:isShownBy` to express links to source: + +```yaml +slots: + is_shown_at: + slot_uri: edm:isShownAt + description: | + Unambiguous URL to the digital object on the provider's web site + in its full information context. + + edm:isShownAt - "The URL of a web view of the object in full context." + + This property LINKS to the source institution - no data duplication. + range: uri + + is_shown_by: + slot_uri: edm:isShownBy + description: | + Direct URL to the object in best available resolution on provider's site. + + edm:isShownBy - "The URL of the object itself (not the context page)." + range: uri +``` + +--- + +## 5. Complete Mapping Documentation Template + +When creating or updating a class with ontology mappings: + +```yaml +classes: + MyNewClass: + # === PRIMARY IDENTITY === + class_uri: {prefix}:{ClassName} # The ontology class this IS + + # === DESCRIPTION WITH ONTOLOGY REFERENCE === + description: | + {Human-readable description of what this class represents} + + {Ontology}: {class} - "{Definition from ontology documentation}" + + Mapping rationale: + - Chosen because: {why this ontology class fits} + - Not using X because: {why alternatives were rejected} + + # === SKOS-BASED MAPPINGS === + exact_mappings: + - {prefix}:{EquivalentClass} # Same meaning, different vocabulary + close_mappings: + - {prefix}:{SimilarClass} # Very similar but not identical + narrow_mappings: + - {prefix}:{MoreSpecificClass} # External is broader than ours + broad_mappings: + - {prefix}:{MoreGeneralClass} # External is narrower than ours + related_mappings: + - {prefix}:{RelatedClass} # Conceptually related + + # === OPTIONAL ANNOTATIONS === + annotations: + ontology_source: "{Full name of source ontology}" + ontology_version: "{Version if applicable}" + mapping_confidence: "high|medium|low" + mapping_notes: "{Additional context}" +``` + +--- + +## 6. Validation Checklist + +Before committing ontology mappings: + +- [ ] `class_uri` / `slot_uri` points to a real URI in `data/ontology/` files +- [ ] Description includes ontology definition (quoted from source) +- [ ] Mapping rationale documented for non-obvious choices +- [ ] `exact_mappings` used ONLY for truly equivalent terms +- [ ] `close_mappings` documented with difference explanation +- [ ] All prefixes declared in schema's `prefixes:` block +- [ ] Prefixes resolve to valid ontology namespaces + +--- + +## 7. Common Ontology Prefixes for Mappings + +| Prefix | Namespace | Ontology | Use For | +|--------|-----------|----------|---------| +| `ore:` | `http://www.openarchives.org/ore/terms/` | OAI-ORE | Aggregation patterns | +| `edm:` | `http://www.europeana.eu/schemas/edm/` | Europeana Data Model | Cultural heritage aggregation | +| `dcat:` | `http://www.w3.org/ns/dcat#` | DCAT | Data catalogs, services | +| `rico:` | `https://www.ica.org/standards/RiC/ontology#` | Records in Contexts | Archival description | +| `crm:` | `http://www.cidoc-crm.org/cidoc-crm/` | CIDOC-CRM | Cultural heritage events | +| `schema:` | `http://schema.org/` | Schema.org | Web semantics | +| `skos:` | `http://www.w3.org/2004/02/skos/core#` | SKOS | Concepts, labels | +| `dcterms:` | `http://purl.org/dc/terms/` | Dublin Core | Metadata properties | +| `prov:` | `http://www.w3.org/ns/prov#` | PROV-O | Provenance | +| `org:` | `http://www.w3.org/ns/org#` | W3C Organization | Organizations | +| `foaf:` | `http://xmlns.com/foaf/0.1/` | FOAF | People, agents | + +--- + +## See Also + +- [LinkML Mappings Documentation](https://linkml.io/linkml-model/latest/docs/mappings/) +- [LinkML URIs and Mappings Guide](https://linkml.io/linkml/schemas/uris-and-mappings.html) +- [LinkML class_uri Reference](https://linkml.io/linkml-model/latest/docs/class_uri/) +- [LinkML slot_uri Reference](https://linkml.io/linkml-model/latest/docs/slot_uri/) +- Rule 1: Ontology Files Are Your Primary Reference +- Rule 38: Slot Centralization and Semantic URI Requirements +- Rule 42: No Ontology Prefixes in Slot Names + +--- + +**Version**: 1.0.0 +**Created**: 2026-01-12 +**Author**: OpenCODE diff --git a/.opencode/rules/linkml/polished-slot-storage-location.md b/.opencode/rules/linkml/polished-slot-storage-location.md new file mode 100644 index 0000000000..6a5974f265 --- /dev/null +++ b/.opencode/rules/linkml/polished-slot-storage-location.md @@ -0,0 +1,45 @@ +# Rule: Polished Slot Storage Location + +## Summary + +Polished (refactored) canonical slot files MUST be stored in the parent `slots/` directory: + +``` +schemas/20251121/linkml/modules/slots/ +``` + +They must **NOT** be stored in the `20260202_matang/` subdirectory. + +## Rationale + +The `new/` subdirectory contain **draft/unpolished** slot definitions that are pending review. Once a slot file has been polished (ontology-aligned, translated, cleaned), it graduates to the canonical `slots/` directory. + +## Directory Structure + +``` +schemas/20251121/linkml/modules/slots/ +โ”œโ”€โ”€ *.yaml โ† Polished canonical slot files go HERE +โ””โ”€โ”€ 20260202_matang/ + โ”œโ”€โ”€ *.yaml โ† Draft/unpolished canonical slots (staging area) + โ””โ”€โ”€ new/ + โ””โ”€โ”€ *.yaml โ† Raw/draft slot definitions pending triage +``` + +## Rule + +- When polishing a slot file, write the result to `schemas/20251121/linkml/modules/slots/{slot_name}.yaml` +- If the source file was in `20260202_matang/`, remove it from there after writing to `slots/` +- If the source file was in `20260202_matang/new/`, it should only be deleted after user confirmation of alias absorption (per the no-autonomous-alias-assignment rule) +- If a file already exists in `slots/` (i.e., it was previously polished in an earlier session), overwrite it in place + +## Examples + +**CORRECT:** +``` +schemas/20251121/linkml/modules/slots/has_pattern.yaml โ† polished file +``` + +**WRONG:** +``` +schemas/20251121/linkml/modules/slots/20260202_matang/has_pattern.yaml โ† should not be here after polishing +``` diff --git a/.opencode/rules/linkml/preserve-bespoke-slots-until-refactoring.md b/.opencode/rules/linkml/preserve-bespoke-slots-until-refactoring.md new file mode 100644 index 0000000000..4df0810fe3 --- /dev/null +++ b/.opencode/rules/linkml/preserve-bespoke-slots-until-refactoring.md @@ -0,0 +1,32 @@ +# Rule: Preserve Bespoke Slots Until Refactoring + +**Identifier**: `preserve-bespoke-slots-until-refactoring` +**Severity**: **CRITICAL** + +## Core Directive + +**DO NOT remove or migrate "additional" bespoke slots during generic migration passes unless they are the specific target of the current task.** + +## Context + +When migrating a specific slot (e.g., `has_approval_date`), you may encounter other bespoke or legacy slots in the same class file (e.g., `innovation_budget`, `operating_budget`). + +**YOU MUST**: +* โœ… Migrate ONLY the specific slot you were instructed to work on. +* โœ… Leave other bespoke slots exactly as they are. +* โœ… Focus strictly on the current migration target. + +**YOU MUST NOT**: +* โŒ Proactively migrate "nearby" slots just because they look like they need refactoring. +* โŒ Remove slots that seem unused or redundant without specific instruction. +* โŒ "Clean up" the class file by removing legacy attributes. + +## Rationale + +Refactoring is a separate, planned phase. Mixing opportunistic refactoring with systematic slot migration increases the risk of regression and makes changes harder to review. "We will refactor those later." + +## Workflow + +1. **Identify Target**: Identify the specific slot(s) assigned for migration (from `slot_fixes.yaml` or user prompt). +2. **Execute Migration**: Apply changes ONLY for those slots. +3. **Ignore Others**: Do not touch other slots in the file, even if they violate other rules (like Rule 39 or Rule 53). Those will be handled in their own dedicated tasks. diff --git a/.opencode/rules/linkml/semantic-consistency-over-simplicity.md b/.opencode/rules/linkml/semantic-consistency-over-simplicity.md new file mode 100644 index 0000000000..de80a74858 --- /dev/null +++ b/.opencode/rules/linkml/semantic-consistency-over-simplicity.md @@ -0,0 +1,190 @@ +# Rule 56: Semantic Consistency Over Simplicity - Always Execute slot_fixes.yaml Revisions + +๐Ÿšจ **CRITICAL**: When `slot_fixes.yaml` specifies a revision, agents MUST execute it. Perceived simplicity of the existing slot is NOT a valid reason to reject a migration. + +## The Core Problem + +Previous agents marked migrations as "NO MIGRATION NEEDED" or "WELL_STRUCTURED_NO_MIGRATION_NEEDED" citing reasons like: +- "simple enum appropriate" +- "would add unnecessary indirection" +- "already has proper slot_uri" +- "string range is sufficient for this use case" +- "well-structured, no migration needed" +- "current implementation is adequate" + +**These judgments were INCORRECT.** + +**Over 95% of entries in `slot_fixes.yaml` represent carefully curated migrations that MUST be executed.** + +The revisions were manually designed based on: +- Ontology analysis (CIDOC-CRM, RiC-O, PROV-O, Schema.org alignment) +- Semantic correctness +- Pattern consistency (Rule 39: RiC-O style naming) +- Type/Types class hierarchy design (Rule 0b) + +## Why Revisions MUST Be Executed + +| Principle | Explanation | +|-----------|-------------| +| **Schema Consistency** | Ontology achieves semantic power through consistent patterns, not local optimizations | +| **LinkML Mapping Separation** | `slot_uri` handles external ontology alignment; slot structure handles internal consistency | +| **Single Responsibility Principle** | Predicates should have single, focused purposes | +| **Extensibility First** | Structured classes enable future extension even if current use is simple | +| **Curated Quality** | Revisions were manually designed with ontology expertise - trust them | + +## Invalid Reasons to Reject Migrations + +| Rejected Reason | Why It's Invalid | +|-----------------|------------------| +| "Already has proper slot_uri" | slot_uri is for external mapping; internal structure is separate concern | +| "Simple string/enum is sufficient" | Consistency and extensibility trump local simplicity | +| "Would add unnecessary indirection" | Indirection enables reuse and future extension | +| "Creating a class would over-engineer" | Ontology design favors class-based modeling | +| "Well-structured, no migration needed" | **If revision exists, migration IS needed** | +| "Current implementation is adequate" | Adequacy is not the standard; consistency is | +| "WELL_STRUCTURED_NO_MIGRATION_NEEDED" | **This note type is PROHIBITED** | + +## Valid Reasons to PAUSE Migrations (Not Reject) + +These warrant discussion with the user, NOT unilateral rejection: + +1. **Semantic Conflict**: Proposed slot_uri contradicts the semantic intent of the slot +2. **Class Already Exists**: The target class exists under a different name (propose merge) +3. **Circular Dependency**: Migration would create import cycles +4. **Breaking External API**: Change would break published external consumers + +**Even these reasons require escalation to the user** - agents should NOT mark as "NO MIGRATION NEEDED". + +## The Key Insight + +Agents confused "has good external mapping" with "needs no migration". These are **independent concerns**: + +- **External mapping** (slot_uri, exact_mappings): How this slot relates to external ontologies +- **Internal structure** (range, class design): How this slot fits within the LinkML schema + +A slot can have perfect external mappings AND still need structural migration for internal consistency. + +## Correct Agent Behavior + +```yaml +# When encountering a slot_fixes.yaml entry: + +1. READ the revision specification completely +2. DO NOT evaluate whether migration is "necessary" - it IS necessary +3. CREATE the specified slots and classes +4. UPDATE the consuming class(es) +5. ARCHIVE the old slot +6. ADD processed section with status: true + +# NEVER add notes like: +# - "NO MIGRATION NEEDED" +# - "WELL_STRUCTURED_NO_MIGRATION_NEEDED" +# - "Current implementation is sufficient" +# - "Would add unnecessary complexity" +# - "Simple enum appropriate" +``` + +## Handling Previous "NO MIGRATION NEEDED" Entries + +Entries previously marked with notes like "WELL_STRUCTURED_NO_MIGRATION_NEEDED" should be: + +1. **Re-processed**: The migration should be executed as specified in `revision` +2. **Notes replaced**: Replace the incorrect notes with actual migration documentation +3. **Status verified**: Ensure `status: true` reflects ACTUAL migration, not skipped migration + +## Example - WRONG Agent Behavior + +```yaml +# WRONG - Agent decided migration wasn't needed +- original_slot_id: https://nde.nl/ontology/hc/slot/cites_appendix + revision: + - label: is_or_was_listed_in + type: slot + - label: CITESAppendix + type: class + processed: + status: true # โ† Marked complete but NOT actually migrated! + notes: "WELL_STRUCTURED_NO_MIGRATION_NEEDED: Already has proper slot_uri + and string range is sufficient for CITES appendix values." +``` + +## Example - CORRECT Agent Behavior + +```yaml +# CORRECT - Agent executed the migration as specified +- original_slot_id: https://nde.nl/ontology/hc/slot/cites_appendix + revision: + - label: is_or_was_listed_in + type: slot + - label: CITESAppendix + type: class + processed: + status: true + timestamp: '2026-01-19T00:00:00Z' + session: session-2026-01-19-cites-appendix-migration + notes: 'Migrated 2026-01-19 per Rule 53/56. Created is_or_was_listed_in.yaml. + Created CITESAppendix.yaml class. Updated BiologicalObject.yaml. + Archived: modules/slots/archive/cites_appendix_archived_20260119.yaml.' +``` + +## Feedback Field + +The `feedback` field in slot_fixes.yaml entries contains user corrections to agent mistakes. When feedback says things like: + +- "I reject this!" +- "Conduct the migration" +- "Please conduct accordingly" +- "I altered the revision" + +This means a previous agent incorrectly deferred the migration, and it MUST now be executed. + +## Schema Consistency Examples + +### Why "Simple URI is fine" is WRONG + +```yaml +# WRONG - Agent judgment: "Simple URI is fine" +thumbnail_url: + range: uri + slot_uri: schema:thumbnailUrl + +# CORRECT - Consistent with all media references +has_or_had_thumbnail: + range: Thumbnail # Thumbnail class with has_or_had_url โ†’ URL +``` + +**Rationale**: All media references (images, thumbnails, videos, documents) should use the same structural pattern. + +### Why "Simple enum is appropriate" is WRONG + +```yaml +# WRONG - "Simple enum is fine" +thinking_mode: + range: ThinkingModeEnum # enabled, disabled, interleaved + +# CORRECT - Enables extension +has_or_had_mode: + range: ThinkingMode + # ThinkingMode can have: mode_type, confidence, effective_date, etc. +``` + +**Rationale**: Even if current use is simple, structured classes enable future extension without breaking changes. + +## Summary + +**Trust the revision. Execute the migration. Document the work.** + +The `revision` key in `slot_fixes.yaml` represents carefully curated ontology decisions. Agents are **executors** of these decisions, **not evaluators**. The only acceptable output is a completed migration with proper documentation. + +## Related Rules + +- **Rule 53**: slot_fixes.yaml is AUTHORITATIVE - Full Slot Migration +- **Rule 55**: Broaden Generic Predicate Ranges Instead of Creating Bespoke Predicates +- **Rule 57**: The revision key in slot_fixes.yaml is IMMUTABLE +- **Rule 39**: RiC-O Temporal Naming Conventions +- **Rule 38**: Slot Centralization and Semantic URI Requirements + +## Revision History + +- 2026-01-19: Strengthened with explicit prohibition of "WELL_STRUCTURED_NO_MIGRATION_NEEDED" notes +- 2026-01-16: Created based on analysis of 51 feedback entries in slot_fixes.yaml diff --git a/.opencode/rules/no-tool-specific-classes-rule.md b/.opencode/rules/no-tool-specific-classes-rule.md deleted file mode 100644 index 666225b171..0000000000 --- a/.opencode/rules/no-tool-specific-classes-rule.md +++ /dev/null @@ -1,48 +0,0 @@ -# Rule: No Tool-Specific Classes - -## Critical Convention - -Ontology classes MUST be domain concepts, not wrappers for specific software tools. - -## Rule - -1. Do not model vendor/tool names as primary class concepts. - - Reject classes like `ExaSearchMetadata`, `OpenAIFetchResult`, `ElasticsearchHit`. - -2. Model the generic domain activity or entity instead. - - Use names like `ExternalSearchMetadata`, `RetrievalActivity`, `SearchResult`. - -3. Capture tool provenance through generic slots and values. - - Use `has_tool`, `has_method`, `has_agent`, `has_note` to record implementation details. - -4. Platform custodians are allowed as domain classes. - - Classes for digital platforms that act as custodians (for example YouTube-related custodian classes) are valid. - - Operational tools used to query/process data are not valid ontology classes. - -## Rationale - -- Tool names are implementation details and change faster than domain semantics. -- Tool-specific classes reduce reuse and interoperability. -- Generic classes preserve stable meaning while still supporting full provenance. - -## Examples - -### Wrong - -```yaml -classes: - ExaSearchMetadata: - class_uri: prov:Activity -``` - -### Correct - -```yaml -classes: - ExternalSearchMetadata: - class_uri: prov:Activity - slots: - - has_tool - - has_method - - has_agent -``` diff --git a/.opencode/rules/polished-slot-storage-location.md b/.opencode/rules/polished-slot-storage-location.md index f42903560b..6a5974f265 100644 --- a/.opencode/rules/polished-slot-storage-location.md +++ b/.opencode/rules/polished-slot-storage-location.md @@ -12,7 +12,7 @@ They must **NOT** be stored in the `20260202_matang/` subdirectory. ## Rationale -The `20260202_matang/` directory and its `new/` subdirectory contain **draft/unpolished** slot definitions that are pending review. Once a slot file has been polished (ontology-aligned, translated, cleaned), it graduates to the canonical `slots/` directory. +The `new/` subdirectory contain **draft/unpolished** slot definitions that are pending review. Once a slot file has been polished (ontology-aligned, translated, cleaned), it graduates to the canonical `slots/` directory. ## Directory Structure diff --git a/frontend/public/schemas/20251121/linkml/manifest.json b/frontend/public/schemas/20251121/linkml/manifest.json index ec2ea98baa..52bc712175 100644 --- a/frontend/public/schemas/20251121/linkml/manifest.json +++ b/frontend/public/schemas/20251121/linkml/manifest.json @@ -1,5 +1,5 @@ { - "generated": "2026-02-15T15:25:32.418Z", + "generated": "2026-02-15T17:46:11.976Z", "schemaRoot": "/schemas/20251121/linkml", "totalFiles": 2369, "categoryCounts": { diff --git a/schemas/20251121/linkml/custodian_source.yaml b/schemas/20251121/linkml/custodian_source.yaml index d2f600aeec..9cbef8d2df 100644 --- a/schemas/20251121/linkml/custodian_source.yaml +++ b/schemas/20251121/linkml/custodian_source.yaml @@ -19,7 +19,7 @@ description: | - provenance: Data tier tracking and source lineage - ghcid: Global Heritage Custodian ID with history - identifiers: ISIL, Wikidata, GHCID variants - - enrichments: Google Maps, Wikidata, Genealogiewerkbalk, etc. + - enrichments: Google Maps, Wikidata, genealogy archive registries, etc. - web_claims: Extracted claims with XPath provenance - custodian_name: Consensus name determination - location: Normalized geographic data @@ -153,7 +153,7 @@ imports: - ./modules/classes/MergeNote # Dutch Enrichments Domain - ./modules/classes/ArchiveInfo - - ./modules/classes/GenealogiewerkbalkEnrichment + - ./modules/classes/GenealogyArchivesRegistryEnrichment - ./modules/classes/IsilCodeEntry - ./modules/classes/MunicipalityInfo - ./modules/classes/NanIsilEnrichment diff --git a/schemas/20251121/linkml/manifest.json b/schemas/20251121/linkml/manifest.json index 52bc712175..2b46d62e83 100644 --- a/schemas/20251121/linkml/manifest.json +++ b/schemas/20251121/linkml/manifest.json @@ -1,5 +1,5 @@ { - "generated": "2026-02-15T17:46:11.976Z", + "generated": "2026-02-15T18:20:10.034Z", "schemaRoot": "/schemas/20251121/linkml", "totalFiles": 2369, "categoryCounts": { diff --git a/schemas/20251121/linkml/modules/classes/DonationScheme.yaml b/schemas/20251121/linkml/modules/classes/DonationScheme.yaml index 5c5d55efac..2ac925fccb 100644 --- a/schemas/20251121/linkml/modules/classes/DonationScheme.yaml +++ b/schemas/20251121/linkml/modules/classes/DonationScheme.yaml @@ -34,12 +34,10 @@ default_prefix: hc classes: DonationScheme: class_uri: schema:DonateAction - description: "A donation or giving scheme offered by a heritage custodian institution.\n\n**PURPOSE**:\n\nDonationScheme provides structured representation of the various ways\nindividuals and organizations can financially support heritage institutions.\nThese range from simple one-time donations to complex membership programs,\nadoption schemes, patron circles, and legacy giving vehicles.\n\n**HERITAGE SECTOR CONTEXT**:\n\nDonation schemes are critical for heritage institution sustainability:\n\n- **Museums**: Friends schemes, patron circles, acquisition fund drives\n- **Libraries**: Adopt-a-book programs, conservation appeals\n- **Archives**: \"Adopt history\" programs, preservation sponsorships\n- **Galleries**: Artist support funds, exhibition sponsorships\n- **Historical societies**: Heritage membership, research fellowships\n- **Botanical gardens**: Plant and animal adoption programs\n\n**MULTILINGUAL TERMINOLOGY**:\n\n\"Friends\" scheme terminology varies by country:\n- Dutch:\ - \ Museumvriend, Vrienden van het museum\n- German: F\xF6rderverein, Freundeskreis\n- French: Amis du mus\xE9e, Soci\xE9t\xE9 des amis\n- Spanish: Amigos del museo\n- Italian: Amici del museo\n\n**PROVENANCE CHAIN**:\n\n```\nHeritageCustodian\n \u2502\n \u251C\u2500\u2500 offers_donation_schemes \u2500\u2500\u2192 DonationScheme[]\n \u2502 \u2502\n \u2502 \u251C\u2500\u2500 scheme_type: MEMBERSHIP_FRIENDS\n \u2502 \u251C\u2500\u2500 scheme_name: \"Rijksmuseum Vrienden\"\n \u2502 \u251C\u2500\u2500 minimum_amount: 60\n \u2502 \u251C\u2500\u2500 currency: \"EUR\"\n \u2502 \u251C\u2500\u2500 payment_frequency: \"annually\"\n \u2502 \u2502\n \u2502 \u2514\u2500\u2500 observed_in\ - \ \u2500\u2500\u2192 WebObservation\n \u2502 \u2502\n \u2502 \u251C\u2500\u2500 source_url: https://rijksmuseum.nl/steun\n \u2502 \u251C\u2500\u2500 retrieved_on: 2026-01-01T10:00:00Z\n \u2502 \u2514\u2500\u2500 extraction_confidence: 0.95\n \u2502\n \u2514\u2500\u2500 web_observations \u2500\u2500\u2192 WebObservation[] (general custodian provenance)\n```\n\n**ONTOLOGY ALIGNMENT**:\n\n- **Schema.org**: `schema:DonateAction` - Action of donating to organization\n- **Schema.org**: `schema:Offer` - Scheme as offer with price specification\n- **W3C Org**: `org:Membership` - For membership-type schemes\n- **Dublin Core**: `dcterms:isPartOf` - Scheme belongs to institution\n- **PROV-O**: `prov:wasDerivedFrom` - Links scheme to observation\n\ - \n**TAX INCENTIVE SCHEMES**:\n\nMany countries provide tax benefits for cultural donations:\n\n| Country | Scheme | Benefit |\n|---------|--------|---------|\n| Netherlands | ANBI | 100% deductible |\n| Netherlands | Cultural ANBI | 125% deductible (extra 25%) |\n| UK | Gift Aid | 25% tax reclaim for charity |\n| UK | Cultural Gifts Scheme | Tax relief on objects donated |\n| USA | 501(c)(3) | Itemized deduction |\n| Germany | Gemeinn\xFCtzigkeit | Tax deductible |\n| France | M\xE9c\xE9nat culturel | 60% tax reduction |\n\n**SCHEME CATEGORIES**:\n\nSchemes are classified via DonationSchemeTypeEnum into eight categories:\n\n1. **MEMBERSHIP_*** - Recurring membership/subscription\n - Friends, Young Friends, Family, Corporate, Research Fellow\n \n2. **PATRON_*** - High-value donor circles\n - Circle, Benefactor, Founders Circle, Life, National\n \n3. **ADOPTION_*** - Object sponsorship\n - Book, Artifact, Archive Collection, Artwork, Animal, Plant\n \n4. **LEGACY_*** - Planned/estate\ - \ giving\n - Bequest, Charitable Trust, Endowment, Named Fund\n \n5. **DONATION_*** - Direct monetary gifts\n - One-off, Recurring, Appeal, Project, Tax Incentive\n \n6. **INKIND_*** - Non-monetary contributions\n - Object, Artwork, Archive, Library Collection, Expertise, Volunteer\n \n7. **SPONSORSHIP_*** - Corporate/event support\n - Exhibition, Gallery, Event, Program, Digitization, Conservation\n \n8. **CROWDFUNDING_*** - Campaign-based collective funding\n - Acquisition, Conservation, Building, Exhibition\n\n**EXTRACTION PATTERN**:\n\nWhen extracting donation schemes from institutional websites:\n\n1. Create WebObservation for the support/donate page\n2. For each scheme found:\n - Create DonationScheme with observed_in \u2192 WebObservation\n - Classify using DonationSchemeTypeEnum\n - Extract financial details (amounts, currency, frequency)\n - List benefits provided to donors\n - Note tax deductibility and applicable schemes\n - Assign extraction_confidence\ - \ based on clarity\n\n**EXAMPLES**:\n\nSee class examples section for detailed instances.\n" + description: >- + Structured representation of an institutional giving program, including + donation type, financial thresholds, payment frequency, donor benefits, + tax treatment, provider organization, and source observation. alt_descriptions: nl: {text: Gestructureerd model van institutionele geefregelingen met bijdragevorm, voordelen, voorwaarden en toezicht., language: nl} de: {text: Strukturiertes Modell institutioneller Spendenprogramme mit Beitragsform, Vorteilen, Bedingungen und Aufsicht., language: de} @@ -53,6 +51,7 @@ classes: de: [{literal_form: Spendenprogramm, language: de}] fr: [{literal_form: dispositif de don, language: fr}] es: [{literal_form: esquema de donacion, language: es}] + it: [{literal_form: programma di donazione, language: it}] ar: [{literal_form: ุจุฑู†ุงู…ุฌ ุชุจุฑุน, language: ar}] id: [{literal_form: skema donasi, language: id}] zh: [{literal_form: ๆ่ต ่ฎกๅˆ’, language: zh}] @@ -98,6 +97,8 @@ classes: has_type: required: true range: DonationSchemeTypeEnum + description: Classification for the scheme modality, including membership, patron, + adoption, legacy, direct donation, in-kind, sponsorship, and crowdfunding families. examples: - value: MEMBERSHIP_FRIENDS - value: ADOPTION_BOOK @@ -151,6 +152,7 @@ classes: - value: Bookplate with donor name offered_by: required: true + description: Custodian organization that publishes and administers the scheme. # range: string # uriorcurie examples: - value: https://nde.nl/ontology/hc/custodian/nl/rijksmuseum @@ -173,6 +175,7 @@ classes: range: TaxScheme multivalued: true inlined_as_list: true + description: Applicable fiscal framework for deductibility or tax relief. examples: - value: has_type: ANBI @@ -206,11 +209,14 @@ classes: - has_percentage: observed_in: required: true + description: Source observation used to extract and verify scheme information. # range: string # uriorcurie examples: - value: https://nde.nl/ontology/hc/observation/web/2026-01-01/rijksmuseum-support comments: - Each scheme links to WebObservation for full provenance chain + - Common domains include museum friends programs, archive adoption campaigns, and library conservation support + - Capture payment rhythm and thresholds as structured values, not embedded narrative - Tax deductibility varies by jurisdiction - always document regulated_by_scheme - Benefits should be extracted as discrete items for comparison - Tiered schemes (e.g., Silver/Gold/Platinum) are separate DonationScheme instances diff --git a/schemas/20251121/linkml/modules/classes/FundingCall.yaml b/schemas/20251121/linkml/modules/classes/FundingCall.yaml index c1dc4f22aa..9ef034286a 100644 --- a/schemas/20251121/linkml/modules/classes/FundingCall.yaml +++ b/schemas/20251121/linkml/modules/classes/FundingCall.yaml @@ -1,19 +1,43 @@ id: https://nde.nl/ontology/hc/class/FundingCall name: FundingCall title: Funding Call -description: A call for applications for funding. MIGRATED from funding_call slot per Rule 53. Follows CallForApplication class (schema:Offer). prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ schema: http://schema.org/ +default_prefix: hc imports: - linkml:types -default_prefix: hc + - ../classes/CallForApplication classes: FundingCall: + class_uri: hc:FundingCall is_a: CallForApplication - class_uri: schema:Offer - annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + description: >- + Public invitation that opens a defined application window for submitting + proposals to a specific funding opportunity. + alt_descriptions: + nl: Openbare oproep die een afgebakende indieningsperiode opent voor voorstellen binnen een specifieke financieringskans. + de: Oeffentliche Ausschreibung mit festem Einreichungszeitraum fuer Antraege auf eine bestimmte Foerdermoeglichkeit. + fr: Appel public ouvrant une periode definie pour soumettre des propositions a une opportunite de financement specifique. + es: Convocatoria publica que abre un periodo definido para presentar propuestas a una oportunidad de financiacion concreta. + ar: ุฏุนูˆุฉ ุนุงู…ุฉ ุชูุชุญ ูุชุฑุฉ ุชู‚ุฏูŠู… ู…ุญุฏุฏุฉ ู„ุฅุฑุณุงู„ ู…ู‚ุชุฑุญุงุช ู„ูุฑุตุฉ ุชู…ูˆูŠู„ ู…ุนูŠู†ุฉ. + id: Undangan publik yang membuka jendela aplikasi terdefinisi untuk pengajuan proposal pada peluang pendanaan tertentu. + zh: ไธบ็‰นๅฎš่ต„ๅŠฉๆœบไผšๅผ€ๅฏๆ˜Ž็กฎ็”ณๆŠฅๆœŸ็š„ๅ…ฌๅผ€ๅพ้›†้€š็Ÿฅใ€‚ + structured_aliases: + - literal_form: financieringsoproep + in_language: nl + - literal_form: Foerderaufruf + in_language: de + - literal_form: appel de financement + in_language: fr + - literal_form: convocatoria de financiacion + in_language: es + - literal_form: ุฏุนูˆุฉ ุชู…ูˆูŠู„ + in_language: ar + - literal_form: panggilan pendanaan + in_language: id + - literal_form: ่ต„ๅŠฉๅพ้›† + in_language: zh + broad_mappings: + - schema:Offer diff --git a/schemas/20251121/linkml/modules/classes/FundingFocus.yaml b/schemas/20251121/linkml/modules/classes/FundingFocus.yaml index 7d3024a334..dbe7daec7d 100644 --- a/schemas/20251121/linkml/modules/classes/FundingFocus.yaml +++ b/schemas/20251121/linkml/modules/classes/FundingFocus.yaml @@ -1,23 +1,46 @@ id: https://nde.nl/ontology/hc/class/FundingFocus name: FundingFocus title: Funding Focus -description: A thematic focus or priority area for funding. MIGRATED from funding_focus slot per Rule 53. Follows skos:Concept. prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ skos: http://www.w3.org/2004/02/skos/core# +default_prefix: hc imports: - linkml:types - ../slots/has_description - ../slots/has_label -default_prefix: hc classes: FundingFocus: - class_uri: skos:Concept + class_uri: hc:FundingFocus + description: >- + Thematic priority category used to target funding toward specific policy, + research, or societal objectives. + alt_descriptions: + nl: Thematische prioriteitscategorie die financiering richt op specifieke beleids-, onderzoeks- of maatschappelijke doelen. + de: Thematische Prioritaetskategorie zur Ausrichtung von Foerdermitteln auf bestimmte politische, wissenschaftliche oder gesellschaftliche Ziele. + fr: Categorie de priorite thematique orientant le financement vers des objectifs politiques, de recherche ou societaux specifie. + es: Categoria de prioridad tematica que orienta la financiacion hacia objetivos politicos, de investigacion o sociales especificos. + ar: ูุฆุฉ ุฃูˆู„ูˆูŠุฉ ู…ูˆุถูˆุนูŠุฉ ู„ุชูˆุฌูŠู‡ ุงู„ุชู…ูˆูŠู„ ู†ุญูˆ ุฃู‡ุฏุงู ุณูŠุงุณุงุชูŠุฉ ุฃูˆ ุจุญุซูŠุฉ ุฃูˆ ู…ุฌุชู…ุนูŠุฉ ู…ุญุฏุฏุฉ. + id: Kategori prioritas tematik yang mengarahkan pendanaan ke tujuan kebijakan, riset, atau sosial tertentu. + zh: ็”จไบŽๅฐ†่ต„ๅŠฉๅฏผๅ‘็‰นๅฎšๆ”ฟ็ญ–ใ€็ ”็ฉถๆˆ–็คพไผš็›ฎๆ ‡็š„ไธป้ข˜ไผ˜ๅ…ˆ็ฑปๅˆซใ€‚ + structured_aliases: + - literal_form: financieringsfocus + in_language: nl + - literal_form: Foerderschwerpunkt + in_language: de + - literal_form: axe de financement + in_language: fr + - literal_form: enfoque de financiacion + in_language: es + - literal_form: ู…ุญูˆุฑ ุงู„ุชู…ูˆูŠู„ + in_language: ar + - literal_form: fokus pendanaan + in_language: id + - literal_form: ่ต„ๅŠฉ้‡็‚น + in_language: zh slots: - - has_label - - has_description - annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + - has_label + - has_description + broad_mappings: + - skos:Concept diff --git a/schemas/20251121/linkml/modules/classes/FundingProgram.yaml b/schemas/20251121/linkml/modules/classes/FundingProgram.yaml index f5a10277e5..5d34cbde67 100644 --- a/schemas/20251121/linkml/modules/classes/FundingProgram.yaml +++ b/schemas/20251121/linkml/modules/classes/FundingProgram.yaml @@ -1,26 +1,50 @@ id: https://nde.nl/ontology/hc/class/FundingProgram name: FundingProgram title: Funding Program -description: A program that provides funding, grants, or subsidies. MIGRATED from funding_program slot per Rule 53. Follows frapo:FundingProgramme. prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ - frapo: http://purl.org/cerif/frapo/ - skos: http://www.w3.org/2004/02/skos/core# + schema: http://schema.org/ +default_prefix: hc imports: - linkml:types - ../slots/has_description - ../slots/has_label - ../slots/targeted_at -default_prefix: hc classes: FundingProgram: - class_uri: frapo:FundingProgramme + class_uri: hc:FundingProgram + description: >- + Structured funding framework that groups related calls, budget lines, and + eligibility logic under a shared strategic objective. + alt_descriptions: + nl: Gestructureerd financieringskader dat verwante oproepen, budgetlijnen en subsidieregels bundelt onder een gedeeld strategisch doel. + de: Strukturiertes Foerderprogramm, das zusammenhaengende Ausschreibungen, Budgetlinien und Foerderlogiken unter einem gemeinsamen strategischen Ziel buendelt. + fr: Cadre de financement structure regroupant appels, lignes budgetaires et regles d'eligibilite autour d'un objectif strategique commun. + es: Marco de financiacion estructurado que agrupa convocatorias, lineas presupuestarias y logica de elegibilidad bajo un objetivo estrategico comun. + ar: ุฅุทุงุฑ ุชู…ูˆูŠู„ูŠ ู…ู†ุธู… ูŠุฌู…ุน ุงู„ุฏุนูˆุงุช ูˆุฎุทูˆุท ุงู„ู…ูŠุฒุงู†ูŠุฉ ูˆู…ู†ุทู‚ ุงู„ุฃู‡ู„ูŠุฉ ุถู…ู† ู‡ุฏู ุงุณุชุฑุงุชูŠุฌูŠ ู…ุดุชุฑูƒ. + id: Kerangka pendanaan terstruktur yang mengelompokkan panggilan, lini anggaran, dan logika kelayakan di bawah tujuan strategis bersama. + zh: ๅœจๅ…ฑๅŒๆˆ˜็•ฅ็›ฎๆ ‡ไธ‹ๆ•ดๅˆ็›ธๅ…ณๅพ้›†ใ€้ข„็ฎ—ๆก็บฟไธŽ่ต„ๆ ผ้€ป่พ‘็š„็ป“ๆž„ๅŒ–่ต„ๅŠฉๆก†ๆžถใ€‚ + structured_aliases: + - literal_form: financieringsprogramma + in_language: nl + - literal_form: Foerderprogramm + in_language: de + - literal_form: programme de financement + in_language: fr + - literal_form: programa de financiacion + in_language: es + - literal_form: ุจุฑู†ุงู…ุฌ ุชู…ูˆูŠู„ + in_language: ar + - literal_form: program pendanaan + in_language: id + - literal_form: ่ต„ๅŠฉ่ฎกๅˆ’ + in_language: zh slots: - - has_label - - has_description - - targeted_at - annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + - has_label + - has_description + - targeted_at + broad_mappings: + - schema:FundingScheme + close_mappings: + - schema:Grant diff --git a/schemas/20251121/linkml/modules/classes/FundingRate.yaml b/schemas/20251121/linkml/modules/classes/FundingRate.yaml index a674a9e1e0..20169eee10 100644 --- a/schemas/20251121/linkml/modules/classes/FundingRate.yaml +++ b/schemas/20251121/linkml/modules/classes/FundingRate.yaml @@ -1,23 +1,46 @@ id: https://nde.nl/ontology/hc/class/FundingRate name: FundingRate title: Funding Rate -description: The rate or percentage of funding provided. MIGRATED from funding_rate slot per Rule 53. Follows schema:MonetaryAmount or Percentage. prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ schema: http://schema.org/ +default_prefix: hc imports: - linkml:types - ../slots/has_rate - ../slots/maximum_of_maximum -default_prefix: hc classes: FundingRate: - class_uri: schema:MonetaryAmount + class_uri: hc:FundingRate + description: >- + Quantified proportion or cap that determines the share of eligible costs + covered by a funding instrument. + alt_descriptions: + nl: Gekwantificeerd percentage of plafond dat bepaalt welk deel van subsidiabele kosten wordt gedekt. + de: Quantifizierter Satz oder Hoechstwert, der den Anteil foerderfaehiger Kosten festlegt. + fr: Proportion ou plafond quantifie determinent la part des couts eligibles couverte par le financement. + es: Proporcion o tope cuantificado que determina la parte de costos elegibles cubierta por la financiacion. + ar: ู†ุณุจุฉ ุฃูˆ ุณู‚ู ูƒู…ูŠ ูŠุญุฏุฏ ุญุตุฉ ุงู„ุชูƒุงู„ูŠู ุงู„ู…ุคู‡ู„ุฉ ุงู„ุชูŠ ูŠุบุทูŠู‡ุง ุงู„ุชู…ูˆูŠู„. + id: Proporsi atau batas kuantitatif yang menentukan porsi biaya layak yang ditanggung instrumen pendanaan. + zh: ๅ†ณๅฎšๅฏ่ต„ๅŠฉๆˆๆœฌ่ฆ†็›–ๆฏ”ไพ‹ๆˆ–ไธŠ้™็š„้‡ๅŒ–ๆฏ”็އๆŒ‡ๆ ‡ใ€‚ + structured_aliases: + - literal_form: financieringspercentage + in_language: nl + - literal_form: Foerdersatz + in_language: de + - literal_form: taux de financement + in_language: fr + - literal_form: tasa de financiacion + in_language: es + - literal_form: ู…ุนุฏู„ ุงู„ุชู…ูˆูŠู„ + in_language: ar + - literal_form: tingkat pendanaan + in_language: id + - literal_form: ่ต„ๅŠฉๆฏ”ไพ‹ + in_language: zh slots: - - has_rate - - maximum_of_maximum - annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + - has_rate + - maximum_of_maximum + broad_mappings: + - schema:MonetaryAmount diff --git a/schemas/20251121/linkml/modules/classes/FundingRequirement.yaml b/schemas/20251121/linkml/modules/classes/FundingRequirement.yaml index d3f67522b3..e953a32912 100644 --- a/schemas/20251121/linkml/modules/classes/FundingRequirement.yaml +++ b/schemas/20251121/linkml/modules/classes/FundingRequirement.yaml @@ -1,17 +1,15 @@ id: https://nde.nl/ontology/hc/class/FundingRequirement name: FundingRequirement -title: FundingRequirement Class +title: Funding Requirement prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ - schema: http://schema.org/ dcterms: http://purl.org/dc/terms/ + schema: http://schema.org/ prov: http://www.w3.org/ns/prov# - pav: http://purl.org/pav/ - skos: http://www.w3.org/2004/02/skos/core# +default_prefix: hc imports: - linkml:types - - ../enums/FundingRequirementTypeEnum - ../slots/apply_to - ../slots/has_note - ../slots/has_score @@ -25,193 +23,77 @@ imports: - ../slots/in_section - ../slots/supersede - ../slots/temporal_extent -default_prefix: hc classes: FundingRequirement: - class_uri: dcterms:Standard - description: "A requirement or criterion that applicants must meet to be eligible for\na funding call. Each requirement is tracked with provenance linking to\nthe source document where it was stated.\n\n**PURPOSE**:\n\nFundingRequirement provides structured, machine-readable representation\nof funding call eligibility criteria. Instead of storing requirements as\nfree-text lists in CallForApplication, each requirement becomes a\ntrackable entity with:\n\n- **Classification**: Categorized by FundingRequirementTypeEnum\n- **Provenance**: Linked to WebObservation documenting source\n- **Values**: Machine-readable value + human-readable text\n- **Temporality**: Valid date range for time-scoped requirements\n\n**PROVENANCE CHAIN**:\n\n```\nCallForApplication\n \u2502\n \u251C\u2500\u2500 requirements \u2500\u2500\u2192 FundingRequirement[]\n \u2502 \u2502\n \u2502 \u251C\u2500\u2500 requirement_type: PARTNERSHIP_MINIMUM_PARTNERS\n\ - \ \u2502 \u251C\u2500\u2500 requirement_text: \"At least 3 partners from 3 EU countries\"\n \u2502 \u251C\u2500\u2500 requirement_value: \"3\"\n \u2502 \u251C\u2500\u2500 requirement_unit: \"partners\"\n \u2502 \u2502\n \u2502 \u2514\u2500\u2500 observed_in \u2500\u2500\u2192 WebObservation\n \u2502 \u2502\n \u2502 \u251C\u2500\u2500 source_url: https://ec.europa.eu/...\n \u2502 \u251C\u2500\u2500 retrieved_on: 2025-11-29T10:30:00Z\n \u2502 \u2514\u2500\u2500 extraction_confidence: 0.95\n \u2502\n \u2514\u2500\u2500 web_observations \u2500\u2500\u2192 WebObservation[] (general call provenance)\n```\n\n**ONTOLOGY\ - \ ALIGNMENT**:\n\n- **Dublin Core**: `dcterms:Standard` - \"A reference point against which\n other things can be evaluated\" (requirements are standards for eligibility)\n- **Dublin Core**: `dcterms:requires` - Relates call to requirement\n- **Dublin Core**: `dcterms:conformsTo` - Applicants must conform to requirements\n- **Schema.org**: `schema:eligibleRegion` - For geographic requirements\n- **Schema.org**: `schema:eligibleQuantity` - For numeric constraints\n- **PROV-O**: `prov:wasDerivedFrom` - Links requirement to observation\n\n**REQUIREMENT CATEGORIES**:\n\nRequirements are classified into six main categories via FundingRequirementTypeEnum:\n\n1. **Eligibility** (ELIGIBILITY_*): Who can apply\n - Geographic: EU Member States, Associated Countries\n - Organizational: Non-profit, public body, SME\n - Heritage type: Museums, archives, libraries\n - Experience: Track record, previous projects\n\n2. **Financial** (FINANCIAL_*): Budget and funding\n - Co-funding: Match\ - \ funding percentages\n - Budget limits: Minimum/maximum grant size\n - Funding rate: Percentage of eligible costs\n - Eligible costs: What can be funded\n\n3. **Partnership** (PARTNERSHIP_*): Consortium requirements\n - Minimum partners: Number required\n - Country diversity: Geographic spread\n - Sector mix: Organisation types needed\n - Coordinator: Lead partner constraints\n\n4. **Thematic** (THEMATIC_*): Topic and scope\n - Focus area: Required research/action themes\n - Heritage scope: Types of heritage addressed\n - Geographic scope: Where activities occur\n\n5. **Technical** (TECHNICAL_*): Outputs and approach\n - Deliverables: Required outputs\n - Open access: Publication requirements\n - Duration: Project length constraints\n - Methodology: Required approaches\n\n6. **Administrative** (ADMINISTRATIVE_*): Process requirements\n - Registration: Portal accounts needed\n - Documentation: Supporting documents\n - Language: Submission language\n\ - \ - Format: Templates and page limits\n\n**TEMPORAL TRACKING**:\n\nRequirements can change between call publications. The `supersedes` field\nlinks to previous versions, and `valid_from`/`valid_to` scope applicability:\n\n```\nFundingRequirement (current)\n \u2502\n \u251C\u2500\u2500 valid_from: 2025-01-15\n \u251C\u2500\u2500 requirement_value: \"3\" (minimum partners)\n \u2502\n \u2514\u2500\u2500 supersedes \u2500\u2500\u2192 FundingRequirement (previous)\n \u2502\n \u251C\u2500\u2500 valid_from: 2024-01-15\n \u251C\u2500\u2500 valid_to: 2025-01-14\n \u2514\u2500\u2500 requirement_value: \"4\" (was 4 partners)\n```\n\n**EXTRACTION PATTERN**:\n\nWhen extracting requirements from web sources:\n\n1. Create WebObservation for the source page\n2. For each requirement found:\n - Create FundingRequirement with observed_in \u2192 WebObservation\n\ - \ - Classify using FundingRequirementTypeEnum\n - Extract machine-readable value and unit\n - Record source_section for traceability\n - Assign extraction_confidence based on clarity\n\n**EXAMPLES**:\n\n1. **Partnership Requirement**\n - requirement_type: PARTNERSHIP_MINIMUM_PARTNERS\n - requirement_text: \"Minimum 3 independent legal entities from 3 different EU Member States\"\n - requirement_value: \"3\"\n - requirement_unit: \"partners\"\n - is_mandatory: true\n \n2. **Financial Requirement**\n - requirement_type: FINANCIAL_COFUNDING\n - requirement_text: \"Co-funding of minimum 25% from non-EU sources required\"\n - requirement_value: \"25\"\n - requirement_unit: \"percent\"\n - is_mandatory: true\n \n3. **Open Access Requirement**\n - requirement_type: TECHNICAL_OPEN_ACCESS\n - requirement_text: \"All peer-reviewed publications must be open access (Plan S compliant)\"\n - requirement_value: \"immediate\"\n - is_mandatory: true\n" - exact_mappings: - - dcterms:Standard - close_mappings: - - schema:QuantitativeValue - - skos:Concept - related_mappings: - - dcterms:requires - - dcterms:conformsTo - - schema:eligibleRegion - - schema:eligibleQuantity - - prov:wasDerivedFrom + class_uri: hc:FundingRequirement + description: >- + Eligibility or compliance criterion that must be satisfied for a proposal + to qualify under a specific funding call. + alt_descriptions: + nl: Subsidiabiliteits- of nalevingscriterium waaraan een voorstel moet voldoen om in aanmerking te komen binnen een specifieke oproep. + de: Eignungs- oder Compliance-Kriterium, das fuer die Foerderfaehigkeit eines Antrags in einem bestimmten Aufruf erfuellt sein muss. + fr: Critere d'eligibilite ou de conformite devant etre satisfait pour qu'une proposition soit recevable dans un appel donne. + es: Criterio de elegibilidad o cumplimiento que debe satisfacerse para que una propuesta califique en una convocatoria especifica. + ar: ู…ุนูŠุงุฑ ุฃู‡ู„ูŠุฉ ุฃูˆ ุงู…ุชุซุงู„ ูŠุฌุจ ุงุณุชูŠูุงุคู‡ ู„ูƒูŠ ูŠุชุฃู‡ู„ ุงู„ู…ู‚ุชุฑุญ ุถู…ู† ุฏุนูˆุฉ ุชู…ูˆูŠู„ ู…ุญุฏุฏุฉ. + id: Kriteria kelayakan atau kepatuhan yang harus dipenuhi agar proposal memenuhi syarat pada panggilan pendanaan tertentu. + zh: ๅœจ็‰นๅฎš่ต„ๅŠฉๅพ้›†ไธญ๏ผŒๆๆกˆๅฟ…้กปๆปก่ถณ็š„่ต„ๆ ผๆˆ–ๅˆ่ง„ๆกไปถใ€‚ + structured_aliases: + - literal_form: financieringsvoorwaarde + in_language: nl + - literal_form: Foerdervoraussetzung + in_language: de + - literal_form: condition de financement + in_language: fr + - literal_form: requisito de financiacion + in_language: es + - literal_form: ุดุฑุท ุงู„ุชู…ูˆูŠู„ + in_language: ar + - literal_form: persyaratan pendanaan + in_language: id + - literal_form: ่ต„ๅŠฉ่ฆๆฑ‚ + in_language: zh slots: - - apply_to - - has_note - - mandatory - - observed_in - - identified_by - - has_text - - has_type - - has_type - - has_measurement_unit - - has_value - - in_section - - supersede - - has_score - - temporal_extent + - apply_to + - has_note + - mandatory + - observed_in + - identified_by + - has_text + - has_type + - has_measurement_unit + - has_value + - in_section + - supersede + - has_score + - temporal_extent slot_usage: identified_by: identifier: true required: true -# range: string # uriorcurie - pattern: ^https://nde\.nl/ontology/hc/requirement/[a-z0-9-]+/[a-z0-9-]+$ - examples: - - value: https://nde.nl/ontology/hc/requirement/ec-cl2-2025-heritage-01/min-partners-3 - - value: https://nde.nl/ontology/hc/requirement/nlhf-medium-2025/cofunding-25pct - has_type: - required: false - range: FundingRequirementTypeEnum - deprecated: 'DEPRECATED 2026-01-13: Use has_type with RequirementType class instead' - examples: - - value: PARTNERSHIP_MINIMUM_PARTNERS - - value: FINANCIAL_COFUNDING - - value: ELIGIBILITY_GEOGRAPHIC - has_type: - required: true - range: RequirementType - examples: - - value: - has_code: PARTNERSHIP_MINIMUM_PARTNERS - has_label: - - Minimum partners requirement@en - - value: - has_code: FINANCIAL_COFUNDING - has_label: - - Co-funding requirement@en has_text: required: true -# range: string - examples: - - value: Minimum 3 independent legal entities from 3 different EU Member States or Horizon Europe Associated Countries - - value: Applications must demonstrate at least 25% co-funding from non-EU sources - has_value: -# range: string - examples: - - value: '3' - - value: '25' - - value: eu-member-states - - value: immediate - has_measurement_unit: -# range: string - examples: - - value: partners - - value: percent - - value: EUR - - value: months - - value: countries mandatory: range: boolean ifabsent: 'true' - examples: - - value: true - description: 'Mandatory: must meet to be eligible' - - value: false - description: 'Optional: preferred but not required' observed_in: required: true -# range: string # uriorcurie - examples: - - value: https://nde.nl/ontology/hc/observation/web/2025-11-29/eu-horizon-cl2-heritage - in_section: -# range: string - examples: - - value: Section 2.1 - Eligibility Criteria - - value: 'FAQ #7 - Consortium composition' - - value: Work Programme page 45 - supersede: -# range: string # uriorcurie - examples: - - value: https://nde.nl/ontology/hc/requirement/ec-cl2-2024-heritage-01/min-partners-4 - comments: - - Each requirement links to WebObservation for full provenance chain - - requirement_value + requirement_unit enable structured queries - - is_mandatory defaults to true; explicitly set false for optional requirements - - supersedes_or_superseded creates version chain for requirement changes - - extraction_confidence can differ from observation confidence - see_also: - - https://dublincore.org/specifications/dublin-core/dcmi-terms/#Standard - - https://schema.org/QuantitativeValue - - https://www.w3.org/TR/prov-o/#Entity - - http://purl.org/pav/ examples: - - value: - requirement_id: https://nde.nl/ontology/hc/requirement/ec-cl2-2025-heritage-01/min-partners-3-countries - requirement_type: PARTNERSHIP_MINIMUM_PARTNERS - requirement_text: Proposals must be submitted by a consortium of at least 3 independent legal entities established in 3 different EU Member States or Horizon Europe Associated Countries. - requirement_value: '3' - requirement_unit: partners - is_mandatory: true - apply_to: https://nde.nl/ontology/hc/call/ec/cl2-2025-heritage-01 - observed_in: https://nde.nl/ontology/hc/observation/web/2025-11-29/eu-horizon-cl2-heritage - source_section: Section 2 - Eligibility Conditions - has_score: - has_score: 0.98 - has_note: Clear statement in eligibility section. Standard Horizon Europe RIA requirement. - - value: - requirement_id: https://nde.nl/ontology/hc/requirement/ec-cl2-2025-heritage-01/cofunding-for-profit - requirement_type: FINANCIAL_COFUNDING - requirement_text: For-profit entities receive 70% funding rate. The remaining 30% must be covered by co-funding or own resources. - requirement_value: '30' - requirement_unit: percent - is_mandatory: true - apply_to: https://nde.nl/ontology/hc/call/ec/cl2-2025-heritage-01 - observed_in: https://nde.nl/ontology/hc/observation/web/2025-11-29/eu-horizon-cl2-heritage - source_section: Section 3 - Financial Conditions - has_score: - has_score: 0.95 - has_note: Applies only to for-profit partners. Non-profits receive 100% funding. - - value: - requirement_id: https://nde.nl/ontology/hc/requirement/ec-cl2-2025-heritage-01/open-access - requirement_type: TECHNICAL_OPEN_ACCESS - requirement_text: Beneficiaries must ensure open access to peer-reviewed scientific publications under the conditions required by the Grant Agreement. Immediate open access is mandatory (no embargo period). - requirement_value: immediate - requirement_unit: null - is_mandatory: true - apply_to: https://nde.nl/ontology/hc/call/ec/cl2-2025-heritage-01 - observed_in: https://nde.nl/ontology/hc/observation/web/2025-11-29/eu-horizon-cl2-heritage - source_section: Section 4.2 - Open Science - has_score: - has_score: 0.99 - has_note: Standard Horizon Europe open access requirement. Plan S compliant. - - value: - requirement_id: https://nde.nl/ontology/hc/requirement/nlhf-medium-2025/uk-based - requirement_type: ELIGIBILITY_GEOGRAPHIC - requirement_text: Your organisation must be based in the UK (England, Northern Ireland, Scotland or Wales). Projects must take place in the UK. - requirement_value: UK - requirement_unit: country - is_mandatory: true - apply_to: https://nde.nl/ontology/hc/call/nlhf/medium-grants-2025-q4 - observed_in: https://nde.nl/ontology/hc/observation/web/2025-11-28/nlhf-medium-grants - source_section: Eligibility - has_score: - has_score: 0.99 - has_note: Clear UK-only restriction. Devolved nations explicitly included. - - value: - requirement_id: https://nde.nl/ontology/hc/requirement/nlhf-medium-2025/non-profit - requirement_type: ELIGIBILITY_ORGANIZATIONAL - requirement_text: We can fund not-for-profit organisations, including charities, community groups, local authorities, and social enterprises. Private individuals and for-profit companies are not eligible. - requirement_value: non-profit - requirement_unit: organization-type - is_mandatory: true - apply_to: https://nde.nl/ontology/hc/call/nlhf/medium-grants-2025-q4 - observed_in: https://nde.nl/ontology/hc/observation/web/2025-11-28/nlhf-medium-grants - source_section: Who can apply - has_score: - has_score: 0.95 - has_note: Explicitly excludes for-profit. Social enterprises may need verification. - annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + - value: + identified_by: https://nde.nl/ontology/hc/requirement/ec-call/minimum-partners + has_text: Minimum 3 independent legal entities from 3 different countries. + has_value: '3' + has_measurement_unit: partners + mandatory: true + description: Consortium size threshold requirement + - value: + identified_by: https://nde.nl/ontology/hc/requirement/ec-call/open-access + has_text: Immediate open access publication is required. + mandatory: true + description: Technical dissemination requirement + exact_mappings: + - dcterms:Standard + related_mappings: + - dcterms:requires + - dcterms:conformsTo + - schema:eligibleQuantity + - prov:wasDerivedFrom diff --git a/schemas/20251121/linkml/modules/classes/FundingScheme.yaml b/schemas/20251121/linkml/modules/classes/FundingScheme.yaml index 15dc1c83f3..0a4156cdce 100644 --- a/schemas/20251121/linkml/modules/classes/FundingScheme.yaml +++ b/schemas/20251121/linkml/modules/classes/FundingScheme.yaml @@ -1,30 +1,46 @@ id: https://nde.nl/ontology/hc/class/FundingScheme name: FundingScheme title: Funding Scheme -description: A scheme or program providing funding. MIGRATED from funding_scheme slot per Rule 53. Follows schema:FundingScheme. prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ schema: http://schema.org/ - dcterms: http://purl.org/dc/terms/ - prov: http://www.w3.org/ns/prov# - crm: http://www.cidoc-crm.org/cidoc-crm/ - skos: http://www.w3.org/2004/02/skos/core# - rdfs: http://www.w3.org/2000/01/rdf-schema# - org: http://www.w3.org/ns/org# - xsd: http://www.w3.org/2001/XMLSchema# +default_prefix: hc imports: - linkml:types - ../slots/has_description - ../slots/has_label -default_prefix: hc classes: FundingScheme: - class_uri: schema:FundingScheme + class_uri: hc:FundingScheme + description: >- + Rule-governed financing arrangement defining how resources are allocated, + evaluated, and distributed to eligible applicants. + alt_descriptions: + nl: Regeling met regels voor toewijzing, beoordeling en uitkering van middelen aan subsidiabele aanvragers. + de: Regelgebundene Finanzierungsregelung zur Zuweisung, Bewertung und Verteilung von Mitteln an foerderfaehige Antragstellende. + fr: Dispositif de financement reglemente definissant l'allocation, l'evaluation et la distribution des ressources aux candidats eligibles. + es: Esquema de financiacion con reglas que define asignacion, evaluacion y distribucion de recursos a solicitantes elegibles. + ar: ู†ุธุงู… ุชู…ูˆูŠู„ูŠ ู‚ุงุฆู… ุนู„ู‰ ู‚ูˆุงุนุฏ ูŠุญุฏุฏ ูƒูŠููŠุฉ ุชุฎุตูŠุต ุงู„ู…ูˆุงุฑุฏ ูˆุชู‚ูŠูŠู…ู‡ุง ูˆุชูˆุฒูŠุนู‡ุง ุนู„ู‰ ุงู„ู…ุชู‚ุฏู…ูŠู† ุงู„ู…ุคู‡ู„ูŠู†. + id: Skema pembiayaan berbasis aturan yang menentukan alokasi, evaluasi, dan distribusi sumber daya kepada pelamar yang memenuhi syarat. + zh: ๅฎšไน‰่ต„ๆบๅˆ†้…ใ€่ฏ„ๅฎกไธŽๅ‘ๆ”พ็ป™ๅˆๆ ผ็”ณ่ฏท่€…ๆ–นๅผ็š„่ง„ๅˆ™ๅŒ–่ต„ๅŠฉๆœบๅˆถใ€‚ + structured_aliases: + - literal_form: financieringsregeling + in_language: nl + - literal_form: Foerderschema + in_language: de + - literal_form: dispositif de financement + in_language: fr + - literal_form: esquema de financiacion + in_language: es + - literal_form: ู…ุฎุทุท ุงู„ุชู…ูˆูŠู„ + in_language: ar + - literal_form: skema pendanaan + in_language: id + - literal_form: ่ต„ๅŠฉๆœบๅˆถ + in_language: zh slots: - - has_label - - has_description - annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + - has_label + - has_description + broad_mappings: + - schema:FundingScheme diff --git a/schemas/20251121/linkml/modules/classes/FundingSource.yaml b/schemas/20251121/linkml/modules/classes/FundingSource.yaml index 26a927ed7d..3db14d443f 100644 --- a/schemas/20251121/linkml/modules/classes/FundingSource.yaml +++ b/schemas/20251121/linkml/modules/classes/FundingSource.yaml @@ -1,33 +1,48 @@ id: https://nde.nl/ontology/hc/class/FundingSource name: FundingSource title: Funding Source -description: A source of funding, such as an organization or grant program. MIGRATED from funding_source slot per Rule 53. Follows frapo:FundingAgency. prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ schema: http://schema.org/ - frapo: http://purl.org/cerif/frapo/ - skos: http://www.w3.org/2004/02/skos/core# - dcterms: http://purl.org/dc/terms/ - prov: http://www.w3.org/ns/prov# - crm: http://www.cidoc-crm.org/cidoc-crm/ - rdfs: http://www.w3.org/2000/01/rdf-schema# - org: http://www.w3.org/ns/org# - xsd: http://www.w3.org/2001/XMLSchema# +default_prefix: hc imports: - linkml:types - ../slots/has_description - ../slots/has_label - ../slots/has_type -default_prefix: hc classes: FundingSource: - class_uri: frapo:FundingAgency + class_uri: hc:FundingSource + description: >- + Originating organization or mechanism from which financial support is + provided. + alt_descriptions: + nl: Organisatie of mechanisme van waaruit financiele ondersteuning afkomstig is. + de: Herkunftsorganisation oder Mechanismus, aus dem finanzielle Unterstuetzung bereitgestellt wird. + fr: Organisation ou mecanisme d'origine a partir duquel le soutien financier est fourni. + es: Organizacion o mecanismo de origen desde el cual se proporciona apoyo financiero. + ar: ุงู„ุฌู‡ุฉ ุฃูˆ ุงู„ุขู„ูŠุฉ ุงู„ู…ุตุฏู‘ูุฑุฉ ุงู„ุชูŠ ูŠูู‚ุฏู‘ูŽู… ู…ู†ู‡ุง ุงู„ุฏุนู… ุงู„ู…ุงู„ูŠ. + id: Organisasi atau mekanisme asal dari mana dukungan finansial diberikan. + zh: ๆไพ›่ต„้‡‘ๆ”ฏๆŒ็š„ๆฅๆบ็ป„็ป‡ๆˆ–ๆœบๅˆถใ€‚ + structured_aliases: + - literal_form: financieringsbron + in_language: nl + - literal_form: Finanzierungsquelle + in_language: de + - literal_form: source de financement + in_language: fr + - literal_form: fuente de financiacion + in_language: es + - literal_form: ู…ุตุฏุฑ ุงู„ุชู…ูˆูŠู„ + in_language: ar + - literal_form: sumber pendanaan + in_language: id + - literal_form: ่ต„้‡‘ๆฅๆบ + in_language: zh slots: - - has_label - - has_description - - has_type - annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + - has_label + - has_description + - has_type + broad_mappings: + - schema:Organization diff --git a/schemas/20251121/linkml/modules/classes/Fylkesarkiv.yaml b/schemas/20251121/linkml/modules/classes/Fylkesarkiv.yaml index 580e443bf4..865bb4472b 100644 --- a/schemas/20251121/linkml/modules/classes/Fylkesarkiv.yaml +++ b/schemas/20251121/linkml/modules/classes/Fylkesarkiv.yaml @@ -1,18 +1,46 @@ id: https://w3id.org/nde/ontology/Fylkesarkiv name: Fylkesarkiv -title: Fylkesarkiv (Norwegian County Archive) +title: Fylkesarkiv prefixes: linkml: https://w3id.org/linkml/ + hc: https://nde.nl/ontology/hc/ + schema: http://schema.org/ + wd: http://www.wikidata.org/entity/ +default_prefix: hc imports: - linkml:types + - ../classes/ArchiveOrganizationType classes: Fylkesarkiv: + class_uri: hc:Fylkesarkiv is_a: ArchiveOrganizationType - class_uri: skos:Concept - description: "Norwegian county archive (fylkesarkiv). These archives serve as regional\narchival institutions at the county (fylke) level in Norway.\n\n**Wikidata**: Q15119463\n\n**Geographic Restriction**: Norway (NO) only.\nThis constraint is enforced via LinkML `rules` with `postconditions`.\n\n**Scope**:\nFylkesarkiv preserve:\n- County administration records (fylkeskommunen)\n- Municipal records from constituent kommuner\n- Regional health and social services documentation\n- Education records (videreg\xE5ende skole)\n- Cultural affairs and heritage documentation\n- Private archives from regional businesses and organizations\n\n**Administrative Context**:\nIn the Norwegian archival system:\n- Arkivverket (National Archives of Norway)\n- Fylkesarkiv (county level) \u2190 This type\n- Kommunearkiv/Byarkiv (municipal level)\n- Interkommunale arkiv (inter-municipal archives)\n\n**Historical Context**:\nNorway has reorganized its counties (2020 regional reform):\n- Some fylkesarkiv have\ - \ merged following county mergers\n- County archives serve both historical fylker and new regions\n- Arkivverket coordinates national archival policy\n\n**Related Types**:\n- Landsarkiv - Regional state archives (under Arkivverket)\n- RegionalArchive (Q27032392) - Generic regional archives\n- CountyArchive - Generic county-level archives\n" - slot_usage: {} - annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + description: >- + Regional archival institution at Norwegian county level responsible for + preserving and providing access to county and related local documentation. + alt_descriptions: + nl: Regionale archiefinstelling op Noors provinciaal niveau die provinciale en gerelateerde lokale documentatie bewaart en toegankelijk maakt. + de: Regionale Archiveinrichtung auf norwegischer Kreisebene zur Bewahrung und Bereitstellung von Kreis- und lokalbezogener Dokumentation. + fr: Institution archivistique regionale au niveau des comtes norvegiens, chargee de conserver et diffuser la documentation comtale et locale associee. + es: Institucion archivistica regional a nivel de condado noruego responsable de preservar y facilitar documentacion del condado y ambito local relacionado. + ar: ู…ุคุณุณุฉ ุฃุฑุดูŠููŠุฉ ุฅู‚ู„ูŠู…ูŠุฉ ุนู„ู‰ ู…ุณุชูˆู‰ ุงู„ู…ู‚ุงุทุนุงุช ููŠ ุงู„ู†ุฑูˆูŠุฌ ู…ุณุคูˆู„ุฉ ุนู† ุญูุธ ูˆุฅุชุงุญุฉ ุงู„ูˆุซุงุฆู‚ ุงู„ุฅุฏุงุฑูŠุฉ ูˆุงู„ุฅู‚ู„ูŠู…ูŠุฉ ุฐุงุช ุงู„ุตู„ุฉ. + id: Lembaga arsip regional tingkat county di Norwegia yang bertanggung jawab melestarikan dan menyediakan akses dokumentasi county serta lokal terkait. + zh: ๆŒชๅจ้ƒก็บงๅŒบๅŸŸๆกฃๆกˆๆœบๆž„๏ผŒ่ดŸ่ดฃไฟๅญ˜ๅนถๆไพ›้ƒก็บงๅŠ็›ธๅ…ณๅœฐๆ–นๆ–‡็Œฎ็š„่ฎฟ้—ฎใ€‚ + structured_aliases: + - literal_form: Noors provinciaal archief + in_language: nl + - literal_form: norwegisches Kreisarchiv + in_language: de + - literal_form: archives de comte norvegien + in_language: fr + - literal_form: archivo condal noruego + in_language: es + - literal_form: ุฃุฑุดูŠู ุงู„ู…ู‚ุงุทุนุฉ ุงู„ู†ุฑูˆูŠุฌูŠ + in_language: ar + - literal_form: arsip county Norwegia + in_language: id + - literal_form: ๆŒชๅจ้ƒกๆกฃๆกˆ้ฆ† + in_language: zh + exact_mappings: + - wd:Q15119463 + broad_mappings: + - schema:ArchiveOrganization diff --git a/schemas/20251121/linkml/modules/classes/GBIFIdentifier.yaml b/schemas/20251121/linkml/modules/classes/GBIFIdentifier.yaml index d0356b193b..735aae7ca3 100644 --- a/schemas/20251121/linkml/modules/classes/GBIFIdentifier.yaml +++ b/schemas/20251121/linkml/modules/classes/GBIFIdentifier.yaml @@ -1,23 +1,46 @@ id: https://nde.nl/ontology/hc/class/GBIFIdentifier name: GBIFIdentifier title: GBIF Identifier -description: Global Biodiversity Information Facility (GBIF) identifier. MIGRATED from gbif_id slot per Rule 53. Follows dwc:occurrenceID. prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ - schema: http://schema.org/ dwc: http://rs.tdwg.org/dwc/terms/ + schema: http://schema.org/ +default_prefix: hc imports: - linkml:types -default_prefix: hc + - ./Identifier classes: GBIFIdentifier: - is_a: Identifier class_uri: hc:GBIFIdentifier + is_a: Identifier + description: >- + Persistent identifier used to reference a biodiversity occurrence record in + GBIF-linked data pipelines. + alt_descriptions: + nl: Persistente identifier voor verwijzing naar een biodiversiteitswaarneming in GBIF-gekoppelde datastromen. + de: Persistenter Identifikator zur Referenzierung eines Biodiversitaetsnachweises in GBIF-verbundenen Datenablaeufen. + fr: Identifiant persistant utilise pour referencer un enregistrement d'occurrence de biodiversite dans des flux de donnees lies a GBIF. + es: Identificador persistente para referenciar un registro de ocurrencia de biodiversidad en flujos de datos vinculados a GBIF. + ar: ู…ุนุฑู‘ู ุฏุงุฆู… ู„ู„ุฅุดุงุฑุฉ ุฅู„ู‰ ุณุฌู„ ูˆู‚ูˆุน ู„ู„ุชู†ูˆุน ุงู„ุญูŠูˆูŠ ุถู…ู† ู…ุณุงุฑุงุช ุจูŠุงู†ุงุช ู…ุฑุชุจุทุฉ ุจู€ GBIF. + id: Pengidentifikasi persisten untuk merujuk catatan kejadian keanekaragaman hayati dalam alur data terkait GBIF. + zh: ็”จไบŽๅœจ GBIF ๅ…ณ่”ๆ•ฐๆฎๆต็จ‹ไธญๅผ•็”จ็”Ÿ็‰ฉๅคšๆ ทๆ€งๅ‡บ็Žฐ่ฎฐๅฝ•็š„ๆŒไน…ๆ ‡่ฏ†็ฌฆใ€‚ + structured_aliases: + - literal_form: GBIF-id + in_language: nl + - literal_form: GBIF-Kennung + in_language: de + - literal_form: identifiant GBIF + in_language: fr + - literal_form: identificador GBIF + in_language: es + - literal_form: ู…ุนุฑู GBIF + in_language: ar + - literal_form: pengenal GBIF + in_language: id + - literal_form: GBIF ๆ ‡่ฏ†็ฌฆ + in_language: zh + broad_mappings: + - schema:PropertyValue close_mappings: - - dwc:occurrenceID - description: A persistent identifier for a biodiversity occurrence record. - annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + - dwc:occurrenceID diff --git a/schemas/20251121/linkml/modules/classes/GHCIdentifier.yaml b/schemas/20251121/linkml/modules/classes/GHCIdentifier.yaml index a22da20f09..2e5abced64 100644 --- a/schemas/20251121/linkml/modules/classes/GHCIdentifier.yaml +++ b/schemas/20251121/linkml/modules/classes/GHCIdentifier.yaml @@ -1,22 +1,23 @@ id: https://nde.nl/ontology/hc/class/GHCIdentifier name: GHCIdentifier -title: Global Heritage Custodian Identifier -description: The Global Heritage Custodian Identifier (GHCID). MIGRATED from ghcid slot per Rule 53. Follows dcterms:identifier. +title: GHC Identifier Class prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ dcterms: http://purl.org/dc/terms/ +default_prefix: hc imports: - linkml:types -default_prefix: hc classes: GHCIdentifier: is_a: Identifier class_uri: hc:GHCIdentifier + description: Persistent identifier assigned to a heritage custodian in the GHCID namespace. close_mappings: - - dcterms:identifier - description: 'A persistent, unique identifier for a heritage custodian. Format: CC-RR-LLL-T-ABBREVIATION' + - dcterms:Identifier + related_mappings: + - dcterms:identifier annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + specificity_score: 0.35 + specificity_rationale: Core persistent identifier class for custodian identity resolution. + custodian_types: '["*"]' diff --git a/schemas/20251121/linkml/modules/classes/Gallery.yaml b/schemas/20251121/linkml/modules/classes/Gallery.yaml index 8b8b6ee393..657110bea8 100644 --- a/schemas/20251121/linkml/modules/classes/Gallery.yaml +++ b/schemas/20251121/linkml/modules/classes/Gallery.yaml @@ -1,30 +1,51 @@ id: https://nde.nl/ontology/hc/class/Gallery name: Gallery title: Gallery -description: An exhibition space or art gallery. MIGRATED from gallery_type_classification context. Follows schema:ArtGallery. prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ schema: http://schema.org/ - skos: http://www.w3.org/2004/02/skos/core# +default_prefix: hc imports: - linkml:types - ../slots/has_description - ../slots/has_label - ../slots/has_type -default_prefix: hc classes: Gallery: - class_uri: schema:ArtGallery + class_uri: hc:Gallery + description: >- + Institution or venue dedicated to exhibiting visual art through curated + programs. + alt_descriptions: + nl: Instelling of locatie gewijd aan het tonen van beeldende kunst via gecureerde programma's. + de: Einrichtung oder Ort, der sich der Praesentation bildender Kunst in kuratierten Programmen widmet. + fr: Institution ou lieu dedie a l'exposition d'arts visuels au moyen de programmes cures. + es: Institucion o espacio dedicado a exhibir artes visuales mediante programas curados. + ar: ู…ุคุณุณุฉ ุฃูˆ ูุถุงุก ู…ุฎุตุต ู„ุนุฑุถ ุงู„ูู†ูˆู† ุงู„ุจุตุฑูŠุฉ ุนุจุฑ ุจุฑุงู…ุฌ ุชู‚ูŠูŠู…/ุชู†ุณูŠู‚ ูู†ูŠ. + id: Institusi atau tempat yang didedikasikan untuk memamerkan seni visual melalui program kurasi. + zh: ้€š่ฟ‡็ญ–ๅฑ•้กน็›ฎๅฑ•็คบ่ง†่ง‰่‰บๆœฏ็š„ๆœบๆž„ๆˆ–ๅœบๆ‰€ใ€‚ + structured_aliases: + - literal_form: galerie + in_language: nl + - literal_form: Galerie + in_language: de + - literal_form: galerie d'art + in_language: fr + - literal_form: galeria de arte + in_language: es + - literal_form: ู…ุนุฑุถ ูู†ูŠ + in_language: ar + - literal_form: galeri seni + in_language: id + - literal_form: ็พŽๆœฏ้ฆ† + in_language: zh slots: - - has_label - - has_description - - has_type + - has_label + - has_description + - has_type slot_usage: has_type: -# range: string # uriorcurie required: true - annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + exact_mappings: + - schema:ArtGallery diff --git a/schemas/20251121/linkml/modules/classes/GalleryType.yaml b/schemas/20251121/linkml/modules/classes/GalleryType.yaml index 3d44c97a5b..94a7728860 100644 --- a/schemas/20251121/linkml/modules/classes/GalleryType.yaml +++ b/schemas/20251121/linkml/modules/classes/GalleryType.yaml @@ -1,216 +1,70 @@ id: https://nde.nl/ontology/hc/class/GalleryType name: GalleryType -title: Gallery Type Classification +title: Gallery Type +prefixes: + linkml: https://w3id.org/linkml/ + hc: https://nde.nl/ontology/hc/ + skos: http://www.w3.org/2004/02/skos/core# +default_prefix: hc imports: - linkml:types - - ../enums/GalleryTypeEnum - - ../slots/has_hypernym - - ../slots/identified_by # was: wikidata_entity - - ../slots/has_model # was: exhibition_model + - ./CustodianType + - ../slots/identified_by + - ../slots/has_model - ../slots/has_objective - - ../slots/has_percentage - - ../slots/has_score # was: template_specificity + - ../slots/has_score - ../slots/has_service - ../slots/has_type - - ../slots/include # was: gallery_subtype - - ../slots/categorized_as # was: exhibition_focus + - ../slots/include - ../slots/represent - ../slots/has_activity - - ../slots/take_comission classes: GalleryType: + class_uri: hc:GalleryType is_a: CustodianType - class_uri: skos:Concept - annotations: - skos:prefLabel: Gallery + description: >- + Controlled taxonomy root for classifying gallery organizational models, + exhibition strategies, and commercial posture. + alt_descriptions: + nl: Gecontroleerde taxonomiewortel voor classificatie van galerie-organisatiemodellen, tentoonstellingsstrategieรซn en commerciรซle oriรซntatie. + de: Kontrollierte Taxonomie-Wurzel zur Klassifikation von Galerie-Organisationsmodellen, Ausstellungsstrategien und kommerzieller Ausrichtung. + fr: Racine taxonomique controlee pour classifier les modeles organisationnels de galeries, les strategies d'exposition et le positionnement commercial. + es: Raiz taxonomica controlada para clasificar modelos organizativos de galeria, estrategias expositivas y orientacion comercial. + ar: ุฌุฐุฑ ุชุตู†ูŠููŠ ู…ุถุจูˆุท ู„ุชุตู†ูŠู ู†ู…ุงุฐุฌ ุชู†ุธูŠู… ุงู„ู…ุนุงุฑุถ ูˆุงุณุชุฑุงุชูŠุฌูŠุงุช ุงู„ุนุฑุถ ูˆุงู„ุงุชุฌุงู‡ ุงู„ุชุฌุงุฑูŠ. + id: Akar taksonomi terkendali untuk mengklasifikasikan model organisasi galeri, strategi pameran, dan orientasi komersial. + zh: ็”จไบŽๅˆ†็ฑป็”ปๅปŠ็ป„็ป‡ๅฝขๆ€ใ€ๅฑ•่งˆ็ญ–็•ฅไธŽๅ•†ไธšๅฎšไฝ็š„ๅ—ๆŽงๅˆ†็ฑปๆ น่Š‚็‚นใ€‚ structured_aliases: - - literal_form: galerie - predicate: EXACT_SYNONYM - in_language: nl - - literal_form: galerijen - predicate: EXACT_SYNONYM - in_language: nl - - literal_form: gallery - predicate: EXACT_SYNONYM - in_language: en - - literal_form: galleries - predicate: EXACT_SYNONYM - in_language: en - - literal_form: art gallery - predicate: EXACT_SYNONYM - in_language: en - - literal_form: Galerie - predicate: EXACT_SYNONYM - in_language: de - - literal_form: Galerien - predicate: EXACT_SYNONYM - in_language: de - - literal_form: kunsthalle - predicate: EXACT_SYNONYM - in_language: de - - literal_form: galeria - predicate: EXACT_SYNONYM - in_language: es - - literal_form: galerรญas - predicate: EXACT_SYNONYM - in_language: es - - literal_form: galleria - predicate: EXACT_SYNONYM - in_language: it - - literal_form: gallerie - predicate: EXACT_SYNONYM - in_language: it - - literal_form: galeria - predicate: EXACT_SYNONYM - in_language: pt - - literal_form: galerias - predicate: EXACT_SYNONYM - in_language: pt - - literal_form: galerie - predicate: EXACT_SYNONYM - in_language: fr - - literal_form: galeries - predicate: EXACT_SYNONYM - in_language: fr - description: "Specialized custodian type for art galleries - institutions that exhibit\nand sometimes sell visual artworks,\ - \ providing public access to contemporary\nor historical art through temporary or rotating exhibitions.\n\n**Wikidata\ - \ Base Concept**: Q1007870 (art gallery)\n\n**Scope**:\nGalleries are distinguished by their focus on:\n- Exhibition-oriented\ - \ (not collection-based like museums)\n- Contemporary or recent art (not historical artifacts)\n- Temporary exhibitions\ - \ (rotating shows, not permanent displays)\n- Artist representation (commercial) or kunsthalle model (non-commercial)\n\ - - Visual arts (paintings, sculptures, photography, installations)\n\n**Key Gallery Subtypes** (78+ extracted from Wikidata):\n\ - \n**By Business Model**:\n- Commercial art galleries (Q56856618) - For-profit, sell artworks, represent artists\n- Noncommercial\ - \ art galleries (Q67165238) - Exhibition-only, no sales\n- Kunsthalle (Q1475403) - German model, temporary exhibitions,\ - \ no permanent collection\n- Vanity galleries (Q17111940) - Charge artists for exhibition space\n- National galleries\ - \ (Q3844310) - State-run, representative of nation\n\n**By Subject Specialization**:\n- Photography galleries (Q114023739)\ - \ - Photographic art exhibitions\n- Photo galleries (Q12303444) - Physical or digital photograph collections\n- Photography\ - \ centres (Q11900212) - Dedicated photography venues\n- Photothรจques (Q135926044) - Photographic heritage preservation\n\ - - Sculpture gardens (Q1759852) - Outdoor sculpture exhibitions\n- Jewellery galleries (Q117072343) - Jewelry and decorative\ - \ arts\n- Design galleries (Q127346204) - Design and applied arts\n- Map galleries (Q125501487) - Cartographic art exhibitions\n\ - - Print rooms (Q445396) - Prints, drawings, watercolors, photographs\n\n**By Organizational Model**:\n- Artist-run centres\ - \ (Q4801243) - Managed and directed by artists\n- Artist-run initiatives (Q3325736) - Gallery operated by artists\n\ - - Artist-run spaces (Q4034417) - Organizations initiated by artists\n- Artist cooperatives (Q4801240) - Jointly owned\ - \ by artist members\n- Canadian artist-run centres (Q16020664) - Canada-specific model (1960s+)\n\n**By Art Period Focus**:\n\ - - Contemporary art galleries (Q16038801) - Current/recent art\n- Modern art galleries (Q3757717) - Modernist period\ - \ (late 19th-20th century)\n- Contemporary arts centres (Q2945053) - Focus on contemporary practice\n- National centres\ - \ for contemporary art (Q109017987) - State contemporary art venues\n\n**By Venue Type**:\n- Alternative exhibition\ - \ spaces (Q16002704) - Non-traditional venues\n- Arts venues (Q15090615) - Places for artistic works display/performance\n\ - - Arts centers (Q2190251) - Community centers for arts\n- Cast collections (Q29380643) - Plaster cast galleries (educational)\n\ - - Plaster cast galleries (Q3768550) - Sculpture reproduction collections\n\n**By Artist Association**:\n- Artist museums\ - \ (Q1747681) - Dedicated to particular artist\n- Artist houses (Q1797122) - Buildings with artist work rooms\n- Art\ - \ colonies (Q1558054) - Places where artists live and interact\n- Art communes (Q4797182) - Communal living focused\ - \ on art creation\n- Studio houses (Q2699076) - Residential spaces with studio facilities\n\n**Online & Digital**:\n\ - - Online art galleries (Q7094057) - Digital exhibition platforms\n- Galeries Fnac (Q109038036) - French retail chain\ - \ photo galleries (1970s+)\n\n**Specialized Formats**:\n- Pinacotheca (Q740437) - Public art gallery (classical term)\n\ - - Print rooms (Q445396) - Graphic arts collections\n- Photograph collections (Q130486108) - Photography collections\n\ - \n**French Model**:\n- Scientific, technical, and industrial culture centers (Q2946053) - Popular science venues\n\n\ - **Cultural Context**:\n- Arts and Culture Centres (Q4801491) - Newfoundland & Labrador system (Canada)\n- Houses of\ - \ culture (Q5061188) - Cultural institutions in socialist/social democratic contexts\n- Houses of literature (Q27908105)\ - \ - Cultural institutions for written art\n- Centrum Beeldende Kunst (Q2104985) - Dutch visual arts centers\n\n**Supporting\ - \ Organizations**:\n- Not-for-profit arts organizations (Q7062022) - Nonprofit arts foundations\n- Art institutions\ - \ (Q20897549) - Organizations dedicated to art\n- Cultural institutions (Q3152824) - Preservation/promotion of culture\n\ - \n**Commercial vs. Non-Commercial Distinction**:\n\n**Commercial Galleries**:\n- Represent artists (exclusive or non-exclusive\ - \ contracts)\n- Sell artworks (earn commission on sales)\n- Participate in art fairs\n- Primary market (new works) or\ - \ secondary market (resale)\n\n**Non-Commercial Galleries** (Kunsthalle model):\n- No permanent collection\n- Exhibition-only\ - \ mission\n- Public or nonprofit funding\n- Educational/cultural programming\n- No artwork sales\n\n**RDF Serialization\ - \ Example**:\n```turtle\n:Custodian_KunsthalRotterdam\n org:classification :GalleryType_Kunsthalle_Q1475403 .\n\n\ - :GalleryType_Kunsthalle_Q1475403\n a glamtype:GalleryType, crm:E55_Type, skos:Concept ;\n skos:prefLabel \"Kunsthalle\"\ - @en, \"kunsthalle\"@nl, \"Kunsthalle\"@de ;\n skos:broader :GalleryType_ArtGallery_Q1007870 ;\n schema:additionalType\ - \ ;\n glamtype:glamorcubesfixphdnt_code \"GALLERY\" ;\n glamtype:has_objective\ - \ false ;\n glamtype:exhibition_focus \"contemporary art\" ;\n glamtype:sales_activity false ;\n glamtype:exhibition_model\ - \ \"temporary rotating exhibitions\" .\n```\n\n**Domain-Specific Properties**:\nThis class adds gallery-specific metadata\ - \ beyond base CustodianType:\n- `has_objective` - Structured profit objective (commercial/nonprofit/mixed)\n- `artist_representation`\ - \ - Artists represented by gallery (for commercial galleries)\n- `exhibition_focus` - Type of art exhibited (contemporary,\ - \ modern, photography, etc.)\n- `sales_activity` - Whether gallery sells artworks (not just exhibits)\n- `exhibition_model`\ - \ - Exhibition strategy (temporary, rotating, curated shows)\n- `has_service` - Art sales service with commission structure (ArtSaleService)\n\n**Getty AAT Integration**:\nThe Getty Art & Architecture Thesaurus provides standardized\ - \ vocabulary:\n- aat:300005768 - art galleries (institutions)\n- aat:300240057 - commercial galleries\n- aat:300240058\ - \ - nonprofit galleries\n- aat:300005741 - kunsthalles\n\n**Art Market Context**:\nCommercial galleries operate in the\ - \ art market ecosystem:\n- **Primary market**: Representing living artists, first sales\n- **Secondary market**: Resale\ - \ of works by established artists\n- **Art fairs**: Participation in international art fairs (Basel, Frieze, etc.)\n\ - - **Auction houses**: Different from galleries (auction vs. consignment model)\n\n**Data Population**:\nGallery subtypes\ - \ extracted from 78 Wikidata entities with type='G'\nin `data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated_full.yaml`.\n" + - literal_form: galerietype + in_language: nl + - literal_form: Galerietyp + in_language: de + - literal_form: type de galerie + in_language: fr + - literal_form: tipo de galeria + in_language: es + - literal_form: ู†ูˆุน ุงู„ู…ุนุฑุถ ุงู„ูู†ูŠ + in_language: ar + - literal_form: tipe galeri + in_language: id + - literal_form: ็”ปๅปŠ็ฑปๅž‹ + in_language: zh slots: - - represent - # REMOVED 2026-01-22: commercial_operation - migrated to has_objective + Profit (Rule 53) - - has_objective - # REMOVED 2026-01-22: commission_rate - migrated to has_service + ArtSaleService (Rule 53) - - has_service - - has_type - - has_type # was: exhibition_focus - migrated per Rule 53 (2026-01-26) - - has_model # was: exhibition_model - migrated per Rule 53 (2026-01-26) - - include # was: gallery_subtype - migrated per Rule 53 (2026-01-26) - - has_activity - - has_score # was: template_specificity - migrated per Rule 53 (2026-01-17) - - identified_by # was: wikidata_entity - migrated per Rule 53 (2026-01-16) + - represent + - has_objective + - has_service + - has_type + - has_model + - include + - has_activity + - has_score + - identified_by slot_usage: - identified_by: # was: wikidata_entity - migrated per Rule 53 (2026-01-16) - pattern: ^Q[0-9]+$ + identified_by: required: true - has_hypernym: - range: GalleryType - required: false has_type: - equals_expression: '["hc:GalleryType"]' - has_type: # was: exhibition_focus - migrated per Rule 53 (2026-01-26) -# range: string - has_model: # was: exhibition_model - migrated per Rule 53 (2026-01-26) -# range: string - include: # was: gallery_subtype - migrated per Rule 53 (2026-01-26) + equals_string: hc:GalleryType + include: range: GalleryType - any_of: - - range: CommercialGallery - - range: NonProfitGallery - - range: ArtistRunSpace - - range: Kunsthalle required: false - exact_mappings: - - skos:Concept - - schema:ArtGallery - close_mappings: - - crm:E55_Type - - aat:300005768 - related_mappings: - - aat:300240057 - - aat:300240058 - comments: - - GalleryType implements SKOS-based classification for art gallery organizations - - Distinguishes commercial (sales-oriented) from non-commercial (kunsthalle) models - - Supports 78+ Wikidata gallery subtypes with multilingual labels - - Getty AAT integration for art market terminology - - 'Artist-run initiatives: Canadian model (1960s+), cooperative ownership' - examples: - - value: - identified_by: https://nde.nl/ontology/hc/type/gallery/Q1475403 - has_type_code: GALLERY - has_label: - - Kunsthalle@en - - kunsthalle@nl - - Kunsthalle@de - has_description: facility that mounts temporary art exhibitions without permanent collection # was: type_description - migrated per Rule 53/56 (2026-01-16) - custodian_type_broader: https://nde.nl/ontology/hc/type/gallery/Q1007870 - # MIGRATED 2026-01-22: commercial_operation โ†’ has_objective + Profit (Rule 53) - has_objective: - has_type: contemporary art - sales_activity: false - has_model: temporary rotating exhibitions, no permanent collection - - value: - identified_by: https://nde.nl/ontology/hc/type/gallery/Q56856618 - has_type_code: GALLERY - has_label: - - Commercial Art Gallery@en - - kunstgalerie@nl - has_description: for-profit gallery that sells artworks and represents artists # was: type_description - migrated per Rule 53/56 (2026-01-16) - custodian_type_broader: https://nde.nl/ontology/hc/type/gallery/Q1007870 - # MIGRATED 2026-01-22: commercial_operation โ†’ has_objective + Profit (Rule 53) - has_objective: - represents_or_represented: - - has_label: Artist A - - has_label: Artist B - - has_label: Artist C - has_type: contemporary painting and sculpture - sales_activity: true - has_model: curated exhibitions of represented artists - # MIGRATED 2026-01-22: commission_rate โ†’ has_service + ArtSaleService (Rule 53) - has_service: - sales_activity: true - takes_or_took_comission: - has_percentage: \ No newline at end of file + broad_mappings: + - skos:Concept diff --git a/schemas/20251121/linkml/modules/classes/GalleryTypes.yaml b/schemas/20251121/linkml/modules/classes/GalleryTypes.yaml index 22b9c0fbb9..4b2e8f3d40 100644 --- a/schemas/20251121/linkml/modules/classes/GalleryTypes.yaml +++ b/schemas/20251121/linkml/modules/classes/GalleryTypes.yaml @@ -1,38 +1,36 @@ id: https://nde.nl/ontology/hc/class/GalleryTypes name: GalleryTypes title: Gallery Type Subclasses -description: Concrete subclasses of GalleryType. MIGRATED from gallery_subtype slot - per Rule 53/0b. prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ skos: http://www.w3.org/2004/02/skos/core# +default_prefix: hc imports: - ./GalleryType - linkml:types -default_prefix: hc classes: CommercialGallery: is_a: GalleryType - description: A gallery that sells art. - annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: '[''*'']' + class_uri: hc:CommercialGallery + description: Gallery model that combines exhibition with artwork sales and artist representation. broad_mappings: - - skos:Concept + - skos:Concept NonProfitGallery: is_a: GalleryType - description: A gallery that operates as a non-profit. + class_uri: hc:NonProfitGallery + description: Gallery model operating under nonprofit governance and mission-oriented programming. broad_mappings: - - skos:Concept + - skos:Concept ArtistRunSpace: is_a: GalleryType - description: A gallery run by artists. + class_uri: hc:ArtistRunSpace + description: Gallery model initiated and managed primarily by artists. broad_mappings: - - skos:Concept + - skos:Concept Kunsthalle: is_a: GalleryType - description: An art exhibition space without a permanent collection. + class_uri: hc:Kunsthalle + description: Exhibition-oriented gallery model without a permanent collection. broad_mappings: - - skos:Concept + - skos:Concept diff --git a/schemas/20251121/linkml/modules/classes/GenBankAccession.yaml b/schemas/20251121/linkml/modules/classes/GenBankAccession.yaml index e5b0536cb9..9c861dd96d 100644 --- a/schemas/20251121/linkml/modules/classes/GenBankAccession.yaml +++ b/schemas/20251121/linkml/modules/classes/GenBankAccession.yaml @@ -1,20 +1,43 @@ id: https://nde.nl/ontology/hc/class/GenBankAccession name: GenBankAccession title: GenBank Accession -description: A GenBank accession number for a nucleotide sequence. MIGRATED from genbank_accession slot per Rule 53. Follows BioProject/GenBank identifiers. prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ schema: http://schema.org/ +default_prefix: hc imports: - linkml:types -default_prefix: hc + - ./Identifier classes: GenBankAccession: + class_uri: hc:GenBankAccession is_a: Identifier - class_uri: schema:PropertyValue - description: A persistent identifier for a nucleotide sequence in GenBank. - annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + description: >- + Persistent accession identifier assigned to a nucleotide sequence record in + GenBank. + alt_descriptions: + nl: Persistente toegangscode toegekend aan een nucleotide-sequentierecord in GenBank. + de: Persistente Zugriffkennung fuer einen Nukleotidsequenz-Datensatz in GenBank. + fr: Numero d'accession persistant attribue a un enregistrement de sequence nucleotidique dans GenBank. + es: Accesion persistente asignada a un registro de secuencia nucleotidica en GenBank. + ar: ุฑู‚ู… ุฅุชุงุญุฉ ุฏุงุฆู… ูŠูุณู†ุฏ ุฅู„ู‰ ุณุฌู„ ุชุณู„ุณู„ ู†ูˆูƒู„ูŠูˆุชูŠุฏูŠ ููŠ GenBank. + id: Nomor aksesi persisten yang ditetapkan pada rekaman sekuens nukleotida di GenBank. + zh: ๅˆ†้…็ป™ GenBank ๆ ธ่‹ท้…ธๅบๅˆ—่ฎฐๅฝ•็š„ๆŒไน…็™ปๅฝ•ๅทๆ ‡่ฏ†ใ€‚ + structured_aliases: + - literal_form: GenBank-toegangscode + in_language: nl + - literal_form: GenBank-Zugangsnummer + in_language: de + - literal_form: numero d'accession GenBank + in_language: fr + - literal_form: numero de acceso GenBank + in_language: es + - literal_form: ุฑู‚ู… ุฅุชุงุญุฉ GenBank + in_language: ar + - literal_form: nomor aksesi GenBank + in_language: id + - literal_form: GenBank ็™ปๅฝ•ๅท + in_language: zh + broad_mappings: + - schema:PropertyValue diff --git a/schemas/20251121/linkml/modules/classes/Gender.yaml b/schemas/20251121/linkml/modules/classes/Gender.yaml index 3ba0ae7814..e760756b11 100644 --- a/schemas/20251121/linkml/modules/classes/Gender.yaml +++ b/schemas/20251121/linkml/modules/classes/Gender.yaml @@ -1,30 +1,48 @@ id: https://nde.nl/ontology/hc/class/Gender name: Gender title: Gender -description: Gender identity or classification. MIGRATED from gender_identity slot per Rule 53. Follows schema:GenderType. prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ schema: http://schema.org/ skos: http://www.w3.org/2004/02/skos/core# - dcterms: http://purl.org/dc/terms/ - prov: http://www.w3.org/ns/prov# - crm: http://www.cidoc-crm.org/cidoc-crm/ - rdfs: http://www.w3.org/2000/01/rdf-schema# - org: http://www.w3.org/ns/org# - xsd: http://www.w3.org/2001/XMLSchema# +default_prefix: hc imports: - linkml:types - ../slots/has_description - ../slots/has_label -default_prefix: hc classes: Gender: - class_uri: schema:GenderType + class_uri: hc:Gender + description: >- + Classification term used to represent stated gender identity in descriptive + metadata contexts. + alt_descriptions: + nl: Classificatieterm voor het weergeven van opgegeven genderidentiteit in beschrijvende metadata. + de: Klassifikationsterm zur Darstellung angegebener Geschlechtsidentitaet in beschreibenden Metadatenkontexten. + fr: Terme de classification utilise pour representer l'identite de genre declaree dans des metadonnees descriptives. + es: Termino de clasificacion para representar identidad de genero declarada en contextos de metadatos descriptivos. + ar: ู…ุตุทู„ุญ ุชุตู†ูŠููŠ ู„ุชู…ุซูŠู„ ุงู„ู‡ูˆูŠุฉ ุงู„ุฌู†ุฏุฑูŠุฉ ุงู„ู…ุตุฑู‘ุญ ุจู‡ุง ุถู…ู† ุณูŠุงู‚ุงุช ุงู„ุจูŠุงู†ุงุช ุงู„ูˆุตููŠุฉ. + id: Istilah klasifikasi untuk merepresentasikan identitas gender yang dinyatakan dalam konteks metadata deskriptif. + zh: ็”จไบŽๅœจๆ่ฟฐๆ€งๅ…ƒๆ•ฐๆฎ่ฏญๅขƒไธญ่กจ็คบ็”ณๆŠฅๆ€งๅˆซ่ฎคๅŒ็š„ๅˆ†็ฑปๆœฏ่ฏญใ€‚ + structured_aliases: + - literal_form: gender + in_language: nl + - literal_form: Geschlechtsidentitaet + in_language: de + - literal_form: identite de genre + in_language: fr + - literal_form: identidad de genero + in_language: es + - literal_form: ุงู„ู‡ูˆูŠุฉ ุงู„ุฌู†ุฏุฑูŠุฉ + in_language: ar + - literal_form: identitas gender + in_language: id + - literal_form: ๆ€งๅˆซ่ฎคๅŒ + in_language: zh slots: - - has_label - - has_description - annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + - has_label + - has_description + broad_mappings: + - schema:GenderType + - skos:Concept diff --git a/schemas/20251121/linkml/modules/classes/GenealogiewerkbalkEnrichment.yaml b/schemas/20251121/linkml/modules/classes/GenealogiewerkbalkEnrichment.yaml deleted file mode 100644 index 0c40501bfb..0000000000 --- a/schemas/20251121/linkml/modules/classes/GenealogiewerkbalkEnrichment.yaml +++ /dev/null @@ -1,33 +0,0 @@ -id: https://nde.nl/ontology/hc/classes/GenealogiewerkbalkEnrichment -name: GenealogiewerkbalkEnrichment -title: GenealogiewerkbalkEnrichment -prefixes: - linkml: https://w3id.org/linkml/ - hc: https://nde.nl/ontology/hc/ - schema: http://schema.org/ - prov: http://www.w3.org/ns/prov# - xsd: http://www.w3.org/2001/XMLSchema# -imports: - - linkml:types - - ../enums/DataTierEnum -# default_range: string -classes: - GenealogiewerkbalkEnrichment: - description: "Dutch genealogy archives registry (Genealogiewerkbalk) data including\ - \ municipality, province, and associated archive information.\nOntology mapping\ - \ rationale: - class_uri is prov:Entity because this represents enrichment data\n\ - \ derived from the Dutch genealogy archives registry\n- close_mappings includes\ - \ schema:Dataset for registry data semantics - related_mappings includes prov:PrimarySource\ - \ for source registry" - class_uri: prov:Entity - close_mappings: - - schema:Dataset - related_mappings: - - prov:PrimarySource - annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: '[''*'']' - slots: - - has_source - - has_url diff --git a/schemas/20251121/linkml/modules/classes/GenealogyArchivesRegistryEnrichment.yaml b/schemas/20251121/linkml/modules/classes/GenealogyArchivesRegistryEnrichment.yaml new file mode 100644 index 0000000000..6070d268b1 --- /dev/null +++ b/schemas/20251121/linkml/modules/classes/GenealogyArchivesRegistryEnrichment.yaml @@ -0,0 +1,48 @@ +id: https://nde.nl/ontology/hc/classes/GenealogyArchivesRegistryEnrichment +name: GenealogyArchivesRegistryEnrichment +title: Genealogy Archives Registry Enrichment Class +prefixes: + linkml: https://w3id.org/linkml/ + hc: https://nde.nl/ontology/hc/ + schema: http://schema.org/ + prov: http://www.w3.org/ns/prov# + xsd: http://www.w3.org/2001/XMLSchema# +imports: + - linkml:types + - ../enums/DataTierEnum +# default_range: string +classes: + GenealogyArchivesRegistryEnrichment: + description: >- + Enrichment data derived from genealogy-focused archive registry sources, + including municipality, province, and linked archive information. + class_uri: prov:Entity + alt_descriptions: + nl: {text: Verrijkingsdata uit genealogische archiefregisters, inclusief gemeente, provincie en gekoppelde archiefinformatie., language: nl} + de: {text: Anreicherungsdaten aus genealogischen Archivregistern mit Angaben zu Gemeinde, Provinz und verknuepften Archiven., language: de} + fr: {text: Donnees d enrichissement issues de registres d archives genealogiques, incluant municipalite, province et archives associees., language: fr} + es: {text: Datos de enriquecimiento derivados de registros archivisticos genealogicos, incluidos municipio, provincia e informacion de archivo vinculada., language: es} + ar: {text: ุจูŠุงู†ุงุช ุฅุซุฑุงุก ู…ุดุชู‚ุฉ ู…ู† ุณุฌู„ุงุช ุฃุฑุดูŠููŠุฉ ุฎุงุตุฉ ุจุนู„ู… ุงู„ุฃู†ุณุงุจ ูˆุชุดู…ู„ ุงู„ุจู„ุฏูŠุฉ ูˆุงู„ู…ู‚ุงุทุนุฉ ูˆู…ุนู„ูˆู…ุงุช ุงู„ุฃุฑุดูŠู ุงู„ู…ุฑุชุจุทุฉ., language: ar} + id: {text: Data pengayaan dari registri arsip genealogi, termasuk munisipalitas, provinsi, dan informasi arsip terkait., language: id} + zh: {text: ๆบ่‡ชๅฎถ่ฐฑๆกฃๆกˆ็™ป่ฎฐๆฅๆบ็š„ๅฏŒๅŒ–ๆ•ฐๆฎ๏ผŒๅŒ…ๅซๅธ‚้•‡ใ€็œไปฝๅŠๅ…ณ่”ๆกฃๆกˆไฟกๆฏใ€‚, language: zh} + structured_aliases: + nl: [{literal_form: verrijking genealogisch archiefregister, language: nl}] + de: [{literal_form: Anreicherung genealogisches Archivregister, language: de}] + fr: [{literal_form: enrichissement registre d archives genealogiques, language: fr}] + es: [{literal_form: enriquecimiento de registro archivistico genealogico, language: es}] + ar: [{literal_form: ุฅุซุฑุงุก ุณุฌู„ ุงู„ุฃุฑุดูŠู ุงู„ุฌูŠู†ูŠุงู„ูˆุฌูŠ, language: ar}] + id: [{literal_form: pengayaan registri arsip genealogi, language: id}] + zh: [{literal_form: ๅฎถ่ฐฑๆกฃๆกˆ็™ป่ฎฐๅฏŒๅŒ–, language: zh}] + exact_mappings: + - prov:Entity + close_mappings: + - schema:Dataset + related_mappings: + - prov:PrimarySource + annotations: + specificity_score: 0.1 + specificity_rationale: Generic utility class/slot created during migration + custodian_types: '[''*'']' + slots: + - has_source + - has_url diff --git a/schemas/20251121/linkml/modules/classes/GenerationEvent.yaml b/schemas/20251121/linkml/modules/classes/GenerationEvent.yaml index 3078857429..fedabae532 100644 --- a/schemas/20251121/linkml/modules/classes/GenerationEvent.yaml +++ b/schemas/20251121/linkml/modules/classes/GenerationEvent.yaml @@ -1,106 +1,42 @@ id: https://nde.nl/ontology/hc/class/GenerationEvent -name: generation_event_class +name: GenerationEvent title: Generation Event Class - prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ prov: http://www.w3.org/ns/prov# schema: http://schema.org/ - +default_prefix: hc imports: - linkml:types - - ../slots/has_description - - ../slots/has_provenance - - ../slots/has_score - ../slots/temporal_extent -default_prefix: hc - + - ../slots/has_provenance + - ../slots/has_description + - ../slots/has_score classes: GenerationEvent: - description: >- - An event representing the generation or creation of an entity. - - **USAGE**: - Used for tracking when and how something was generated, including: - - Video chapter generation (manual, AI, imported) - - Content extraction events - - Automated processing activities - - Confidence scoring for generated content - - **STRUCTURE**: - - temporal_extent: When the generation occurred (TimeSpan) - - has_provenance: Who/what performed the generation (Provenance) - - has_description: Details about the generation process - - has_score: Confidence score for the generated content (ConfidenceScore) - - **ONTOLOGY ALIGNMENT**: - - Maps to prov:Generation (PROV-O generation event) - - Also maps to schema:CreateAction (Schema.org action) - class_uri: prov:Generation - + description: Event in which an entity is created or generated. exact_mappings: - prov:Generation - close_mappings: - schema:CreateAction - slots: - temporal_extent - has_provenance - has_description - has_score - slot_usage: temporal_extent: range: TimeSpan - required: false inlined: true - examples: - - value: - begin_of_the_begin: "2024-01-15T10:30:00Z" - end_of_the_end: "2024-01-15T10:30:00Z" has_provenance: range: Provenance - required: false inlined: true - examples: - - value: - has_agent: - has_type: SOFTWARE - has_name: "YouTube Auto-Chapters" - has_description: -# range: string - required: false - examples: - - value: "Generated using Whisper transcript segmentation" has_score: range: ConfidenceScore - required: false inlined: true - examples: - - value: - has_score: 0.95 - has_method: "xpath_extraction" - has_description: "High confidence - exact match at expected location" annotations: custodian_types: '["*"]' - custodian_types_rationale: >- - Generation events are universal for tracking content creation. - custodian_types_primary: "*" - specificity_score: 0.30 - specificity_rationale: >- - Moderately low specificity - used across many content types. - - examples: - - value: - temporal_extent: - begin_of_the_begin: "2024-01-15T10:30:00Z" - has_description: "AI-generated video chapters from transcript" - has_score: - has_score: 0.92 - has_method: "transcript_segmentation" - comments: - - Created from slot_fixes.yaml migration (2026-01-19) - - Updated 2026-01-19 to include has_score for confidence tracking + specificity_score: 0.3 + specificity_rationale: Cross-domain provenance event for generated content. diff --git a/schemas/20251121/linkml/modules/classes/GeoFeature.yaml b/schemas/20251121/linkml/modules/classes/GeoFeature.yaml index 243372ea7b..5ba7dbc030 100644 --- a/schemas/20251121/linkml/modules/classes/GeoFeature.yaml +++ b/schemas/20251121/linkml/modules/classes/GeoFeature.yaml @@ -1,40 +1,33 @@ id: https://nde.nl/ontology/hc/class/GeoFeature name: GeoFeature -title: Geographic Feature -description: 'A classification of a geographic feature (e.g., populated place, administrative division). MIGRATED from feature_class/feature_code slots. - - Used to classify GeoSpatialPlace instances according to GeoNames feature codes.' +title: Geo Feature Class prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ - schema: http://schema.org/ - gn: http://www.geonames.org/ontology# skos: http://www.w3.org/2004/02/skos/core# - dcterms: http://purl.org/dc/terms/ - prov: http://www.w3.org/ns/prov# - crm: http://www.cidoc-crm.org/cidoc-crm/ - rdfs: http://www.w3.org/2000/01/rdf-schema# - org: http://www.w3.org/ns/org# - xsd: http://www.w3.org/2001/XMLSchema# + gn: http://www.geonames.org/ontology# +default_prefix: hc imports: - linkml:types - - ../slots/has_code - ../slots/has_type -default_prefix: hc + - ../slots/has_code classes: GeoFeature: class_uri: skos:Concept + description: Geo feature classification entry, typically aligned to GeoNames coding. + broad_mappings: + - skos:Concept + close_mappings: + - gn:Feature slots: - - has_type - - has_code + - has_type + - has_code slot_usage: has_type: -# range: string # uriorcurie required: true has_code: -# range: string # uriorcurie required: true annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + specificity_score: 0.35 + specificity_rationale: Controlled geospatial classification term. + custodian_types: '["*"]' diff --git a/schemas/20251121/linkml/modules/classes/GeoFeatureType.yaml b/schemas/20251121/linkml/modules/classes/GeoFeatureType.yaml index 31cbeafc82..820521810e 100644 --- a/schemas/20251121/linkml/modules/classes/GeoFeatureType.yaml +++ b/schemas/20251121/linkml/modules/classes/GeoFeatureType.yaml @@ -1,25 +1,26 @@ id: https://nde.nl/ontology/hc/class/GeoFeatureType name: GeoFeatureType -title: Geographic Feature Type -description: Abstract base class for geographic feature types (e.g., PopulatedPlace, AdministrativeDivision). MIGRATED from feature_class slot per Rule 0b. +title: Geo Feature Type Class prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ skos: http://www.w3.org/2004/02/skos/core# - gn: http://www.geonames.org/ontology# +default_prefix: hc imports: - linkml:types - - ../slots/has_description - ../slots/has_label -default_prefix: hc + - ../slots/has_description classes: GeoFeatureType: class_uri: skos:Concept abstract: true + description: Abstract taxonomy node for geographic feature classes. + broad_mappings: + - skos:Concept slots: - - has_label - - has_description + - has_label + - has_description annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + specificity_score: 0.3 + specificity_rationale: Shared hierarchy base for geospatial feature typing. + custodian_types: '["*"]' diff --git a/schemas/20251121/linkml/modules/classes/GeoFeatureTypes.yaml b/schemas/20251121/linkml/modules/classes/GeoFeatureTypes.yaml index bf9e1f0b51..d1f9e46174 100644 --- a/schemas/20251121/linkml/modules/classes/GeoFeatureTypes.yaml +++ b/schemas/20251121/linkml/modules/classes/GeoFeatureTypes.yaml @@ -1,82 +1,77 @@ id: https://nde.nl/ontology/hc/class/GeoFeatureTypes name: GeoFeatureTypes -title: Geographic Feature Type Subclasses -description: Concrete subclasses of GeoFeatureType representing specific geographic - feature categories. Based on GeoNames feature classes. +title: Geo Feature Types Class Module prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ - skos: http://www.w3.org/2004/02/skos/core# gn: http://www.geonames.org/ontology# + schema: http://schema.org/ + crm: http://www.cidoc-crm.org/cidoc-crm/ +default_prefix: hc imports: - ./GeoFeatureType - linkml:types -default_prefix: hc classes: AdministrativeBoundary: is_a: GeoFeatureType class_uri: gn:A - description: Country, state, region, etc. (GeoNames class A) - annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: '[''*'']' + description: Administrative division feature class. broad_mappings: - - schema:Place - - crm:E53_Place + - schema:Place + - crm:E53_Place HydrographicFeature: is_a: GeoFeatureType class_uri: gn:H - description: Stream, lake, etc. (GeoNames class H) + description: Hydrographic feature class. broad_mappings: - - schema:Place - - crm:E53_Place + - schema:Place + - crm:E53_Place AreaFeature: is_a: GeoFeatureType class_uri: gn:L - description: Parks, area, etc. (GeoNames class L) + description: Area feature class. broad_mappings: - - schema:Place - - crm:E53_Place + - schema:Place + - crm:E53_Place PopulatedPlace: is_a: GeoFeatureType class_uri: gn:P - description: City, village, etc. (GeoNames class P) + description: Populated place feature class. broad_mappings: - - schema:Place - - crm:E53_Place + - schema:Place + - crm:E53_Place RoadRailroad: is_a: GeoFeatureType class_uri: gn:R - description: Road, railroad, etc. (GeoNames class R) + description: Transport corridor feature class. broad_mappings: - - schema:Place - - crm:E53_Place + - schema:Place + - crm:E53_Place SpotFeature: is_a: GeoFeatureType class_uri: gn:S - description: Spot, building, farm (GeoNames class S) + description: Spot feature class, including discrete built entities. broad_mappings: - - schema:Place - - crm:E53_Place + - schema:Place + - crm:E53_Place HypsographicFeature: is_a: GeoFeatureType class_uri: gn:T - description: Mountain, hill, rock (GeoNames class T) + description: Terrain elevation feature class. broad_mappings: - - schema:Place - - crm:E53_Place + - schema:Place + - crm:E53_Place UnderseaFeature: is_a: GeoFeatureType class_uri: gn:U - description: Undersea feature (GeoNames class U) + description: Undersea feature class. broad_mappings: - - schema:Place - - crm:E53_Place + - schema:Place + - crm:E53_Place VegetationFeature: is_a: GeoFeatureType class_uri: gn:V - description: Forest, heath, etc. (GeoNames class V) + description: Vegetation feature class. broad_mappings: - - schema:Place - - crm:E53_Place + - schema:Place + - crm:E53_Place diff --git a/schemas/20251121/linkml/modules/classes/GeoNamesIdentifier.yaml b/schemas/20251121/linkml/modules/classes/GeoNamesIdentifier.yaml index 14557c4ce4..ef117ef864 100644 --- a/schemas/20251121/linkml/modules/classes/GeoNamesIdentifier.yaml +++ b/schemas/20251121/linkml/modules/classes/GeoNamesIdentifier.yaml @@ -1,23 +1,24 @@ id: https://nde.nl/ontology/hc/class/GeoNamesIdentifier name: GeoNamesIdentifier -title: GeoNames Identifier -description: Identifier from the GeoNames geographical database. MIGRATED from geonames_id slot per Rule 53. Follows gn:geonamesID. +title: GeoNames Identifier Class prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ - schema: http://schema.org/ + dcterms: http://purl.org/dc/terms/ gn: http://www.geonames.org/ontology# +default_prefix: hc imports: - linkml:types -default_prefix: hc classes: GeoNamesIdentifier: is_a: Identifier class_uri: hc:GeoNamesIdentifier + description: External identifier referencing a feature in the GeoNames gazetteer. close_mappings: - - gn:geonamesID - description: A unique identifier for a GeoNames feature. Typically an integer. + - dcterms:Identifier + related_mappings: + - gn:geonamesID annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + specificity_score: 0.3 + specificity_rationale: Specialized external place identifier class. + custodian_types: '["*"]' diff --git a/schemas/20251121/linkml/modules/classes/GeoSpatialPlace.yaml b/schemas/20251121/linkml/modules/classes/GeoSpatialPlace.yaml index b7162f73da..fc4d2b86c1 100644 --- a/schemas/20251121/linkml/modules/classes/GeoSpatialPlace.yaml +++ b/schemas/20251121/linkml/modules/classes/GeoSpatialPlace.yaml @@ -1,158 +1,74 @@ id: https://nde.nl/ontology/hc/class/GeoSpatialPlace -name: geospatial_place_class -title: GeoSpatialPlace Class +name: GeoSpatialPlace +title: Geo Spatial Place Class prefixes: - geo: http://www.opengis.net/ont/geosparql# - rov: http://www.w3.org/ns/regorg# + linkml: https://w3id.org/linkml/ + hc: https://nde.nl/ontology/hc/ geosparql: http://www.opengis.net/ont/geosparql# - wgs84: http://www.w3.org/2003/01/geo/wgs84_pos# - sf: http://www.opengis.net/ont/sf# - gn: http://www.geonames.org/ontology# - gn_entity: http://sws.geonames.org/ + geo: http://www.opengis.net/ont/geosparql# + schema: http://schema.org/ + prov: http://www.w3.org/ns/prov# + crm: http://www.cidoc-crm.org/cidoc-crm/ tooi: https://identifier.overheid.nl/tooi/def/ont/ +default_prefix: hc imports: - linkml:types - ../enums/GeometryTypeEnum - ../metadata - - ../slots/has_reference_system - - ../slots/has_altitude - ../slots/has_coordinates - - ../slots/has_geofeature + - ../slots/has_altitude - ../slots/geographic_extent - ../slots/geometric_extent + - ../slots/has_reference_system + - ../slots/has_geofeature - ../slots/identified_by - - ../slots/has_score - ../slots/has_resolution - ../slots/temporal_extent + - ../slots/has_score types: WktLiteral: uri: geosparql:wktLiteral base: str - description: 'Well-Known Text (WKT) representation of geometry. - See OGC Simple Features specification. - ' - examples: - - value: POINT(4.2894 52.0705) - - value: POLYGON((4.0 52.0, 4.5 52.0, 4.5 52.5, 4.0 52.5, 4.0 52.0)) + description: Well-Known Text representation of geometry. classes: GeoSpatialPlace: class_uri: geosparql:Feature - description: "Geospatial location with coordinates, geometry, and projections.\n\nCRITICAL DISTINCTION FROM CustodianPlace:\n\n| Aspect | CustodianPlace | GeoSpatialPlace |\n|--------|----------------|-----------------|\n| Nature | Nominal reference | Geospatial data |\n| Content | \"het herenhuis in de Schilderswijk\" | lat: 52.0705, lon: 4.2894 |\n| Purpose | Identify custodian by place name | Locate custodian precisely |\n| Ambiguity | May be vague (\"the mansion\") | Precise, measurable |\n| Source | Archival documents, oral history | GPS, cadastral surveys, geocoding |\n\n**TOOI Ontology Alignment**:\n\nThis class follows the TOOI pattern for geospatial data:\n- `tooi:BestuurlijkeRuimte` is a subclass of `geosparql:Feature` and `prov:Entity`\n- `tooi:BestuurlijkeRuimte-hasGeometry` \u2192 `geosparql:Geometry`\n- `tooi:RegistratieveRuimte` for administrative boundaries\n- `tooi:JuridischeRuimte` for legal jurisdiction boundaries\n\nLike TOOI, we separate:\n- **geosparql:Feature**\ - \ (this class): The real-world place with location data\n- **geosparql:Geometry**: The mathematical representation (WKT, GeoJSON)\n\n**Use Cases**:\n\n1. **Building-level precision**: Museum building footprint (Polygon)\n2. **City-level approximation**: Heritage institution centroid (Point)\n3. **Administrative boundaries**: Archive jurisdiction area (MultiPolygon)\n4. **Historical boundaries**: Pre-merger municipal territory (Polygon + temporal_extent)\n\n**Relationship to CustodianPlace**:\n\nCustodianPlace.has_geospatial_location \u2192 GeoSpatialPlace\n\nA nominal place reference (\"Rijksmuseum\") links to its geospatial location\n(lat: 52.3600, lon: 4.8852, geometry: building footprint polygon).\n\n**Relationship to AuxiliaryPlace**:\n\nAuxiliaryPlace.has_geospatial_location \u2192 GeoSpatialPlace\n\nSecondary/subordinate locations (branch offices, storage depots, reading rooms)\ncan also link to precise geospatial coordinates. This enables:\n- Mapping all custodian locations\ - \ (primary + auxiliary)\n- Spatial queries across an organization's entire footprint\n- Building footprints for off-site storage facilities\n- Historical boundary tracking for branch offices\n\n**Relationship to OrganizationalChangeEvent**:\n\nOrganizational changes may affect geographic location:\n- RELOCATION: New GeoSpatialPlace, old one gets temporal_extent.end_of_the_end\n- MERGER: Multiple locations \u2192 single primary + auxiliary locations\n- SPLIT: One location \u2192 multiple successor locations\n" + description: Measured geospatial place representation with coordinates, geometry, and reference system. exact_mappings: - - geosparql:Feature + - geosparql:Feature close_mappings: - - geo:SpatialThing - - schema:Place - - schema:GeoCoordinates + - schema:Place + - geo:SpatialThing related_mappings: - - prov:Entity - - tooi:BestuurlijkeRuimte - - crm:E53_Place + - prov:Entity + - tooi:BestuurlijkeRuimte + - crm:E53_Place slots: - - has_coordinates - - has_altitude - - geographic_extent - - identified_by - - has_reference_system - - has_geofeature - - geometric_extent - - identified_by - - has_resolution - - has_score - - temporal_extent + - identified_by + - has_coordinates + - has_altitude + - has_reference_system + - has_geofeature + - geographic_extent + - geometric_extent + - has_resolution + - temporal_extent + - has_score slot_usage: has_coordinates: range: Coordinates inlined: true required: true - examples: - - value: - latitude: 52.36 - longitude: 4.8852 has_reference_system: ifabsent: string(EPSG:4326) - identified_by: - description: 'Cadastral identifiers for this geospatial place. MIGRATION NOTE (2026-01-14): Replaces cadastral_id per slot_fixes.yaml. Use Identifier with identifier_scheme=''cadastral'' for parcel IDs. Netherlands: Kadaster perceelnummer format {gemeente}-{sectie}-{perceelnummer}' - examples: - - value: temporal_extent: range: TimeSpan inlined: true - required: false - examples: - - value: - begin_of_the_begin: '1920-01-01' - end_of_the_end: '2001-01-01' comments: - - Follows TOOI BestuurlijkeRuimte pattern using GeoSPARQL - - 'CRITICAL: NOT a nominal reference - this is measured/surveyed location data' - - Use CustodianPlace for nominal references, this class for coordinates - - lat/lon required; geometry_wkt optional for point locations - - Link from CustodianPlace via has_geospatial_location slot - - Link from AuxiliaryPlace via has_geospatial_location slot (subordinate sites) - - Link from OrganizationalChangeEvent via has_affected_territory slot - - temporal_extent tracks boundary changes over time (was valid_from_geo/valid_to_geo) - - OSM and GeoNames IDs enable external linking - see_also: - - http://www.opengis.net/ont/geosparql - - https://www.geonames.org/ - - https://www.openstreetmap.org/ - - https://identifier.overheid.nl/tooi/def/ont/ - examples: - - value: - geospatial_id: https://nde.nl/ontology/hc/geo/rijksmuseum-building - has_coordinates: - latitude: 52.36 - longitude: 4.8852 - altitude: 0.0 - geometric_extent: - - has_format: - has_value: POLYGON((4.8830 52.3590, 4.8870 52.3590, 4.8870 52.3610, 4.8830 52.3610, 4.8830 52.3590)) - has_type: - has_label: POLYGON - coordinate_reference_system: EPSG:4326 - osm_id: way/27083908 - spatial_resolution: BUILDING - has_geofeature: - - has_type: SpotFeature - has_code: - has_label: S.MUS - - value: - geospatial_id: https://nde.nl/ontology/hc/geo/amsterdam-centroid - has_coordinates: - latitude: 52.3676 - longitude: 4.9041 - geometric_extent: - - has_type: - has_label: POINT - coordinate_reference_system: EPSG:4326 - spatial_resolution: CITY - has_geofeature: - - has_type: PopulatedPlace - has_code: - has_label: P.PPLC - - value: - geospatial_id: https://nde.nl/ontology/hc/geo/noord-holland-archive-territory-pre-2001 - has_coordinates: - latitude: 52.5 - longitude: 4.8 - geometric_extent: - - has_format: - has_value: MULTIPOLYGON(((4.5 52.2, 5.2 52.2, 5.2 52.8, 4.5 52.8, 4.5 52.2))) - has_type: - has_label: MULTIPOLYGON - coordinate_reference_system: EPSG:4326 - spatial_resolution: REGION - has_geofeature: - - has_type: AdministrativeBoundary - has_code: - has_label: A.ADM1 - temporal_extent: - begin_of_the_begin: '1920-01-01' - end_of_the_end: '2001-01-01' + - Use this class for measurable geodata, not nominal place references. + - Link nominal place references through dedicated place classes. + - Temporal extent tracks boundary or footprint change over time. annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + specificity_score: 0.55 + specificity_rationale: Primary geospatial feature class for coordinates and geometry. + custodian_types: '["*"]' diff --git a/schemas/20251121/linkml/modules/classes/GeographicExtent.yaml b/schemas/20251121/linkml/modules/classes/GeographicExtent.yaml index 101c65b9f0..f7c568949b 100644 --- a/schemas/20251121/linkml/modules/classes/GeographicExtent.yaml +++ b/schemas/20251121/linkml/modules/classes/GeographicExtent.yaml @@ -4,11 +4,9 @@ title: Geographic Extent Class prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ - schema: http://schema.org/ dcterms: http://purl.org/dc/terms/ - + schema: http://schema.org/ default_prefix: hc - imports: - linkml:types - ../metadata @@ -17,18 +15,15 @@ imports: classes: GeographicExtent: class_uri: dcterms:Location - description: >- - A geographic area defining the scope or extent (e.g., eligible countries). - - **Ontology Alignment**: - - **Primary**: `dcterms:Location` - - **Close**: `schema:Place` - + description: Geographic area used to define spatial applicability or coverage. + exact_mappings: + - dcterms:Location + close_mappings: + - schema:Place slots: - - has_label - identified_by - + - has_label annotations: custodian_types: '["*"]' specificity_score: 0.3 - specificity_rationale: Geographic metadata. + specificity_rationale: Spatial scope descriptor for policies and eligibility. diff --git a/schemas/20251121/linkml/modules/classes/GeographicScope.yaml b/schemas/20251121/linkml/modules/classes/GeographicScope.yaml index 69840b2f24..9b275dc607 100644 --- a/schemas/20251121/linkml/modules/classes/GeographicScope.yaml +++ b/schemas/20251121/linkml/modules/classes/GeographicScope.yaml @@ -1,23 +1,25 @@ id: https://nde.nl/ontology/hc/class/GeographicScope name: GeographicScope -title: Geographic Scope -description: The geographic scope or coverage of an entity (e.g., local, regional, national). MIGRATED from geographic_scope slot per Rule 53. Follows skos:Concept. +title: Geographic Scope Class prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ skos: http://www.w3.org/2004/02/skos/core# +default_prefix: hc imports: - linkml:types - - ../slots/has_description - ../slots/has_label -default_prefix: hc + - ../slots/has_description classes: GeographicScope: class_uri: skos:Concept + description: Controlled concept describing scale of geographic coverage. + broad_mappings: + - skos:Concept slots: - - has_label - - has_description + - has_label + - has_description annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + specificity_score: 0.25 + specificity_rationale: Controlled scope vocabulary for local-to-global coverage. + custodian_types: '["*"]' diff --git a/schemas/20251121/linkml/modules/classes/Geometry.yaml b/schemas/20251121/linkml/modules/classes/Geometry.yaml index 96bc4e4343..e94ad6b04b 100644 --- a/schemas/20251121/linkml/modules/classes/Geometry.yaml +++ b/schemas/20251121/linkml/modules/classes/Geometry.yaml @@ -1,34 +1,37 @@ id: https://nde.nl/ontology/hc/class/Geometry name: Geometry -title: Geometry -description: A spatial geometry (point, polygon, etc.). MIGRATED from geometry_type/geometry_wkt slots. Follows GeoSPARQL Geometry. +title: Geometry Class prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ geosparql: http://www.opengis.net/ont/geosparql# + schema: http://schema.org/ +default_prefix: hc imports: - linkml:types - - ../slots/has_description - - ../slots/has_format - ../slots/has_label + - ../slots/has_description - ../slots/has_type -default_prefix: hc + - ../slots/has_format classes: Geometry: class_uri: geosparql:Geometry + description: Spatial geometry representation, such as point, line, or polygon. + exact_mappings: + - geosparql:Geometry + close_mappings: + - schema:GeoShape slots: - - has_label - - has_description - - has_type - - has_format + - has_label + - has_description + - has_type + - has_format slot_usage: - has_format: -# range: string # uriorcurie - required: true has_type: -# range: string # uriorcurie + required: true + has_format: required: true annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + specificity_score: 0.35 + specificity_rationale: Core geometric encoding object for geospatial data. + custodian_types: '["*"]' diff --git a/schemas/20251121/linkml/modules/classes/GeometryType.yaml b/schemas/20251121/linkml/modules/classes/GeometryType.yaml index 514312a096..a606e8cc6b 100644 --- a/schemas/20251121/linkml/modules/classes/GeometryType.yaml +++ b/schemas/20251121/linkml/modules/classes/GeometryType.yaml @@ -1,25 +1,26 @@ id: https://nde.nl/ontology/hc/class/GeometryType name: GeometryType -title: Geometry Type -description: Abstract base class for geometry types (e.g., Point, Polygon). MIGRATED from geometry_type slot per Rule 0b. +title: Geometry Type Class prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ skos: http://www.w3.org/2004/02/skos/core# - geosparql: http://www.opengis.net/ont/geosparql# +default_prefix: hc imports: - linkml:types - - ../slots/has_description - ../slots/has_label -default_prefix: hc + - ../slots/has_description classes: GeometryType: class_uri: skos:Concept abstract: true + description: Abstract controlled concept for geometry shape types. + broad_mappings: + - skos:Concept slots: - - has_label - - has_description + - has_label + - has_description annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + specificity_score: 0.25 + specificity_rationale: Geometry shape taxonomy base class. + custodian_types: '["*"]' diff --git a/schemas/20251121/linkml/modules/classes/GeometryTypes.yaml b/schemas/20251121/linkml/modules/classes/GeometryTypes.yaml index 015f9775a1..bffa6536d4 100644 --- a/schemas/20251121/linkml/modules/classes/GeometryTypes.yaml +++ b/schemas/20251121/linkml/modules/classes/GeometryTypes.yaml @@ -1,63 +1,49 @@ id: https://nde.nl/ontology/hc/class/GeometryTypes name: GeometryTypes -title: Geometry Type Subclasses -description: Concrete subclasses of GeometryType representing specific geometry types. - Based on GeoSPARQL geometry types. +title: Geometry Types Class Module prefixes: - geo: http://www.opengis.net/ont/geosparql# linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ - skos: http://www.w3.org/2004/02/skos/core# - geosparql: http://www.opengis.net/ont/geosparql# sf: http://www.opengis.net/ont/sf# + geo: http://www.opengis.net/ont/geosparql# +default_prefix: hc imports: - ./GeometryType - linkml:types -default_prefix: hc classes: Point: is_a: GeometryType class_uri: sf:Point - description: A single point geometry. - annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: '[''*'']' + description: Point geometry type. broad_mappings: - - geo:Geometry - - sf:Geometry + - geo:Geometry Polygon: is_a: GeometryType class_uri: sf:Polygon - description: A polygon geometry. + description: Polygon geometry type. broad_mappings: - - geo:Geometry - - sf:Geometry + - geo:Geometry MultiPolygon: is_a: GeometryType class_uri: sf:MultiPolygon - description: A collection of polygons. + description: Multi polygon geometry type. broad_mappings: - - geo:Geometry - - sf:Geometry + - geo:Geometry LineString: is_a: GeometryType class_uri: sf:LineString - description: A line string geometry. + description: Line string geometry type. broad_mappings: - - geo:Geometry - - sf:Geometry + - geo:Geometry MultiLineString: is_a: GeometryType class_uri: sf:MultiLineString - description: A collection of line strings. + description: Multi line string geometry type. broad_mappings: - - geo:Geometry - - sf:Geometry + - geo:Geometry MultiPoint: is_a: GeometryType class_uri: sf:MultiPoint - description: A collection of points. + description: Multi point geometry type. broad_mappings: - - geo:Geometry - - sf:Geometry + - geo:Geometry diff --git a/schemas/20251121/linkml/modules/classes/GeospatialIdentifier.yaml b/schemas/20251121/linkml/modules/classes/GeospatialIdentifier.yaml index ada4562ada..8f81ea57ff 100644 --- a/schemas/20251121/linkml/modules/classes/GeospatialIdentifier.yaml +++ b/schemas/20251121/linkml/modules/classes/GeospatialIdentifier.yaml @@ -1,20 +1,24 @@ id: https://nde.nl/ontology/hc/class/GeospatialIdentifier name: GeospatialIdentifier -title: Geospatial Identifier -description: A unique identifier for a geospatial feature (e.g., from GeoSPARQL). MIGRATED from geospatial_id slot per Rule 53. Follows geosparql:Feature. +title: Geospatial Identifier Class prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ + dcterms: http://purl.org/dc/terms/ geosparql: http://www.opengis.net/ont/geosparql# +default_prefix: hc imports: - linkml:types -default_prefix: hc classes: GeospatialIdentifier: is_a: Identifier - class_uri: geosparql:Feature - description: A persistent URI or identifier for a geospatial feature. + class_uri: hc:GeospatialIdentifier + description: Persistent identifier for a geospatial entity in an external or internal system. + close_mappings: + - dcterms:Identifier + related_mappings: + - geosparql:Feature annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + specificity_score: 0.3 + specificity_rationale: Identifier class for geospatial record linking. + custodian_types: '["*"]' diff --git a/schemas/20251121/linkml/modules/classes/GeospatialLocation.yaml b/schemas/20251121/linkml/modules/classes/GeospatialLocation.yaml index 995a7567e0..d57fdcd3d0 100644 --- a/schemas/20251121/linkml/modules/classes/GeospatialLocation.yaml +++ b/schemas/20251121/linkml/modules/classes/GeospatialLocation.yaml @@ -1,7 +1,6 @@ id: https://nde.nl/ontology/hc/class/GeospatialLocation name: GeospatialLocation -title: GeospatialLocation -description: A specific geospatial location. +title: Geospatial Location Class prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ @@ -13,10 +12,12 @@ imports: classes: GeospatialLocation: class_uri: schema:GeoCoordinates - description: Geospatial location. + description: Coordinate-based geospatial location. + exact_mappings: + - schema:GeoCoordinates slots: - - has_location + - has_location annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: "['*']" + specificity_score: 0.25 + specificity_rationale: Coordinate wrapper used in geospatial modeling. + custodian_types: '["*"]' diff --git a/schemas/20251121/linkml/modules/classes/GhcidBlock.yaml b/schemas/20251121/linkml/modules/classes/GhcidBlock.yaml index 6f3b91ea6b..c9a736ab3b 100644 --- a/schemas/20251121/linkml/modules/classes/GhcidBlock.yaml +++ b/schemas/20251121/linkml/modules/classes/GhcidBlock.yaml @@ -1,40 +1,30 @@ id: https://nde.nl/ontology/hc/classes/GhcidBlock name: GhcidBlock -title: GhcidBlock +title: Ghcid Block Class prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ - schema: http://schema.org/ - prov: http://www.w3.org/ns/prov# - xsd: http://www.w3.org/2001/XMLSchema# dcterms: http://purl.org/dc/terms/ - crm: http://www.cidoc-crm.org/cidoc-crm/ - skos: http://www.w3.org/2004/02/skos/core# - rdfs: http://www.w3.org/2000/01/rdf-schema# - org: http://www.w3.org/ns/org# + prov: http://www.w3.org/ns/prov# + schema: http://schema.org/ +default_prefix: hc imports: - linkml:types -# default_range: string + - ../slots/identified_by classes: GhcidBlock: - description: "GHCID (Global Heritage Custodian Identifier) generation metadata\ - \ and history. Contains current GHCID string, UUID variants (v5, v8), numeric\ - \ form, generation timestamp, and history of GHCID changes due to relocations,\ - \ mergers, or collision resolution.\nOntology mapping rationale: - class_uri\ - \ is dcterms:Identifier because GHCID is fundamentally\n an identifier assignment\ - \ with associated metadata\n- close_mappings includes prov:Entity as identifier\ - \ blocks are\n traceable provenance entities themselves\n- related_mappings\ - \ includes schema:PropertyValue (identifier as\n property) and prov:Generation\ - \ (identifier creation event)" class_uri: dcterms:Identifier + description: Identifier metadata block capturing assignment, variants, and lifecycle history for GHCID values. + exact_mappings: + - dcterms:Identifier close_mappings: - - prov:Entity + - prov:Entity related_mappings: - - schema:PropertyValue - - prov:Generation - annotations: - specificity_score: 0.1 - specificity_rationale: Generic utility class/slot created during migration - custodian_types: '[''*'']' + - schema:PropertyValue + - prov:Generation slots: - - identified_by + - identified_by + annotations: + specificity_score: 0.35 + specificity_rationale: Identifier lifecycle container for custody-level identifier governance. + custodian_types: '["*"]' diff --git a/schemas/20251121/linkml/modules/enums/DataTierEnum.yaml b/schemas/20251121/linkml/modules/enums/DataTierEnum.yaml index d9db3c4472..424ee6150f 100644 --- a/schemas/20251121/linkml/modules/enums/DataTierEnum.yaml +++ b/schemas/20251121/linkml/modules/enums/DataTierEnum.yaml @@ -21,8 +21,8 @@ enums: TIER_1_AUTHORITATIVE: description: Official registry data (NDE CSV, Nationaal Archief ISIL) TIER_2_VERIFIED: - description: Verified external sources (Wikidata, Google Maps, Genealogiewerkbalk) + description: Verified external sources (Wikidata, Google Maps, genealogy archive registries) TIER_3_CROWD_SOURCED: description: Community-contributed data (reviews, user edits) TIER_4_INFERRED: - description: Algorithmically extracted (website scrape, Exa search) + description: Algorithmically extracted (website scrape, external search)