Add comprehensive rules for LinkML schema management and ontology mapping

- Introduced Rule 42: No Ontology Prefixes in Slot Names to enforce clean naming conventions.
- Established Rule: No Rough Edits in Schema Files to ensure structural integrity during modifications.
- Implemented Rule: No Version Indicators in Names to maintain stable semantic naming.
- Created Rule: Ontology Detection vs Heuristics to emphasize the importance of verifying ontology definitions.
- Defined Rule 50: Ontology-to-LinkML Mapping Convention to standardize mapping practices.
- Added Rule: Polished Slot Storage Location to specify directory structure for polished slot files.
- Enforced Rule: Preserve Bespoke Slots Until Refactoring to prevent unintended migrations during slot updates.
- Instituted Rule 56: Semantic Consistency Over Simplicity to mandate execution of revisions in slot_fixes.yaml.
- Added new Genealogy Archives Registry Enrichment class with multilingual support and structured aliases.
This commit is contained in:
kempersc 2026-02-15 19:20:09 +01:00
parent ee5e8e5a7c
commit 554fe520ea
69 changed files with 6113 additions and 1095 deletions

View file

@ -0,0 +1,583 @@
# Rule 46: Ontology-Driven Cache Segmentation
🚨 **CRITICAL**: The semantic cache MUST use vocabulary derived from LinkML `*Type.yaml` and `*Types.yaml` schema files to extract entities for cache key generation. Hardcoded regex patterns are deprecated.
**Status**: Implemented (Evolved v2.0)
**Version**: 2.0 (Epistemological Evolution)
**Updated**: 2026-01-10
## Evolution Overview
Rule 46 v2.0 incorporates insights from Volodymyr Pavlyshyn's work on agentic memory systems:
1. **Epistemic Provenance** (Phase 1) - Track WHERE, WHEN, HOW data originated
2. **Topological Distance** (Phase 2) - Use ontology structure, not just embeddings
3. **Holarchic Cache** (Phase 3) - Entries as holons with up/down links
4. **Message Passing** (Phase 4, planned) - Smalltalk-style introspectable cache
5. **Clarity Trading** (Phase 5, planned) - Block ambiguous queries from cache
## Epistemic Provenance
Every cached response carries epistemological metadata:
```typescript
interface EpistemicProvenance {
dataSource: 'ISIL_REGISTRY' | 'WIKIDATA' | 'CUSTODIAN_YAML' | 'LLM_INFERENCE' | ...;
dataTier: 1 | 2 | 3 | 4; // TIER_1_AUTHORITATIVE → TIER_4_INFERRED
sourceTimestamp: string;
derivationChain: string[]; // ["SPARQL:Qdrant", "RAG:retrieve", "LLM:generate"]
revalidationPolicy: 'static' | 'daily' | 'weekly' | 'on_access';
}
```
**Benefit**: Users see "This answer is from TIER_1 ISIL registry data, captured 2025-01-08".
## Topological Distance
Beyond embedding similarity, cache matching considers **structural distance** in the type hierarchy:
```
HeritageCustodian (*)
┌──────────────────┼──────────────────┐
▼ ▼ ▼
MuseumType (M) ArchiveType (A) LibraryType (L)
│ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
▼ ▼ ▼ ▼ ▼ ▼
ArtMuseum History Municipal State Public Academic
```
**Combined Similarity Formula**:
```typescript
finalScore = 0.7 * embeddingSimilarity + 0.3 * (1 - topologicalDistance)
```
**Benefit**: "Art museum" won't match "natural history museum" even with 95% embedding similarity.
## Holarchic Cache Structure
Cache entries are **holons** - simultaneously complete AND parts of aggregates:
| Level | Example | Aggregates |
|-------|---------|------------|
| Micro | "Rijksmuseum details" | None |
| Meso | "Museums in Amsterdam" | List of micro holons |
| Macro | "Heritage in Noord-Holland" | All meso holons in region |
```typescript
interface CachedQuery {
// ... existing fields ...
holonLevel?: 'micro' | 'meso' | 'macro';
participatesIn?: string[]; // Higher-level cache keys
aggregates?: string[]; // Lower-level entries
}
```
## Problem Statement
The ArchiefAssistent semantic cache prevents geographic false positives using entity extraction:
```
Query: "Hoeveel musea in Amsterdam?"
Cached: "Hoeveel musea in Noord-Holland?"
Result: BLOCKED (location mismatch) ✅
```
However, the current implementation uses **hardcoded regex patterns**:
```typescript
// DEPRECATED: Hardcoded patterns in semantic-cache.ts
const INSTITUTION_PATTERNS: Record<InstitutionTypeCode, RegExp> = {
M: /\b(muse(um|a|ums?)|musea)/i,
A: /\b(archie[fv]en?|archives?|archief)/i,
// ... 19 patterns to maintain manually
};
```
**Problems with hardcoded patterns**:
1. **Maintenance burden** - Every new institution type requires code changes
2. **Missing subtypes** - "kunstmuseum" vs "museum" should cache separately
3. **No multilingual support** - Only Dutch/English, misses German/French labels
4. **Duplication** - Same vocabulary exists in LinkML schemas
5. **No record type awareness** - "burgerlijke stand" queries mixed with general archive queries
## Solution: Schema-Derived Vocabulary
The LinkML schema already contains rich vocabulary:
| Schema File | Content | Cache Utility |
|-------------|---------|---------------|
| `CustodianType.yaml` | 19 top-level types | Primary segmentation (M/A/L/G...) |
| `MuseumType.yaml` | 187+ museum subtypes | Subtype segmentation |
| `ArchiveOrganizationType.yaml` | 144+ archive subtypes | Subtype segmentation |
| `*RecordSetTypes.yaml` | Record type taxonomies | Finding aids specificity |
### Vocabulary Sources in Schema
1. **`type_label`** - Multilingual labels via `skos:prefLabel`
2. **`structured_aliases`** - Language-tagged alternative names
3. **`keywords`** - Search terms for entity recognition
4. **`wikidata_entity`** - Linked Data identifiers
## Architecture
### Overview: Two-Tier Embedding Hierarchy
The system uses a **hierarchical embedding approach** for fast semantic routing:
1. **Tier 1: Types File Embeddings** - Which category? (Museum vs Archive vs Library)
2. **Tier 2: Individual Type Embeddings** - Which specific type? (ArtMuseum vs NaturalHistoryMuseum)
```
┌─────────────────────────────────────────────────────────────────────────┐
│ BUILD TIME: Extract vocabulary + generate embeddings │
│ │
│ schemas/20251121/linkml/modules/classes/*Type.yaml │
│ schemas/20251121/linkml/modules/classes/*Types.yaml │
│ ↓ │
│ scripts/extract-types-vocab.ts │
│ ↓ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ types-vocab.json │ │
│ │ ├── tier1Embeddings: { MuseumType: [...], ArchiveType: [...] } │ │
│ │ ├── tier2Embeddings: { ArtMuseum: [...], MunicipalArchive: [...]}│ │
│ │ └── termLog: { "kunstmuseum": { type: "M", subtype: "ART_MUSEUM"}│ │
│ └───────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
▼ (loaded at runtime)
┌─────────────────────────────────────────────────────────────────────────┐
│ RUNTIME: Two-Tier Semantic Routing │
│ │
│ Query: "Hoeveel gemeentearchieven in Amsterdam?" │
│ ↓ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ TIER 1: Types File Selection │ │
│ │ Query embedding vs Tier1 embeddings (19 categories) │ │
│ │ Result: ArchiveOrganizationType (similarity: 0.89) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ TIER 2: Specific Type Selection │ │
│ │ Query embedding vs Tier2 embeddings (144 archive subtypes) │ │
│ │ Result: MunicipalArchive (similarity: 0.94) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ Structured cache key: "count:A.MUNICIPAL_ARCHIVE:amsterdam" │
└─────────────────────────────────────────────────────────────────────────┘
```
### Tier 1: Types File Embeddings
Each Types file (e.g., `MuseumType.yaml`, `ArchiveOrganizationType.yaml`) gets ONE embedding
representing the **accumulated vocabulary** of all types within that file.
**Embedding Text Construction**:
```
MuseumType: museum musea kunstmuseum art museum natural history museum
science museum open-air museum ecomuseum virtual museum
heritage farm national museum regional museum university museum
[... all keywords from all 187 subtypes ...]
```
**Purpose**: Fast first-pass filter to identify which GLAMORCUBESFIXPHDNT category the query relates to.
| Types File | Code | Accumulated Terms Count |
|------------|------|------------------------|
| MuseumType | M | ~500+ terms from 187 subtypes |
| ArchiveOrganizationType | A | ~400+ terms from 144 subtypes |
| LibraryType | L | ~200+ terms from subtypes |
| GalleryType | G | ~100+ terms from subtypes |
| ... | ... | ... |
### Tier 2: Individual Type Embeddings
Each **specific type** within a Types file gets its own embedding from its accumulated terms.
**Embedding Text Construction**:
```
MunicipalArchive: gemeentearchief stadsarchief city archive municipal archive
town archive local government records burgerlijke stand
bevolkingsregister council minutes building permits
[... all keywords + structured_aliases + labels ...]
```
**Purpose**: Precise subtype identification after Tier 1 narrows the category.
### Term Log Structure
A lookup table mapping every extracted term to its type/subtype:
```json
{
"termLog": {
"kunstmuseum": {
"typeCode": "M",
"typeName": "MuseumType",
"subtypeName": "ART_MUSEUM",
"wikidata": "Q207694",
"language": "nl"
},
"art museum": {
"typeCode": "M",
"typeName": "MuseumType",
"subtypeName": "ART_MUSEUM",
"wikidata": "Q207694",
"language": "en"
},
"gemeentearchief": {
"typeCode": "A",
"typeName": "ArchiveOrganizationType",
"subtypeName": "MUNICIPAL_ARCHIVE",
"wikidata": "Q8362876",
"language": "nl"
}
}
}
```
**Purpose**:
1. Fast O(1) keyword lookup (no embedding needed for exact matches)
2. Audit trail of which terms map to which types
3. Debugging which queries match which types
### Runtime Lookup Strategy
```typescript
async function extractEntitiesWithEmbeddings(query: string): Promise<ExtractedEntities> {
const vocab = await loadTypesVocabulary();
const normalized = query.toLowerCase();
// FAST PATH: Check termLog for exact keyword matches
for (const [term, mapping] of Object.entries(vocab.termLog)) {
if (normalized.includes(term)) {
return {
institutionType: mapping.typeCode,
institutionSubtype: mapping.subtypeName,
subtypeWikidata: mapping.wikidata,
// ... location and intent extraction
};
}
}
// SLOW PATH: Embedding-based semantic matching
const queryEmbedding = await generateEmbedding(query);
// Tier 1: Find best matching Types file
let bestType: string | null = null;
let bestTypeSimilarity = 0;
for (const [typeName, typeEmbedding] of Object.entries(vocab.tier1Embeddings)) {
const similarity = cosineSimilarity(queryEmbedding, typeEmbedding);
if (similarity > bestTypeSimilarity && similarity > 0.7) {
bestTypeSimilarity = similarity;
bestType = typeName;
}
}
if (!bestType) return {}; // No type matched
// Tier 2: Find best matching subtype within the Types file
const typeCode = vocab.institutionTypes[bestType].code;
let bestSubtype: string | null = null;
let bestSubtypeSimilarity = 0;
for (const [subtypeName, subtypeEmbedding] of Object.entries(vocab.tier2Embeddings[typeCode] || {})) {
const similarity = cosineSimilarity(queryEmbedding, subtypeEmbedding);
if (similarity > bestSubtypeSimilarity && similarity > 0.75) {
bestSubtypeSimilarity = similarity;
bestSubtype = subtypeName;
}
}
return {
institutionType: typeCode,
institutionSubtype: bestSubtype,
// ... location and intent extraction
};
}
```
### Embedding Model Choice
For build-time embedding generation, use the same model as the semantic cache:
| Option | Model | Dimensions | Quality |
|--------|-------|------------|---------|
| **Primary** | `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` | 384 | Good multilingual |
| Fallback | `all-MiniLM-L6-v2` | 384 | English-focused |
| High Quality | `multilingual-e5-large` | 1024 | Best multilingual |
**Build-time generation**: Embeddings are generated ONCE at build time and stored in JSON.
This avoids runtime embedding API calls for type classification.
## TypesVocabulary JSON Structure
Generated at build time with **pre-computed embeddings**:
```json
{
"version": "2026-01-10T12:00:00Z",
"schemaVersion": "20251121",
"embeddingModel": "paraphrase-multilingual-MiniLM-L12-v2",
"embeddingDimensions": 384,
"tier1Embeddings": {
"MuseumType": [0.023, -0.045, 0.087, ...],
"ArchiveOrganizationType": [0.012, 0.056, -0.034, ...],
"LibraryType": [-0.034, 0.089, 0.012, ...],
"GalleryType": [0.045, -0.023, 0.067, ...]
},
"tier2Embeddings": {
"M": {
"ART_MUSEUM": [0.034, -0.056, 0.078, ...],
"NATURAL_HISTORY_MUSEUM": [0.045, 0.023, -0.089, ...],
"SCIENCE_MUSEUM": [0.067, -0.012, 0.045, ...]
},
"A": {
"MUNICIPAL_ARCHIVE": [0.089, 0.034, -0.056, ...],
"NATIONAL_ARCHIVE": [0.012, -0.078, 0.045, ...],
"CHURCH_ARCHIVE": [-0.023, 0.067, 0.034, ...]
}
},
"termLog": {
"kunstmuseum": {"typeCode": "M", "subtypeName": "ART_MUSEUM", "wikidata": "Q207694", "lang": "nl"},
"art museum": {"typeCode": "M", "subtypeName": "ART_MUSEUM", "wikidata": "Q207694", "lang": "en"},
"gemeentearchief": {"typeCode": "A", "subtypeName": "MUNICIPAL_ARCHIVE", "wikidata": "Q8362876", "lang": "nl"},
"stadsarchief": {"typeCode": "A", "subtypeName": "MUNICIPAL_ARCHIVE", "wikidata": "Q8362876", "lang": "nl"},
"city archive": {"typeCode": "A", "subtypeName": "MUNICIPAL_ARCHIVE", "wikidata": "Q8362876", "lang": "en"},
"burgerlijke stand": {"typeCode": "A", "recordSetType": "CIVIL_REGISTRY", "lang": "nl"},
"geboorteakte": {"typeCode": "A", "recordSetType": "CIVIL_REGISTRY", "lang": "nl"}
},
"institutionTypes": {
"M": {
"code": "M",
"className": "MuseumType",
"baseWikidata": "Q33506",
"accumulatedTerms": "museum musea kunstmuseum art museum natural history museum science museum open-air museum ecomuseum virtual museum heritage farm national museum regional museum university museum...",
"keywords": {
"nl": ["museum", "musea"],
"en": ["museum", "museums"],
"de": ["Museum", "Museen"]
},
"subtypes": {
"ART_MUSEUM": {
"className": "ArtMuseum",
"wikidata": "Q207694",
"accumulatedTerms": "kunstmuseum art museum kunstmusea art museums fine art museum visual arts museum painting gallery sculpture museum",
"keywords": {
"nl": ["kunstmuseum", "kunstmusea"],
"en": ["art museum", "art museums"]
}
},
"NATURAL_HISTORY_MUSEUM": {
"className": "NaturalHistoryMuseum",
"wikidata": "Q559049",
"accumulatedTerms": "natuurhistorisch museum natuurmuseum natural history museum science museum fossils taxidermy specimens geology biology",
"keywords": {
"nl": ["natuurhistorisch museum", "natuurmuseum"],
"en": ["natural history museum"]
}
}
}
},
"A": {
"code": "A",
"className": "ArchiveOrganizationType",
"baseWikidata": "Q166118",
"accumulatedTerms": "archief archieven archive archives gemeentearchief stadsarchief nationaal archief rijksarchief church archive company archive film archive...",
"keywords": {
"nl": ["archief", "archieven"],
"en": ["archive", "archives"]
},
"subtypes": {
"MUNICIPAL_ARCHIVE": {
"className": "MunicipalArchive",
"wikidata": "Q8362876",
"accumulatedTerms": "gemeentearchief stadsarchief municipal archive city archive town archive local government records civil registry population register building permits council minutes",
"keywords": {
"nl": ["gemeentearchief", "stadsarchief", "gemeentelijke archiefdienst"],
"en": ["municipal archive", "city archive", "town archive"]
}
},
"NATIONAL_ARCHIVE": {
"className": "NationalArchive",
"wikidata": "Q1188452",
"accumulatedTerms": "nationaal archief rijksarchief national archive state archive government records national records federal archive",
"keywords": {
"nl": ["nationaal archief", "rijksarchief"],
"en": ["national archive", "state archive"]
}
}
}
}
},
"recordSetTypes": {
"CIVIL_REGISTRY": {
"className": "CivilRegistrySeries",
"accumulatedTerms": "burgerlijke stand geboorteakte huwelijksakte overlijdensakte bevolkingsregister civil registry birth records marriage records death records population register vital records genealogy",
"keywords": {
"nl": ["burgerlijke stand", "geboorteakte", "huwelijksakte", "overlijdensakte", "bevolkingsregister"],
"en": ["civil registry", "birth records", "marriage records", "death records"]
}
},
"COUNCIL_GOVERNANCE": {
"className": "CouncilGovernanceFonds",
"accumulatedTerms": "gemeenteraad raadsnotulen raadsbesluit verordening council minutes ordinances resolutions bylaws municipal council town council city council",
"keywords": {
"nl": ["gemeenteraad", "raadsnotulen", "raadsbesluit", "verordening"],
"en": ["council minutes", "ordinances", "resolutions"]
}
}
}
}
```
### Key Additions for Embedding Support
| Field | Purpose |
|-------|---------|
| `tier1Embeddings` | Pre-computed embeddings for each Types file (19 categories) |
| `tier2Embeddings` | Pre-computed embeddings for each subtype (500+ types) |
| `termLog` | Fast O(1) lookup table for exact keyword matches |
| `accumulatedTerms` | Raw text used to generate embeddings (for debugging/regeneration) |
| `embeddingModel` | Model used to generate embeddings (for reproducibility) |
## Enhanced ExtractedEntities Interface
```typescript
export interface ExtractedEntities {
// Existing fields
institutionType?: InstitutionTypeCode | null;
location?: string | null;
locationType?: 'city' | 'province' | null;
intent?: 'count' | 'list' | 'info' | null;
// NEW: Ontology-derived fields
institutionSubtype?: string | null; // e.g., 'MUNICIPAL_ARCHIVE', 'ART_MUSEUM'
recordSetType?: string | null; // e.g., 'CIVIL_REGISTRY', 'COUNCIL_GOVERNANCE'
subtypeWikidata?: string | null; // e.g., 'Q8362876' for LOD integration
}
```
## Enhanced Cache Key Format
```
{intent}:{institutionType}[.{subtype}][:{recordSetType}]:{location}
Examples:
- "count:m:amsterdam" # Basic museum count
- "count:m.art_museum:amsterdam" # Art museum count (subtype)
- "list:a.municipal_archive:nh" # Municipal archives in Noord-Holland
- "query:a:civil_registry:utrecht" # Civil registry in Utrecht
- "info:a.national_archive::nl" # National archive info (no location filter)
```
## Implementation Files
| File | Purpose |
|------|---------|
| `scripts/extract-types-vocab.ts` | Build-time vocabulary extraction from LinkML |
| `apps/archief-assistent/public/types-vocab.json` | Generated vocabulary file |
| `apps/archief-assistent/src/lib/types-vocabulary.ts` | Runtime vocabulary loader |
| `apps/archief-assistent/src/lib/semantic-cache.ts` | Updated entity extraction |
## Build Integration
Add to `apps/archief-assistent/package.json`:
```json
{
"scripts": {
"prebuild": "tsx ../../scripts/extract-types-vocab.ts",
"build": "vite build"
}
}
```
## Keyword Extraction Priority
When extracting keywords from schema files:
1. **`keywords`** array (highest priority) - Explicit search terms
2. **`structured_aliases.literal_form`** - Multilingual alternative names
3. **`type_label`** - Preferred labels per language
4. **Class name conversion** - `MunicipalArchive` → "municipal archive"
## Cache Segmentation Rules
### Rule 1: Subtype Specificity
Queries with **specific subtypes** should NOT match **generic type** cache entries:
```
Query: "kunstmusea in Amsterdam" → key: "count:m.art_museum:amsterdam"
Cached: "musea in Amsterdam" → key: "count:m:amsterdam"
Result: MISS (subtype mismatch) ✅
```
### Rule 2: Record Set Type Isolation
Queries about **specific record types** should cache separately:
```
Query: "burgerlijke stand Utrecht" → key: "query:a:civil_registry:utrecht"
Cached: "archieven in Utrecht" → key: "list:a:utrecht"
Result: MISS (record set type mismatch) ✅
```
### Rule 3: Subtype-to-Type Fallback
Generic queries CAN match subtype cache entries (broader is acceptable):
```
Query: "musea in Amsterdam" → key: "count:m:amsterdam"
Cached: "kunstmusea in Amsterdam" → key: "count:m.art_museum:amsterdam"
Result: MISS (don't return subset for superset query)
```
## Migration Notes
1. **Backwards Compatible**: Existing cache entries without `institutionSubtype` continue to work
2. **Gradual Rollout**: New cache entries get subtype, old entries remain valid
3. **Cache Clear**: Consider clearing cache after deployment to ensure consistency
## Validation
Run E2E tests to verify:
```bash
cd apps/archief-assistent
npm run test:e2e
```
Key test cases:
- Geographic isolation (Amsterdam ≠ Rotterdam ≠ Noord-Holland)
- Subtype isolation (kunstmuseum ≠ museum)
- Record set isolation (burgerlijke stand ≠ archive)
- Intent isolation (count ≠ list ≠ info)
## References
- **Rule 41**: Types classes define SPARQL template variables
- **Rule 0b**: Type/Types file naming convention
- **CustodianType.yaml**: Base taxonomy definition
- **AGENTS.md**: GLAMORCUBESFIXPHDNT taxonomy documentation
---
**Created**: 2026-01-10
**Author**: OpenCode Agent
**Status**: Implemented (v2.0)
## References
- Pavlyshyn, V. "Context Graphs and Data Traces: Building Epistemology Layers for Agentic Memory"
- Pavlyshyn, V. "The Shape of Knowledge: Topology Theory for Knowledge Graphs"
- Pavlyshyn, V. "Beyond Hierarchy: Why Agentic AI Systems Need Holarchies"
- Pavlyshyn, V. "Smalltalk: The Language That Changed Everything"
- Pavlyshyn, V. "Clarity Traders: Beyond Vibe Coding"

View file

@ -0,0 +1,65 @@
# Rule: Engineering Parsimony and Domain Modeling
## Critical Convention
Our ontology follows an engineering-oriented approach: practical domain utility and
stable interoperability take priority over minimal, tool-specific class catalogs.
## Rule
1. Model domain concepts, not implementation tools.
- Reject classes like `ExaSearchMetadata`, `OpenAIFetchResult`, `ElasticsearchHit`.
2. Prefer generic, reusable activity/entity classes for operational provenance.
- Use classes such as `ExternalSearchMetadata`, `RetrievalActivity`, `SearchResult`.
3. Capture tool/vendor details in slot values, not class names.
- Record with generic predicates like `has_tool`, `has_method`, `has_agent`, `has_note`.
4. Digital platforms acting as custodians are valid domain classes.
- Platform-as-custodian classes (for example YouTube-related custodian classes) are allowed.
- Data processing/search tools are not ontology class candidates.
5. Avoid ontology growth driven by transient engineering stack choices.
- New class proposals must be justified by cross-tool, domain-stable semantics.
## Rationale
- Tool names are volatile implementation details and age quickly.
- Domain-level abstractions maximize reuse, query consistency, and mapping stability.
- This aligns with an engineering ontology practice where strict theoretical
parsimony in candidate theories is not the only optimization criterion; practical
semantic interoperability and maintainability are primary.
## Examples
### Wrong
```yaml
classes:
ExaSearchMetadata:
class_uri: prov:Activity
```
### Correct
```yaml
classes:
ExternalSearchMetadata:
class_uri: prov:Activity
slots:
- has_tool
- has_method
- has_agent
```
## References
1. Liefke, K. (2024). *Natural Language Ontology and Semantic Theory*.
Cambridge Elements in Semantics. DOI: `10.1017/9781009307789`.
URL: https://www.cambridge.org/core/elements/abs/natural-language-ontology-and-semantic-theory/E8DDE548BB8A98137721984E26FAD764
2. Liefke, K. (2025). *Reduction and Unification in Natural Language Ontology*.
Cambridge Elements in Semantics. DOI: `10.1017/9781009559683`.
URL: https://www.cambridge.org/core/elements/abs/reduction-and-unification-in-natural-language-ontology/40F58ABA0D9C08958B5926F0CBDAD3CA

View file

@ -18,7 +18,7 @@
## 🚫 AUTOMATED ENRICHMENT IS PROHIBITED 🚫
**DO NOT USE** automated scripts to enrich person profiles with web search data. The `enrich_person_comprehensive.py` script has been deprecated.
**DO NOT USE** automated scripts to enrich person profiles with web search data.
**Why automated enrichment failed**:
- Web searches return data about DIFFERENT people with similar names
@ -184,95 +184,12 @@ Domains: geni.com, ancestry.*, familysearch.org, findagrave.com, myheritage.*
→ Exception: If source explicitly links to living person with verifiable connection
```
## Implementation in Enrichment Scripts
```python
def validate_entity_match(profile: dict, search_result: dict) -> tuple[bool, str]:
"""
Validate that a search result refers to the same person as the profile.
REQUIRES: At least 3 of 5 identity attributes must match.
Name match alone is INSUFFICIENT and automatically rejected.
Returns (is_valid, reason)
"""
profile_employer = profile.get('affiliations', [{}])[0].get('custodian_name', '').lower()
profile_location = profile.get('profile_data', {}).get('location', '').lower()
profile_role = profile.get('profile_data', {}).get('headline', '').lower()
source_text = search_result.get('answer', '').lower()
source_url = search_result.get('source_url', '').lower()
# AUTOMATIC REJECTION: Genealogy sources
genealogy_domains = ['geni.com', 'ancestry.', 'familysearch.', 'findagrave.', 'myheritage.']
if any(domain in source_url for domain in genealogy_domains):
return False, "genealogy_source_rejected"
# AUTOMATIC REJECTION: Profession conflicts
heritage_roles = ['curator', 'archivist', 'librarian', 'conservator', 'registrar', 'collection', 'heritage']
entertainment_roles = ['actress', 'actor', 'singer', 'footballer', 'politician', 'model', 'athlete']
profile_is_heritage = any(role in profile_role for role in heritage_roles)
source_is_entertainment = any(role in source_text for role in entertainment_roles)
if profile_is_heritage and source_is_entertainment:
return False, "conflicting_profession"
# AUTOMATIC REJECTION: Location conflicts
if profile_location:
location_conflicts = [
('venezuela', 'uk'), ('mexico', 'netherlands'), ('brazil', 'france'),
('caracas', 'london'), ('mexico city', 'amsterdam')
]
for source_loc, profile_loc in location_conflicts:
if source_loc in source_text and profile_loc in profile_location:
return False, "conflicting_location"
# Count positive identity attribute matches (need 3 of 5)
matches = 0
match_details = []
# 1. Employer match
if profile_employer and profile_employer in source_text:
matches += 1
match_details.append(f"employer:{profile_employer}")
# 2. Location match
if profile_location and profile_location in source_text:
matches += 1
match_details.append(f"location:{profile_location}")
# 3. Role/profession match
if profile_role:
role_words = [w for w in profile_role.split() if len(w) > 4]
if any(word in source_text for word in role_words):
matches += 1
match_details.append(f"role_match")
# 4. Education/institution match (if available)
profile_education = profile.get('profile_data', {}).get('education', [])
if profile_education:
edu_names = [e.get('school', '').lower() for e in profile_education if e.get('school')]
if any(edu in source_text for edu in edu_names):
matches += 1
match_details.append(f"education_match")
# 5. Time period match (career dates)
# (implementation depends on available data)
# REQUIRE 3 OF 5 MATCHES
if matches < 3:
return False, f"insufficient_identity_verification (only {matches}/5 attributes matched)"
return True, f"verified ({matches}/5 matches: {', '.join(match_details)})"
```
## Claim Rejection Patterns
The following patterns should trigger automatic claim rejection:
The following inconsisten patterns should trigger automatic claim rejection:
```python
# Genealogy sources - ALWAYS REJECT
# Genealogy sources conflict - ALWAYS REJECT
GENEALOGY_DOMAINS = [
'geni.com', 'ancestry.com', 'ancestry.co.uk', 'familysearch.org',
'findagrave.com', 'myheritage.com', 'wikitree.com', 'geneanet.org'
@ -293,7 +210,7 @@ LOCATION_PAIRS = [
('caracas', 'london'), ('caracas', 'amsterdam'),
]
# Age impossibility - if birth year makes current career implausible, REJECT
# Age impossibility - if birth year makes current career implausible, REJECT. For instance, for a Junior role:
MIN_PLAUSIBLE_BIRTH_YEAR = 1945 # Would be 80 in 2025 - still plausible but verify
MAX_PLAUSIBLE_BIRTH_YEAR = 2002 # Would be 23 in 2025 - plausible for junior roles
```

View file

@ -0,0 +1,248 @@
# Rule 47: Disambiguation Entity Profiles - Prevent Repeated Entity Resolution Errors
## Status: CRITICAL
## Summary
When entity resolution determines that a web source describes a **different person** with a similar name, **create a PPID profile for that person** in `data/person/`. The PPID system is universal - ANY person who ever lived can have a profile, regardless of heritage relevance.
---
## The Universal PPID Principle
**In principle, all persons on Earth should be assigned PPIDs** - whether or not they are active in the heritage field. This includes:
- Heritage workers (curators, archivists, librarians, etc.)
- Non-heritage professionals (actors, doctors, athletes, etc.)
- Historical persons (deceased individuals from any era)
- Public figures and private individuals
The `heritage_relevance` field indicates whether someone works in the heritage sector, but does NOT determine whether they can have a profile. **Anyone can have a PPID.**
---
## The Problem
During entity resolution, we often discover that web search results describe a **different person** with a similar name:
| Heritage Profile | Namesake Discovered | Why Different |
|------------------|---------------------|---------------|
| Carmen Juliá (UK curator) | Carmen Julia Álvarez (Venezuelan actress) | Different profession, location, timeline |
| Jan de Vries (Rijksmuseum curator) | Jan de Vries (footballer) | Different profession |
| Robert Ritter (heritage worker) | Robert Ritter (Nazi doctor, 1901-1951) | Different era, profession |
Without creating a profile for the namesake, future enrichment attempts may:
1. Re-discover the same namesake
2. Waste time re-investigating
3. Risk attributing false claims again
---
## The Solution: Create PPID Profiles for Namesakes
When entity resolution proves two entities are different, **create a regular PPID profile for the namesake**:
1. Use standard PPID naming convention (no special prefix)
2. Set `heritage_relevance.is_heritage_relevant: false`
3. Document the disambiguation in BOTH profiles
---
## Example: Venezuelan Actress Profile
```json
{
"ppid": "ID_VE-XX-CCS_1952_VE-XX-CCS_XXXX_CARMEN-JULIA-ALVAREZ",
"profile_data": {
"full_name": "Carmen Julia Álvarez",
"profession": "actress",
"nationality": "Venezuelan",
"birth_year": 1952,
"birth_location": "Caracas, Venezuela",
"active_period": "1970s-2000s"
},
"heritage_relevance": {
"is_heritage_relevant": false,
"relevance_score": 0.0,
"reason": "Entertainment industry professional - actress in film and television"
},
"disambiguation_notes": {
"commonly_confused_with": [
{
"ppid": "ID_UK-XX-XXX_XXXX_UK-XX-XXX_XXXX_CARMEN-JULIA",
"name": "Carmen Juliá",
"profession": "curator",
"employer": "New Contemporaries",
"location": "UK",
"why_different": "Different profession (actress vs curator), different location (Venezuela vs UK), overlapping active periods in incompatible roles"
}
],
"disambiguation_note": "This is the Venezuelan actress, NOT the UK-based art curator."
},
"web_claims": [
{
"claim_type": "birth_year",
"claim_value": 1952,
"provenance": {
"source_url": "https://en.wikipedia.org/wiki/Carmen_Julia_Álvarez",
"retrieved_on": "2026-01-11T14:30:00Z",
"retrieval_agent": "manual-human-curator"
}
},
{
"claim_type": "profession",
"claim_value": "actress",
"provenance": {
"source_url": "https://en.wikipedia.org/wiki/Carmen_Julia_Álvarez",
"retrieved_on": "2026-01-11T14:30:00Z",
"retrieval_agent": "manual-human-curator"
}
}
],
"extraction_metadata": {
"created_at": "2026-01-11T15:00:00Z",
"created_by": "manual-human-curator",
"creation_reason": "Created during entity resolution to distinguish from heritage worker Carmen Juliá"
}
}
```
---
## Update the Heritage Profile Too
The heritage profile should also reference the disambiguation:
```json
{
"ppid": "ID_UK-XX-XXX_XXXX_UK-XX-XXX_XXXX_CARMEN-JULIA",
"profile_data": {
"full_name": "Carmen Juliá",
"headline": "Curator at New Contemporaries"
},
"heritage_relevance": {
"is_heritage_relevant": true,
"relevance_score": 0.85
},
"disambiguation_notes": {
"known_namesakes": [
{
"ppid": "ID_VE-XX-CCS_1952_VE-XX-CCS_XXXX_CARMEN-JULIA-ALVAREZ",
"name": "Carmen Julia Álvarez",
"profession": "actress",
"location": "Venezuela",
"why_not_same_person": "Different profession, location, timeline"
}
],
"disambiguation_warning": "Web searches for 'Carmen Julia' return data about Venezuelan actress Carmen Julia Álvarez (born 1952). This is a DIFFERENT person."
}
}
```
---
## When to Create Namesake Profiles
Create a PPID profile for a namesake when:
1. **Entity resolution proves they are a different person**
2. **They are notable enough** to appear in search results repeatedly (Wikipedia, IMDB, news)
3. **The confusion risk is high** (similar name, some overlapping attributes)
**Do NOT create profiles for**:
- Random social media accounts with no notable presence
- Obvious mismatches unlikely to recur in searches
---
## Benefits
1. **Universal person database**: Any person can have a PPID
2. **Prevents repeated mistakes**: Future enrichment can check for known namesakes
3. **Bidirectional linking**: Both profiles reference each other
4. **Consistent data model**: No special file naming or profile types needed
5. **Audit trail**: Documents why profiles were created
---
## Workflow
### Step 1: During Entity Resolution
When you reject a claim due to identity mismatch with a notable namesake:
```
1. Document WHY the source describes a different person
2. Check if the namesake is notable (Wikipedia, IMDB, frequent search results)
3. If notable → Create PPID profile for the namesake
4. Link both profiles via disambiguation_notes
```
### Step 2: Create Namesake Profile
Use standard PPID naming:
```
ID_{birth-location}_{birth-decade}_{current-location}_{death-decade}_{NAME}.json
```
Example: `ID_VE-XX-CCS_1952_VE-XX-CCS_XXXX_CARMEN-JULIA-ALVAREZ.json`
### Step 3: Update Both Profiles
- Namesake profile: Add `commonly_confused_with` pointing to heritage profile
- Heritage profile: Add `known_namesakes` pointing to namesake profile
---
## Historical Persons
Historical persons (deceased) can also have PPID profiles:
```json
{
"ppid": "ID_DE-XX-XXX_1901_DE-XX-XXX_1951_ROBERT-RITTER",
"profile_data": {
"full_name": "Robert Ritter",
"profession": "physician",
"birth_year": 1901,
"death_year": 1951,
"nationality": "German",
"historical_note": "Nazi-era physician involved in racial hygiene programs"
},
"heritage_relevance": {
"is_heritage_relevant": false,
"relevance_score": 0.0
},
"disambiguation_notes": {
"commonly_confused_with": [
{
"ppid": "ID_XX-XX-XXX_XXXX_XX-XX-XXX_XXXX_ROBERT-RITTER",
"name": "Robert Ritter",
"profession": "heritage worker",
"why_different": "Different era - historical figure (1901-1951) vs living heritage professional"
}
]
}
}
```
---
## Related Rules
- **Rule 46**: Entity Resolution - Names Are NEVER Sufficient
- **Rule 21**: Data Fabrication is Strictly Prohibited
- **Rule 26**: Person Data Provenance - Web Claims for Staff Information
---
## Summary
**The PPID system is universal.** When you discover during entity resolution that a web source describes a different person:
1. **Create a regular PPID profile** for the namesake (actress, historical figure, etc.)
2. **Set `heritage_relevance.is_heritage_relevant: false`** (unless they happen to also work in heritage)
3. **Link both profiles** via `disambiguation_notes`
4. **Use standard PPID naming** - no special prefixes needed
This builds a comprehensive person database while preventing entity resolution errors.

View file

@ -0,0 +1,307 @@
# Rule 46: Entity Resolution - Names Are NEVER Sufficient
## Status: CRITICAL
## 🚨 DATA QUALITY IS OF UTMOST IMPORTANCE 🚨
**Wrong data is worse than no data.** Attributing a birth year, spouse, or social media profile to the wrong person is a **critical data quality failure** that undermines the entire dataset's trustworthiness.
**ALL enrichments MUST be done MANUALLY and double-checked.** Automated web search enrichment has been DISABLED due to catastrophic entity resolution failures (540+ false claims removed in Jan 2026).
**The cost of false data**:
- Corrupts downstream analysis and reporting
- Creates legal/privacy risks (attributing data to wrong person)
- Destroys user trust in the dataset
- Requires expensive manual cleanup
---
## 🚫 AUTOMATED ENRICHMENT IS PROHIBITED 🚫
**DO NOT USE** automated scripts to enrich person profiles with web search data.
**Why automated enrichment failed**:
- Web searches return data about DIFFERENT people with similar names
- Regex pattern matching cannot distinguish between namesakes
- Wikipedia, IMDB, ResearchGate, Instagram all returned data from wrong people
- Example: "Carmen Juliá" search returned Venezuelan actress, Mexican hydrogeologist, Spanish medievalist - NONE were the UK art curator
**ONLY ALLOWED enrichment methods**:
1. **Manual research** - Human curator verifies source refers to the correct person
2. **Institutional sources** - Data from the person's employer website (verified)
3. **LinkedIn profile data** - Already verified via direct profile access
4. **ORCID/Wikidata** - If the person has a verified identifier
---
## The Core Principle
🚨 **SIMILAR OR IDENTICAL NAMES ARE NEVER SUFFICIENT FOR ENTITY RESOLUTION.**
A web search result mentioning "Carmen Juliá born 1952" is **NOT** evidence that the Carmen Juliá in our person profile was born in 1952. Names are not unique identifiers - there are thousands of people with the same name worldwide.
**Entity resolution requires verification of MULTIPLE independent identity attributes:**
| Attribute | Purpose | Example |
|-----------|---------|---------|
| **Age/Birth Year** | Temporal consistency | Both sources describe someone in their 40s |
| **Career Path** | Professional identity | Both are art curators, not one curator and one actress |
| **Location** | Geographic consistency | Both are based in UK, not one UK and one Venezuela |
| **Employer** | Institutional affiliation | Both work at New Contemporaries |
| **Education** | Academic background | Same university or field |
**Minimum Requirement**: At least **3 of 5** attributes must match before attributing ANY claim from a web source. Name match alone = **AUTOMATIC REJECTION**.
## Problem Statement
When enriching person profiles via web search (Linkup, Exa, etc.), search results often return data about **different people with similar or identical names**. Without proper entity resolution, the enrichment process can attribute false claims to the wrong person.
**Example Failure** (Carmen Juliá - UK Art Curator):
- Source profile: Carmen Juliá, Curator at New Contemporaries (UK)
- Birth year extracted: 1952 from Carmen Julia **Álvarez** (Venezuelan actress)
- Spouse extracted: "actors Eduardo Serrano" from the Venezuelan actress
- ResearchGate: Carmen Julia **Navarro** (Mexican hydrogeologist)
- Academia.edu: Carmen Julia **Gutiérrez** (Spanish medieval studies)
All data is from **different people** - none is the actual Carmen Juliá who is a UK-based art curator.
**Why This Happened**: The enrichment script used regex pattern matching to extract "born 1952" without verifying that the Wikipedia article described the SAME person.
## The Rule
### DO NOT use name matching as the basis for entity resolution. EVER.
For person enrichment via web search:
**FORBIDDEN** (Name-based extraction):
- ❌ Extracting birth years from any search result mentioning "Carmen Julia born..."
- ❌ Attributing social media profiles just because the name appears
- ❌ Claiming relationships (spouse, parent, child) from web text pattern matching
- ❌ Assigning academic profiles (ResearchGate, Academia.edu, Google Scholar) based on name matching alone
- ❌ Using Wikipedia articles without verifying ALL identity attributes
- ❌ Trusting genealogy sites (Geni, Ancestry, MyHeritage) which describe historical namesakes
- ❌ Using IMDB for birth years (actors with same names)
**REQUIRED** (Multi-Attribute Entity Resolution):
1. **Verify identity via MULTIPLE attributes** - name alone is INSUFFICIENT
2. **Cross-reference with known facts** (employer, location, job title from LinkedIn)
3. **Detect conflicting signals** - actress vs curator, Venezuela vs UK, 1950s birth vs active 2020s career
4. **Reject ambiguous matches** - if source doesn't clearly identify the same person, reject the claim
5. **Document rejection rationale** - log why claim was rejected for audit trail
## Entity Resolution Verification Checklist
Before attributing a web claim to a person profile, verify MULTIPLE identity attributes:
| # | Attribute | What to Check | Example Match | Example Conflict |
|---|-----------|---------------|---------------|------------------|
| 1 | **Career/Profession** | Same field/industry | Both are curators | Source says "actress", profile is curator |
| 2 | **Employer** | Same institution | Both at Rijksmuseum | Source says "film studio", profile is museum |
| 3 | **Location** | Same city/country | Both UK-based | Source says Venezuela, profile is UK |
| 4 | **Age Range** | Plausible for career | Birth 1980s, active 2020s | Birth 1952, still active in 2025 as junior |
| 5 | **Education** | Same university/field | Both art history | Source says "medical school" |
**Minimum requirement**: At least **3 of 5** attributes must match. Name match alone = **AUTOMATIC REJECTION**.
**Any conflicting signal = AUTOMATIC REJECTION** (e.g., source says "actress" when profile is "curator").
## Sources with High Entity Resolution Risk
These sources are NOT forbidden, but require **stricter verification thresholds** due to high false-positive rates:
| Source Type | Risk Level | Why | Required Matches |
|-------------|------------|-----|------------------|
| Genealogy sites | CRITICAL | Historical persons with same name | 5/5 attributes (or explicit link to living person) |
| IMDB | CRITICAL | Actors with common names | 5/5 attributes (unless person works in film/TV) |
| Wikipedia | HIGH | Many people with same name have pages | 4/5 attributes match |
| Academic profiles | HIGH | Multiple researchers with same name | 4/5 attributes + institution match |
| Social media | HIGH | Many accounts with similar handles | 4/5 attributes + verify employer/location in bio |
| News articles | MEDIUM | May mention multiple people | 3/5 attributes + read full context |
| Institutional websites | LOW | Usually about their own staff | 2/5 attributes (good source if person works there) |
**Key point**: High-risk sources CAN be used if you verify enough identity attributes. The risk level determines the verification threshold, not whether the source is allowed.
## Red Flags Requiring Investigation
The following are **red flags** that require careful investigation - NOT automatic rejection. People change careers and relocate.
### Profession Differences
If source profession differs from profile profession, **investigate**:
```
Source: "actress", "actor", "singer"
Profile: "curator", "archivist", "librarian"
ASK: Did this person change careers?
- Check timeline: Did acting career END before heritage career BEGAN?
- Check for transition evidence: "former actress turned curator"
- If careers overlap in time → likely different people → REJECT
- If sequential careers with clear transition → may be same person → ACCEPT with documentation
```
### Location Differences
If source location differs from profile location, **investigate**:
```
Source: "Venezuela", "Mexico", "Brazil"
Profile: "UK", "Netherlands", "France"
ASK: Did this person relocate?
- Check timeline: When were they in each location?
- Check for migration evidence: education abroad, international career moves
- If locations overlap in time → likely different people → REJECT
- If sequential locations with clear move → may be same person → ACCEPT with documentation
```
### When to Actually REJECT
Reject when investigation shows **no plausible connection**:
```
Example: Carmen Julia Álvarez (Venezuelan actress, active 1970s-2000s)
vs Carmen Juliá (UK curator, active 2015-present)
- Overlapping active periods in DIFFERENT professions on DIFFERENT continents
- No evidence of career change or relocation
- Birth year 1952 makes current junior curator role implausible
→ REJECT: These are clearly different people
```
### Age Conflicts (Still Automatic Rejection)
If source age is **physically implausible** for profile career stage, REJECT:
```
Source: Born 1922, 1915, 1939
Profile: Currently active professional in 2025
→ REJECT (person would be 86-103 years old)
Source: Born 2007, 2004
Profile: Senior curator
→ REJECT (person would be 18-21, too young)
```
### Genealogy Source
Genealogy sources require **5 of 5 attribute matches** due to high false-positive rates:
```
Domains: geni.com, ancestry.*, familysearch.org, findagrave.com, myheritage.*
→ REQUIRE 5/5 attribute matches (these often describe historical namesakes)
→ Exception: If source explicitly links to living person with verifiable connection
```
## Claim Rejection Patterns
The following inconsisten patterns should trigger automatic claim rejection:
```python
# Genealogy sources conflict - ALWAYS REJECT
GENEALOGY_DOMAINS = [
'geni.com', 'ancestry.com', 'ancestry.co.uk', 'familysearch.org',
'findagrave.com', 'myheritage.com', 'wikitree.com', 'geneanet.org'
]
# Profession conflicts - if profile has one and source has another, REJECT
PROFESSION_CONFLICTS = {
'heritage': ['curator', 'archivist', 'librarian', 'conservator', 'registrar', 'collection manager'],
'entertainment': ['actress', 'actor', 'singer', 'footballer', 'politician', 'model', 'athlete'],
'medical': ['doctor', 'nurse', 'surgeon', 'physician'],
'tech': ['software engineer', 'developer', 'programmer'],
}
# Location conflicts - if source describes person in location X and profile is location Y, REJECT
LOCATION_PAIRS = [
('venezuela', 'uk'), ('venezuela', 'netherlands'), ('venezuela', 'germany'),
('mexico', 'uk'), ('mexico', 'netherlands'), ('brazil', 'france'),
('caracas', 'london'), ('caracas', 'amsterdam'),
]
# Age impossibility - if birth year makes current career implausible, REJECT. For instance, for a Junior role:
MIN_PLAUSIBLE_BIRTH_YEAR = 1945 # Would be 80 in 2025 - still plausible but verify
MAX_PLAUSIBLE_BIRTH_YEAR = 2002 # Would be 23 in 2025 - plausible for junior roles
```
## Handling Rejected Claims
When a claim fails entity resolution:
```json
{
"claim_type": "birth_year",
"claim_value": 1952,
"entity_resolution": {
"status": "REJECTED",
"reason": "conflicting_profession",
"details": "Source describes Venezuelan actress, profile is UK curator",
"source_identity": "Carmen Julia Álvarez (Venezuelan actress)",
"profile_identity": "Carmen Juliá (UK art curator)",
"rejected_at": "2026-01-11T15:00:00Z",
"rejected_by": "entity_resolution_validator_v1"
}
}
```
## Special Cases
### Common Names
For very common names (e.g., "John Smith", "Maria García", "Jan de Vries"), require **4 of 5** verification checks instead of 3. The more common the name, the higher the threshold.
| Name Commonality | Required Matches |
|------------------|------------------|
| Unique name (e.g., "Xander Vermeulen-Oosterhuis") | 2 of 5 |
| Moderately common (e.g., "Carmen Juliá") | 3 of 5 |
| Very common (e.g., "Jan de Vries") | 4 of 5 |
| Extremely common (e.g., "John Smith") | 5 of 5 or reject |
### Abbreviated Names
For profiles with abbreviated names (e.g., "J. Smith"), entity resolution is inherently uncertain:
- Set `entity_resolution_confidence: "very_low"`
- Require **human review** for all claims
- Do NOT attribute web claims automatically
### Historical Persons
When sources describe historical/deceased persons:
- Check if death date conflicts with profile activity (living person active in 2025)
- **ALWAYS REJECT** genealogy site data
- Reject any source describing events before 1950 unless profile is known to be historical
### Wikipedia Articles
Wikipedia is particularly dangerous because:
- Many people with the same name have articles
- Search engines return Wikipedia first
- The Wikipedia Carmen Julia Álvarez article describes a Venezuelan actress born 1952
- This is a DIFFERENT PERSON from Carmen Juliá the UK curator
**For Wikipedia sources**:
1. Read the FULL article, not just snippets
2. Verify the Wikipedia subject's profession matches the profile
3. Verify the Wikipedia subject's location matches the profile
4. If ANY conflict detected → REJECT
## Audit Trail
All entity resolution decisions must be logged:
```json
{
"enrichment_history": [
{
"enrichment_timestamp": "2026-01-11T15:00:00Z",
"enrichment_agent": "enrich_person_comprehensive.py v1.4.0",
"entity_resolution_decisions": [
{
"source_url": "https://en.wikipedia.org/wiki/Carmen_Julia_Álvarez",
"decision": "REJECTED",
"reason": "Different person - Venezuelan actress, not UK curator"
}
],
"claims_rejected_count": 5,
"claims_accepted_count": 1
}
]
}
```
## See Also
- Rule 21: Data Fabrication is Strictly Prohibited
- Rule 26: Person Data Provenance - Web Claims for Staff Information
- Rule 45: Inferred Data Must Be Explicit with Provenance

View file

@ -0,0 +1,422 @@
# Rule 45: Inferred Data Must Be Explicit with Provenance
**Status**: Active
**Created**: 2025-01-09
**Applies to**: PPID enrichment, person entity profiles, any data inference
## Core Principle
**All inferred data MUST be stored in explicit `inferred_*` fields with full provenance statements. Inferred values MUST NEVER silently replace or merge with verified data.**
This ensures:
1. **Transparency**: Users can distinguish verified facts from heuristic estimates
2. **Auditability**: The inference method and source observations are traceable
3. **Reversibility**: Inferred data can be corrected when verified data becomes available
4. **Quality Signals**: Confidence levels and argument chains are preserved
## Required Structure for Inferred Data
Every inferred claim MUST include:
```yaml
inferred_[field_name]:
value: "the inferred value"
edtf: "196X" # For dates: EDTF notation
formatted: "NL-UT-UTR" # For locations: CC-RR-PPP format
confidence: "low|medium|high"
inference_provenance:
method: "heuristic_name"
inference_chain:
- step: 1
observation: "University start year 1986"
source_field: "profile_data.education[0].date_range"
source_value: "1986 - 1990"
- step: 2
assumption: "University entry at age 18"
rationale: "Standard Dutch university entry age"
- step: 3
calculation: "1986 - 18 = 1968"
result: "Estimated birth year 1968"
- step: 4
generalization: "Round to decade → 196X"
rationale: "EDTF decade notation for uncertain years"
inferred_at: "2025-01-09T18:00:00Z"
inferred_by: "enrich_ppids.py"
```
## Explicit Inferred Fields
### For Person Profiles (PPID)
| Inferred Field | Source Observations | Heuristic |
|----------------|---------------------|-----------|
| `inferred_birth_year` | Earliest education/job dates | Entry age assumptions |
| `inferred_birth_decade` | Birth year estimate | EDTF decade notation |
| `inferred_birth_settlement` | School/university location | Residential proximity |
| `inferred_birth_region` | Settlement location | GeoNames admin1 |
| `inferred_birth_country` | Settlement location | GeoNames country |
| `inferred_current_settlement` | Profile location, current job | Direct extraction |
| `inferred_current_region` | Settlement location | GeoNames admin1 |
| `inferred_current_country` | Settlement location | GeoNames country |
### Example: Complete Inferred Birth Data
```json
{
"ppid": "ID_NL-UT-UTR_196X_NL-UT-UTR_XXXX_AART-HARTEN",
"birth_date": {
"edtf": "XXXX",
"precision": "unknown",
"note": "See inferred_birth_decade for heuristic estimate"
},
"inferred_birth_decade": {
"value": "196X",
"edtf": "196X",
"precision": "decade",
"confidence": "low",
"inference_provenance": {
"method": "earliest_education_heuristic",
"inference_chain": [
{
"step": 1,
"observation": "University education record found",
"source_field": "profile_data.education[0]",
"source_value": {
"institution": "Universiteit Utrecht",
"degree": "Social & Organisational psychology, doctoraal",
"date_range": "1986 - 1990"
}
},
{
"step": 2,
"extraction": "Start year extracted from date_range",
"extracted_value": 1986
},
{
"step": 3,
"assumption": "University entry age",
"assumed_value": 18,
"rationale": "Standard Dutch university entry age (post-VWO)",
"confidence_impact": "Assumption reduces confidence; actual age 17-20 possible"
},
{
"step": 4,
"calculation": "1986 - 18 = 1968",
"result": "Estimated birth year: 1968"
},
{
"step": 5,
"generalization": "Convert to EDTF decade",
"input": 1968,
"output": "196X",
"rationale": "Decade precision appropriate for heuristic estimate"
}
],
"inferred_at": "2025-01-09T18:00:00Z",
"inferred_by": "enrich_ppids.py"
}
},
"inferred_birth_settlement": {
"value": "Utrecht",
"formatted": "NL-UT-UTR",
"confidence": "low",
"inference_provenance": {
"method": "earliest_education_location",
"inference_chain": [
{
"step": 1,
"observation": "Earliest education institution identified",
"source_field": "profile_data.education[0].institution",
"source_value": "Universiteit Utrecht"
},
{
"step": 2,
"lookup": "Institution location mapping",
"mapping_key": "Universiteit Utrecht",
"mapping_value": "Utrecht, Netherlands"
},
{
"step": 3,
"geocoding": "GeoNames resolution",
"query": "Utrecht",
"country_code": "NL",
"result": {
"geonames_id": 2745912,
"name": "Utrecht",
"admin1_code": "09",
"admin1_name": "Utrecht"
}
},
{
"step": 4,
"formatting": "CC-RR-PPP generation",
"country_code": "NL",
"region_code": "UT",
"settlement_code": "UTR",
"result": "NL-UT-UTR"
}
],
"assumption_note": "University location used as proxy for birth location; student may have relocated for education",
"inferred_at": "2025-01-09T18:00:00Z",
"inferred_by": "enrich_ppids.py"
}
}
}
```
## List-Valued Inferred Data (EDTF Set Notation)
When inference yields multiple plausible values (e.g., someone born in 1968 could be in either the 1960s or 1970s decade), store as a **list** with EDTF set notation.
### EDTF Set Notation Standards
| Notation | Meaning | Use Case |
|----------|---------|----------|
| `[196X,197X]` | One of these values | Person born in late 1960s (uncertainty spans decades) |
| `{196X,197X}` | All of these values | NOT for birth decade (use `[...]`) |
| `[1965..1970]` | Range within set | Birth year between 1965-1970 |
### When to Use List Values
1. **Decade Boundary Cases**: Estimated birth year is within 3 years of a decade boundary
- Estimated 1968 → `[196X,197X]` (could be late 60s or early 70s due to age assumption variance)
- Estimated 1972 → `[196X,197X]` (same logic)
- Estimated 1975 → `197X` (confidently mid-decade)
2. **Multiple Plausible Locations**: Student attended schools in different cities
- `["NL-UT-UTR", "NL-NH-AMS"]` with provenance explaining each candidate
### Example: List-Valued Birth Decade
```json
{
"inferred_birth_decade": {
"values": ["196X", "197X"],
"edtf": "[196X,197X]",
"edtf_meaning": "one of: 1960s or 1970s",
"precision": "decade_set",
"confidence": "low",
"primary_value": "196X",
"primary_rationale": "1968 is closer to 1960s center than 1970s",
"inference_provenance": {
"method": "earliest_observation_heuristic",
"inference_chain": [
{
"step": 1,
"observation": "University start 1986",
"source_field": "profile_data.education[0].date_range"
},
{
"step": 2,
"assumption": "University entry at age 18 (±3 years)",
"rationale": "Dutch university entry typically 17-21"
},
{
"step": 3,
"calculation": "1986 - 18 = 1968 (range: 1965-1971)",
"result": "Birth year estimate: 1968 with variance 1965-1971"
},
{
"step": 4,
"generalization": "Birth year range spans decade boundary",
"input_range": [1965, 1971],
"output": ["196X", "197X"],
"rationale": "Cannot determine which decade without additional evidence"
}
],
"inferred_at": "2025-01-09T18:00:00Z",
"inferred_by": "enrich_ppids.py"
}
}
}
```
### PPID Generation with List Values
When `inferred_birth_decade` is a list, use `primary_value` for PPID:
```json
{
"ppid": "ID_NL-UT-UTR_196X_NL-UT-UTR_XXXX_AART-HARTEN",
"ppid_components": {
"first_date": "196X",
"first_date_source": "inferred_birth_decade.primary_value",
"first_date_alternatives": ["197X"]
}
}
```
### Example: List-Valued Location
```json
{
"inferred_birth_settlement": {
"values": [
{"settlement": "Utrecht", "formatted": "NL-UT-UTR"},
{"settlement": "Amsterdam", "formatted": "NL-NH-AMS"}
],
"primary_value": "NL-UT-UTR",
"primary_rationale": "Earlier education (1986) in Utrecht; Amsterdam job later (1990)",
"confidence": "very_low",
"inference_provenance": {
"method": "education_locations",
"inference_chain": [
{
"step": 1,
"observation": "Multiple education institutions found",
"source_field": "profile_data.education",
"candidates": ["Universiteit Utrecht (1986)", "UvA (1990)"]
},
{
"step": 2,
"assumption": "Earlier education more likely near birth location",
"rationale": "Students often attend local university first"
}
]
}
}
}
```
## Confidence Levels
| Level | Criteria | Example |
|-------|----------|---------|
| **high** | Direct extraction from authoritative source | Profile states "Born in Amsterdam" |
| **medium** | Single-step inference with reliable source | Current job location from employment record |
| **low** | Multi-step heuristic with assumptions | Birth year from university start date |
| **very_low** | Speculative, multiple assumptions, or list-valued | Birth location from first observed location, or decade spanning boundary |
## Anti-Patterns (FORBIDDEN)
### ❌ Silent Replacement
```json
{
"birth_date": {
"edtf": "196X",
"precision": "decade"
}
}
```
**Problem**: No indication this is inferred, no provenance, no confidence level.
### ❌ Hidden in Metadata
```json
{
"birth_date": {
"edtf": "196X"
},
"enrichment_metadata": {
"birth_date_inferred": true
}
}
```
**Problem**: Inference metadata separated from the value; easy to miss.
### ❌ Missing Inference Chain
```json
{
"inferred_birth_decade": {
"value": "196X",
"method": "heuristic"
}
}
```
**Problem**: No explanation of HOW the value was derived; not auditable.
## Correct Pattern ✅
```json
{
"birth_date": {
"edtf": "XXXX",
"precision": "unknown",
"note": "See inferred_birth_decade"
},
"inferred_birth_decade": {
"value": "196X",
"edtf": "196X",
"confidence": "low",
"inference_provenance": {
"method": "earliest_education_heuristic",
"inference_chain": [
{"step": 1, "observation": "...", "source_field": "...", "source_value": "..."},
{"step": 2, "assumption": "...", "rationale": "..."},
{"step": 3, "calculation": "...", "result": "..."}
],
"inferred_at": "2025-01-09T18:00:00Z",
"inferred_by": "enrich_ppids.py"
}
}
}
```
## PPID Component Handling
When inferred values are used in PPID components:
```json
{
"ppid": "ID_NL-UT-UTR_196X_NL-NH-AMS_XXXX_AART-HARTEN",
"ppid_components": {
"type": "ID",
"first_location": "NL-UT-UTR",
"first_location_source": "inferred_birth_settlement",
"first_date": "196X",
"first_date_source": "inferred_birth_decade",
"last_location": "NL-NH-AMS",
"last_location_source": "inferred_current_settlement",
"last_date": "XXXX",
"name_tokens": ["AART", "HARTEN"]
}
}
```
The `*_source` fields document which inferred field was used for PPID generation.
## Upgrade Path: Inferred → Verified
When verified data becomes available:
1. **Keep inferred data** in `inferred_*` fields for audit trail
2. **Add verified data** to canonical fields
3. **Mark inferred as superseded**:
```json
{
"birth_date": {
"edtf": "1967-03-15",
"precision": "day",
"verified": true,
"source": "official_record"
},
"inferred_birth_decade": {
"value": "196X",
"superseded": true,
"superseded_by": "birth_date",
"superseded_at": "2025-01-15T10:00:00Z",
"accuracy_assessment": "Inferred decade was correct (1960s), actual year 1967"
}
}
```
## Implementation Checklist
For any enrichment script:
- [ ] Create explicit `inferred_*` fields for ALL inferred data
- [ ] Include `inference_provenance` with complete `inference_chain`
- [ ] Record each step: observation → assumption → calculation → result
- [ ] Set appropriate `confidence` level
- [ ] Add `*_source` references in PPID components
- [ ] Preserve original unknown values (`XXXX`, `XX-XX-XXX`)
- [ ] Add `note` in canonical fields pointing to inferred alternatives
## Related Rules
- **Rule 44**: PPID Birth Date Enrichment and EDTF Unknown Date Notation
- **Rule 35**: Provenance Statements MUST Have Dual Timestamps
- **Rule 6**: WebObservation Claims MUST Have XPath Provenance

View file

@ -0,0 +1,251 @@
# Rule 40: KIEN Registry is Authoritative for Intangible Heritage Custodians
## Summary
For Intangible Heritage Custodians (Type I), the KIEN registry at `https://www.immaterieelerfgoed.nl/` is the **TIER_1_AUTHORITATIVE** source for contact data and addresses. Google Maps enrichment is **TIER_3_CROWD_SOURCED** and should NEVER override KIEN data.
## Empirical Validation (January 2025)
A comprehensive audit of 188 Type I custodian files revealed:
| Category | Count | Percentage |
|----------|-------|------------|
| ✅ Google Maps matches OK | 101 | 53.7% |
| 🔧 **FALSE_MATCH detected** | **62** | **33.0%** |
| ⚠️ No official website (valid) | 20 | 10.6% |
| 📭 No Google Maps data | 5 | 2.7% |
**Key Finding: 33% of Google Maps enrichment data for Type I custodians was incorrect.**
### False Match Categories Identified
1. **Domain mismatches** (39 files): Google Maps website ≠ KIEN official website
2. **Name mismatches** (8 files): Completely different organizations (e.g., "Ria Bos" heritage practitioner → "Ria Money Transfer Agent")
3. **Wrong location** (6 files): Same-ish name but different city (Amsterdam→Den Haag, Netherlands→Suriname!)
4. **Wrong organization type** (5 files): Federation vs specific member, heritage org vs webshop
5. **Different entity type** (3 files): Organization vs location/street name
6. **Different event** (3 files): Horse racing vs festival, different village's event
### Why Google Maps Fails for Type I
Google Maps is optimized for commercial businesses with physical storefronts. Type I intangible heritage custodians are fundamentally different:
- **Virtual organizations** without commercial presence
- **Person-based heritage** (individual practitioners preserving traditional crafts)
- **Volunteer networks** meeting in private residences
- **Event-based organizations** that exist only during festivals
- **Federations** that coordinate member organizations without own premises
## Rationale
Google Maps frequently returns **false matches** for intangible heritage organizations because:
1. **Virtual Organizations**: Many intangible heritage custodians operate as networks/platforms without commercial storefronts
2. **Name Collisions**: Common words in organization names (e.g., "Platform") match unrelated businesses
3. **No Physical Presence**: Organizations focused on intangible heritage (handwriting, oral traditions, crafts) often have no Google Maps listing
4. **Volunteer-Run**: Contact addresses are often private residences, not businesses
KIEN (Kenniscentrum Immaterieel Erfgoed Nederland) is the official Dutch registry for intangible cultural heritage and maintains verified contact information directly from the organizations.
## Data Tier Hierarchy for Type I Custodians
| Priority | Source | Data Tier | Trust Level |
|----------|--------|-----------|-------------|
| 1st | KIEN Registry (`immaterieelerfgoed.nl`) | TIER_1_AUTHORITATIVE | Highest |
| 2nd | Organization's Official Website | TIER_2_VERIFIED | High |
| 3rd | Wikidata | TIER_3_CROWD_SOURCED | Medium |
| 4th | Google Maps | TIER_3_CROWD_SOURCED | Low (verify!) |
## Required Workflow for Type I Enrichment
### Step 1: Scrape KIEN Page First
For every intangible heritage custodian, the KIEN profile page MUST be scraped to extract:
```yaml
kien_enrichment:
kien_name: "Platform Handschriftontwikkeling"
kien_url: "https://www.immaterieelerfgoed.nl/nl/page/2476/platform-handschriftontwikkeling"
heritage_page_url: "https://www.immaterieelerfgoed.nl/nl/handschrift"
heritage_forms:
- "Ambachten, handwerk en techniek"
- "Sociale praktijken"
address:
street: "De Hazelaar 41"
postal_code: "6903 BB"
city: "Zevenaar"
province: "Gelderland"
country: "NL"
registered_since: "2019-11"
enrichment_timestamp: "2025-01-08T00:00:00Z"
source: "https://www.immaterieelerfgoed.nl"
```
### Step 2: Validate Google Maps Match (If Any)
If Google Maps enrichment exists, compare against KIEN data:
```python
def validate_google_maps_match(kien_data, gmaps_data):
"""Check if Google Maps data matches KIEN authoritative source."""
# Check website domain match
kien_domain = extract_domain(kien_data.get('website'))
gmaps_domain = extract_domain(gmaps_data.get('website'))
if kien_domain and gmaps_domain and kien_domain != gmaps_domain:
return {
'status': 'FALSE_MATCH',
'reason': f'Website mismatch: KIEN={kien_domain}, GMaps={gmaps_domain}'
}
# Check name similarity
kien_name = kien_data.get('kien_name', '').lower()
gmaps_name = gmaps_data.get('name', '').lower()
if fuzz.ratio(kien_name, gmaps_name) < 70:
return {
'status': 'FALSE_MATCH',
'reason': f'Name mismatch: KIEN="{kien_name}", GMaps="{gmaps_name}"'
}
return {'status': 'VERIFIED'}
```
### Step 3: Mark False Matches
When Google Maps returns a different organization:
```yaml
google_maps_enrichment:
status: FALSE_MATCH
false_match_reason: >-
Google Maps returned "Platform 9 BV" (a health/coaching business at
Nieuwleusen) instead of "Platform Handschriftontwikkeling" (a virtual
handwriting development platform). These are completely different
organizations. KIEN registry is authoritative for this Type I custodian.
original_false_match:
place_id: ChIJNZ6o7H_fx0cR-TURAN3Bj54
name: Platform 9 BV
formatted_address: Burg, Burgemeester Backxlaan 321, 7711 AD Nieuwleusen
website: http://www.platform9.nl/
correction_timestamp: "2025-01-08T00:00:00Z"
correction_agent: opencode-claude-sonnet-4
```
## KIEN Contact Data Extraction
The KIEN heritage pages follow a consistent structure. Extract from the "Contact" section:
```
## Contact
[Organization Name](link-to-profile-page)
Street Address
Postal Code
City
Province
[Website](url)
Bijgeschreven in inventaris vanaf: [date]
```
### Example Extraction (from immaterieelerfgoed.nl/nl/handschrift):
```yaml
contact:
organization: "Platform Handschriftontwikkeling"
profile_url: "https://www.immaterieelerfgoed.nl/nl/page/2476/platform-handschriftontwikkeling"
address:
street: "De Hazelaar 41"
postal_code: "6903 BB"
city: "Zevenaar"
province: "Gelderland"
website: "http://www.handschriftontwikkeling.nl/"
registered_since: "november 2019"
```
## Location Resolution for Type I
When KIEN provides an address:
1. **Use KIEN address** for `location.formatted_address`
2. **Geocode KIEN address** to get coordinates (NOT Google Maps coordinates)
3. **Update location_resolution** with method `KIEN_ADDRESS_GEOCODE`
```yaml
location:
street_address: "De Hazelaar 41"
postal_code: "6903 BB"
city: Zevenaar
region_code: GE
country: NL
coordinate_provenance:
source_type: KIEN_ADDRESS_GEOCODE
source_url: "https://www.immaterieelerfgoed.nl/nl/handschrift"
geocoding_service: nominatim
geocoding_timestamp: "2025-01-08T00:00:00Z"
```
## Batch Re-Enrichment Script
To fix all Type I custodians with potentially incorrect Google Maps data:
```bash
# Find all Type I custodians
python scripts/rescrape_kien_contacts.py --type I --output data/custodian/
# This script should:
# 1. Read all NL-*-I-*.yaml files
# 2. Fetch KIEN page for each (from kien_enrichment.kien_url)
# 3. Extract contact/address from KIEN
# 4. Compare with google_maps_enrichment
# 5. Mark mismatches as FALSE_MATCH
# 6. Update location with KIEN address
```
## Anti-Patterns
### WRONG - Using Google Maps as primary source for Type I:
```yaml
# WRONG - Google Maps overriding KIEN data
location:
formatted_address: "Burg, Burgemeester Backxlaan 321, 7711 AD Nieuwleusen"
coordinate_provenance:
source_type: GOOGLE_MAPS # WRONG for Type I!
```
### CORRECT - KIEN as primary source:
```yaml
# CORRECT - KIEN is authoritative
location:
street_address: "De Hazelaar 41"
postal_code: "6903 BB"
city: Zevenaar
coordinate_provenance:
source_type: KIEN_ADDRESS_GEOCODE # Correct!
```
## Affected Files
This rule affects approximately 100+ Type I custodian files:
- `data/custodian/NL-*-I-*.yaml`
All should be reviewed to ensure:
1. `kien_enrichment` contains address from KIEN page
2. `google_maps_enrichment` is validated against KIEN
3. `location` uses KIEN address (not Google Maps)
4. False matches are properly documented
## Related Rules
- **Rule 5**: NEVER Delete Enriched Data - Keep false match data in `original_false_match`
- **Rule 6**: WebObservation Claims - KIEN data should have provenance
- **Rule 22**: Custodian YAML Files Are Single Source of Truth
- **Rule 35**: Provenance Timestamps - Include KIEN fetch timestamps
## See Also
- KIEN Registry: https://www.immaterieelerfgoed.nl/
- UNESCO Intangible Cultural Heritage: https://ich.unesco.org/
- Dutch Intangible Heritage Network documentation

View file

@ -0,0 +1,351 @@
# Rule 44: PPID Birth Date Enrichment and Unknown Date Notation
**Version**: 1.0.0
**Created**: 2025-01-09
**Status**: ACTIVE
**Related**: [PPID-GHCID Alignment](../../docs/plan/person_pid/10_ppid_ghcid_alignment.md) | [EDTF Specification](https://www.loc.gov/standards/datetime/)
---
## 1. Summary
When birth/death dates are missing from person entity sources, agents MUST:
1. **Search for dates** using Exa Search and Linkup tools
2. **Record all enrichment data** as web claims with provenance
3. **If not found**, use **EDTF-compliant notation** for estimated/unknown dates
4. **Never fabricate** specific dates without source evidence
---
## 2. Enrichment Workflow
### 2.1 Required Search Before Using Unknown Notation
Before marking a date as unknown, agents MUST attempt enrichment:
```
Person Entity (missing birth_date)
1. Search Exa: "{full_name} born birth date"
2. Search Exa: "{full_name} {known_employer}"
3. Search Linkup: "{full_name} biography"
4. If found → Record as web_claim with provenance
5. If NOT found → Use EDTF unknown notation
6. Record enrichment_attempt in metadata
```
### 2.2 Enrichment Search Requirements
| Search Tool | Query Pattern | When to Use |
|-------------|---------------|-------------|
| `exa_web_search_exa` | `"{name}" born birthday birth date year` | Primary search |
| `exa_linkedin_search_exa` | `"{name}" at "{employer}"` | For work context |
| `linkup_linkup-search` | `"{name}" biography personal` | Deep research |
### 2.3 Recording Successful Enrichment
When birth date is found, record as web claim:
```yaml
web_claims:
- claim_type: birth_date
claim_value: "1985-03-15"
source_url: "https://example.org/person/bio"
retrieved_on: "2025-01-09T14:30:00Z"
retrieval_agent: "opencode-claude-sonnet-4"
confidence_score: 0.85
notes: "Found in biography section"
```
### 2.4 Recording Failed Enrichment Attempts
Always record that enrichment was attempted:
```yaml
enrichment_metadata:
birth_date_search:
attempted: true
search_date: "2025-01-09T14:30:00Z"
search_agent: "opencode-claude-sonnet-4"
search_tools_used:
- exa_web_search_exa
- linkup_linkup-search
queries_tried:
- '"Jan van Berg" born birthday'
- '"Jan van Berg" biography'
result: "NOT_FOUND"
notes: "No publicly available birth date found after comprehensive search"
```
---
## 3. EDTF-Compliant Unknown Date Notation
### 3.1 Standard: Extended Date/Time Format (EDTF)
This project follows the **Library of Congress EDTF Specification** (ISO 8601-2:2019) for representing uncertain, approximate, and unspecified dates.
**Key EDTF Characters**:
| Character | Meaning | EDTF Level | Example |
|-----------|---------|------------|---------|
| `X` | Unspecified digit | Level 1+ | `19XX` = some year 1900-1999 |
| `~` | Approximate (circa) | Level 1+ | `1985~` = circa 1985 |
| `?` | Uncertain | Level 1+ | `1985?` = possibly 1985 |
| `%` | Uncertain AND approximate | Level 1+ | `1985%` = possibly circa 1985 |
| `S` | Significant digits | Level 2 | `1950S2` = 1900-1999, estimated 1950 |
| `[..]` | One of set | Level 2 | `[1970,1980]` = either 1970 or 1980 |
| `{..}` | All of set | Level 2 | `{1970..1980}` = all years 1970-1980 |
### 3.2 Unspecified Date Components (X Notation)
Use `X` to replace unknown digits:
| Known Information | EDTF Format | Meaning |
|-------------------|-------------|---------|
| Only decade known (1970s) | `197X` | Some year 1970-1979 |
| Only century known (1900s) | `19XX` | Some year 1900-1999 |
| Year unknown entirely | `XXXX` | Year unknown |
| Year known, month unknown | `1985-XX` | Some month in 1985 |
| Year+month known, day unknown | `1985-04-XX` | Some day in April 1985 |
| Year known, month+day unknown | `1985-XX-XX` | Some day in 1985 |
| Only decade and final digit known | `197X-XX-XX` or use set | 1970-1979 |
### 3.3 Multiple Possible Decades (Set Notation)
When the decade is uncertain but constrained to specific options:
| Scenario | EDTF Format | Meaning |
|----------|-------------|---------|
| Born in 1970s OR 1980s | `[197X,198X]` | One of: some year in 1970s or 1980s |
| Born in specific years | `[1975,1985]` | Either 1975 or 1985 |
| Born 1970-1985 range | `1970/1985` | Interval: between 1970 and 1985 |
### 3.4 Estimated Dates with Significant Digits
When you can estimate a year with confidence bounds:
```
1975S2 = Estimated 1975, significant to 2 digits (1900-1999)
1975S3 = Estimated 1975, significant to 3 digits (1970-1979)
```
This is useful when you can estimate based on career timeline (e.g., "started working 1998, likely born 1970s").
### 3.5 Living Persons - Birth Date Estimation
For living persons in LinkedIn data, estimate birth decade from:
1. **Graduation year** (if available): Subtract ~22 years for bachelor's degree
2. **Career start** (first job): Subtract ~22-25 years
3. **Current role seniority**: "Senior" roles suggest 35+ years old
```yaml
# Example: Person graduated 2010
birth_date_estimate:
edtf: "1988S2" # Estimated 1988, significant to 2 digits (1980-1999)
estimation_method: "graduation_year_inference"
estimation_basis: "Graduated bachelor's 2010, estimated birth ~1988"
confidence: 0.60
```
---
## 4. PPID Format with Unknown Dates
### 4.1 PPID Date Component Rules
The PPID format includes birth and death dates:
```
{TYPE}_{FL}_{FD}_{LL}_{LD}_{NT}
│ │
│ └── Last Date (death) - EDTF format
└── First Date (birth) - EDTF format
```
### 4.2 Examples with Unknown Components
| Scenario | PPID Example |
|----------|--------------|
| All known | `PID_NL-NH-AMS_1985-03-15_NL-NH-HAA_2020-08-22_JAN-BERG` |
| Birth year only | `ID_NL-NH-AMS_1985_XX-XX-XXX_XXXX_JAN-BERG` |
| Birth decade only | `ID_XX-XX-XXX_197X_XX-XX-XXX_XXXX_JAN-BERG` |
| Nothing known | `ID_XX-XX-XXX_XXXX_XX-XX-XXX_XXXX_JAN-BERG` |
| Living person | `ID_NL-NH-AMS_1985_XX-XX-XXX_XXXX_JAN-BERG` |
### 4.3 Filename Safety
EDTF characters are **filename-safe**:
| Character | Filename Safe? | Notes |
|-----------|----------------|-------|
| `X` | YES | Uppercase letter |
| `~` | YES | Allowed on macOS/Linux/Windows |
| `?` | NO | Not allowed on Windows |
| `%` | CAUTION | URL encoding issues |
| `[` `]` | CAUTION | Shell escaping issues |
| `,` | YES | Allowed |
| `/` | NO | Directory separator |
| `\|` | CAUTION | Shell pipe, Windows disallowed |
**Recommendation**: For filenames, use only:
- `X` for unknown digits
- `~` for approximate (suffix only)
- Avoid `?`, `%`, `[]`, `/`, `|` in filenames
When set notation `[..]` is needed, store in metadata but use simplified form in filename:
- Filename: `ID_XX-XX-XXX_197X_...` (simplified)
- Metadata: `birth_date_edtf: "[1975,1985]"` (full EDTF)
---
## 5. Decision Tree
```
┌─────────────────────────────────────────┐
│ Person entity missing birth_date │
└─────────────────┬───────────────────────┘
┌─────────────────────────────────────────┐
│ Search Exa + Linkup for birth date │
└─────────────────┬───────────────────────┘
┌───────┴───────┐
│ Date found? │
└───────┬───────┘
YES │ NO
▼ │ ▼
┌─────────────────┐ ┌─────────────────────────────┐
│ Record as │ │ Can estimate from career? │
│ web_claim with │ └───────────┬─────────────────┘
│ provenance │ YES │ NO
└─────────────────┘ ▼ │ ▼
┌───────────────┐ ┌───────────────┐
│ Use EDTF │ │ Use XXXX │
│ estimate: │ │ (unknown) │
│ 1988S2 or │ │ │
│ 198X │ │ │
└───────────────┘ └───────────────┘
```
---
## 6. Examples
### 6.1 Fully Unknown (No Enrichment Found)
```yaml
# Person: Nora Ruijs (student, no public birth info)
ppid: ID_XX-XX-XXX_XXXX_XX-XX-XXX_XXXX_NORA-RUIJS
birth_date:
edtf: "XXXX"
precision: "unknown"
enrichment_metadata:
birth_date_search:
attempted: true
search_date: "2025-01-09T14:30:00Z"
result: "NOT_FOUND"
```
### 6.2 Decade Estimated from Career
```yaml
# Person: Senior curator, started career 1995
ppid: ID_NL-NH-AMS_197X_XX-XX-XXX_XXXX_JAN-BERG
birth_date:
edtf: "197X"
edtf_full: "1972S3" # Estimated 1972, significant to 3 digits
precision: "decade"
estimation_method: "career_start_inference"
estimation_basis: "Career started 1995 as junior curator, estimated age 23"
```
### 6.3 Multiple Possible Decades
```yaml
# Person: Could be born 1970s or 1980s based on conflicting sources
ppid: ID_XX-XX-XXX_197X_XX-XX-XXX_XXXX_MARIA-SILVA # Simplified for filename
birth_date:
edtf: "[197X,198X]" # Full EDTF with set notation
edtf_filename: "197X" # Simplified for filename (earlier estimate)
precision: "decade_uncertain"
notes: "Sources conflict: LinkedIn suggests 1980s, university bio suggests 1970s"
```
### 6.4 Exact Date Found via Enrichment
```yaml
# Person: Birth date found on institutional bio page
ppid: ID_NL-NH-AMS_1985-03-15_XX-XX-XXX_XXXX_JAN-BERG
birth_date:
edtf: "1985-03-15"
precision: "day"
web_claims:
- claim_type: birth_date
claim_value: "1985-03-15"
source_url: "https://museum.nl/team/jan-berg"
retrieved_on: "2025-01-09T14:30:00Z"
retrieval_agent: "opencode-claude-sonnet-4"
```
---
## 7. Anti-Patterns
### 7.1 FORBIDDEN: Fabricating Dates
```yaml
# WRONG - No source, no search attempted
birth_date:
edtf: "1985-03-15" # Where did this come from?!
```
### 7.2 FORBIDDEN: Using Non-EDTF Notation
```yaml
# WRONG - Not EDTF compliant
birth_date: "197~8~" # Invalid notation
birth_date: "1970s" # Use 197X instead
birth_date: "circa 1985" # Use 1985~ instead
birth_date: "unknown" # Use XXXX instead
```
### 7.3 FORBIDDEN: Skipping Enrichment Search
```yaml
# WRONG - No search attempted
birth_date:
edtf: "XXXX"
# No enrichment_metadata showing search was attempted!
```
---
## 8. Validation Rules
1. **Search Required**: Cannot use `XXXX` without `enrichment_metadata.birth_date_search.attempted: true`
2. **EDTF Compliance**: All dates must parse as valid EDTF (use validator)
3. **Filename Safety**: PPID filenames must avoid `?`, `%`, `[]`, `/`, `|`
4. **Provenance Required**: All found dates must have `web_claims` with source
---
## 9. References
- [EDTF Specification (Library of Congress)](https://www.loc.gov/standards/datetime/)
- [ISO 8601-2:2019](https://www.iso.org/standard/70908.html)
- [PPID-GHCID Alignment Document](../../docs/plan/person_pid/10_ppid_ghcid_alignment.md)
- [Rule 21: Data Fabrication Prohibition](../DATA_FABRICATION_PROHIBITION.md)

View file

@ -5,8 +5,8 @@
## The Rule
1. **Slots (Predicates)** MUST ONLY have `exact_mappings` to ontology **predicates** (properties).
* ❌ INVALID: Slot `analyzes_or_analyzed` maps to `schema:object` (a Class).
* ✅ VALID: Slot `analyzes_or_analyzed` maps to `crm:P129_is_about` (a Property).
* ❌ INVALID: Slot `analyze` maps to `schema:object` (a Class).
* ✅ VALID: Slot `analyze` maps to `crm:P129_is_about` (a Property).
2. **Classes (Entities)** MUST ONLY have `exact_mappings` to ontology **classes** (entities).
* ❌ INVALID: Class `Person` maps to `foaf:name` (a Property).

View file

@ -0,0 +1,65 @@
# Rule: Engineering Parsimony and Domain Modeling
## Critical Convention
Our ontology follows an engineering-oriented approach: practical domain utility and
stable interoperability take priority over minimal, tool-specific class catalogs.
## Rule
1. Model domain concepts, not implementation tools.
- Reject classes like `ExaSearchMetadata`, `OpenAIFetchResult`, `ElasticsearchHit`.
2. Prefer generic, reusable activity/entity classes for operational provenance.
- Use classes such as `ExternalSearchMetadata`, `RetrievalActivity`, `SearchResult`.
3. Capture tool/vendor details in slot values, not class names.
- Record with generic predicates like `has_tool`, `has_method`, `has_agent`, `has_note`.
4. Digital platforms acting as custodians are valid domain classes.
- Platform-as-custodian classes (for example YouTube-related custodian classes) are allowed.
- Data processing/search tools are not ontology class candidates.
5. Avoid ontology growth driven by transient engineering stack choices.
- New class proposals must be justified by cross-tool, domain-stable semantics.
## Rationale
- Tool names are volatile implementation details and age quickly.
- Domain-level abstractions maximize reuse, query consistency, and mapping stability.
- This aligns with an engineering ontology practice where strict theoretical
parsimony in candidate theories is not the only optimization criterion; practical
semantic interoperability and maintainability are primary.
## Examples
### Wrong
```yaml
classes:
ExaSearchMetadata:
class_uri: prov:Activity
```
### Correct
```yaml
classes:
ExternalSearchMetadata:
class_uri: prov:Activity
slots:
- has_tool
- has_method
- has_agent
```
## References
1. Liefke, K. (2024). *Natural Language Ontology and Semantic Theory*.
Cambridge Elements in Semantics. DOI: `10.1017/9781009307789`.
URL: https://www.cambridge.org/core/elements/abs/natural-language-ontology-and-semantic-theory/E8DDE548BB8A98137721984E26FAD764
2. Liefke, K. (2025). *Reduction and Unification in Natural Language Ontology*.
Cambridge Elements in Semantics. DOI: `10.1017/9781009559683`.
URL: https://www.cambridge.org/core/elements/abs/reduction-and-unification-in-natural-language-ontology/40F58ABA0D9C08958B5926F0CBDAD3CA

View file

@ -0,0 +1,37 @@
# Exact Mapping Predicate/Class Distinction Rule
🚨 **CRITICAL**: The `exact_mappings` property implies semantic equivalence. Equivalence can only exist between elements of the same ontological category.
## The Rule
1. **Slots (Predicates)** MUST ONLY have `exact_mappings` to ontology **predicates** (properties).
* ❌ INVALID: Slot `analyze` maps to `schema:object` (a Class).
* ✅ VALID: Slot `analyze` maps to `crm:P129_is_about` (a Property).
2. **Classes (Entities)** MUST ONLY have `exact_mappings` to ontology **classes** (entities).
* ❌ INVALID: Class `Person` maps to `foaf:name` (a Property).
* ✅ VALID: Class `Person` maps to `foaf:Person` (a Class).
3. **When true equivalence exists and is verified, exact mapping is preferred.**
* ✅ VALID: Class `Acquisition` maps to `crm:E8_Acquisition`.
* ✅ VALID: Slot mapped to an actually equivalent ontology property.
* ❗ Do not avoid `exact_mappings` by default; avoid only when scope is broader/narrower/similar-but-not-equal.
## Rationale
Mapping a slot (which defines a relationship or attribute) to a class (which defines a type of entity) is a category error. `schema:object` represents the *class* of objects, not the *relationship* of "having an object" or "analyzing an object".
## Verification Checklist
When adding or reviewing `exact_mappings`:
- [ ] Is the LinkML element a Class or a Slot?
- [ ] Did you verify the target term type in the ontology definition files (do not rely on naming heuristics)?
- [ ] Do they match? (Class↔Class, Slot↔Property)
- [ ] If the target ontology uses opaque IDs (like CIDOC-CRM `E55_Type`), verify the type definition in the ontology file.
- [ ] If semantic scope is truly equivalent, use `exact_mappings` (not `close`/`broad` as a conservative fallback).
## Common Pitfalls to Fix
- Mapping slots to `schema:Object` or `schema:Thing`.
- Mapping slots to `skos:Concept`.
- Mapping classes to `schema:name` or `dc:title`.

View file

@ -0,0 +1,144 @@
# Rule 58: Feedback vs Revision Distinction in slot_fixes.yaml
## Summary
The `feedback` and `revision` fields in `slot_fixes.yaml` serve distinct purposes and MUST NOT be conflated or renamed.
## Field Definitions
### `revision` Field
- **Purpose**: Defines WHAT the migration target is
- **Content**: List of slots and classes to create
- **Authority**: IMMUTABLE (per Rule 57)
- **Format**: Structured YAML list with `label`, `type`, optional `link_branch`
### `feedback` Field
- **Purpose**: Contains user instructions on HOW the revision needs to be applied or corrected
- **Content**: Can be string or structured format
- **Authority**: User directives that override previous `notes`
- **Action Required**: Agent must interpret and act upon feedback
## Feedback Formats
### Format 1: Structured (with `done` field)
```yaml
feedback:
- timestamp: '2026-01-17T00:01:57Z'
user: Simon C. Kemper
done: false # Becomes true after agent processes
comment: |
The migration should use X instead of Y.
response: "" # Agent fills this after completing
```
### Format 2: String (direct instruction)
```yaml
feedback: I reject this! type_id should be migrated to has_or_had_identifier + Identifier
```
Or:
```yaml
feedback: I altered the revision based on this feedback. Conduct this new migration accordingly.
```
## Interpretation Rules
| Feedback Contains | Meaning | Action Required |
|-------------------|---------|-----------------|
| "I reject this" | Previous `notes` were WRONG | Follow `revision` field instead |
| "I altered the revision" | User updated `revision` | Execute migration per NEW revision |
| "Conduct the migration" | Migration not yet done | Execute migration now |
| "Please conduct accordingly" | Migration pending | Execute migration now |
| "ADDRESSED" or `done: true` | Already processed | No action needed |
## Decision Tree
```
Is feedback field present?
├─ NO → Check `processed.status`
│ ├─ true → Migration complete
│ └─ false → Execute revision
└─ YES → What format?
├─ Structured with `done: true` → No action needed
├─ Structured with `done: false` → Process feedback, then set done: true
└─ String format → Parse for keywords:
├─ "reject" → Previous notes invalid, follow revision
├─ "altered/adjusted revision" → Execute NEW revision
├─ "conduct/please" → Migration pending, execute now
└─ "ADDRESSED" → Already done, no action
```
## Anti-Patterns
### WRONG: Renaming feedback to revision
```yaml
# DO NOT DO THIS
# feedback contains instructions, not migration specs
revision: # Was: feedback
- I reject this! Use has_or_had_identifier
```
### WRONG: Ignoring string feedback
```yaml
feedback: Please conduct the migration accordingly.
notes: "NO MIGRATION NEEDED" # WRONG - feedback overrides notes
```
### WRONG: Treating all feedback as completed
```yaml
feedback: I altered the revision. Conduct this new migration.
processed:
status: true # WRONG if migration not actually done
```
## Correct Workflow
1. **Read feedback** - Understand user instruction
2. **Check revision** - This defines the target migration
3. **Execute migration** - Create/update slots and classes per revision
4. **Update processed.status** - Set to `true`
5. **Add response** - Document what was done
- For structured feedback: Set `done: true` and fill `response`
- For string feedback: Add new structured feedback entry confirming completion
## Example: Processing String Feedback
Before:
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/type_id
feedback: I reject this! type_id should be migrated to has_or_had_identifier + Identifier
revision:
- label: has_or_had_identifier
type: slot
- label: Identifier
type: class
processed:
status: false
notes: "Previously marked as no migration needed"
```
After processing:
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/type_id
feedback:
- timestamp: '2026-01-17T12:00:00Z'
user: System
done: true
comment: "Original string feedback: I reject this! type_id should be migrated to has_or_had_identifier + Identifier"
response: "Migration completed. type_id.yaml archived, consuming classes updated to use has_or_had_identifier slot with Identifier range."
revision:
- label: has_or_had_identifier
type: slot
- label: Identifier
type: class
processed:
status: true
notes: "Migration completed per user feedback rejecting previous notes."
```
## See Also
- **Rule 53**: Full Slot Migration - slot_fixes.yaml is AUTHORITATIVE
- **Rule 57**: slot_fixes.yaml Revision Key is IMMUTABLE
- **Rule 39**: Slot Naming Convention (RiC-O Style)

View file

@ -0,0 +1,373 @@
# Rule 53: Full Slot Migration - No Deprecation Notes
🚨 **CRITICAL**: When migrating slots from `slot_fixes.yaml`:
1. **Follow the `revision` section EXACTLY** - The `slot_fixes.yaml` file specifies the exact replacement slots and classes to use
2. **Perform FULL MIGRATION** - Completely remove the deprecated slot from the entity class
3. **Do NOT add deprecation notes** - Never keep both old and new slots with deprecation markers
---
## 🚨 slot_fixes.yaml is AUTHORITATIVE AND CURATED 🚨
**File Location**: `schemas/20251121/linkml/modules/slots/slot_fixes.yaml`
**THIS FILE IS THE SINGLE SOURCE OF TRUTH FOR ALL SLOT MIGRATIONS.**
The `slot_fixes.yaml` file has been **manually curated** to specify the exact replacement slots and classes for each deprecated slot. The revisions are based on:
1. **Ontology analysis** - Each replacement was chosen based on alignment with base ontologies (CIDOC-CRM, RiC-O, PROV-O, Schema.org, etc.)
2. **Semantic correctness** - Revisions reflect the intended meaning of the original slot
3. **Pattern consistency** - Follows established naming conventions (Rule 39: RiC-O style, Rule 43: singular nouns)
4. **Class hierarchy design** - Type/Types pattern (Rule 0b) applied where appropriate
**YOU MUST NOT**:
- ❌ Substitute different slots than those specified in `revision`
- ❌ Use your own judgment to pick "similar" slots
- ❌ Skip the revision and invent new mappings
- ❌ Partially apply the revision (e.g., use the slot but not the class)
**YOU MUST**:
- ✅ Follow the `revision` section TO THE LETTER
- ✅ Use EXACTLY the slots and classes specified
- ✅ Apply ALL components of the revision (both slots AND classes)
- ✅ Interpret `link_branch` fields correctly (see below)
- ✅ Update `processed.status: true` after completing migration
---
## Understanding `link_branch` in Revision Plans
🚨 **CRITICAL**: The `link_branch` field in revision plans indicates **nested class attributes**. Items with `link_branch: N` are slots/classes that belong TO the primary class, not standalone replacements.
### How to Interpret `link_branch`
| Revision Item | Meaning |
|---------------|---------|
| Items **WITHOUT** `link_branch` | **PRIMARY** slot and class to create |
| Items **WITH** `link_branch: 1` | First attribute branch that the primary class needs |
| Items **WITH** `link_branch: 2` | Second attribute branch that the primary class needs |
| Items **WITH** `link_branch: N` | Nth attribute branch for the primary class |
### Example: `visitor_count` Revision
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/visitor_count
revision:
- label: has_or_had_quantity # PRIMARY SLOT (no link_branch)
type: slot
- label: Quantity # PRIMARY CLASS (no link_branch)
type: class
- label: has_or_had_measurement_unit # Quantity needs this slot
type: slot
link_branch: 1 # ← Branch 1: unit attribute
- label: MeasureUnit # Range of has_or_had_measurement_unit
type: class
value:
- visitors
link_branch: 1
- label: temporal_extent # Quantity needs this slot too
type: slot
link_branch: 2 # ← Branch 2: time attribute
- label: TimeSpan # Range of temporal_extent
type: class
link_branch: 2
```
**Interpretation**: This creates:
1. **Primary**: `has_or_had_quantity` slot → `Quantity` class
2. **Branch 1**: `Quantity.has_or_had_measurement_unit``MeasureUnit` (with value "visitors")
3. **Branch 2**: `Quantity.temporal_extent``TimeSpan`
### Resulting Class Structure
```yaml
# The Quantity class should have these slots:
Quantity:
slots:
- has_or_had_measurement_unit # From link_branch: 1
- temporal_extent # From link_branch: 2
```
### Complex Example: `visitor_conversion_rate`
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/visitor_conversion_rate
revision:
- label: has_or_had_conversion_rate # PRIMARY SLOT
type: slot
- label: ConversionRate # PRIMARY CLASS
type: class
- label: has_or_had_type # ConversionRate.has_or_had_type
type: slot
link_branch: 1
- label: ConversionRateType # Abstract type class
type: class
link_branch: 1
- label: includes_or_included # ConversionRateType hierarchy slot
type: slot
link_branch: 1
- label: ConversionRateTypes # Concrete subclasses file
type: class
link_branch: 1
- label: temporal_extent # ConversionRate.temporal_extent
type: slot
link_branch: 2
- label: TimeSpan # Range of temporal_extent
type: class
link_branch: 2
```
**Interpretation**:
1. **Primary**: `has_or_had_conversion_rate``ConversionRate`
2. **Branch 1**: Type hierarchy with `ConversionRateType` (abstract) + `ConversionRateTypes` (concrete subclasses)
3. **Branch 2**: Temporal tracking via `temporal_extent``TimeSpan`
### Migration Checklist for `link_branch` Revisions
- [ ] Create/verify PRIMARY slot exists
- [ ] Create/verify PRIMARY class exists
- [ ] For EACH `link_branch: N`:
- [ ] Add the branch slot to PRIMARY class's `slots:` list
- [ ] Import the branch slot file
- [ ] Import the branch class file (if creating new class)
- [ ] Verify range of branch slot points to branch class
- [ ] Update consuming class to use PRIMARY slot (not deprecated slot)
- [ ] Update examples to show nested structure
---
## Mandatory: Follow slot_fixes.yaml Revisions Exactly
**The `revision` section in `slot_fixes.yaml` is AUTHORITATIVE.** Do not substitute different slots based on your own judgment.
**Example from slot_fixes.yaml**:
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/actual_start
revision:
- label: begin_of_the_begin # ← USE THIS SLOT
type: slot
- label: TimeSpan # ← USE THIS CLASS
type: class
```
**CORRECT**: Use `begin_of_the_begin` slot (as specified)
**WRONG**: Substitute `has_actual_start_date` (not in revision)
## The Problem
Adding deprecation notes while keeping both old and new slots:
- Creates schema bloat with redundant properties
- Confuses data consumers about which slot to use
- Violates single-source-of-truth principle
- Complicates future data validation
## Anti-Pattern (WRONG)
```yaml
# WRONG - Keeping deprecated slot with deprecation note
classes:
TemporaryLocation:
slots:
- actual_start # OLD - kept with deprecation note
- actual_end # OLD - kept with deprecation note
- has_actual_start_date # NEW
- has_actual_end_date # NEW
slot_usage:
actual_start:
deprecated: |
DEPRECATED: Use has_actual_start_date instead.
# ... more deprecation documentation
```
## Correct Pattern
```yaml
# CORRECT - Only new slots, old slots completely removed
classes:
TemporaryLocation:
slots:
- has_actual_start_date # NEW - only new slots present
- has_actual_end_date # NEW
# NO slot_usage for deprecated slots - they don't exist in this class
```
## Migration Steps
When processing a slot from `slot_fixes.yaml`:
1. **Identify affected entity class(es)**
2. **Remove old slot from imports** (if dedicated import file exists)
3. **Remove old slot from slots list**
4. **Remove any slot_usage for old slot**
5. **Add new slot import** (if not already present)
6. **Add new slot to slots list**
7. **Add slot_usage for new slot** (if range override or customization needed)
8. **Update examples** to use new slot
9. **Validate with gen-owl**
## What Happens to Old Slot Files
The old slot files in `modules/slots/` (e.g., `actual_start.yaml`, `activities_societies.yaml`) are **NOT deleted** because:
- Other entity classes might still use them
- They serve as documentation of the old schema
- They can be archived when all usages are migrated
However, the old slots are **removed from the entity class** being migrated.
## Example: TemporaryLocation Migration
**Before** (with old slots):
```yaml
imports:
- ../slots/actual_end
- ../slots/actual_start
- ../slots/has_actual_start_date
- ../slots/has_actual_end_date
slots:
- actual_end
- actual_start
- has_actual_start_date
- has_actual_end_date
```
**After** (fully migrated):
```yaml
imports:
# actual_end and actual_start imports REMOVED
- ../slots/has_actual_start_date
- ../slots/has_actual_end_date
slots:
# actual_end and actual_start REMOVED from list
- has_actual_start_date
- has_actual_end_date
```
## Slot Usage for New Slots
Only add `slot_usage` for the new slot if you need to:
- Override the range for this specific class
- Add class-specific examples
- Add class-specific constraints
Do NOT add `slot_usage` just to document that it replaces an old slot.
## Recording in slot_fixes.yaml
When marking a slot as processed:
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/actual_start
processed:
status: true
timestamp: '2026-01-14T16:00:00Z'
session: "session-2026-01-14-type-migration"
notes: "FULLY MIGRATED: TemporaryLocation - actual_start REMOVED, using temporal_extent with TimeSpan.begin_of_the_begin (Rule 53)"
```
Note the "FULLY MIGRATED" prefix in notes to confirm this was a complete removal, not a deprecation-in-place.
---
## ⚠️ Common Mistakes to Avoid ⚠️
### Mistake 1: Substituting Different Slots
**slot_fixes.yaml specifies**:
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/actual_start
revision:
- label: begin_of_the_begin # ← MUST USE THIS
type: slot
- label: TimeSpan # ← WITH THIS CLASS
type: class
```
| Action | Status |
|--------|--------|
| Using `begin_of_the_begin` with `TimeSpan` | ✅ CORRECT |
| Using `has_actual_start_date` (invented) | ❌ WRONG |
| Using `start_date` (different slot) | ❌ WRONG |
| Using `begin_of_the_begin` WITHOUT `TimeSpan` | ❌ WRONG (incomplete) |
### Mistake 2: Partial Application
The revision often specifies MULTIPLE components that work together:
```yaml
revision:
- label: has_or_had_type # ← Slot for linking
type: slot
- label: BackupType # ← Abstract base class
type: class
- label: includes_or_included # ← Slot for hierarchy
type: slot
- label: BackupTypes # ← Concrete subclasses
type: class
```
**All four components** are part of the migration. Don't just use `has_or_had_type` and ignore the class structure.
### Mistake 3: Using `temporal_extent` Slot Correctly
When `slot_fixes.yaml` specifies TimeSpan-based revision:
```yaml
revision:
- label: begin_of_the_begin
type: slot
- label: TimeSpan
type: class
```
This means: **Use the `temporal_extent` slot** (which has `range: TimeSpan`) and access the temporal bounds via TimeSpan's slots:
```yaml
# CORRECT: Use temporal_extent with TimeSpan structure
temporal_extent:
begin_of_the_begin: '2020-06-15'
end_of_the_end: '2022-03-15'
# WRONG: Create new has_actual_start_date slot
has_actual_start_date: '2020-06-15' # ❌ Not in revision!
```
### Mistake 4: Not Updating Examples
When migrating slots, **update ALL examples** in the class file:
- Description examples (in class description)
- slot_usage examples
- Class-level examples (at bottom of file)
---
## Verification Checklist
Before marking a slot as processed:
- [ ] Read the `revision` section completely
- [ ] Identified ALL slots and classes in revision
- [ ] Removed old slot from imports
- [ ] Removed old slot from slots list
- [ ] Removed old slot from slot_usage
- [ ] Added new slot(s) per revision
- [ ] Added new class import(s) per revision
- [ ] Updated ALL examples to use new slots
- [ ] Validated with `linkml-lint` or `gen-owl`
- [ ] Updated `slot_fixes.yaml` with:
- `status: true`
- `timestamp` (ISO 8601)
- `session` identifier
- `notes` with "FULLY MIGRATED:" prefix
---
## See Also
- Rule 9: Enum-to-Class Promotion (single source of truth principle)
- Rule 0b: Type/Types File Naming Convention
- Rule: Slot Naming Convention (Current Style)
- `.opencode/ENUM_TO_CLASS_PRINCIPLE.md`
- `schemas/20251121/linkml/modules/slots/slot_fixes.yaml` - **AUTHORITATIVE** master list of migrations

View file

@ -0,0 +1,129 @@
# Rule: Generic Slots, Specific Classes
**Identifier**: `generic-slots-specific-classes`
**Severity**: **CRITICAL**
## Core Principle
**Slots MUST be generic predicates** that can be reused across multiple classes. **Classes MUST be specific** to provide context and constraints.
**DO NOT** create class-specific slots when a generic predicate can be used.
## Rationale
1. **Predicate Proliferation**: Creating bespoke slots for every class explodes the schema size (e.g., `has_museum_name`, `has_library_name`, `has_archive_name` instead of `has_name`).
2. **Interoperability**: Generic predicates (`has_name`, `has_identifier`, `has_part`) map cleanly to standard ontologies (Schema.org, Dublin Core, RiC-O).
3. **Querying**: It's easier to query "all entities with a name" than "all entities with museum_name OR library_name OR archive_name".
4. **Maintenance**: Updating one generic slot propagates to all classes.
## Examples
### ❌ Anti-Pattern: Class-Specific Slots
```yaml
# WRONG: Creating specific slots for each class
slots:
has_museum_visitor_count:
range: integer
has_library_patron_count:
range: integer
classes:
Museum:
slots:
- has_museum_visitor_count
Library:
slots:
- has_library_patron_count
```
### ✅ Correct Pattern: Generic Slot, Specific Class Usage
```yaml
# CORRECT: One generic slot reused
slots:
has_or_had_quantity:
slot_uri: rico:hasOrHadQuantity
range: Quantity
multivalued: true
classes:
Museum:
slots:
- has_or_had_quantity
slot_usage:
has_or_had_quantity:
description: The number of visitors to the museum.
Library:
slots:
- has_or_had_quantity
slot_usage:
has_or_had_quantity:
description: The number of registered patrons.
```
## Intermediate Class Pattern
Making slots generic often requires introducing **Intermediate Classes** to hold structured data, rather than flattening attributes onto the parent class.
### ❌ Anti-Pattern: Specific Flattened Slots
```yaml
# WRONG: Flattened specific attributes
classes:
Museum:
slots:
- has_museum_budget_amount
- has_museum_budget_currency
- has_museum_budget_year
```
### ✅ Correct Pattern: Generic Slot + Intermediate Class
```yaml
# CORRECT: Generic slot pointing to structured class
slots:
has_or_had_budget:
range: Budget
multivalued: true
classes:
Museum:
slots:
- has_or_had_budget
Budget:
slots:
- has_or_had_amount
- has_or_had_currency
- has_or_had_year
```
## Specificity Levels
| Level | Component | Example |
|-------|-----------|---------|
| **Generic** | **Slot (Predicate)** | `has_or_had_identifier` |
| **Specific** | **Class (Subject/Object)** | `ISILCode` |
| **Specific** | **Slot Usage (Context)** | "The ISIL code assigned to this library" |
## Migration Guide
If you encounter an overly specific slot:
1. **Identify the generic concept** (e.g., `has_museum_opening_hours``has_opening_hours`).
2. **Check if a generic slot exists** in `modules/slots/`.
3. **If yes**, use the generic slot and add `slot_usage` to the class.
4. **If no**, create the **generic** slot, not a specific one.
## Naming Indicators
**Reject slots containing:**
* Class names (e.g., `has_custodian_name``has_name`)
* Narrow types (e.g., `has_isbn_identifier``has_identifier`)
* Contextual specifics (e.g., `has_primary_email``has_email` + type/role)
## See Also
* Rule 55: Broaden Generic Predicate Ranges
* Rule: Slot Naming Convention (Current Style)

View file

@ -0,0 +1,157 @@
# Rule 59: LinkML Union Types Require `range: Any`
🚨 **CRITICAL**: When using `any_of` for union types in LinkML, you MUST also specify `range: Any` at the attribute level. Without it, the union type validation does NOT work.
## The Problem
LinkML's `any_of` construct allows defining slots that accept multiple types (e.g., string OR integer). However, there's a critical implementation detail:
**Without `range: Any`, the `any_of` constraint is silently ignored during validation.**
This leads to validation failures where data that should be valid (e.g., integer value in a string/integer union field) is rejected.
## Correct Pattern
```yaml
slots:
identifier_value:
range: Any # ← REQUIRED for any_of to work
any_of:
- range: string
- range: integer
description: The identifier value (can be string or integer)
```
## Incorrect Pattern (WILL FAIL)
```yaml
slots:
identifier_value:
# Missing range: Any - validation will fail!
any_of:
- range: string
- range: integer
description: The identifier value (can be string or integer)
```
## Common Use Cases
This pattern is required for:
| Use Case | Types | Example Fields |
|----------|-------|----------------|
| Identifier values | string \| integer | `identifier_value`, `geonames_id`, `viaf_id` |
| Social media IDs | string \| array | `youtube_channel_id`, `facebook_id`, `twitter_username` |
| Flexible identifiers | object \| array | `identifiers` (dict or list format) |
| Numeric strings | string \| integer | `postal_code`, `kvk_number` |
## Real-World Examples from GLAM Schema
### Example 1: OriginalEntryIdentifier.yaml
```yaml
# Before (BROKEN):
attributes:
identifier_value:
any_of:
- range: string
- range: integer
# After (WORKING):
attributes:
identifier_value:
range: Any # Added
any_of:
- range: string
- range: integer
```
### Example 2: WikidataSocialMedia.yaml
```yaml
# Social media fields that can be single value or array
attributes:
youtube_channel_id:
range: Any # Required for string|array union
any_of:
- range: string
- range: string
multivalued: true
description: YouTube channel ID (single value or array)
facebook_id:
range: Any
any_of:
- range: string
- range: string
multivalued: true
```
### Example 3: OriginalEntry.yaml (object|array union)
```yaml
# identifiers field that accepts both dict and array formats
attributes:
identifiers:
range: Any # Required for flexible typing
description: >-
Identifiers from original source. Accepts both dict format
(e.g., {isil: "XX-123"}) and array format
(e.g., [{scheme: "isil", value: "XX-123"}])
```
### Example 4: OriginalEntryLocation.yaml
```yaml
attributes:
geonames_id:
range: Any # Required for string|integer
any_of:
- range: string
- range: integer
description: GeoNames ID (may be string or integer depending on source)
```
## Validation Behavior
| Schema Definition | Integer Data | String Data | Result |
|-------------------|--------------|-------------|--------|
| `range: string` | ❌ FAIL | ✅ PASS | Strict string only |
| `range: integer` | ✅ PASS | ❌ FAIL | Strict integer only |
| `any_of` without `range: Any` | ❌ FAIL | ❌ FAIL | Broken - nothing works |
| `any_of` with `range: Any` | ✅ PASS | ✅ PASS | Correct union behavior |
## Why This Happens
LinkML's validation engine processes `range` first to determine the basic type constraint. When `range` is not specified (or defaults to `string`), it applies that constraint before checking `any_of`. The `range: Any` tells the validator to defer type checking to the `any_of` constraints.
## Checklist for Union Types
When adding a field that accepts multiple types:
- [ ] Define the `any_of` block with all acceptable ranges
- [ ] Add `range: Any` at the same level as `any_of`
- [ ] Test with sample data of each type
- [ ] Document the accepted types in the description
## See Also
- LinkML Documentation: [Union Types](https://linkml.io/linkml/schemas/advanced.html#union-types)
- GLAM Validation: `schemas/20251121/linkml/modules/classes/CustodianSourceFile.yaml`
- Validation command: `linkml-validate -s <schema>.yaml <data>.yaml`
## Migration Notes
**Affected Files (Fixed January 2026)**:
- `OriginalEntryIdentifier.yaml` - `identifier_value`
- `Identifier.yaml` - `identifier_value` slot_usage
- `WikidataSocialMedia.yaml` - `youtube_channel_id`, `facebook_id`, `instagram_username`, `linkedin_company_id`, `twitter_username`, `facebook_page_id`
- `YoutubeEnrichment.yaml` - `channel_id`
- `OriginalEntryLocation.yaml` - `geonames_id`
- `OriginalEntry.yaml` - `identifiers`
---
**Version**: 1.0
**Created**: 2026-01-18
**Author**: AI Agent (OpenCode Claude)

View file

@ -0,0 +1,181 @@
# LinkML YAML Best Practices Rule
## Rule: Follow LinkML Conventions for Valid, Interoperable Schema Files
### 1. equals_expression Anti-Pattern
`equals_expression` is for dynamic formula evaluation (e.g., `"{age_in_years} * 12"`). Never use it for static value constraints.
**WRONG:**
```yaml
slot_usage:
has_type:
equals_expression: '["hc:ArchiveOrganizationType"]'
hold_record_set:
equals_expression: '["hc:Fonds", "hc:Series"]'
```
**CORRECT** (single value):
```yaml
slot_usage:
has_type:
equals_string: "hc:ArchiveOrganizationType"
```
**CORRECT** (multiple allowed values - if classes):
```yaml
slot_usage:
hold_record_set:
any_of:
- range: UniversityAdministrativeFonds
- range: StudentRecordSeries
- range: FacultyPaperCollection
```
**CORRECT** (multiple allowed values - if literals):
```yaml
slot_usage:
status:
equals_string_in:
- "active"
- "inactive"
- "pending"
```
### 2. Declare All Used Prefixes
Every CURIE prefix used in the file must be declared in the `prefixes:` block.
**WRONG:**
```yaml
prefixes:
linkml: https://w3id.org/linkml/
skos: http://www.w3.org/2004/02/skos/core#
slot_usage:
has_type:
equals_string: "hc:ArchiveOrganizationType" # hc: not declared!
```
**CORRECT:**
```yaml
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
skos: http://www.w3.org/2004/02/skos/core#
default_prefix: hc
slot_usage:
has_type:
equals_string: "hc:ArchiveOrganizationType"
```
### 3. Import Referenced Classes
When using external classes in `is_a`, `range`, or other references, import them.
**WRONG:**
```yaml
imports:
- linkml:types
classes:
AcademicArchive:
is_a: ArchiveOrganizationType # Not imported!
slot_usage:
related_to:
range: WikidataAlignment # Not imported!
```
**CORRECT:**
```yaml
imports:
- linkml:types
- ../classes/ArchiveOrganizationType
- ../classes/WikidataAlignment
classes:
AcademicArchive:
is_a: ArchiveOrganizationType
slot_usage:
related_to:
range: WikidataAlignment
```
### 4. Quote Regex Patterns and Annotation Values
**Regex patterns:**
```yaml
# WRONG
pattern: ^Q[0-9]+$
# CORRECT
pattern: "^Q[0-9]+$"
```
**Annotation values (must be strings):**
```yaml
# WRONG
annotations:
specificity_score: 0.1
# CORRECT
annotations:
specificity_score: "0.1"
```
### 5. Remove Unused Imports
Only import slots and classes that are actually used in the file.
**WRONG:**
```yaml
imports:
- ../slots/has_scope # Never used in slots: or slot_usage:
- ../slots/has_score
- ../slots/has_type
```
**CORRECT:**
```yaml
imports:
- ../slots/has_score
- ../slots/has_type
```
### 6. Slot Usage Requires Slot Presence
A slot referenced in `slot_usage:` must either be:
- Listed in the `slots:` array, OR
- Inherited from a parent class via `is_a`
**WRONG:**
```yaml
classes:
MyClass:
slots:
- has_type
slot_usage:
has_type: {...}
identified_by: {...} # Not in slots: and not inherited!
```
**CORRECT:**
```yaml
classes:
MyClass:
slots:
- has_type
- identified_by
slot_usage:
has_type: {...}
identified_by: {...}
```
## Checklist for Class Files
- [ ] All prefixes used in CURIEs are declared
- [ ] `default_prefix` set if module belongs to that namespace
- [ ] All referenced classes are imported
- [ ] All used slots are imported
- [ ] No `equals_expression` with static JSON arrays
- [ ] Regex patterns are quoted
- [ ] Annotation values are quoted strings
- [ ] No unused imports
- [ ] `slot_usage` only references slots that exist (via slots: or inheritance)

View file

@ -0,0 +1,185 @@
# Mapping Specificity Rule: Broad vs Narrow vs Exact Mappings
## 🚨 CRITICAL: Mapping Semantics
When mapping LinkML classes to external ontologies, you MUST distinguish between **equivalence**, **hypernyms** (broader concepts), and **hyponyms** (narrower concepts).
### The Rule
1. **Exact Mappings (`skos:exactMatch`)**: Use ONLY when the external concept is **semantically equivalent** to your class.
* *Example*: `hc:Person` `exact_mappings` `schema:Person`.
* **CRITICAL**: Exact means the SAME semantic scope - neither broader nor narrower!
* **DO NOT AVOID EXACT BY DEFAULT**: If equivalence is verified (including class/property category match and ontology definition review), `exact_mappings` SHOULD be used.
2. **Broad Mappings (`skos:broadMatch`)**: Use when the external concept is a **hypernym** (a broader, more general category) of your class.
* *Example*: `hc:AcademicArchiveRecordSetType` `broad_mappings` `rico:RecordSetType`.
* *Rationale*: An academic archive record set *is a* record set type, but `rico:RecordSetType` is broader.
* *Common Hypernyms*: `skos:Concept`, `prov:Entity`, `prov:Activity`, `schema:Thing`, `schema:Organization`, `schema:Action`, `rico:RecordSetType`, `crm:E55_Type`.
3. **Narrow Mappings (`skos:narrowMatch`)**: Use when the external concept is a **hyponym** (a narrower, more specific category) of your class.
* *Example*: `hc:Organization` `narrow_mappings` `hc:Library` (if mapping inversely).
4. **Close Mappings (`skos:closeMatch`)**: Use when the external concept is similar but not exactly equivalent.
* *Example*: `hc:AccessPolicy` `close_mappings` `dcterms:accessRights` (related but different scope).
5. **Related Mappings (`skos:relatedMatch`)**: Use for non-hierarchical relationships.
* *Example*: `hc:Collection` `related_mappings` `rico:RecordSet`.
### 🚨 Type Compatibility Rule
**Classes map to classes, properties map to properties.** Never mix types in mappings.
| Your Element | Valid Mapping Target |
|--------------|---------------------|
| Class | Class (owl:Class, rdfs:Class) |
| Slot | Property (owl:ObjectProperty, owl:DatatypeProperty, rdf:Property) |
**WRONG**:
```yaml
# AccessApplication is a CLASS, schema:Action is a CLASS - but Action is BROADER
AccessApplication:
exact_mappings:
- schema:Action # WRONG: Action is a hypernym, not equivalent
```
**CORRECT**:
```yaml
AccessApplication:
broad_mappings:
- schema:Action # CORRECT: Action is the broader category
```
### 🚨 No Self/Internal Exact Mappings
`exact_mappings` MUST NOT contain self-references or internal HC class references for the same concept.
**WRONG**:
```yaml
AcademicArchive:
exact_mappings:
- hc:AcademicArchive # Self/internal reference; not an external equivalence mapping
```
**CORRECT**:
```yaml
AcademicArchive:
exact_mappings:
- wd:Q27032435 # External concept with equivalent semantic scope
```
Use `exact_mappings` only for equivalent terms in external ontologies or external controlled vocabularies, not for repeating the class itself.
### ✅ Positive Guidance: When Exact Mapping Is Correct
Use `exact_mappings` when all checks below pass:
- Semantic scope is equivalent (not parent/child, not merely similar)
- Ontological category matches (Class↔Class, Slot↔Property)
- Target term is verified in the ontology source files under `data/ontology/` or verified Wikidata entity metadata
- No self/internal duplication (no `hc:` self-reference for the same concept)
**CORRECT**:
```yaml
Person:
exact_mappings:
- schema:Person
Acquisition:
exact_mappings:
- crm:E8_Acquisition
```
Do not downgrade a truly equivalent mapping to `close_mappings` or `broad_mappings` just to be conservative.
### Common Hypernyms That Are NEVER Exact Mappings
These terms are always BROADER than your specific class - never use them as `exact_mappings`:
| Hypernym | What It Means | Use Instead |
|----------|---------------|-------------|
| `schema:Action` | Any action | `broad_mappings` |
| `schema:Organization` | Any organization | `broad_mappings` |
| `schema:Thing` | Anything at all | `broad_mappings` |
| `schema:PropertyValue` | Any property value | `broad_mappings` |
| `schema:Permit` | Any permit | `broad_mappings` |
| `prov:Activity` | Any activity | `broad_mappings` |
| `prov:Entity` | Any entity | `broad_mappings` |
| `skos:Concept` | Any concept | `broad_mappings` |
| `crm:E55_Type` | Any type classification | `broad_mappings` |
| `crm:E42_Identifier` | Any identifier | `broad_mappings` |
| `rico:Identifier` | Any identifier | `broad_mappings` |
| `dcat:DataService` | Any data service | `broad_mappings` |
### Common Violations to Avoid
**WRONG**:
```yaml
AcademicArchiveRecordSetType:
exact_mappings:
- rico:RecordSetType # WRONG: This implies AcademicArchiveRecordSetType == RecordSetType
```
**CORRECT**:
```yaml
AcademicArchiveRecordSetType:
broad_mappings:
- rico:RecordSetType # CORRECT: RecordSetType is broader
```
**WRONG**:
```yaml
SocialMovement:
exact_mappings:
- schema:Organization # WRONG: SocialMovement is a specific TYPE of Organization
```
**CORRECT**:
```yaml
SocialMovement:
broad_mappings:
- schema:Organization # CORRECT
```
**WRONG**:
```yaml
AccessApplication:
exact_mappings:
- schema:Action # WRONG: Action is a hypernym
```
**CORRECT**:
```yaml
AccessApplication:
broad_mappings:
- schema:Action # CORRECT: Action is the broader category
```
### How to Determine Mapping Type
Ask these questions:
1. **Is it the SAME thing?**`exact_mappings`
- "Could I swap these two terms in any context without changing meaning?"
- If NO, it's not an exact mapping
2. **Is the external term a PARENT category?**`broad_mappings`
- "Is my class a TYPE OF the external term?"
- Example: AccessApplication IS-A Action
3. **Is the external term a CHILD category?**`narrow_mappings`
- "Is the external term a TYPE OF my class?"
- Example: Library IS-A Organization (so Organization has narrow_mapping to Library)
4. **Is it similar but not hierarchical?**`close_mappings`
- "Related but not equivalent or hierarchical"
5. **Is there some other relationship?**`related_mappings`
- "Connected in some way"
### Verification Checklist
- [ ] Does the `exact_mapping` represent the **exact same scope**?
- [ ] Is the external term a generic parent class (e.g., `Type`, `Concept`, `Entity`, `Action`, `Activity`, `Organization`)? → Move to `broad_mappings`
- [ ] Is the external term a specific instance or subclass? → Check `narrow_mappings`
- [ ] Is the external term the same type (class→class, property→property)?
- [ ] Would swapping the terms change the meaning? If yes, not an `exact_mapping`

View file

@ -0,0 +1,177 @@
# Rule: Multilingual Support Requirements
## Overview
All LinkML slot files MUST include multilingual support with translations in the following languages:
| Code | Language | Required |
|------|----------|----------|
| `nl` | Dutch | ✅ Yes |
| `de` | German | ✅ Yes |
| `fr` | French | ✅ Yes |
| `ar` | Arabic | ✅ Yes |
| `id` | Indonesian | ✅ Yes |
| `zh` | Chinese (Simplified) | ✅ Yes |
| `es` | Spanish | ✅ Yes |
---
## Required Multilingual Fields
### 1. `alt_descriptions`
Provide faithful translations of the English `description` field:
```yaml
slots:
my_slot:
description: >-
To possess a specific structural arrangement or encoding standard.
alt_descriptions:
nl: >-
Het bezitten van een specifieke structurele rangschikking of coderingsstandaard.
de: >-
Das Besitzen einer spezifischen strukturellen Anordnung oder eines Kodierungsstandards.
fr: >-
Posséder un arrangement structurel spécifique ou une norme de codage.
ar: >-
امتلاك ترتيب هيكلي محدد أو معيار ترميز.
id: >-
Memiliki susunan struktural tertentu atau standar pengkodean.
zh: >-
拥有特定的结构安排或编码标准。
es: >-
Poseer una disposición estructural específica o un estándar de codificación.
```
### 2. `structured_aliases`
Provide translated slot names/labels for each language:
```yaml
slots:
has_format:
structured_aliases:
- literal_form: heeft formaat
predicate: EXACT_SYNONYM
in_language: nl
- literal_form: hat Format
predicate: EXACT_SYNONYM
in_language: de
- literal_form: a un format
predicate: EXACT_SYNONYM
in_language: fr
- literal_form: لديه تنسيق
predicate: EXACT_SYNONYM
in_language: ar
- literal_form: memiliki format
predicate: EXACT_SYNONYM
in_language: id
- literal_form: 具有格式
predicate: EXACT_SYNONYM
in_language: zh
- literal_form: tiene formato
predicate: EXACT_SYNONYM
in_language: es
```
---
## Translation Guidelines
### DO:
- Translate the semantic meaning faithfully
- Preserve technical precision
- Use natural phrasing for each language
- Keep translations concise (similar length to English)
### DON'T:
- Paraphrase or expand beyond the original meaning
- Add information not present in the English description
- Use machine translation without review
- Skip any of the required languages
---
## Complete Example
```yaml
id: https://nde.nl/ontology/hc/slot/catalogue
name: catalogue
title: catalogue
slots:
catalogue:
slot_uri: crm:P70_documents
description: >-
To systematically record, classify, and organize items within a structured
inventory or database for the purposes of documentation and retrieval.
alt_descriptions:
nl: >-
Het systematisch vastleggen, classificeren en ordenen van items binnen een
gestructureerde inventaris of database voor documentatie en terugvinding.
de: >-
Das systematische Erfassen, Klassifizieren und Ordnen von Objekten in einem
strukturierten Inventar oder einer Datenbank für Dokumentation und Abruf.
fr: >-
Enregistrer, classer et organiser systématiquement des éléments dans un
inventaire structuré ou une base de données à des fins de documentation et de récupération.
ar: >-
تسجيل وتصنيف وتنظيم العناصر بشكل منهجي ضمن جرد منظم أو قاعدة بيانات لأغراض التوثيق والاسترجاع.
id: >-
Mencatat, mengklasifikasikan, dan mengatur item secara sistematis dalam
inventaris terstruktur atau database untuk tujuan dokumentasi dan pengambilan.
zh: >-
在结构化清单或数据库中系统地记录、分类和组织项目,以便于文档编制和检索。
es: >-
Registrar, clasificar y organizar sistemáticamente elementos dentro de un
inventario estructurado o base de datos con fines de documentación y recuperación.
structured_aliases:
- literal_form: catalogiseren
predicate: EXACT_SYNONYM
in_language: nl
- literal_form: katalogisieren
predicate: EXACT_SYNONYM
in_language: de
- literal_form: cataloguer
predicate: EXACT_SYNONYM
in_language: fr
- literal_form: فهرسة
predicate: EXACT_SYNONYM
in_language: ar
- literal_form: mengkatalogkan
predicate: EXACT_SYNONYM
in_language: id
- literal_form: 编目
predicate: EXACT_SYNONYM
in_language: zh
- literal_form: catalogar
predicate: EXACT_SYNONYM
in_language: es
```
---
## Validation Checklist
Before completing a slot file, verify:
- [ ] `alt_descriptions` provided for all 7 languages (nl, de, fr, ar, id, zh, es)
- [ ] `structured_aliases` provided for all 7 languages
- [ ] Translations are faithful to the English original
- [ ] No language is skipped or left empty
- [ ] Arabic and Chinese characters render correctly
---
## See Also
- Rule 1: Preserve Original Descriptions (LINKML_EDITING_RULES.md)
- Rule 2: Translation Accuracy (LINKML_EDITING_RULES.md)
- Rule 3: Description Field Purity (LINKML_EDITING_RULES.md)
---
**Version**: 1.0.0
**Created**: 2026-02-03
**Author**: OpenCODE

View file

@ -0,0 +1,24 @@
# Rule: No Autonomous Alias Assignment
**Status**: ACTIVE
**Created**: 2026-02-10
## Rule
The agent MUST NOT assign aliases to canonical slot files on its own. Only the user decides which `new/` slot files are absorbed as aliases into which canonical slots.
## Rationale
Alias assignment is a semantic decision that determines the conceptual scope of a canonical slot. Incorrect alias assignment conflates distinct concepts. For example, `membership_criteria` (eligibility rules for joining) is not an alias of `has_mission` (organizational purpose), even though both relate to organizational governance.
## What the agent MUST do
1. When creating or polishing a canonical slot file, leave the `aliases` field empty unless the user has explicitly specified which aliases to include.
2. When processing `new/` files, present candidates to the user and wait for their alias assignment decisions.
3. Do NOT delete `new/` files until the user confirms the alias mapping.
## What the agent MUST NOT do
- Autonomously decide that a `new/` file should become an alias of a canonical slot.
- Add alias entries without explicit user instruction.
- Delete `new/` files based on self-determined alias assignments.

View file

@ -0,0 +1,46 @@
# Rule: Do Not Delete From slot_fixes.yaml
**Identifier**: `no-deletion-from-slot-fixes`
**Severity**: **CRITICAL**
## Core Directive
**NEVER delete entries from `slot_fixes.yaml`.**
The `slot_fixes.yaml` file serves as the historical record and audit trail for all schema migrations. Removing entries destroys this history and violates the project's data integrity principles.
## Workflow
When processing a migration:
1. **Do NOT Remove**: Never delete the entry for the slot you are working on.
2. **Update `processed`**: Instead, update the `processed` block:
* Set `status: true`.
* Set `date` to the current date (YYYY-MM-DD).
* Add a detailed `notes` string explaining what was done (e.g., "Fully migrated to [new_slot] + [Class] (Rule 53). [File].yaml updated. Slot archived.").
3. **Preserve History**: The entry must remain in the file permanently as a record of the migration.
## Rationale
* **Audit Trail**: We need to know what was migrated, when, and how.
* **Reversibility**: If a migration introduces a bug, the record helps us understand the original state.
* **Completeness**: The file tracks the total progress of the schema refactoring project.
## Example
**WRONG (Deletion)**:
```yaml
# DELETED from file
# - original_slot_id: ...
```
**CORRECT (Update)**:
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/has_some_slot
processed:
status: true
date: '2026-01-27'
notes: Fully migrated to has_or_had_new_slot + NewClass (Rule 53).
revision:
...
```

View file

@ -0,0 +1,189 @@
# Rule 52: No Duplicate Ontology Mappings
## Summary
Each ontology URI MUST appear in only ONE mapping category per schema element. A URI cannot simultaneously have multiple semantic relationships to the same class or slot.
## The Problem
LinkML provides five mapping annotation types based on SKOS vocabulary alignment:
| Property | SKOS Predicate | Meaning |
|----------|---------------|---------|
| `exact_mappings` | `skos:exactMatch` | "This IS that" (equivalent) |
| `close_mappings` | `skos:closeMatch` | "This is very similar to that" |
| `related_mappings` | `skos:relatedMatch` | "This is conceptually related to that" |
| `narrow_mappings` | `skos:narrowMatch` | "This is MORE SPECIFIC than that" |
| `broad_mappings` | `skos:broadMatch` | "This is MORE GENERAL than that" |
These relationships are **mutually exclusive**. A URI cannot simultaneously:
- BE the element (`exact_mappings`) AND be broader than it (`broad_mappings`)
- Be closely similar (`close_mappings`) AND be more general (`broad_mappings`)
## Anti-Pattern (WRONG)
```yaml
# WRONG - schema:url appears in TWO mapping types
slots:
source_url:
slot_uri: prov:atLocation
exact_mappings:
- schema:url # Says "source_url IS schema:url"
broad_mappings:
- schema:url # Says "schema:url is MORE GENERAL than source_url"
```
This is a **logical contradiction**: `source_url` cannot simultaneously BE `schema:url` AND be more specific than `schema:url`.
## Correct Pattern
```yaml
# CORRECT - each URI appears in only ONE mapping type
slots:
source_url:
slot_uri: prov:atLocation
exact_mappings:
- schema:url # source_url IS schema:url
close_mappings:
- dcterms:source # Similar but not identical
```
## Decision Guide: Which Mapping to Keep
When a URI appears in multiple categories, keep the **most precise** one:
### Precedence Order (keep the first match)
1. **exact_mappings** - Strongest claim: semantic equivalence
2. **close_mappings** - Strong claim: nearly equivalent
3. **narrow_mappings** / **broad_mappings** - Hierarchical relationship
4. **related_mappings** - Weakest claim: conceptual association
### Decision Matrix
| If URI appears in... | Keep | Remove |
|---------------------|------|--------|
| exact + broad | exact | broad |
| exact + close | exact | close |
| exact + related | exact | related |
| close + broad | close | broad |
| close + related | close | related |
| related + broad | related | broad |
| narrow + broad | narrow | broad (contradictory!) |
### Special Case: narrow + broad
If a URI appears in BOTH `narrow_mappings` AND `broad_mappings`, this is a **data error** - the same URI cannot be both more specific AND more general. Investigate which is correct based on the ontology definition.
## Real Examples Fixed
### Example 1: source_url
```yaml
# BEFORE (wrong)
slots:
source_url:
exact_mappings:
- schema:url
broad_mappings:
- schema:url # Duplicate!
# AFTER (correct)
slots:
source_url:
exact_mappings:
- schema:url # Keep exact (strongest)
# broad_mappings removed
```
### Example 2: Custodian class
```yaml
# BEFORE (wrong)
classes:
Custodian:
close_mappings:
- cpov:PublicOrganisation
narrow_mappings:
- cpov:PublicOrganisation # Duplicate!
# AFTER (correct)
classes:
Custodian:
close_mappings:
- cpov:PublicOrganisation # Keep close (Custodian ≈ PublicOrganisation)
# narrow_mappings: use for URIs that are MORE SPECIFIC than Custodian
```
### Example 3: geonames_id (narrow + broad conflict)
```yaml
# BEFORE (wrong - logical contradiction!)
slots:
geonames_id:
narrow_mappings:
- dcterms:identifier # Says geonames_id is MORE SPECIFIC
broad_mappings:
- dcterms:identifier # Says geonames_id is MORE GENERAL
# AFTER (correct)
slots:
geonames_id:
narrow_mappings:
- dcterms:identifier # geonames_id IS a specific type of identifier
# broad_mappings removed (was contradictory)
```
## Detection Script
Run this to find duplicate mappings in the schema:
```python
import yaml
from pathlib import Path
from collections import defaultdict
mapping_types = ['exact_mappings', 'close_mappings', 'related_mappings',
'narrow_mappings', 'broad_mappings']
dirs = [
Path('schemas/20251121/linkml/modules/slots'),
Path('schemas/20251121/linkml/modules/classes'),
]
for d in dirs:
for yaml_file in d.glob('*.yaml'):
try:
with open(yaml_file) as f:
content = yaml.safe_load(f)
except Exception:
continue
if not content:
continue
for section in ['classes', 'slots']:
items = content.get(section, {})
if not isinstance(items, dict):
continue
for name, defn in items.items():
if not isinstance(defn, dict):
continue
uri_to_types = defaultdict(list)
for mt in mapping_types:
for uri in defn.get(mt, []) or []:
uri_to_types[uri].append(mt)
for uri, types in uri_to_types.items():
if len(types) > 1:
print(f"{yaml_file}: {name} - {uri} in {types}")
```
## Validation Rule
**Pre-commit check**: Before committing LinkML schema changes, run the detection script. If any duplicates are found, the commit should fail.
## References
- [LinkML Mappings Documentation](https://linkml.io/linkml-model/latest/docs/mappings/)
- [SKOS Mapping Properties](https://www.w3.org/TR/skos-reference/#mapping)
- Rule 50: Ontology-to-LinkML Mapping Convention (parent rule)
- Rule 51: No Hallucinated Ontology References

View file

@ -0,0 +1,316 @@
# Rule 51: No Hallucinated Ontology References
**Priority**: CRITICAL
**Scope**: All LinkML schema files (`schemas/20251121/linkml/`)
**Created**: 2025-01-13
---
## Summary
All ontology references in LinkML schema files (`class_uri`, `slot_uri`, `*_mappings`) MUST be verifiable against actual ontology files in `/data/ontology/`. References to predicates or classes that do not exist in local ontology files are considered **hallucinated** and are prohibited.
---
## The Problem
AI agents may suggest ontology mappings based on training data without verifying that:
1. The ontology file exists in `/data/ontology/`
2. The specific predicate/class exists within that ontology file
3. The prefix is declared and resolvable
This leads to schema files containing references like `dqv:value` or `adms:status` that cannot be validated or serialized to RDF.
---
## Requirements
### 1. All Ontology Prefixes Must Have Local Files
Before using a prefix (e.g., `prov:`, `schema:`, `org:`), verify the ontology file exists:
```bash
# Check if ontology exists
ls data/ontology/ | grep -i "prov\|schema\|org"
```
**Available Ontologies** (as of 2025-01-13):
| Prefix | File | Verified |
|--------|------|----------|
| `prov:` | `prov-o.ttl`, `prov.ttl` | ✅ |
| `schema:` | `schemaorg.owl` | ✅ |
| `org:` | `org.rdf` | ✅ |
| `skos:` | `skos.rdf` | ✅ |
| `dcterms:` | `dublin_core_elements.rdf` | ✅ |
| `foaf:` | `foaf.ttl` | ✅ |
| `rico:` | `RiC-O_1-1.rdf` | ✅ |
| `crm:` | `CIDOC_CRM_v7.1.3.rdf` | ✅ |
| `geo:` | `geo.ttl` | ✅ |
| `sosa:` | `sosa.ttl` | ✅ |
| `bf:` | `bibframe.rdf` | ✅ |
| `edm:` | `edm.owl` | ✅ |
| `premis:` | `premis3.owl` | ✅ |
| `dcat:` | `dcat3.ttl` | ✅ |
| `ore:` | `ore.rdf` | ✅ |
| `pico:` | `pico.ttl` | ✅ |
| `gn:` | `geonames_ontology.rdf` | ✅ |
| `time:` | `time.ttl` | ✅ |
| `locn:` | `locn.ttl` | ✅ |
| `dqv:` | `dqv.ttl` | ✅ |
| `adms:` | `adms.ttl` | ✅ |
**NOT Available** (do not use without adding):
| Prefix | Status | Alternative |
|--------|--------|-------------|
| `qudt:` | Only referenced in era_ontology.ttl | Use `hc:` with close_mappings annotation |
### 2. Predicates Must Exist in Ontology Files
Before using a predicate, verify it exists:
```bash
# Verify predicate exists
grep -l "hasFrameRate\|frameRate" data/ontology/premis3.owl
# Check specific predicate definition
grep -E "premis:hasFrameRate|:hasFrameRate" data/ontology/premis3.owl
```
### 3. Use hc: Prefix for Domain-Specific Concepts
When no standard ontology predicate exists, use the Heritage Custodian namespace:
```yaml
# CORRECT - Use hc: with documentation
slots:
heritage_relevance_score:
slot_uri: hc:heritageRelevanceScore
description: Heritage sector relevance score (0.0-1.0)
annotations:
ontology_note: >-
No standard ontology predicate for heritage relevance scoring.
Domain-specific metric for this project.
# WRONG - Hallucinated predicate
slots:
heritage_relevance_score:
slot_uri: dqv:heritageScore # Does not exist!
```
### 4. Document External References in close_mappings
When a similar concept exists in an ontology we don't have locally, document it in `close_mappings` with a note:
```yaml
slots:
confidence_score:
slot_uri: hc:confidenceScore
close_mappings:
- dqv:value # W3C Data Quality Vocabulary (not in local files)
annotations:
external_ontology_note: >-
dqv:value from W3C Data Quality Vocabulary would be semantically
appropriate but ontology not included in project. See
https://www.w3.org/TR/vocab-dqv/
```
---
## Verification Workflow
### Before Adding New Mappings
1. **Check if ontology file exists**:
```bash
ls data/ontology/ | grep -i "<ontology-name>"
```
2. **Search for predicate in ontology**:
```bash
grep -l "<predicate-name>" data/ontology/*
```
3. **Verify predicate definition**:
```bash
grep -B2 -A5 "<predicate-name>" data/ontology/<file>
```
4. **If not found**: Use `hc:` prefix with appropriate documentation
### When Reviewing Existing Mappings
Run validation script:
```bash
# Find all slot_uri references
grep -r "slot_uri:" schemas/20251121/linkml/modules/slots/ | \
grep -v "hc:" | \
cut -d: -f3 | \
sort -u
# Verify each prefix has a local file
for prefix in prov schema org skos dcterms foaf rico; do
echo "Checking $prefix:"
ls data/ontology/ | grep -i "$prefix" || echo " NOT FOUND!"
done
```
---
## Ontology Addition Process
If a new ontology is genuinely needed:
1. **Download the ontology**:
```bash
curl -L -o data/ontology/<name>.ttl "<url>" -H "Accept: text/turtle"
```
2. **Update ONTOLOGY_CATALOG.md**:
```bash
# Add entry to data/ontology/ONTOLOGY_CATALOG.md
```
3. **Verify predicates exist**:
```bash
grep "<predicate>" data/ontology/<name>.ttl
```
4. **Update LinkML prefixes** in schema files
---
## Examples
### CORRECT: Verified Mapping
```yaml
slots:
retrieval_timestamp:
slot_uri: prov:atTime # Verified in data/ontology/prov-o.ttl
range: datetime
```
### CORRECT: Domain-Specific with External Reference
```yaml
slots:
confidence_score:
slot_uri: hc:confidenceScore # HC namespace (always valid)
range: float
close_mappings:
- dqv:value # External reference (documented, not required locally)
annotations:
ontology_note: >-
Uses HC namespace as dqv: ontology not in local files.
dqv:value would be semantically appropriate alternative.
```
### WRONG: Hallucinated Mapping
```yaml
slots:
confidence_score:
slot_uri: dqv:value # INVALID - dqv: not in data/ontology/!
range: float
```
### WRONG: Non-Existent Predicate
```yaml
slots:
frame_rate:
slot_uri: premis:hasFrameRate # INVALID - predicate not in premis3.owl!
range: float
```
---
## Consequences of Violation
1. **RDF serialization fails** - Invalid prefixes cause gen-owl errors
2. **Schema validation errors** - LinkML validates prefix declarations
3. **Broken interoperability** - External systems cannot resolve URIs
4. **Data quality issues** - Semantic web tooling cannot process data
---
## PREMIS Ontology Reference (premis3.owl)
**CRITICAL**: The PREMIS ontology is frequently hallucinated. ALL premis: references MUST be verified.
### Valid PREMIS Classes
```
Action, Agent, Bitstream, Copyright, Dependency, EnvironmentCharacteristic,
Event, File, Fixity, HardwareAgent, Identifier, Inhibitor, InstitutionalPolicy,
IntellectualEntity, License, Object, Organization, OutcomeStatus, Person,
PreservationPolicy, Representation, RightsBasis, RightsStatus, Rule, Signature,
SignatureEncoding, SignificantProperties, SoftwareAgent, Statute,
StorageLocation, StorageMedium
```
### Valid PREMIS Properties
```
act, allows, basis, characteristic, citation, compositionLevel, dependency,
determinationDate, documentation, encoding, endDate, fixity, governs,
identifier, inhibitedBy, inhibits, jurisdiction, key, medium, note,
originalName, outcome, outcomeNote, policy, prohibits, purpose, rationale,
relationship, restriction, rightsStatus, signature, size, startDate,
storedAt, terms, validationRules, version
```
### Known Hallucinated PREMIS Terms (DO NOT USE)
| Hallucinated Term | Correction |
|-------------------|------------|
| `premis:PreservationEvent` | Use `premis:Event` |
| `premis:RightsDeclaration` | Use `premis:RightsBasis` or `premis:RightsStatus` |
| `premis:hasRightsStatement` | Use `premis:rightsStatus` |
| `premis:hasRightsDeclaration` | Use `premis:rightsStatus` |
| `premis:hasRepresentation` | Use `premis:relationship` or `dcterms:hasFormat` |
| `premis:hasRelatedStatementInformation` | Use `premis:note` or `adms:status` |
| `premis:hasObjectCharacteristics` | Use `premis:characteristic` |
| `premis:rightsGranted` | Use `premis:RightsStatus` class with `premis:restriction` |
| `premis:rightsEndDate` | Use `premis:endDate` |
| `premis:linkingAgentIdentifier` | Use `premis:Agent` class |
| `premis:storageLocation` (lowercase) | Use `premis:storedAt` property or `premis:StorageLocation` class |
| `premis:hasFrameRate` | Does not exist - use `hc:frameRate` |
| `premis:environmentCharacteristic` (lowercase) | Use `premis:EnvironmentCharacteristic` (class) |
### PREMIS Verification Commands
```bash
# List all PREMIS classes
grep -E "owl:Class.*premis" data/ontology/premis3.owl | \
sed 's/.*v3\///' | sed 's/".*//' | sort -u
# List all PREMIS properties
grep -E "ObjectProperty|DatatypeProperty" data/ontology/premis3.owl | \
grep -oP 'v3/\K[^"]+' | sort -u
# Verify a specific term exists
grep -c "YourTermHere" data/ontology/premis3.owl
```
---
## See Also
- Rule 38: Slot Centralization and Semantic URI Requirements
- Rule 50: Ontology-to-LinkML Mapping Convention
- `/data/ontology/ONTOLOGY_CATALOG.md` - Available ontologies
- `.opencode/rules/slot-ontology-mapping-reference.md` - Mapping reference
---
## Version History
- **2025-01-13**: Added 7 more hallucinated PREMIS terms discovered during schema audit:
- `premis:hasRightsStatement`, `premis:hasRightsDeclaration`, `premis:hasRepresentation`
- `premis:hasRelatedStatementInformation`, `premis:rightsGranted`, `premis:rightsEndDate`
- `premis:linkingAgentIdentifier`
- **2025-01-13**: Initial creation after discovering dqv:, adms:, qudt: references without local files

View file

@ -0,0 +1,164 @@
# Rule 60: No Migration Deferral - Execute All slot_fixes.yaml Revisions
## Status: MANDATORY
## Summary
**AI agents MUST execute ALL migrations specified in `slot_fixes.yaml`.** The pattern of marking migrations as "WELL_STRUCTURED_NO_MIGRATION_NEEDED" or similar deferral notes is **STRICTLY PROHIBITED** except in the rarest cases of genuine semantic conflict.
## The Problem
Previous AI agents incorrectly deferred >95% of slot_fixes.yaml migrations with notes like:
- "WELL_STRUCTURED_NO_MIGRATION_NEEDED"
- "Already has proper slot_uri"
- "Simple enum is appropriate"
- "Would add unnecessary indirection"
- "Current structure is adequate"
**These deferrals were WRONG.** The slot_fixes.yaml revisions were carefully curated based on:
- Ontology analysis (CIDOC-CRM, RiC-O, PROV-O, Schema.org alignment)
- Semantic correctness and consistency
- Pattern adherence (Rule 39: RiC-O naming, Rule 43: singular nouns)
- Type/Types class hierarchy design (Rule 0b)
## The Rule
### MANDATORY: Execute ALL Migrations
When encountering a slot_fixes.yaml entry with `status: false`:
1. **READ the `revision` field** - This specifies EXACTLY what to do
2. **EXECUTE the migration** - Create/update slots, classes, imports as specified
3. **UPDATE status to `true`** - Document what was done in `notes`
### PROHIBITED: Deferral Reasoning
The following justifications are **INVALID** and **NEVER acceptable**:
| Invalid Justification | Why It's Wrong |
|-----------------------|----------------|
| "Already has proper slot_uri" | slot_uri handles external mapping; internal structure is separate concern |
| "Simple string/enum is sufficient" | Consistency and extensibility trump local simplicity |
| "Would add unnecessary indirection" | Indirection enables reuse and future extension |
| "Creating a class would over-engineer" | Ontology design favors class-based modeling |
| "Current structure is well-designed" | Individual optimization conflicts with schema-wide patterns |
| "No semantic benefit" | Agent lacks context to make this judgment |
### VALID: Reasons to Pause (Not Defer)
The ONLY valid reasons to pause a migration (NOT defer indefinitely) are:
1. **Semantic Conflict**: The proposed revision contradicts the slot's documented semantic intent
2. **Circular Dependency**: Migration would create import cycle
3. **External Consumer Impact**: Breaking change affecting downstream systems
4. **Class Already Exists**: Target class exists under different name (document and merge)
**When pausing**: Add a `feedback` entry explaining the specific conflict and request human review. Do NOT mark as "NO_MIGRATION_NEEDED".
## Decision Tree
```
Is there a slot_fixes.yaml entry with status: false?
├─ YES → Read the revision field
│ ├─ Does revision specify slots/classes to create?
│ │ └─ YES → EXECUTE THE MIGRATION (mandatory)
│ └─ Is there a genuine semantic conflict?
│ ├─ NO → EXECUTE THE MIGRATION (mandatory)
│ └─ YES → Document conflict in feedback, request human review
│ (Do NOT mark as "no migration needed")
└─ NO → Nothing to do
```
## Examples
### WRONG: Deferral Note
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/example_slot
revision:
- label: has_or_had_example
type: slot
- label: Example
type: class
processed:
status: true # WRONG - marked true without doing work
notes: "WELL_STRUCTURED_NO_MIGRATION_NEEDED - slot already has proper
slot_uri and the current structure is adequate" # INVALID
```
### CORRECT: Execute Migration
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/example_slot
revision:
- label: has_or_had_example
type: slot
- label: Example
type: class
processed:
status: true
timestamp: '2026-01-19T12:00:00Z'
notes: 'Migrated 2026-01-19 per Rule 53/56.
- Created has_or_had_example.yaml slot file
- Created Example.yaml class file
- Updated ClassA.yaml, ClassB.yaml to use new slot
- Archived: modules/slots/archive/example_slot_archived_20260119.yaml'
```
### CORRECT: Pause with Genuine Conflict
```yaml
- original_slot_id: https://nde.nl/ontology/hc/slot/conflicting_slot
revision:
- label: has_or_had_foo
type: slot
processed:
status: false # Correctly left false
notes: ''
feedback:
- timestamp: '2026-01-19T12:00:00Z'
user: opencode-claude
done: false
comment: |
PAUSED FOR HUMAN REVIEW - Genuine semantic conflict detected:
- Revision specifies has_or_had_foo (temporal relationship)
- But slot is used for immutable birth dates (should be has_*)
- Request clarification on intended temporal semantics
```
## Statistics Context
The slot_fixes.yaml file contains 527 migration entries. Analysis of previous agent behavior:
- **Incorrectly deferred**: >95% of entries marked "NO_MIGRATION_NEEDED"
- **Actually needing deferral**: <5% (genuine semantic conflicts)
- **Required action**: Execute ALL migrations except those with documented semantic conflicts
## Related Rules
- **Rule 53**: Full Slot Migration - slot_fixes.yaml is AUTHORITATIVE
- **Rule 56**: Semantic Consistency Over Simplicity - Always Execute Revisions
- **Rule 57**: slot_fixes.yaml Revision Key is IMMUTABLE
- **Rule 58**: Feedback vs Revision Distinction
## Anti-Patterns Checklist
Before marking ANY migration as complete without execution, verify:
- [ ] Did I actually create the specified slots?
- [ ] Did I actually create the specified classes?
- [ ] Did I update all class files that use this slot?
- [ ] Did I archive the old slot file?
- [ ] Is my "notes" field documenting actual work done, not a deferral excuse?
If any answer is "no", the migration is NOT complete.
## Consequences
Agents that defer migrations without genuine semantic conflict:
1. Create technical debt requiring human cleanup
2. Delay schema consistency improvements
3. Waste curator time reviewing false "completions"
4. Undermine trust in AI-assisted schema work
**Execute the migrations. Do not defer.**

View file

@ -0,0 +1,215 @@
# Rule 42: No Ontology Prefixes in Slot Names
**CRITICAL**: LinkML slot names MUST NOT include ontology namespace prefixes. Ontology references belong in mapping properties, NOT in element names.
---
## 1. The Problem
Slot names like `rico_has_or_had_holder` or `skos_broader` violate separation of concerns:
- **Slot names** should describe the semantic meaning in plain, readable terms
- **Ontology mappings** belong in `slot_uri`, `exact_mappings`, `close_mappings`, `related_mappings`, `narrow_mappings`, `broad_mappings`
Embedding ontology prefixes in names:
1. Creates coupling between naming and specific ontology versions
2. Reduces readability for non-ontology experts
3. Duplicates information already in mapping properties
4. Makes future ontology migrations harder
---
## 2. Correct Pattern
### Use Descriptive Names + Mapping Properties
```yaml
# CORRECT: Clean name with ontology reference in slot_uri
slots:
record_holder:
description: The custodian that holds or held this record set.
slot_uri: rico:hasOrHadHolder
exact_mappings:
- rico:hasOrHadHolder
close_mappings:
- schema:holdingArchive
range: Custodian
```
### WRONG: Ontology Prefix in Name
```yaml
# WRONG: Ontology prefix embedded in slot name
slots:
rico_has_or_had_holder: # BAD - "rico_" prefix
description: The custodian that holds or held this record set.
slot_uri: rico:hasOrHadHolder
range: string
```
---
## 3. Prohibited Prefixes in Slot Names
The following prefixes MUST NOT appear at the start of slot names:
| Prefix | Ontology | Example Violation |
|--------|----------|-------------------|
| `rico_` | Records in Contexts | `rico_organizational_principle` |
| `skos_` | SKOS | `skos_broader`, `skos_narrower` |
| `schema_` | Schema.org | `schema_name` |
| `dcterms_` | Dublin Core | `dcterms_created` |
| `dct_` | Dublin Core | `dct_identifier` |
| `prov_` | PROV-O | `prov_generated_by` |
| `org_` | W3C Organization | `org_has_member` |
| `crm_` | CIDOC-CRM | `crm_carried_out_by` |
| `foaf_` | FOAF | `foaf_knows` |
| `owl_` | OWL | `owl_same_as` |
| `rdf_` | RDF | `rdf_type` |
| `rdfs_` | RDFS | `rdfs_label` |
| `cpov_` | CPOV | `cpov_public_organisation` |
| `tooi_` | TOOI | `tooi_overheidsorganisatie` |
| `bf_` | BIBFRAME | `bf_title` |
| `edm_` | Europeana | `edm_provided_cho` |
---
## 4. Migration Examples
### Example 1: RiC-O Slots
```yaml
# BEFORE (wrong)
rico_has_or_had_holder:
slot_uri: rico:hasOrHadHolder
range: string
# AFTER (correct)
record_holder:
description: Reference to the custodian that holds or held this record set.
slot_uri: rico:hasOrHadHolder
exact_mappings:
- rico:hasOrHadHolder
range: Custodian
```
### Example 2: SKOS Slots
```yaml
# BEFORE (wrong)
skos_broader:
slot_uri: skos:broader
range: uriorcurie
# AFTER (correct)
broader_concept:
description: A broader concept in the hierarchy.
slot_uri: skos:broader
exact_mappings:
- skos:broader
range: uriorcurie
```
### Example 3: RiC-O Organizational Principle
```yaml
# BEFORE (wrong)
rico_organizational_principle:
slot_uri: rico:hasRecordSetType
range: string
# AFTER (correct)
organizational_principle:
description: The organizational principle (fonds, series, collection) for this record set.
slot_uri: rico:hasRecordSetType
exact_mappings:
- rico:hasRecordSetType
range: string
```
---
## 5. Exceptions
### 5.1 Identifier Slots
Slots that store **identifiers from external systems** may include system names (not ontology prefixes):
```yaml
# ALLOWED: External system identifier
wikidata_id:
description: Wikidata entity identifier (Q-number).
slot_uri: schema:identifier
range: string
pattern: "^Q[0-9]+$"
# ALLOWED: External system identifier
viaf_id:
description: VIAF identifier for authority control.
slot_uri: schema:identifier
range: string
```
### 5.2 Internal Namespace Force Slots
Technical slots for namespace generation are prefixed with `internal_`:
```yaml
# ALLOWED: Technical workaround slot
internal_wd_namespace_force:
description: Internal slot to force WD namespace generation. Do not use.
slot_uri: wd:Q35120
range: string
```
---
## 6. Validation
Run this command to find violations:
```bash
cd schemas/20251121/linkml/modules/slots
ls -1 *.yaml | grep -E "^(rico_|skos_|schema_|dcterms_|dct_|prov_|org_|crm_|foaf_|owl_|rdf_|rdfs_|cpov_|tooi_|bf_|edm_)"
```
Expected output: No files (after migration)
---
## 7. Rationale
### LinkML Best Practices
LinkML provides dedicated properties for ontology alignment:
| Property | Purpose | Example |
|----------|---------|---------|
| `slot_uri` | Primary ontology predicate | `slot_uri: rico:hasOrHadHolder` |
| `exact_mappings` | Semantically equivalent predicates | `exact_mappings: [schema:holdingArchive]` |
| `close_mappings` | Nearly equivalent predicates | `close_mappings: [dc:creator]` |
| `related_mappings` | Related but different predicates | `related_mappings: [prov:wasAttributedTo]` |
| `narrow_mappings` | More specific predicates | `narrow_mappings: [rico:hasInstantiation]` |
| `broad_mappings` | More general predicates | `broad_mappings: [schema:about]` |
See: https://linkml.io/linkml-model/latest/docs/mappings/
### Clean Separation of Concerns
- **Names**: Human-readable, domain-focused terminology
- **URIs**: Machine-readable, ontology-specific identifiers
- **Mappings**: Cross-ontology alignment documentation
This separation allows:
1. Renaming slots without changing ontology bindings
2. Adding new ontology mappings without renaming slots
3. Clear documentation of semantic relationships
4. Easier maintenance and evolution
---
## 8. See Also
- **Rule 38**: Slot Centralization and Semantic URI Requirements
- **Rule 39**: Slot Naming Convention (RiC-O Style) - for temporal naming patterns
- LinkML Mappings Documentation: https://linkml.io/linkml-model/latest/docs/mappings/

View file

@ -0,0 +1,61 @@
# Rule: No Rough Edits in Schema Files
**Identifier**: `no-rough-edits-in-schema`
**Severity**: **CRITICAL**
## Core Directive
**DO NOT** perform rough, imprecise, or bulk text substitutions (like `sed -i` or regex-based python scripts) on LinkML schema files (`schemas/*/linkml/`) without guaranteeing structural integrity.
**YOU MUST**:
* ✅ Use proper YAML parsers/dumpers if modifying structure programmatically.
* ✅ Manually verify edits if using text replacement.
* ✅ Ensure indentation and nesting are preserved exactly.
* ✅ Respect comments and ordering (which parsers often destroy, so careful text editing is sometimes necessary, but it must be PRECISE).
## Rationale
LinkML schemas are highly structured YAML files where indentation and nesting semantics are critical. Rough edits often cause:
* **Duplicate keys** (e.g., leaving a property behind after deleting its parent key).
* **Invalid indentation** (breaking the parent-child relationship).
* **Silent corruption** (valid YAML but wrong semantics).
## Examples
### ❌ Anti-Pattern: Rough Deletion
Deleting lines containing a string without checking context:
```python
# WRONG: Deleting lines blindly
for line in lines:
if "some_slot" in line:
continue # Deletes the line, but might leave children orphaned!
new_lines.append(line)
```
**Resulting Corruption**:
```yaml
# Original
slots:
some_slot:
range: string
# Corrupted (orphaned child)
slots:
range: string # INVALID!
```
### ✅ Correct Pattern: Structural Awareness
If removing a slot reference, ensure you remove the entire list item or key-value block.
```python
# BETTER: Check for list item syntax
if re.match(r'^\s*-\s*some_slot\s*$', line):
continue
```
## Application
This rule applies to ALL files in `schemas/20251121/linkml/` and future versions.

View file

@ -0,0 +1,53 @@
# Rule: No Version Indicators in Names
## 🚨 Critical
Do not include version identifiers in **class names**, **slot names**, or **enum names**.
Version tags in semantic names create churn, break reuse, and force unnecessary migrations.
## The Rule
1. Use stable semantic names for LinkML elements.
- ✅ `DigitalPlatform`
- ❌ `DigitalPlatformV2`
2. If a model evolves, keep the name and update metadata/provenance.
- Track revision in changelog, annotations, or transformation metadata.
- Do not encode `v2`, `v3`, `_2026`, `beta`, `final` in the element name.
3. Apply this to all naming surfaces:
- `classes:` keys
- `slots:` keys
- `enums:` keys
- `name:` values in module files
## Allowed Versioning Locations
- File-level changelog/comments
- Dedicated metadata classes/slots (e.g., transformation metadata)
- External release tags (git tags, manifest versions)
## Migration Guidance
When you encounter versioned names:
1. Rename semantic elements to stable names.
2. Update references/imports/usages accordingly.
3. Preserve provenance of the migration in comments/annotations.
## Examples
✅ Correct:
```yaml
classes:
DigitalPlatformTransformationMetadata:
description: Metadata about record transformation steps.
```
❌ Wrong:
```yaml
classes:
DigitalPlatformV2TransformationMetadata:
description: Metadata about V2 transformation.
```

View file

@ -0,0 +1,15 @@
# Rule: Ontology Detection vs Heuristics
## Summary
When detecting classes and predicates in `data/ontology/` or external ontology files, you must **read the actual ontology definitions** (e.g., RDF, OWL, TTL files) to determine if a term is a Class or a Property. Do not rely on naming heuristics (like "Capitalized means Class").
## Detail
* **Verification**: Always read the source ontology file or use a semantic lookup tool to verify the `rdf:type` of an entity.
* If `rdf:type` is `owl:Class` or `rdfs:Class`, it is a **Class**.
* If `rdf:type` is `rdf:Property`, `owl:ObjectProperty`, or `owl:DatatypeProperty`, it is a **Property**.
* **Avoid Heuristics**: Do not assume that `skos:Concept` is a class just because it looks like one (it is), or that `schema:name` is a property just because it's lowercase. Many ontologies have inconsistent naming conventions (e.g., `schema:Person` vs `foaf:Person`).
* **Strictness**: If the ontology file is not available locally, attempt to fetch it or consult authoritative documentation before guessing.
## Violation Examples
* Assuming `ex:MyTerm` is a class because it starts with an uppercase letter without checking the `.ttl` file.
* Mapping a LinkML slot to `schema:Thing` (a Class) instead of a Property because you guessed based on the name.

View file

@ -0,0 +1,306 @@
# Rule 50: Ontology-to-LinkML Mapping Convention
🚨 **CRITICAL**: When mapping base ontology classes and predicates to LinkML schema elements, use LinkML's dedicated mapping properties as documented at https://linkml.io/linkml-model/latest/docs/mappings/
---
## 1. What "LinkML Mapping" Means in This Project
**"LinkML mapping"** refers specifically to:
1. Connecting LinkML schema elements (classes, slots, enums) to external ontology URIs
2. Using LinkML's built-in mapping properties (`class_uri`, `slot_uri`, `*_mappings`)
3. Following SKOS-based vocabulary alignment standards
**LinkML mapping does NOT mean**:
- Creating arbitrary crosswalks in spreadsheets
- Writing prose descriptions of how concepts relate
- Inventing custom `@context` JSON-LD mappings outside the schema
---
## 2. LinkML Mapping Property Reference
### Primary Identity Properties
| Property | Applies To | Purpose | Example |
|----------|-----------|---------|---------|
| `class_uri` | Classes | Primary RDF class URI | `class_uri: ore:Aggregation` |
| `slot_uri` | Slots | Primary RDF predicate URI | `slot_uri: rico:hasOrHadHolder` |
| `enum_uri` | Enums | Enum namespace URI | `enum_uri: hc:PlatformTypeEnum` |
### SKOS-Based Mapping Properties
These properties express **semantic relationships** to external ontology terms:
| Property | SKOS Predicate | Meaning | Use When |
|----------|---------------|---------|----------|
| `exact_mappings` | `skos:exactMatch` | **IDENTICAL meaning** | Different ontology, **SAME semantics** (interchangeable) |
| `close_mappings` | `skos:closeMatch` | Very similar meaning | Similar but **NOT interchangeable** |
| `related_mappings` | `skos:relatedMatch` | Semantically related | Broader conceptual relationship |
| `narrow_mappings` | `skos:narrowMatch` | This is more specific | External term is broader |
| `broad_mappings` | `skos:broadMatch` | This is more general | External term is narrower |
### ⚠️ CRITICAL: `exact_mappings` Requires PRECISE Semantic Equivalence
**`exact_mappings` means the terms are INTERCHANGEABLE** - you could substitute one for the other in any context without changing meaning.
**Requirements for `exact_mappings`**:
1. **Same definition**: Both terms must have equivalent definitions
2. **Same scope**: Both terms cover the same set of instances
3. **Same constraints**: Same domain/range restrictions apply
4. **Bidirectional**: If A exactMatch B, then B exactMatch A
**DO NOT use `exact_mappings` when**:
- One term is a subset of the other (use `narrow_mappings`/`broad_mappings`)
- Terms are similar but have different scopes (use `close_mappings`)
- Terms are related but not equivalent (use `related_mappings`)
- You're uncertain about equivalence (default to `close_mappings`)
**Example - WRONG**:
```yaml
# PersonProfile is NOT equivalent to foaf:Person
# PersonProfile is a structured document ABOUT a person, not the person themselves
exact_mappings:
- foaf:Person # ❌ WRONG - different semantics!
```
**Example - CORRECT**:
```yaml
# foaf:Person and schema:Person ARE equivalent
# Both define "a person" with the same scope
exact_mappings:
- schema:Person # ✅ CORRECT - truly equivalent
```
---
## 3. Mapping Workflow: Ontology → LinkML
### Step 1: Identify External Ontology Class/Predicate
Search base ontology files in `/data/ontology/`:
```bash
# Find aggregation-related classes
rg -i "aggregation|aggregate" data/ontology/*.ttl data/ontology/*.rdf data/ontology/*.owl
# Check specific ontology
rg "rdfs:Class|owl:Class" data/ontology/ore.rdf | grep -i "aggregation"
```
### Step 2: Determine Mapping Strength
| Scenario | Mapping Property |
|----------|------------------|
| **This IS that ontology class** (identity) | `class_uri` |
| **Equivalent in another vocabulary** | `exact_mappings` |
| **Similar concept, different scope** | `close_mappings` |
| **Related but different granularity** | `narrow_mappings` / `broad_mappings` |
| **Conceptually related** | `related_mappings` |
### Step 3: Document Mapping in LinkML Schema
#### For Classes
```yaml
classes:
DataAggregator:
class_uri: ore:Aggregation # Primary identity - THIS IS an ORE Aggregation
description: |
A platform that harvests and STORES copies of metadata/content, causing data duplication.
ore:Aggregation - "A set of related resources grouped together."
Mapped to ORE because aggregators create aggregations of harvested metadata.
exact_mappings:
- edm:EuropeanaAggregation # Europeana's specialization
close_mappings:
- dcat:Catalog # Similar (collects datasets) but broader scope
narrow_mappings:
- edm:ProvidedCHO # More specific (single cultural object)
```
#### For Slots
```yaml
slots:
aggregates_from:
slot_uri: ore:aggregates # Primary predicate
description: |
Institutions whose data is aggregated (harvested and stored) by this platform.
ore:aggregates - "Aggregations assert ore:aggregates relationships."
exact_mappings:
- edm:aggregatedCHO # Europeana equivalent
range: HeritageCustodian
multivalued: true
```
---
## 4. Aggregation vs. Linking: A Mapping Example
This project requires **semantic precision** in distinguishing:
| Concept | Primary Mapping | Semantic Pattern |
|---------|-----------------|------------------|
| **Data Aggregation** | `ore:Aggregation` | Data is COPIED to aggregator's server |
| **Linking/Federation** | `dcat:DataService` | Data REMAINS at source; only links provided |
### Aggregation Pattern (Data Duplication)
```yaml
classes:
DataAggregator:
class_uri: ore:Aggregation
description: |
Harvests and stores copies of metadata from partner institutions.
Key semantic: Data DUPLICATION occurs - the aggregator maintains its own copy.
Examples: Europeana, DPLA, Archives Portal Europe
exact_mappings:
- edm:EuropeanaAggregation
annotations:
data_storage_pattern: AGGREGATION
causes_data_duplication: true
```
### Linking Pattern (Single Source of Truth)
```yaml
classes:
FederatedDiscoveryPortal:
class_uri: dcat:DataService
description: |
Provides unified search across multiple institutions but LINKS to original sources.
Key semantic: NO data duplication - users are redirected to source institutions.
Data remains at partner institutions' platforms (single source of truth).
close_mappings:
- schema:SearchAction # The search functionality
related_mappings:
- ore:Aggregation # Related but crucially different
annotations:
data_storage_pattern: LINKING
causes_data_duplication: false
```
### Linking Properties from EDM
Use `edm:isShownAt` and `edm:isShownBy` to express links to source:
```yaml
slots:
is_shown_at:
slot_uri: edm:isShownAt
description: |
Unambiguous URL to the digital object on the provider's web site
in its full information context.
edm:isShownAt - "The URL of a web view of the object in full context."
This property LINKS to the source institution - no data duplication.
range: uri
is_shown_by:
slot_uri: edm:isShownBy
description: |
Direct URL to the object in best available resolution on provider's site.
edm:isShownBy - "The URL of the object itself (not the context page)."
range: uri
```
---
## 5. Complete Mapping Documentation Template
When creating or updating a class with ontology mappings:
```yaml
classes:
MyNewClass:
# === PRIMARY IDENTITY ===
class_uri: {prefix}:{ClassName} # The ontology class this IS
# === DESCRIPTION WITH ONTOLOGY REFERENCE ===
description: |
{Human-readable description of what this class represents}
{Ontology}: {class} - "{Definition from ontology documentation}"
Mapping rationale:
- Chosen because: {why this ontology class fits}
- Not using X because: {why alternatives were rejected}
# === SKOS-BASED MAPPINGS ===
exact_mappings:
- {prefix}:{EquivalentClass} # Same meaning, different vocabulary
close_mappings:
- {prefix}:{SimilarClass} # Very similar but not identical
narrow_mappings:
- {prefix}:{MoreSpecificClass} # External is broader than ours
broad_mappings:
- {prefix}:{MoreGeneralClass} # External is narrower than ours
related_mappings:
- {prefix}:{RelatedClass} # Conceptually related
# === OPTIONAL ANNOTATIONS ===
annotations:
ontology_source: "{Full name of source ontology}"
ontology_version: "{Version if applicable}"
mapping_confidence: "high|medium|low"
mapping_notes: "{Additional context}"
```
---
## 6. Validation Checklist
Before committing ontology mappings:
- [ ] `class_uri` / `slot_uri` points to a real URI in `data/ontology/` files
- [ ] Description includes ontology definition (quoted from source)
- [ ] Mapping rationale documented for non-obvious choices
- [ ] `exact_mappings` used ONLY for truly equivalent terms
- [ ] `close_mappings` documented with difference explanation
- [ ] All prefixes declared in schema's `prefixes:` block
- [ ] Prefixes resolve to valid ontology namespaces
---
## 7. Common Ontology Prefixes for Mappings
| Prefix | Namespace | Ontology | Use For |
|--------|-----------|----------|---------|
| `ore:` | `http://www.openarchives.org/ore/terms/` | OAI-ORE | Aggregation patterns |
| `edm:` | `http://www.europeana.eu/schemas/edm/` | Europeana Data Model | Cultural heritage aggregation |
| `dcat:` | `http://www.w3.org/ns/dcat#` | DCAT | Data catalogs, services |
| `rico:` | `https://www.ica.org/standards/RiC/ontology#` | Records in Contexts | Archival description |
| `crm:` | `http://www.cidoc-crm.org/cidoc-crm/` | CIDOC-CRM | Cultural heritage events |
| `schema:` | `http://schema.org/` | Schema.org | Web semantics |
| `skos:` | `http://www.w3.org/2004/02/skos/core#` | SKOS | Concepts, labels |
| `dcterms:` | `http://purl.org/dc/terms/` | Dublin Core | Metadata properties |
| `prov:` | `http://www.w3.org/ns/prov#` | PROV-O | Provenance |
| `org:` | `http://www.w3.org/ns/org#` | W3C Organization | Organizations |
| `foaf:` | `http://xmlns.com/foaf/0.1/` | FOAF | People, agents |
---
## See Also
- [LinkML Mappings Documentation](https://linkml.io/linkml-model/latest/docs/mappings/)
- [LinkML URIs and Mappings Guide](https://linkml.io/linkml/schemas/uris-and-mappings.html)
- [LinkML class_uri Reference](https://linkml.io/linkml-model/latest/docs/class_uri/)
- [LinkML slot_uri Reference](https://linkml.io/linkml-model/latest/docs/slot_uri/)
- Rule 1: Ontology Files Are Your Primary Reference
- Rule 38: Slot Centralization and Semantic URI Requirements
- Rule 42: No Ontology Prefixes in Slot Names
---
**Version**: 1.0.0
**Created**: 2026-01-12
**Author**: OpenCODE

View file

@ -0,0 +1,45 @@
# Rule: Polished Slot Storage Location
## Summary
Polished (refactored) canonical slot files MUST be stored in the parent `slots/` directory:
```
schemas/20251121/linkml/modules/slots/
```
They must **NOT** be stored in the `20260202_matang/` subdirectory.
## Rationale
The `new/` subdirectory contain **draft/unpolished** slot definitions that are pending review. Once a slot file has been polished (ontology-aligned, translated, cleaned), it graduates to the canonical `slots/` directory.
## Directory Structure
```
schemas/20251121/linkml/modules/slots/
├── *.yaml ← Polished canonical slot files go HERE
└── 20260202_matang/
├── *.yaml ← Draft/unpolished canonical slots (staging area)
└── new/
└── *.yaml ← Raw/draft slot definitions pending triage
```
## Rule
- When polishing a slot file, write the result to `schemas/20251121/linkml/modules/slots/{slot_name}.yaml`
- If the source file was in `20260202_matang/`, remove it from there after writing to `slots/`
- If the source file was in `20260202_matang/new/`, it should only be deleted after user confirmation of alias absorption (per the no-autonomous-alias-assignment rule)
- If a file already exists in `slots/` (i.e., it was previously polished in an earlier session), overwrite it in place
## Examples
**CORRECT:**
```
schemas/20251121/linkml/modules/slots/has_pattern.yaml ← polished file
```
**WRONG:**
```
schemas/20251121/linkml/modules/slots/20260202_matang/has_pattern.yaml ← should not be here after polishing
```

View file

@ -0,0 +1,32 @@
# Rule: Preserve Bespoke Slots Until Refactoring
**Identifier**: `preserve-bespoke-slots-until-refactoring`
**Severity**: **CRITICAL**
## Core Directive
**DO NOT remove or migrate "additional" bespoke slots during generic migration passes unless they are the specific target of the current task.**
## Context
When migrating a specific slot (e.g., `has_approval_date`), you may encounter other bespoke or legacy slots in the same class file (e.g., `innovation_budget`, `operating_budget`).
**YOU MUST**:
* ✅ Migrate ONLY the specific slot you were instructed to work on.
* ✅ Leave other bespoke slots exactly as they are.
* ✅ Focus strictly on the current migration target.
**YOU MUST NOT**:
* ❌ Proactively migrate "nearby" slots just because they look like they need refactoring.
* ❌ Remove slots that seem unused or redundant without specific instruction.
* ❌ "Clean up" the class file by removing legacy attributes.
## Rationale
Refactoring is a separate, planned phase. Mixing opportunistic refactoring with systematic slot migration increases the risk of regression and makes changes harder to review. "We will refactor those later."
## Workflow
1. **Identify Target**: Identify the specific slot(s) assigned for migration (from `slot_fixes.yaml` or user prompt).
2. **Execute Migration**: Apply changes ONLY for those slots.
3. **Ignore Others**: Do not touch other slots in the file, even if they violate other rules (like Rule 39 or Rule 53). Those will be handled in their own dedicated tasks.

View file

@ -0,0 +1,190 @@
# Rule 56: Semantic Consistency Over Simplicity - Always Execute slot_fixes.yaml Revisions
🚨 **CRITICAL**: When `slot_fixes.yaml` specifies a revision, agents MUST execute it. Perceived simplicity of the existing slot is NOT a valid reason to reject a migration.
## The Core Problem
Previous agents marked migrations as "NO MIGRATION NEEDED" or "WELL_STRUCTURED_NO_MIGRATION_NEEDED" citing reasons like:
- "simple enum appropriate"
- "would add unnecessary indirection"
- "already has proper slot_uri"
- "string range is sufficient for this use case"
- "well-structured, no migration needed"
- "current implementation is adequate"
**These judgments were INCORRECT.**
**Over 95% of entries in `slot_fixes.yaml` represent carefully curated migrations that MUST be executed.**
The revisions were manually designed based on:
- Ontology analysis (CIDOC-CRM, RiC-O, PROV-O, Schema.org alignment)
- Semantic correctness
- Pattern consistency (Rule 39: RiC-O style naming)
- Type/Types class hierarchy design (Rule 0b)
## Why Revisions MUST Be Executed
| Principle | Explanation |
|-----------|-------------|
| **Schema Consistency** | Ontology achieves semantic power through consistent patterns, not local optimizations |
| **LinkML Mapping Separation** | `slot_uri` handles external ontology alignment; slot structure handles internal consistency |
| **Single Responsibility Principle** | Predicates should have single, focused purposes |
| **Extensibility First** | Structured classes enable future extension even if current use is simple |
| **Curated Quality** | Revisions were manually designed with ontology expertise - trust them |
## Invalid Reasons to Reject Migrations
| Rejected Reason | Why It's Invalid |
|-----------------|------------------|
| "Already has proper slot_uri" | slot_uri is for external mapping; internal structure is separate concern |
| "Simple string/enum is sufficient" | Consistency and extensibility trump local simplicity |
| "Would add unnecessary indirection" | Indirection enables reuse and future extension |
| "Creating a class would over-engineer" | Ontology design favors class-based modeling |
| "Well-structured, no migration needed" | **If revision exists, migration IS needed** |
| "Current implementation is adequate" | Adequacy is not the standard; consistency is |
| "WELL_STRUCTURED_NO_MIGRATION_NEEDED" | **This note type is PROHIBITED** |
## Valid Reasons to PAUSE Migrations (Not Reject)
These warrant discussion with the user, NOT unilateral rejection:
1. **Semantic Conflict**: Proposed slot_uri contradicts the semantic intent of the slot
2. **Class Already Exists**: The target class exists under a different name (propose merge)
3. **Circular Dependency**: Migration would create import cycles
4. **Breaking External API**: Change would break published external consumers
**Even these reasons require escalation to the user** - agents should NOT mark as "NO MIGRATION NEEDED".
## The Key Insight
Agents confused "has good external mapping" with "needs no migration". These are **independent concerns**:
- **External mapping** (slot_uri, exact_mappings): How this slot relates to external ontologies
- **Internal structure** (range, class design): How this slot fits within the LinkML schema
A slot can have perfect external mappings AND still need structural migration for internal consistency.
## Correct Agent Behavior
```yaml
# When encountering a slot_fixes.yaml entry:
1. READ the revision specification completely
2. DO NOT evaluate whether migration is "necessary" - it IS necessary
3. CREATE the specified slots and classes
4. UPDATE the consuming class(es)
5. ARCHIVE the old slot
6. ADD processed section with status: true
# NEVER add notes like:
# - "NO MIGRATION NEEDED"
# - "WELL_STRUCTURED_NO_MIGRATION_NEEDED"
# - "Current implementation is sufficient"
# - "Would add unnecessary complexity"
# - "Simple enum appropriate"
```
## Handling Previous "NO MIGRATION NEEDED" Entries
Entries previously marked with notes like "WELL_STRUCTURED_NO_MIGRATION_NEEDED" should be:
1. **Re-processed**: The migration should be executed as specified in `revision`
2. **Notes replaced**: Replace the incorrect notes with actual migration documentation
3. **Status verified**: Ensure `status: true` reflects ACTUAL migration, not skipped migration
## Example - WRONG Agent Behavior
```yaml
# WRONG - Agent decided migration wasn't needed
- original_slot_id: https://nde.nl/ontology/hc/slot/cites_appendix
revision:
- label: is_or_was_listed_in
type: slot
- label: CITESAppendix
type: class
processed:
status: true # ← Marked complete but NOT actually migrated!
notes: "WELL_STRUCTURED_NO_MIGRATION_NEEDED: Already has proper slot_uri
and string range is sufficient for CITES appendix values."
```
## Example - CORRECT Agent Behavior
```yaml
# CORRECT - Agent executed the migration as specified
- original_slot_id: https://nde.nl/ontology/hc/slot/cites_appendix
revision:
- label: is_or_was_listed_in
type: slot
- label: CITESAppendix
type: class
processed:
status: true
timestamp: '2026-01-19T00:00:00Z'
session: session-2026-01-19-cites-appendix-migration
notes: 'Migrated 2026-01-19 per Rule 53/56. Created is_or_was_listed_in.yaml.
Created CITESAppendix.yaml class. Updated BiologicalObject.yaml.
Archived: modules/slots/archive/cites_appendix_archived_20260119.yaml.'
```
## Feedback Field
The `feedback` field in slot_fixes.yaml entries contains user corrections to agent mistakes. When feedback says things like:
- "I reject this!"
- "Conduct the migration"
- "Please conduct accordingly"
- "I altered the revision"
This means a previous agent incorrectly deferred the migration, and it MUST now be executed.
## Schema Consistency Examples
### Why "Simple URI is fine" is WRONG
```yaml
# WRONG - Agent judgment: "Simple URI is fine"
thumbnail_url:
range: uri
slot_uri: schema:thumbnailUrl
# CORRECT - Consistent with all media references
has_or_had_thumbnail:
range: Thumbnail # Thumbnail class with has_or_had_url → URL
```
**Rationale**: All media references (images, thumbnails, videos, documents) should use the same structural pattern.
### Why "Simple enum is appropriate" is WRONG
```yaml
# WRONG - "Simple enum is fine"
thinking_mode:
range: ThinkingModeEnum # enabled, disabled, interleaved
# CORRECT - Enables extension
has_or_had_mode:
range: ThinkingMode
# ThinkingMode can have: mode_type, confidence, effective_date, etc.
```
**Rationale**: Even if current use is simple, structured classes enable future extension without breaking changes.
## Summary
**Trust the revision. Execute the migration. Document the work.**
The `revision` key in `slot_fixes.yaml` represents carefully curated ontology decisions. Agents are **executors** of these decisions, **not evaluators**. The only acceptable output is a completed migration with proper documentation.
## Related Rules
- **Rule 53**: slot_fixes.yaml is AUTHORITATIVE - Full Slot Migration
- **Rule 55**: Broaden Generic Predicate Ranges Instead of Creating Bespoke Predicates
- **Rule 57**: The revision key in slot_fixes.yaml is IMMUTABLE
- **Rule 39**: RiC-O Temporal Naming Conventions
- **Rule 38**: Slot Centralization and Semantic URI Requirements
## Revision History
- 2026-01-19: Strengthened with explicit prohibition of "WELL_STRUCTURED_NO_MIGRATION_NEEDED" notes
- 2026-01-16: Created based on analysis of 51 feedback entries in slot_fixes.yaml

View file

@ -1,48 +0,0 @@
# Rule: No Tool-Specific Classes
## Critical Convention
Ontology classes MUST be domain concepts, not wrappers for specific software tools.
## Rule
1. Do not model vendor/tool names as primary class concepts.
- Reject classes like `ExaSearchMetadata`, `OpenAIFetchResult`, `ElasticsearchHit`.
2. Model the generic domain activity or entity instead.
- Use names like `ExternalSearchMetadata`, `RetrievalActivity`, `SearchResult`.
3. Capture tool provenance through generic slots and values.
- Use `has_tool`, `has_method`, `has_agent`, `has_note` to record implementation details.
4. Platform custodians are allowed as domain classes.
- Classes for digital platforms that act as custodians (for example YouTube-related custodian classes) are valid.
- Operational tools used to query/process data are not valid ontology classes.
## Rationale
- Tool names are implementation details and change faster than domain semantics.
- Tool-specific classes reduce reuse and interoperability.
- Generic classes preserve stable meaning while still supporting full provenance.
## Examples
### Wrong
```yaml
classes:
ExaSearchMetadata:
class_uri: prov:Activity
```
### Correct
```yaml
classes:
ExternalSearchMetadata:
class_uri: prov:Activity
slots:
- has_tool
- has_method
- has_agent
```

View file

@ -12,7 +12,7 @@ They must **NOT** be stored in the `20260202_matang/` subdirectory.
## Rationale
The `20260202_matang/` directory and its `new/` subdirectory contain **draft/unpolished** slot definitions that are pending review. Once a slot file has been polished (ontology-aligned, translated, cleaned), it graduates to the canonical `slots/` directory.
The `new/` subdirectory contain **draft/unpolished** slot definitions that are pending review. Once a slot file has been polished (ontology-aligned, translated, cleaned), it graduates to the canonical `slots/` directory.
## Directory Structure

View file

@ -1,5 +1,5 @@
{
"generated": "2026-02-15T15:25:32.418Z",
"generated": "2026-02-15T17:46:11.976Z",
"schemaRoot": "/schemas/20251121/linkml",
"totalFiles": 2369,
"categoryCounts": {

View file

@ -19,7 +19,7 @@ description: |
- provenance: Data tier tracking and source lineage
- ghcid: Global Heritage Custodian ID with history
- identifiers: ISIL, Wikidata, GHCID variants
- enrichments: Google Maps, Wikidata, Genealogiewerkbalk, etc.
- enrichments: Google Maps, Wikidata, genealogy archive registries, etc.
- web_claims: Extracted claims with XPath provenance
- custodian_name: Consensus name determination
- location: Normalized geographic data
@ -153,7 +153,7 @@ imports:
- ./modules/classes/MergeNote
# Dutch Enrichments Domain
- ./modules/classes/ArchiveInfo
- ./modules/classes/GenealogiewerkbalkEnrichment
- ./modules/classes/GenealogyArchivesRegistryEnrichment
- ./modules/classes/IsilCodeEntry
- ./modules/classes/MunicipalityInfo
- ./modules/classes/NanIsilEnrichment

View file

@ -1,5 +1,5 @@
{
"generated": "2026-02-15T17:46:11.976Z",
"generated": "2026-02-15T18:20:10.034Z",
"schemaRoot": "/schemas/20251121/linkml",
"totalFiles": 2369,
"categoryCounts": {

View file

@ -34,12 +34,10 @@ default_prefix: hc
classes:
DonationScheme:
class_uri: schema:DonateAction
description: "A donation or giving scheme offered by a heritage custodian institution.\n\n**PURPOSE**:\n\nDonationScheme provides structured representation of the various ways\nindividuals and organizations can financially support heritage institutions.\nThese range from simple one-time donations to complex membership programs,\nadoption schemes, patron circles, and legacy giving vehicles.\n\n**HERITAGE SECTOR CONTEXT**:\n\nDonation schemes are critical for heritage institution sustainability:\n\n- **Museums**: Friends schemes, patron circles, acquisition fund drives\n- **Libraries**: Adopt-a-book programs, conservation appeals\n- **Archives**: \"Adopt history\" programs, preservation sponsorships\n- **Galleries**: Artist support funds, exhibition sponsorships\n- **Historical societies**: Heritage membership, research fellowships\n- **Botanical gardens**: Plant and animal adoption programs\n\n**MULTILINGUAL TERMINOLOGY**:\n\n\"Friends\" scheme terminology varies by country:\n- Dutch:\
\ Museumvriend, Vrienden van het museum\n- German: F\xF6rderverein, Freundeskreis\n- French: Amis du mus\xE9e, Soci\xE9t\xE9 des amis\n- Spanish: Amigos del museo\n- Italian: Amici del museo\n\n**PROVENANCE CHAIN**:\n\n```\nHeritageCustodian\n \u2502\n \u251C\u2500\u2500 offers_donation_schemes \u2500\u2500\u2192 DonationScheme[]\n \u2502 \u2502\n \u2502 \u251C\u2500\u2500 scheme_type: MEMBERSHIP_FRIENDS\n \u2502 \u251C\u2500\u2500 scheme_name: \"Rijksmuseum Vrienden\"\n \u2502 \u251C\u2500\u2500 minimum_amount: 60\n \u2502 \u251C\u2500\u2500 currency: \"EUR\"\n \u2502 \u251C\u2500\u2500 payment_frequency: \"annually\"\n \u2502 \u2502\n \u2502 \u2514\u2500\u2500 observed_in\
\ \u2500\u2500\u2192 WebObservation\n \u2502 \u2502\n \u2502 \u251C\u2500\u2500 source_url: https://rijksmuseum.nl/steun\n \u2502 \u251C\u2500\u2500 retrieved_on: 2026-01-01T10:00:00Z\n \u2502 \u2514\u2500\u2500 extraction_confidence: 0.95\n \u2502\n \u2514\u2500\u2500 web_observations \u2500\u2500\u2192 WebObservation[] (general custodian provenance)\n```\n\n**ONTOLOGY ALIGNMENT**:\n\n- **Schema.org**: `schema:DonateAction` - Action of donating to organization\n- **Schema.org**: `schema:Offer` - Scheme as offer with price specification\n- **W3C Org**: `org:Membership` - For membership-type schemes\n- **Dublin Core**: `dcterms:isPartOf` - Scheme belongs to institution\n- **PROV-O**: `prov:wasDerivedFrom` - Links scheme to observation\n\
\n**TAX INCENTIVE SCHEMES**:\n\nMany countries provide tax benefits for cultural donations:\n\n| Country | Scheme | Benefit |\n|---------|--------|---------|\n| Netherlands | ANBI | 100% deductible |\n| Netherlands | Cultural ANBI | 125% deductible (extra 25%) |\n| UK | Gift Aid | 25% tax reclaim for charity |\n| UK | Cultural Gifts Scheme | Tax relief on objects donated |\n| USA | 501(c)(3) | Itemized deduction |\n| Germany | Gemeinn\xFCtzigkeit | Tax deductible |\n| France | M\xE9c\xE9nat culturel | 60% tax reduction |\n\n**SCHEME CATEGORIES**:\n\nSchemes are classified via DonationSchemeTypeEnum into eight categories:\n\n1. **MEMBERSHIP_*** - Recurring membership/subscription\n - Friends, Young Friends, Family, Corporate, Research Fellow\n \n2. **PATRON_*** - High-value donor circles\n - Circle, Benefactor, Founders Circle, Life, National\n \n3. **ADOPTION_*** - Object sponsorship\n - Book, Artifact, Archive Collection, Artwork, Animal, Plant\n \n4. **LEGACY_*** - Planned/estate\
\ giving\n - Bequest, Charitable Trust, Endowment, Named Fund\n \n5. **DONATION_*** - Direct monetary gifts\n - One-off, Recurring, Appeal, Project, Tax Incentive\n \n6. **INKIND_*** - Non-monetary contributions\n - Object, Artwork, Archive, Library Collection, Expertise, Volunteer\n \n7. **SPONSORSHIP_*** - Corporate/event support\n - Exhibition, Gallery, Event, Program, Digitization, Conservation\n \n8. **CROWDFUNDING_*** - Campaign-based collective funding\n - Acquisition, Conservation, Building, Exhibition\n\n**EXTRACTION PATTERN**:\n\nWhen extracting donation schemes from institutional websites:\n\n1. Create WebObservation for the support/donate page\n2. For each scheme found:\n - Create DonationScheme with observed_in \u2192 WebObservation\n - Classify using DonationSchemeTypeEnum\n - Extract financial details (amounts, currency, frequency)\n - List benefits provided to donors\n - Note tax deductibility and applicable schemes\n - Assign extraction_confidence\
\ based on clarity\n\n**EXAMPLES**:\n\nSee class examples section for detailed instances.\n"
description: >-
Structured representation of an institutional giving program, including
donation type, financial thresholds, payment frequency, donor benefits,
tax treatment, provider organization, and source observation.
alt_descriptions:
nl: {text: Gestructureerd model van institutionele geefregelingen met bijdragevorm, voordelen, voorwaarden en toezicht., language: nl}
de: {text: Strukturiertes Modell institutioneller Spendenprogramme mit Beitragsform, Vorteilen, Bedingungen und Aufsicht., language: de}
@ -53,6 +51,7 @@ classes:
de: [{literal_form: Spendenprogramm, language: de}]
fr: [{literal_form: dispositif de don, language: fr}]
es: [{literal_form: esquema de donacion, language: es}]
it: [{literal_form: programma di donazione, language: it}]
ar: [{literal_form: برنامج تبرع, language: ar}]
id: [{literal_form: skema donasi, language: id}]
zh: [{literal_form: 捐赠计划, language: zh}]
@ -98,6 +97,8 @@ classes:
has_type:
required: true
range: DonationSchemeTypeEnum
description: Classification for the scheme modality, including membership, patron,
adoption, legacy, direct donation, in-kind, sponsorship, and crowdfunding families.
examples:
- value: MEMBERSHIP_FRIENDS
- value: ADOPTION_BOOK
@ -151,6 +152,7 @@ classes:
- value: Bookplate with donor name
offered_by:
required: true
description: Custodian organization that publishes and administers the scheme.
# range: string # uriorcurie
examples:
- value: https://nde.nl/ontology/hc/custodian/nl/rijksmuseum
@ -173,6 +175,7 @@ classes:
range: TaxScheme
multivalued: true
inlined_as_list: true
description: Applicable fiscal framework for deductibility or tax relief.
examples:
- value:
has_type: ANBI
@ -206,11 +209,14 @@ classes:
- has_percentage:
observed_in:
required: true
description: Source observation used to extract and verify scheme information.
# range: string # uriorcurie
examples:
- value: https://nde.nl/ontology/hc/observation/web/2026-01-01/rijksmuseum-support
comments:
- Each scheme links to WebObservation for full provenance chain
- Common domains include museum friends programs, archive adoption campaigns, and library conservation support
- Capture payment rhythm and thresholds as structured values, not embedded narrative
- Tax deductibility varies by jurisdiction - always document regulated_by_scheme
- Benefits should be extracted as discrete items for comparison
- Tiered schemes (e.g., Silver/Gold/Platinum) are separate DonationScheme instances

View file

@ -1,19 +1,43 @@
id: https://nde.nl/ontology/hc/class/FundingCall
name: FundingCall
title: Funding Call
description: A call for applications for funding. MIGRATED from funding_call slot per Rule 53. Follows CallForApplication class (schema:Offer).
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
default_prefix: hc
imports:
- linkml:types
default_prefix: hc
- ../classes/CallForApplication
classes:
FundingCall:
class_uri: hc:FundingCall
is_a: CallForApplication
class_uri: schema:Offer
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
description: >-
Public invitation that opens a defined application window for submitting
proposals to a specific funding opportunity.
alt_descriptions:
nl: Openbare oproep die een afgebakende indieningsperiode opent voor voorstellen binnen een specifieke financieringskans.
de: Oeffentliche Ausschreibung mit festem Einreichungszeitraum fuer Antraege auf eine bestimmte Foerdermoeglichkeit.
fr: Appel public ouvrant une periode definie pour soumettre des propositions a une opportunite de financement specifique.
es: Convocatoria publica que abre un periodo definido para presentar propuestas a una oportunidad de financiacion concreta.
ar: دعوة عامة تفتح فترة تقديم محددة لإرسال مقترحات لفرصة تمويل معينة.
id: Undangan publik yang membuka jendela aplikasi terdefinisi untuk pengajuan proposal pada peluang pendanaan tertentu.
zh: 为特定资助机会开启明确申报期的公开征集通知。
structured_aliases:
- literal_form: financieringsoproep
in_language: nl
- literal_form: Foerderaufruf
in_language: de
- literal_form: appel de financement
in_language: fr
- literal_form: convocatoria de financiacion
in_language: es
- literal_form: دعوة تمويل
in_language: ar
- literal_form: panggilan pendanaan
in_language: id
- literal_form: 资助征集
in_language: zh
broad_mappings:
- schema:Offer

View file

@ -1,23 +1,46 @@
id: https://nde.nl/ontology/hc/class/FundingFocus
name: FundingFocus
title: Funding Focus
description: A thematic focus or priority area for funding. MIGRATED from funding_focus slot per Rule 53. Follows skos:Concept.
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
skos: http://www.w3.org/2004/02/skos/core#
default_prefix: hc
imports:
- linkml:types
- ../slots/has_description
- ../slots/has_label
default_prefix: hc
classes:
FundingFocus:
class_uri: skos:Concept
class_uri: hc:FundingFocus
description: >-
Thematic priority category used to target funding toward specific policy,
research, or societal objectives.
alt_descriptions:
nl: Thematische prioriteitscategorie die financiering richt op specifieke beleids-, onderzoeks- of maatschappelijke doelen.
de: Thematische Prioritaetskategorie zur Ausrichtung von Foerdermitteln auf bestimmte politische, wissenschaftliche oder gesellschaftliche Ziele.
fr: Categorie de priorite thematique orientant le financement vers des objectifs politiques, de recherche ou societaux specifie.
es: Categoria de prioridad tematica que orienta la financiacion hacia objetivos politicos, de investigacion o sociales especificos.
ar: فئة أولوية موضوعية لتوجيه التمويل نحو أهداف سياساتية أو بحثية أو مجتمعية محددة.
id: Kategori prioritas tematik yang mengarahkan pendanaan ke tujuan kebijakan, riset, atau sosial tertentu.
zh: 用于将资助导向特定政策、研究或社会目标的主题优先类别。
structured_aliases:
- literal_form: financieringsfocus
in_language: nl
- literal_form: Foerderschwerpunkt
in_language: de
- literal_form: axe de financement
in_language: fr
- literal_form: enfoque de financiacion
in_language: es
- literal_form: محور التمويل
in_language: ar
- literal_form: fokus pendanaan
in_language: id
- literal_form: 资助重点
in_language: zh
slots:
- has_label
- has_description
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
- has_label
- has_description
broad_mappings:
- skos:Concept

View file

@ -1,26 +1,50 @@
id: https://nde.nl/ontology/hc/class/FundingProgram
name: FundingProgram
title: Funding Program
description: A program that provides funding, grants, or subsidies. MIGRATED from funding_program slot per Rule 53. Follows frapo:FundingProgramme.
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
frapo: http://purl.org/cerif/frapo/
skos: http://www.w3.org/2004/02/skos/core#
schema: http://schema.org/
default_prefix: hc
imports:
- linkml:types
- ../slots/has_description
- ../slots/has_label
- ../slots/targeted_at
default_prefix: hc
classes:
FundingProgram:
class_uri: frapo:FundingProgramme
class_uri: hc:FundingProgram
description: >-
Structured funding framework that groups related calls, budget lines, and
eligibility logic under a shared strategic objective.
alt_descriptions:
nl: Gestructureerd financieringskader dat verwante oproepen, budgetlijnen en subsidieregels bundelt onder een gedeeld strategisch doel.
de: Strukturiertes Foerderprogramm, das zusammenhaengende Ausschreibungen, Budgetlinien und Foerderlogiken unter einem gemeinsamen strategischen Ziel buendelt.
fr: Cadre de financement structure regroupant appels, lignes budgetaires et regles d'eligibilite autour d'un objectif strategique commun.
es: Marco de financiacion estructurado que agrupa convocatorias, lineas presupuestarias y logica de elegibilidad bajo un objetivo estrategico comun.
ar: إطار تمويلي منظم يجمع الدعوات وخطوط الميزانية ومنطق الأهلية ضمن هدف استراتيجي مشترك.
id: Kerangka pendanaan terstruktur yang mengelompokkan panggilan, lini anggaran, dan logika kelayakan di bawah tujuan strategis bersama.
zh: 在共同战略目标下整合相关征集、预算条线与资格逻辑的结构化资助框架。
structured_aliases:
- literal_form: financieringsprogramma
in_language: nl
- literal_form: Foerderprogramm
in_language: de
- literal_form: programme de financement
in_language: fr
- literal_form: programa de financiacion
in_language: es
- literal_form: برنامج تمويل
in_language: ar
- literal_form: program pendanaan
in_language: id
- literal_form: 资助计划
in_language: zh
slots:
- has_label
- has_description
- targeted_at
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
- has_label
- has_description
- targeted_at
broad_mappings:
- schema:FundingScheme
close_mappings:
- schema:Grant

View file

@ -1,23 +1,46 @@
id: https://nde.nl/ontology/hc/class/FundingRate
name: FundingRate
title: Funding Rate
description: The rate or percentage of funding provided. MIGRATED from funding_rate slot per Rule 53. Follows schema:MonetaryAmount or Percentage.
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
default_prefix: hc
imports:
- linkml:types
- ../slots/has_rate
- ../slots/maximum_of_maximum
default_prefix: hc
classes:
FundingRate:
class_uri: schema:MonetaryAmount
class_uri: hc:FundingRate
description: >-
Quantified proportion or cap that determines the share of eligible costs
covered by a funding instrument.
alt_descriptions:
nl: Gekwantificeerd percentage of plafond dat bepaalt welk deel van subsidiabele kosten wordt gedekt.
de: Quantifizierter Satz oder Hoechstwert, der den Anteil foerderfaehiger Kosten festlegt.
fr: Proportion ou plafond quantifie determinent la part des couts eligibles couverte par le financement.
es: Proporcion o tope cuantificado que determina la parte de costos elegibles cubierta por la financiacion.
ar: نسبة أو سقف كمي يحدد حصة التكاليف المؤهلة التي يغطيها التمويل.
id: Proporsi atau batas kuantitatif yang menentukan porsi biaya layak yang ditanggung instrumen pendanaan.
zh: 决定可资助成本覆盖比例或上限的量化比率指标。
structured_aliases:
- literal_form: financieringspercentage
in_language: nl
- literal_form: Foerdersatz
in_language: de
- literal_form: taux de financement
in_language: fr
- literal_form: tasa de financiacion
in_language: es
- literal_form: معدل التمويل
in_language: ar
- literal_form: tingkat pendanaan
in_language: id
- literal_form: 资助比例
in_language: zh
slots:
- has_rate
- maximum_of_maximum
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
- has_rate
- maximum_of_maximum
broad_mappings:
- schema:MonetaryAmount

View file

@ -1,17 +1,15 @@
id: https://nde.nl/ontology/hc/class/FundingRequirement
name: FundingRequirement
title: FundingRequirement Class
title: Funding Requirement
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
dcterms: http://purl.org/dc/terms/
schema: http://schema.org/
prov: http://www.w3.org/ns/prov#
pav: http://purl.org/pav/
skos: http://www.w3.org/2004/02/skos/core#
default_prefix: hc
imports:
- linkml:types
- ../enums/FundingRequirementTypeEnum
- ../slots/apply_to
- ../slots/has_note
- ../slots/has_score
@ -25,193 +23,77 @@ imports:
- ../slots/in_section
- ../slots/supersede
- ../slots/temporal_extent
default_prefix: hc
classes:
FundingRequirement:
class_uri: dcterms:Standard
description: "A requirement or criterion that applicants must meet to be eligible for\na funding call. Each requirement is tracked with provenance linking to\nthe source document where it was stated.\n\n**PURPOSE**:\n\nFundingRequirement provides structured, machine-readable representation\nof funding call eligibility criteria. Instead of storing requirements as\nfree-text lists in CallForApplication, each requirement becomes a\ntrackable entity with:\n\n- **Classification**: Categorized by FundingRequirementTypeEnum\n- **Provenance**: Linked to WebObservation documenting source\n- **Values**: Machine-readable value + human-readable text\n- **Temporality**: Valid date range for time-scoped requirements\n\n**PROVENANCE CHAIN**:\n\n```\nCallForApplication\n \u2502\n \u251C\u2500\u2500 requirements \u2500\u2500\u2192 FundingRequirement[]\n \u2502 \u2502\n \u2502 \u251C\u2500\u2500 requirement_type: PARTNERSHIP_MINIMUM_PARTNERS\n\
\ \u2502 \u251C\u2500\u2500 requirement_text: \"At least 3 partners from 3 EU countries\"\n \u2502 \u251C\u2500\u2500 requirement_value: \"3\"\n \u2502 \u251C\u2500\u2500 requirement_unit: \"partners\"\n \u2502 \u2502\n \u2502 \u2514\u2500\u2500 observed_in \u2500\u2500\u2192 WebObservation\n \u2502 \u2502\n \u2502 \u251C\u2500\u2500 source_url: https://ec.europa.eu/...\n \u2502 \u251C\u2500\u2500 retrieved_on: 2025-11-29T10:30:00Z\n \u2502 \u2514\u2500\u2500 extraction_confidence: 0.95\n \u2502\n \u2514\u2500\u2500 web_observations \u2500\u2500\u2192 WebObservation[] (general call provenance)\n```\n\n**ONTOLOGY\
\ ALIGNMENT**:\n\n- **Dublin Core**: `dcterms:Standard` - \"A reference point against which\n other things can be evaluated\" (requirements are standards for eligibility)\n- **Dublin Core**: `dcterms:requires` - Relates call to requirement\n- **Dublin Core**: `dcterms:conformsTo` - Applicants must conform to requirements\n- **Schema.org**: `schema:eligibleRegion` - For geographic requirements\n- **Schema.org**: `schema:eligibleQuantity` - For numeric constraints\n- **PROV-O**: `prov:wasDerivedFrom` - Links requirement to observation\n\n**REQUIREMENT CATEGORIES**:\n\nRequirements are classified into six main categories via FundingRequirementTypeEnum:\n\n1. **Eligibility** (ELIGIBILITY_*): Who can apply\n - Geographic: EU Member States, Associated Countries\n - Organizational: Non-profit, public body, SME\n - Heritage type: Museums, archives, libraries\n - Experience: Track record, previous projects\n\n2. **Financial** (FINANCIAL_*): Budget and funding\n - Co-funding: Match\
\ funding percentages\n - Budget limits: Minimum/maximum grant size\n - Funding rate: Percentage of eligible costs\n - Eligible costs: What can be funded\n\n3. **Partnership** (PARTNERSHIP_*): Consortium requirements\n - Minimum partners: Number required\n - Country diversity: Geographic spread\n - Sector mix: Organisation types needed\n - Coordinator: Lead partner constraints\n\n4. **Thematic** (THEMATIC_*): Topic and scope\n - Focus area: Required research/action themes\n - Heritage scope: Types of heritage addressed\n - Geographic scope: Where activities occur\n\n5. **Technical** (TECHNICAL_*): Outputs and approach\n - Deliverables: Required outputs\n - Open access: Publication requirements\n - Duration: Project length constraints\n - Methodology: Required approaches\n\n6. **Administrative** (ADMINISTRATIVE_*): Process requirements\n - Registration: Portal accounts needed\n - Documentation: Supporting documents\n - Language: Submission language\n\
\ - Format: Templates and page limits\n\n**TEMPORAL TRACKING**:\n\nRequirements can change between call publications. The `supersedes` field\nlinks to previous versions, and `valid_from`/`valid_to` scope applicability:\n\n```\nFundingRequirement (current)\n \u2502\n \u251C\u2500\u2500 valid_from: 2025-01-15\n \u251C\u2500\u2500 requirement_value: \"3\" (minimum partners)\n \u2502\n \u2514\u2500\u2500 supersedes \u2500\u2500\u2192 FundingRequirement (previous)\n \u2502\n \u251C\u2500\u2500 valid_from: 2024-01-15\n \u251C\u2500\u2500 valid_to: 2025-01-14\n \u2514\u2500\u2500 requirement_value: \"4\" (was 4 partners)\n```\n\n**EXTRACTION PATTERN**:\n\nWhen extracting requirements from web sources:\n\n1. Create WebObservation for the source page\n2. For each requirement found:\n - Create FundingRequirement with observed_in \u2192 WebObservation\n\
\ - Classify using FundingRequirementTypeEnum\n - Extract machine-readable value and unit\n - Record source_section for traceability\n - Assign extraction_confidence based on clarity\n\n**EXAMPLES**:\n\n1. **Partnership Requirement**\n - requirement_type: PARTNERSHIP_MINIMUM_PARTNERS\n - requirement_text: \"Minimum 3 independent legal entities from 3 different EU Member States\"\n - requirement_value: \"3\"\n - requirement_unit: \"partners\"\n - is_mandatory: true\n \n2. **Financial Requirement**\n - requirement_type: FINANCIAL_COFUNDING\n - requirement_text: \"Co-funding of minimum 25% from non-EU sources required\"\n - requirement_value: \"25\"\n - requirement_unit: \"percent\"\n - is_mandatory: true\n \n3. **Open Access Requirement**\n - requirement_type: TECHNICAL_OPEN_ACCESS\n - requirement_text: \"All peer-reviewed publications must be open access (Plan S compliant)\"\n - requirement_value: \"immediate\"\n - is_mandatory: true\n"
exact_mappings:
- dcterms:Standard
close_mappings:
- schema:QuantitativeValue
- skos:Concept
related_mappings:
- dcterms:requires
- dcterms:conformsTo
- schema:eligibleRegion
- schema:eligibleQuantity
- prov:wasDerivedFrom
class_uri: hc:FundingRequirement
description: >-
Eligibility or compliance criterion that must be satisfied for a proposal
to qualify under a specific funding call.
alt_descriptions:
nl: Subsidiabiliteits- of nalevingscriterium waaraan een voorstel moet voldoen om in aanmerking te komen binnen een specifieke oproep.
de: Eignungs- oder Compliance-Kriterium, das fuer die Foerderfaehigkeit eines Antrags in einem bestimmten Aufruf erfuellt sein muss.
fr: Critere d'eligibilite ou de conformite devant etre satisfait pour qu'une proposition soit recevable dans un appel donne.
es: Criterio de elegibilidad o cumplimiento que debe satisfacerse para que una propuesta califique en una convocatoria especifica.
ar: معيار أهلية أو امتثال يجب استيفاؤه لكي يتأهل المقترح ضمن دعوة تمويل محددة.
id: Kriteria kelayakan atau kepatuhan yang harus dipenuhi agar proposal memenuhi syarat pada panggilan pendanaan tertentu.
zh: 在特定资助征集中,提案必须满足的资格或合规条件。
structured_aliases:
- literal_form: financieringsvoorwaarde
in_language: nl
- literal_form: Foerdervoraussetzung
in_language: de
- literal_form: condition de financement
in_language: fr
- literal_form: requisito de financiacion
in_language: es
- literal_form: شرط التمويل
in_language: ar
- literal_form: persyaratan pendanaan
in_language: id
- literal_form: 资助要求
in_language: zh
slots:
- apply_to
- has_note
- mandatory
- observed_in
- identified_by
- has_text
- has_type
- has_type
- has_measurement_unit
- has_value
- in_section
- supersede
- has_score
- temporal_extent
- apply_to
- has_note
- mandatory
- observed_in
- identified_by
- has_text
- has_type
- has_measurement_unit
- has_value
- in_section
- supersede
- has_score
- temporal_extent
slot_usage:
identified_by:
identifier: true
required: true
# range: string # uriorcurie
pattern: ^https://nde\.nl/ontology/hc/requirement/[a-z0-9-]+/[a-z0-9-]+$
examples:
- value: https://nde.nl/ontology/hc/requirement/ec-cl2-2025-heritage-01/min-partners-3
- value: https://nde.nl/ontology/hc/requirement/nlhf-medium-2025/cofunding-25pct
has_type:
required: false
range: FundingRequirementTypeEnum
deprecated: 'DEPRECATED 2026-01-13: Use has_type with RequirementType class instead'
examples:
- value: PARTNERSHIP_MINIMUM_PARTNERS
- value: FINANCIAL_COFUNDING
- value: ELIGIBILITY_GEOGRAPHIC
has_type:
required: true
range: RequirementType
examples:
- value:
has_code: PARTNERSHIP_MINIMUM_PARTNERS
has_label:
- Minimum partners requirement@en
- value:
has_code: FINANCIAL_COFUNDING
has_label:
- Co-funding requirement@en
has_text:
required: true
# range: string
examples:
- value: Minimum 3 independent legal entities from 3 different EU Member States or Horizon Europe Associated Countries
- value: Applications must demonstrate at least 25% co-funding from non-EU sources
has_value:
# range: string
examples:
- value: '3'
- value: '25'
- value: eu-member-states
- value: immediate
has_measurement_unit:
# range: string
examples:
- value: partners
- value: percent
- value: EUR
- value: months
- value: countries
mandatory:
range: boolean
ifabsent: 'true'
examples:
- value: true
description: 'Mandatory: must meet to be eligible'
- value: false
description: 'Optional: preferred but not required'
observed_in:
required: true
# range: string # uriorcurie
examples:
- value: https://nde.nl/ontology/hc/observation/web/2025-11-29/eu-horizon-cl2-heritage
in_section:
# range: string
examples:
- value: Section 2.1 - Eligibility Criteria
- value: 'FAQ #7 - Consortium composition'
- value: Work Programme page 45
supersede:
# range: string # uriorcurie
examples:
- value: https://nde.nl/ontology/hc/requirement/ec-cl2-2024-heritage-01/min-partners-4
comments:
- Each requirement links to WebObservation for full provenance chain
- requirement_value + requirement_unit enable structured queries
- is_mandatory defaults to true; explicitly set false for optional requirements
- supersedes_or_superseded creates version chain for requirement changes
- extraction_confidence can differ from observation confidence
see_also:
- https://dublincore.org/specifications/dublin-core/dcmi-terms/#Standard
- https://schema.org/QuantitativeValue
- https://www.w3.org/TR/prov-o/#Entity
- http://purl.org/pav/
examples:
- value:
requirement_id: https://nde.nl/ontology/hc/requirement/ec-cl2-2025-heritage-01/min-partners-3-countries
requirement_type: PARTNERSHIP_MINIMUM_PARTNERS
requirement_text: Proposals must be submitted by a consortium of at least 3 independent legal entities established in 3 different EU Member States or Horizon Europe Associated Countries.
requirement_value: '3'
requirement_unit: partners
is_mandatory: true
apply_to: https://nde.nl/ontology/hc/call/ec/cl2-2025-heritage-01
observed_in: https://nde.nl/ontology/hc/observation/web/2025-11-29/eu-horizon-cl2-heritage
source_section: Section 2 - Eligibility Conditions
has_score:
has_score: 0.98
has_note: Clear statement in eligibility section. Standard Horizon Europe RIA requirement.
- value:
requirement_id: https://nde.nl/ontology/hc/requirement/ec-cl2-2025-heritage-01/cofunding-for-profit
requirement_type: FINANCIAL_COFUNDING
requirement_text: For-profit entities receive 70% funding rate. The remaining 30% must be covered by co-funding or own resources.
requirement_value: '30'
requirement_unit: percent
is_mandatory: true
apply_to: https://nde.nl/ontology/hc/call/ec/cl2-2025-heritage-01
observed_in: https://nde.nl/ontology/hc/observation/web/2025-11-29/eu-horizon-cl2-heritage
source_section: Section 3 - Financial Conditions
has_score:
has_score: 0.95
has_note: Applies only to for-profit partners. Non-profits receive 100% funding.
- value:
requirement_id: https://nde.nl/ontology/hc/requirement/ec-cl2-2025-heritage-01/open-access
requirement_type: TECHNICAL_OPEN_ACCESS
requirement_text: Beneficiaries must ensure open access to peer-reviewed scientific publications under the conditions required by the Grant Agreement. Immediate open access is mandatory (no embargo period).
requirement_value: immediate
requirement_unit: null
is_mandatory: true
apply_to: https://nde.nl/ontology/hc/call/ec/cl2-2025-heritage-01
observed_in: https://nde.nl/ontology/hc/observation/web/2025-11-29/eu-horizon-cl2-heritage
source_section: Section 4.2 - Open Science
has_score:
has_score: 0.99
has_note: Standard Horizon Europe open access requirement. Plan S compliant.
- value:
requirement_id: https://nde.nl/ontology/hc/requirement/nlhf-medium-2025/uk-based
requirement_type: ELIGIBILITY_GEOGRAPHIC
requirement_text: Your organisation must be based in the UK (England, Northern Ireland, Scotland or Wales). Projects must take place in the UK.
requirement_value: UK
requirement_unit: country
is_mandatory: true
apply_to: https://nde.nl/ontology/hc/call/nlhf/medium-grants-2025-q4
observed_in: https://nde.nl/ontology/hc/observation/web/2025-11-28/nlhf-medium-grants
source_section: Eligibility
has_score:
has_score: 0.99
has_note: Clear UK-only restriction. Devolved nations explicitly included.
- value:
requirement_id: https://nde.nl/ontology/hc/requirement/nlhf-medium-2025/non-profit
requirement_type: ELIGIBILITY_ORGANIZATIONAL
requirement_text: We can fund not-for-profit organisations, including charities, community groups, local authorities, and social enterprises. Private individuals and for-profit companies are not eligible.
requirement_value: non-profit
requirement_unit: organization-type
is_mandatory: true
apply_to: https://nde.nl/ontology/hc/call/nlhf/medium-grants-2025-q4
observed_in: https://nde.nl/ontology/hc/observation/web/2025-11-28/nlhf-medium-grants
source_section: Who can apply
has_score:
has_score: 0.95
has_note: Explicitly excludes for-profit. Social enterprises may need verification.
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
- value:
identified_by: https://nde.nl/ontology/hc/requirement/ec-call/minimum-partners
has_text: Minimum 3 independent legal entities from 3 different countries.
has_value: '3'
has_measurement_unit: partners
mandatory: true
description: Consortium size threshold requirement
- value:
identified_by: https://nde.nl/ontology/hc/requirement/ec-call/open-access
has_text: Immediate open access publication is required.
mandatory: true
description: Technical dissemination requirement
exact_mappings:
- dcterms:Standard
related_mappings:
- dcterms:requires
- dcterms:conformsTo
- schema:eligibleQuantity
- prov:wasDerivedFrom

View file

@ -1,30 +1,46 @@
id: https://nde.nl/ontology/hc/class/FundingScheme
name: FundingScheme
title: Funding Scheme
description: A scheme or program providing funding. MIGRATED from funding_scheme slot per Rule 53. Follows schema:FundingScheme.
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
dcterms: http://purl.org/dc/terms/
prov: http://www.w3.org/ns/prov#
crm: http://www.cidoc-crm.org/cidoc-crm/
skos: http://www.w3.org/2004/02/skos/core#
rdfs: http://www.w3.org/2000/01/rdf-schema#
org: http://www.w3.org/ns/org#
xsd: http://www.w3.org/2001/XMLSchema#
default_prefix: hc
imports:
- linkml:types
- ../slots/has_description
- ../slots/has_label
default_prefix: hc
classes:
FundingScheme:
class_uri: schema:FundingScheme
class_uri: hc:FundingScheme
description: >-
Rule-governed financing arrangement defining how resources are allocated,
evaluated, and distributed to eligible applicants.
alt_descriptions:
nl: Regeling met regels voor toewijzing, beoordeling en uitkering van middelen aan subsidiabele aanvragers.
de: Regelgebundene Finanzierungsregelung zur Zuweisung, Bewertung und Verteilung von Mitteln an foerderfaehige Antragstellende.
fr: Dispositif de financement reglemente definissant l'allocation, l'evaluation et la distribution des ressources aux candidats eligibles.
es: Esquema de financiacion con reglas que define asignacion, evaluacion y distribucion de recursos a solicitantes elegibles.
ar: نظام تمويلي قائم على قواعد يحدد كيفية تخصيص الموارد وتقييمها وتوزيعها على المتقدمين المؤهلين.
id: Skema pembiayaan berbasis aturan yang menentukan alokasi, evaluasi, dan distribusi sumber daya kepada pelamar yang memenuhi syarat.
zh: 定义资源分配、评审与发放给合格申请者方式的规则化资助机制。
structured_aliases:
- literal_form: financieringsregeling
in_language: nl
- literal_form: Foerderschema
in_language: de
- literal_form: dispositif de financement
in_language: fr
- literal_form: esquema de financiacion
in_language: es
- literal_form: مخطط التمويل
in_language: ar
- literal_form: skema pendanaan
in_language: id
- literal_form: 资助机制
in_language: zh
slots:
- has_label
- has_description
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
- has_label
- has_description
broad_mappings:
- schema:FundingScheme

View file

@ -1,33 +1,48 @@
id: https://nde.nl/ontology/hc/class/FundingSource
name: FundingSource
title: Funding Source
description: A source of funding, such as an organization or grant program. MIGRATED from funding_source slot per Rule 53. Follows frapo:FundingAgency.
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
frapo: http://purl.org/cerif/frapo/
skos: http://www.w3.org/2004/02/skos/core#
dcterms: http://purl.org/dc/terms/
prov: http://www.w3.org/ns/prov#
crm: http://www.cidoc-crm.org/cidoc-crm/
rdfs: http://www.w3.org/2000/01/rdf-schema#
org: http://www.w3.org/ns/org#
xsd: http://www.w3.org/2001/XMLSchema#
default_prefix: hc
imports:
- linkml:types
- ../slots/has_description
- ../slots/has_label
- ../slots/has_type
default_prefix: hc
classes:
FundingSource:
class_uri: frapo:FundingAgency
class_uri: hc:FundingSource
description: >-
Originating organization or mechanism from which financial support is
provided.
alt_descriptions:
nl: Organisatie of mechanisme van waaruit financiele ondersteuning afkomstig is.
de: Herkunftsorganisation oder Mechanismus, aus dem finanzielle Unterstuetzung bereitgestellt wird.
fr: Organisation ou mecanisme d'origine a partir duquel le soutien financier est fourni.
es: Organizacion o mecanismo de origen desde el cual se proporciona apoyo financiero.
ar: الجهة أو الآلية المصدِّرة التي يُقدَّم منها الدعم المالي.
id: Organisasi atau mekanisme asal dari mana dukungan finansial diberikan.
zh: 提供资金支持的来源组织或机制。
structured_aliases:
- literal_form: financieringsbron
in_language: nl
- literal_form: Finanzierungsquelle
in_language: de
- literal_form: source de financement
in_language: fr
- literal_form: fuente de financiacion
in_language: es
- literal_form: مصدر التمويل
in_language: ar
- literal_form: sumber pendanaan
in_language: id
- literal_form: 资金来源
in_language: zh
slots:
- has_label
- has_description
- has_type
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
- has_label
- has_description
- has_type
broad_mappings:
- schema:Organization

View file

@ -1,18 +1,46 @@
id: https://w3id.org/nde/ontology/Fylkesarkiv
name: Fylkesarkiv
title: Fylkesarkiv (Norwegian County Archive)
title: Fylkesarkiv
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
wd: http://www.wikidata.org/entity/
default_prefix: hc
imports:
- linkml:types
- ../classes/ArchiveOrganizationType
classes:
Fylkesarkiv:
class_uri: hc:Fylkesarkiv
is_a: ArchiveOrganizationType
class_uri: skos:Concept
description: "Norwegian county archive (fylkesarkiv). These archives serve as regional\narchival institutions at the county (fylke) level in Norway.\n\n**Wikidata**: Q15119463\n\n**Geographic Restriction**: Norway (NO) only.\nThis constraint is enforced via LinkML `rules` with `postconditions`.\n\n**Scope**:\nFylkesarkiv preserve:\n- County administration records (fylkeskommunen)\n- Municipal records from constituent kommuner\n- Regional health and social services documentation\n- Education records (videreg\xE5ende skole)\n- Cultural affairs and heritage documentation\n- Private archives from regional businesses and organizations\n\n**Administrative Context**:\nIn the Norwegian archival system:\n- Arkivverket (National Archives of Norway)\n- Fylkesarkiv (county level) \u2190 This type\n- Kommunearkiv/Byarkiv (municipal level)\n- Interkommunale arkiv (inter-municipal archives)\n\n**Historical Context**:\nNorway has reorganized its counties (2020 regional reform):\n- Some fylkesarkiv have\
\ merged following county mergers\n- County archives serve both historical fylker and new regions\n- Arkivverket coordinates national archival policy\n\n**Related Types**:\n- Landsarkiv - Regional state archives (under Arkivverket)\n- RegionalArchive (Q27032392) - Generic regional archives\n- CountyArchive - Generic county-level archives\n"
slot_usage: {}
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
description: >-
Regional archival institution at Norwegian county level responsible for
preserving and providing access to county and related local documentation.
alt_descriptions:
nl: Regionale archiefinstelling op Noors provinciaal niveau die provinciale en gerelateerde lokale documentatie bewaart en toegankelijk maakt.
de: Regionale Archiveinrichtung auf norwegischer Kreisebene zur Bewahrung und Bereitstellung von Kreis- und lokalbezogener Dokumentation.
fr: Institution archivistique regionale au niveau des comtes norvegiens, chargee de conserver et diffuser la documentation comtale et locale associee.
es: Institucion archivistica regional a nivel de condado noruego responsable de preservar y facilitar documentacion del condado y ambito local relacionado.
ar: مؤسسة أرشيفية إقليمية على مستوى المقاطعات في النرويج مسؤولة عن حفظ وإتاحة الوثائق الإدارية والإقليمية ذات الصلة.
id: Lembaga arsip regional tingkat county di Norwegia yang bertanggung jawab melestarikan dan menyediakan akses dokumentasi county serta lokal terkait.
zh: 挪威郡级区域档案机构,负责保存并提供郡级及相关地方文献的访问。
structured_aliases:
- literal_form: Noors provinciaal archief
in_language: nl
- literal_form: norwegisches Kreisarchiv
in_language: de
- literal_form: archives de comte norvegien
in_language: fr
- literal_form: archivo condal noruego
in_language: es
- literal_form: أرشيف المقاطعة النرويجي
in_language: ar
- literal_form: arsip county Norwegia
in_language: id
- literal_form: 挪威郡档案馆
in_language: zh
exact_mappings:
- wd:Q15119463
broad_mappings:
- schema:ArchiveOrganization

View file

@ -1,23 +1,46 @@
id: https://nde.nl/ontology/hc/class/GBIFIdentifier
name: GBIFIdentifier
title: GBIF Identifier
description: Global Biodiversity Information Facility (GBIF) identifier. MIGRATED from gbif_id slot per Rule 53. Follows dwc:occurrenceID.
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
dwc: http://rs.tdwg.org/dwc/terms/
schema: http://schema.org/
default_prefix: hc
imports:
- linkml:types
default_prefix: hc
- ./Identifier
classes:
GBIFIdentifier:
is_a: Identifier
class_uri: hc:GBIFIdentifier
is_a: Identifier
description: >-
Persistent identifier used to reference a biodiversity occurrence record in
GBIF-linked data pipelines.
alt_descriptions:
nl: Persistente identifier voor verwijzing naar een biodiversiteitswaarneming in GBIF-gekoppelde datastromen.
de: Persistenter Identifikator zur Referenzierung eines Biodiversitaetsnachweises in GBIF-verbundenen Datenablaeufen.
fr: Identifiant persistant utilise pour referencer un enregistrement d'occurrence de biodiversite dans des flux de donnees lies a GBIF.
es: Identificador persistente para referenciar un registro de ocurrencia de biodiversidad en flujos de datos vinculados a GBIF.
ar: معرّف دائم للإشارة إلى سجل وقوع للتنوع الحيوي ضمن مسارات بيانات مرتبطة بـ GBIF.
id: Pengidentifikasi persisten untuk merujuk catatan kejadian keanekaragaman hayati dalam alur data terkait GBIF.
zh: 用于在 GBIF 关联数据流程中引用生物多样性出现记录的持久标识符。
structured_aliases:
- literal_form: GBIF-id
in_language: nl
- literal_form: GBIF-Kennung
in_language: de
- literal_form: identifiant GBIF
in_language: fr
- literal_form: identificador GBIF
in_language: es
- literal_form: معرف GBIF
in_language: ar
- literal_form: pengenal GBIF
in_language: id
- literal_form: GBIF 标识符
in_language: zh
broad_mappings:
- schema:PropertyValue
close_mappings:
- dwc:occurrenceID
description: A persistent identifier for a biodiversity occurrence record.
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
- dwc:occurrenceID

View file

@ -1,22 +1,23 @@
id: https://nde.nl/ontology/hc/class/GHCIdentifier
name: GHCIdentifier
title: Global Heritage Custodian Identifier
description: The Global Heritage Custodian Identifier (GHCID). MIGRATED from ghcid slot per Rule 53. Follows dcterms:identifier.
title: GHC Identifier Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
dcterms: http://purl.org/dc/terms/
default_prefix: hc
imports:
- linkml:types
default_prefix: hc
classes:
GHCIdentifier:
is_a: Identifier
class_uri: hc:GHCIdentifier
description: Persistent identifier assigned to a heritage custodian in the GHCID namespace.
close_mappings:
- dcterms:identifier
description: 'A persistent, unique identifier for a heritage custodian. Format: CC-RR-LLL-T-ABBREVIATION'
- dcterms:Identifier
related_mappings:
- dcterms:identifier
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
specificity_score: 0.35
specificity_rationale: Core persistent identifier class for custodian identity resolution.
custodian_types: '["*"]'

View file

@ -1,30 +1,51 @@
id: https://nde.nl/ontology/hc/class/Gallery
name: Gallery
title: Gallery
description: An exhibition space or art gallery. MIGRATED from gallery_type_classification context. Follows schema:ArtGallery.
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
skos: http://www.w3.org/2004/02/skos/core#
default_prefix: hc
imports:
- linkml:types
- ../slots/has_description
- ../slots/has_label
- ../slots/has_type
default_prefix: hc
classes:
Gallery:
class_uri: schema:ArtGallery
class_uri: hc:Gallery
description: >-
Institution or venue dedicated to exhibiting visual art through curated
programs.
alt_descriptions:
nl: Instelling of locatie gewijd aan het tonen van beeldende kunst via gecureerde programma's.
de: Einrichtung oder Ort, der sich der Praesentation bildender Kunst in kuratierten Programmen widmet.
fr: Institution ou lieu dedie a l'exposition d'arts visuels au moyen de programmes cures.
es: Institucion o espacio dedicado a exhibir artes visuales mediante programas curados.
ar: مؤسسة أو فضاء مخصص لعرض الفنون البصرية عبر برامج تقييم/تنسيق فني.
id: Institusi atau tempat yang didedikasikan untuk memamerkan seni visual melalui program kurasi.
zh: 通过策展项目展示视觉艺术的机构或场所。
structured_aliases:
- literal_form: galerie
in_language: nl
- literal_form: Galerie
in_language: de
- literal_form: galerie d'art
in_language: fr
- literal_form: galeria de arte
in_language: es
- literal_form: معرض فني
in_language: ar
- literal_form: galeri seni
in_language: id
- literal_form: 美术馆
in_language: zh
slots:
- has_label
- has_description
- has_type
- has_label
- has_description
- has_type
slot_usage:
has_type:
# range: string # uriorcurie
required: true
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
exact_mappings:
- schema:ArtGallery

View file

@ -1,216 +1,70 @@
id: https://nde.nl/ontology/hc/class/GalleryType
name: GalleryType
title: Gallery Type Classification
title: Gallery Type
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
skos: http://www.w3.org/2004/02/skos/core#
default_prefix: hc
imports:
- linkml:types
- ../enums/GalleryTypeEnum
- ../slots/has_hypernym
- ../slots/identified_by # was: wikidata_entity
- ../slots/has_model # was: exhibition_model
- ./CustodianType
- ../slots/identified_by
- ../slots/has_model
- ../slots/has_objective
- ../slots/has_percentage
- ../slots/has_score # was: template_specificity
- ../slots/has_score
- ../slots/has_service
- ../slots/has_type
- ../slots/include # was: gallery_subtype
- ../slots/categorized_as # was: exhibition_focus
- ../slots/include
- ../slots/represent
- ../slots/has_activity
- ../slots/take_comission
classes:
GalleryType:
class_uri: hc:GalleryType
is_a: CustodianType
class_uri: skos:Concept
annotations:
skos:prefLabel: Gallery
description: >-
Controlled taxonomy root for classifying gallery organizational models,
exhibition strategies, and commercial posture.
alt_descriptions:
nl: Gecontroleerde taxonomiewortel voor classificatie van galerie-organisatiemodellen, tentoonstellingsstrategieën en commerciële oriëntatie.
de: Kontrollierte Taxonomie-Wurzel zur Klassifikation von Galerie-Organisationsmodellen, Ausstellungsstrategien und kommerzieller Ausrichtung.
fr: Racine taxonomique controlee pour classifier les modeles organisationnels de galeries, les strategies d'exposition et le positionnement commercial.
es: Raiz taxonomica controlada para clasificar modelos organizativos de galeria, estrategias expositivas y orientacion comercial.
ar: جذر تصنيفي مضبوط لتصنيف نماذج تنظيم المعارض واستراتيجيات العرض والاتجاه التجاري.
id: Akar taksonomi terkendali untuk mengklasifikasikan model organisasi galeri, strategi pameran, dan orientasi komersial.
zh: 用于分类画廊组织形态、展览策略与商业定位的受控分类根节点。
structured_aliases:
- literal_form: galerie
predicate: EXACT_SYNONYM
in_language: nl
- literal_form: galerijen
predicate: EXACT_SYNONYM
in_language: nl
- literal_form: gallery
predicate: EXACT_SYNONYM
in_language: en
- literal_form: galleries
predicate: EXACT_SYNONYM
in_language: en
- literal_form: art gallery
predicate: EXACT_SYNONYM
in_language: en
- literal_form: Galerie
predicate: EXACT_SYNONYM
in_language: de
- literal_form: Galerien
predicate: EXACT_SYNONYM
in_language: de
- literal_form: kunsthalle
predicate: EXACT_SYNONYM
in_language: de
- literal_form: galeria
predicate: EXACT_SYNONYM
in_language: es
- literal_form: galerías
predicate: EXACT_SYNONYM
in_language: es
- literal_form: galleria
predicate: EXACT_SYNONYM
in_language: it
- literal_form: gallerie
predicate: EXACT_SYNONYM
in_language: it
- literal_form: galeria
predicate: EXACT_SYNONYM
in_language: pt
- literal_form: galerias
predicate: EXACT_SYNONYM
in_language: pt
- literal_form: galerie
predicate: EXACT_SYNONYM
in_language: fr
- literal_form: galeries
predicate: EXACT_SYNONYM
in_language: fr
description: "Specialized custodian type for art galleries - institutions that exhibit\nand sometimes sell visual artworks,\
\ providing public access to contemporary\nor historical art through temporary or rotating exhibitions.\n\n**Wikidata\
\ Base Concept**: Q1007870 (art gallery)\n\n**Scope**:\nGalleries are distinguished by their focus on:\n- Exhibition-oriented\
\ (not collection-based like museums)\n- Contemporary or recent art (not historical artifacts)\n- Temporary exhibitions\
\ (rotating shows, not permanent displays)\n- Artist representation (commercial) or kunsthalle model (non-commercial)\n\
- Visual arts (paintings, sculptures, photography, installations)\n\n**Key Gallery Subtypes** (78+ extracted from Wikidata):\n\
\n**By Business Model**:\n- Commercial art galleries (Q56856618) - For-profit, sell artworks, represent artists\n- Noncommercial\
\ art galleries (Q67165238) - Exhibition-only, no sales\n- Kunsthalle (Q1475403) - German model, temporary exhibitions,\
\ no permanent collection\n- Vanity galleries (Q17111940) - Charge artists for exhibition space\n- National galleries\
\ (Q3844310) - State-run, representative of nation\n\n**By Subject Specialization**:\n- Photography galleries (Q114023739)\
\ - Photographic art exhibitions\n- Photo galleries (Q12303444) - Physical or digital photograph collections\n- Photography\
\ centres (Q11900212) - Dedicated photography venues\n- Photothèques (Q135926044) - Photographic heritage preservation\n\
- Sculpture gardens (Q1759852) - Outdoor sculpture exhibitions\n- Jewellery galleries (Q117072343) - Jewelry and decorative\
\ arts\n- Design galleries (Q127346204) - Design and applied arts\n- Map galleries (Q125501487) - Cartographic art exhibitions\n\
- Print rooms (Q445396) - Prints, drawings, watercolors, photographs\n\n**By Organizational Model**:\n- Artist-run centres\
\ (Q4801243) - Managed and directed by artists\n- Artist-run initiatives (Q3325736) - Gallery operated by artists\n\
- Artist-run spaces (Q4034417) - Organizations initiated by artists\n- Artist cooperatives (Q4801240) - Jointly owned\
\ by artist members\n- Canadian artist-run centres (Q16020664) - Canada-specific model (1960s+)\n\n**By Art Period Focus**:\n\
- Contemporary art galleries (Q16038801) - Current/recent art\n- Modern art galleries (Q3757717) - Modernist period\
\ (late 19th-20th century)\n- Contemporary arts centres (Q2945053) - Focus on contemporary practice\n- National centres\
\ for contemporary art (Q109017987) - State contemporary art venues\n\n**By Venue Type**:\n- Alternative exhibition\
\ spaces (Q16002704) - Non-traditional venues\n- Arts venues (Q15090615) - Places for artistic works display/performance\n\
- Arts centers (Q2190251) - Community centers for arts\n- Cast collections (Q29380643) - Plaster cast galleries (educational)\n\
- Plaster cast galleries (Q3768550) - Sculpture reproduction collections\n\n**By Artist Association**:\n- Artist museums\
\ (Q1747681) - Dedicated to particular artist\n- Artist houses (Q1797122) - Buildings with artist work rooms\n- Art\
\ colonies (Q1558054) - Places where artists live and interact\n- Art communes (Q4797182) - Communal living focused\
\ on art creation\n- Studio houses (Q2699076) - Residential spaces with studio facilities\n\n**Online & Digital**:\n\
- Online art galleries (Q7094057) - Digital exhibition platforms\n- Galeries Fnac (Q109038036) - French retail chain\
\ photo galleries (1970s+)\n\n**Specialized Formats**:\n- Pinacotheca (Q740437) - Public art gallery (classical term)\n\
- Print rooms (Q445396) - Graphic arts collections\n- Photograph collections (Q130486108) - Photography collections\n\
\n**French Model**:\n- Scientific, technical, and industrial culture centers (Q2946053) - Popular science venues\n\n\
**Cultural Context**:\n- Arts and Culture Centres (Q4801491) - Newfoundland & Labrador system (Canada)\n- Houses of\
\ culture (Q5061188) - Cultural institutions in socialist/social democratic contexts\n- Houses of literature (Q27908105)\
\ - Cultural institutions for written art\n- Centrum Beeldende Kunst (Q2104985) - Dutch visual arts centers\n\n**Supporting\
\ Organizations**:\n- Not-for-profit arts organizations (Q7062022) - Nonprofit arts foundations\n- Art institutions\
\ (Q20897549) - Organizations dedicated to art\n- Cultural institutions (Q3152824) - Preservation/promotion of culture\n\
\n**Commercial vs. Non-Commercial Distinction**:\n\n**Commercial Galleries**:\n- Represent artists (exclusive or non-exclusive\
\ contracts)\n- Sell artworks (earn commission on sales)\n- Participate in art fairs\n- Primary market (new works) or\
\ secondary market (resale)\n\n**Non-Commercial Galleries** (Kunsthalle model):\n- No permanent collection\n- Exhibition-only\
\ mission\n- Public or nonprofit funding\n- Educational/cultural programming\n- No artwork sales\n\n**RDF Serialization\
\ Example**:\n```turtle\n:Custodian_KunsthalRotterdam\n org:classification :GalleryType_Kunsthalle_Q1475403 .\n\n\
:GalleryType_Kunsthalle_Q1475403\n a glamtype:GalleryType, crm:E55_Type, skos:Concept ;\n skos:prefLabel \"Kunsthalle\"\
@en, \"kunsthalle\"@nl, \"Kunsthalle\"@de ;\n skos:broader :GalleryType_ArtGallery_Q1007870 ;\n schema:additionalType\
\ <http://www.wikidata.org/entity/Q1475403> ;\n glamtype:glamorcubesfixphdnt_code \"GALLERY\" ;\n glamtype:has_objective\
\ false ;\n glamtype:exhibition_focus \"contemporary art\" ;\n glamtype:sales_activity false ;\n glamtype:exhibition_model\
\ \"temporary rotating exhibitions\" .\n```\n\n**Domain-Specific Properties**:\nThis class adds gallery-specific metadata\
\ beyond base CustodianType:\n- `has_objective` - Structured profit objective (commercial/nonprofit/mixed)\n- `artist_representation`\
\ - Artists represented by gallery (for commercial galleries)\n- `exhibition_focus` - Type of art exhibited (contemporary,\
\ modern, photography, etc.)\n- `sales_activity` - Whether gallery sells artworks (not just exhibits)\n- `exhibition_model`\
\ - Exhibition strategy (temporary, rotating, curated shows)\n- `has_service` - Art sales service with commission structure (ArtSaleService)\n\n**Getty AAT Integration**:\nThe Getty Art & Architecture Thesaurus provides standardized\
\ vocabulary:\n- aat:300005768 - art galleries (institutions)\n- aat:300240057 - commercial galleries\n- aat:300240058\
\ - nonprofit galleries\n- aat:300005741 - kunsthalles\n\n**Art Market Context**:\nCommercial galleries operate in the\
\ art market ecosystem:\n- **Primary market**: Representing living artists, first sales\n- **Secondary market**: Resale\
\ of works by established artists\n- **Art fairs**: Participation in international art fairs (Basel, Frieze, etc.)\n\
- **Auction houses**: Different from galleries (auction vs. consignment model)\n\n**Data Population**:\nGallery subtypes\
\ extracted from 78 Wikidata entities with type='G'\nin `data/wikidata/GLAMORCUBEPSXHFN/hyponyms_curated_full.yaml`.\n"
- literal_form: galerietype
in_language: nl
- literal_form: Galerietyp
in_language: de
- literal_form: type de galerie
in_language: fr
- literal_form: tipo de galeria
in_language: es
- literal_form: نوع المعرض الفني
in_language: ar
- literal_form: tipe galeri
in_language: id
- literal_form: 画廊类型
in_language: zh
slots:
- represent
# REMOVED 2026-01-22: commercial_operation - migrated to has_objective + Profit (Rule 53)
- has_objective
# REMOVED 2026-01-22: commission_rate - migrated to has_service + ArtSaleService (Rule 53)
- has_service
- has_type
- has_type # was: exhibition_focus - migrated per Rule 53 (2026-01-26)
- has_model # was: exhibition_model - migrated per Rule 53 (2026-01-26)
- include # was: gallery_subtype - migrated per Rule 53 (2026-01-26)
- has_activity
- has_score # was: template_specificity - migrated per Rule 53 (2026-01-17)
- identified_by # was: wikidata_entity - migrated per Rule 53 (2026-01-16)
- represent
- has_objective
- has_service
- has_type
- has_model
- include
- has_activity
- has_score
- identified_by
slot_usage:
identified_by: # was: wikidata_entity - migrated per Rule 53 (2026-01-16)
pattern: ^Q[0-9]+$
identified_by:
required: true
has_hypernym:
range: GalleryType
required: false
has_type:
equals_expression: '["hc:GalleryType"]'
has_type: # was: exhibition_focus - migrated per Rule 53 (2026-01-26)
# range: string
has_model: # was: exhibition_model - migrated per Rule 53 (2026-01-26)
# range: string
include: # was: gallery_subtype - migrated per Rule 53 (2026-01-26)
equals_string: hc:GalleryType
include:
range: GalleryType
any_of:
- range: CommercialGallery
- range: NonProfitGallery
- range: ArtistRunSpace
- range: Kunsthalle
required: false
exact_mappings:
- skos:Concept
- schema:ArtGallery
close_mappings:
- crm:E55_Type
- aat:300005768
related_mappings:
- aat:300240057
- aat:300240058
comments:
- GalleryType implements SKOS-based classification for art gallery organizations
- Distinguishes commercial (sales-oriented) from non-commercial (kunsthalle) models
- Supports 78+ Wikidata gallery subtypes with multilingual labels
- Getty AAT integration for art market terminology
- 'Artist-run initiatives: Canadian model (1960s+), cooperative ownership'
examples:
- value:
identified_by: https://nde.nl/ontology/hc/type/gallery/Q1475403
has_type_code: GALLERY
has_label:
- Kunsthalle@en
- kunsthalle@nl
- Kunsthalle@de
has_description: facility that mounts temporary art exhibitions without permanent collection # was: type_description - migrated per Rule 53/56 (2026-01-16)
custodian_type_broader: https://nde.nl/ontology/hc/type/gallery/Q1007870
# MIGRATED 2026-01-22: commercial_operation → has_objective + Profit (Rule 53)
has_objective:
has_type: contemporary art
sales_activity: false
has_model: temporary rotating exhibitions, no permanent collection
- value:
identified_by: https://nde.nl/ontology/hc/type/gallery/Q56856618
has_type_code: GALLERY
has_label:
- Commercial Art Gallery@en
- kunstgalerie@nl
has_description: for-profit gallery that sells artworks and represents artists # was: type_description - migrated per Rule 53/56 (2026-01-16)
custodian_type_broader: https://nde.nl/ontology/hc/type/gallery/Q1007870
# MIGRATED 2026-01-22: commercial_operation → has_objective + Profit (Rule 53)
has_objective:
represents_or_represented:
- has_label: Artist A
- has_label: Artist B
- has_label: Artist C
has_type: contemporary painting and sculpture
sales_activity: true
has_model: curated exhibitions of represented artists
# MIGRATED 2026-01-22: commission_rate → has_service + ArtSaleService (Rule 53)
has_service:
sales_activity: true
takes_or_took_comission:
has_percentage:
broad_mappings:
- skos:Concept

View file

@ -1,38 +1,36 @@
id: https://nde.nl/ontology/hc/class/GalleryTypes
name: GalleryTypes
title: Gallery Type Subclasses
description: Concrete subclasses of GalleryType. MIGRATED from gallery_subtype slot
per Rule 53/0b.
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
skos: http://www.w3.org/2004/02/skos/core#
default_prefix: hc
imports:
- ./GalleryType
- linkml:types
default_prefix: hc
classes:
CommercialGallery:
is_a: GalleryType
description: A gallery that sells art.
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: '[''*'']'
class_uri: hc:CommercialGallery
description: Gallery model that combines exhibition with artwork sales and artist representation.
broad_mappings:
- skos:Concept
- skos:Concept
NonProfitGallery:
is_a: GalleryType
description: A gallery that operates as a non-profit.
class_uri: hc:NonProfitGallery
description: Gallery model operating under nonprofit governance and mission-oriented programming.
broad_mappings:
- skos:Concept
- skos:Concept
ArtistRunSpace:
is_a: GalleryType
description: A gallery run by artists.
class_uri: hc:ArtistRunSpace
description: Gallery model initiated and managed primarily by artists.
broad_mappings:
- skos:Concept
- skos:Concept
Kunsthalle:
is_a: GalleryType
description: An art exhibition space without a permanent collection.
class_uri: hc:Kunsthalle
description: Exhibition-oriented gallery model without a permanent collection.
broad_mappings:
- skos:Concept
- skos:Concept

View file

@ -1,20 +1,43 @@
id: https://nde.nl/ontology/hc/class/GenBankAccession
name: GenBankAccession
title: GenBank Accession
description: A GenBank accession number for a nucleotide sequence. MIGRATED from genbank_accession slot per Rule 53. Follows BioProject/GenBank identifiers.
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
default_prefix: hc
imports:
- linkml:types
default_prefix: hc
- ./Identifier
classes:
GenBankAccession:
class_uri: hc:GenBankAccession
is_a: Identifier
class_uri: schema:PropertyValue
description: A persistent identifier for a nucleotide sequence in GenBank.
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
description: >-
Persistent accession identifier assigned to a nucleotide sequence record in
GenBank.
alt_descriptions:
nl: Persistente toegangscode toegekend aan een nucleotide-sequentierecord in GenBank.
de: Persistente Zugriffkennung fuer einen Nukleotidsequenz-Datensatz in GenBank.
fr: Numero d'accession persistant attribue a un enregistrement de sequence nucleotidique dans GenBank.
es: Accesion persistente asignada a un registro de secuencia nucleotidica en GenBank.
ar: رقم إتاحة دائم يُسند إلى سجل تسلسل نوكليوتيدي في GenBank.
id: Nomor aksesi persisten yang ditetapkan pada rekaman sekuens nukleotida di GenBank.
zh: 分配给 GenBank 核苷酸序列记录的持久登录号标识。
structured_aliases:
- literal_form: GenBank-toegangscode
in_language: nl
- literal_form: GenBank-Zugangsnummer
in_language: de
- literal_form: numero d'accession GenBank
in_language: fr
- literal_form: numero de acceso GenBank
in_language: es
- literal_form: رقم إتاحة GenBank
in_language: ar
- literal_form: nomor aksesi GenBank
in_language: id
- literal_form: GenBank 登录号
in_language: zh
broad_mappings:
- schema:PropertyValue

View file

@ -1,30 +1,48 @@
id: https://nde.nl/ontology/hc/class/Gender
name: Gender
title: Gender
description: Gender identity or classification. MIGRATED from gender_identity slot per Rule 53. Follows schema:GenderType.
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
skos: http://www.w3.org/2004/02/skos/core#
dcterms: http://purl.org/dc/terms/
prov: http://www.w3.org/ns/prov#
crm: http://www.cidoc-crm.org/cidoc-crm/
rdfs: http://www.w3.org/2000/01/rdf-schema#
org: http://www.w3.org/ns/org#
xsd: http://www.w3.org/2001/XMLSchema#
default_prefix: hc
imports:
- linkml:types
- ../slots/has_description
- ../slots/has_label
default_prefix: hc
classes:
Gender:
class_uri: schema:GenderType
class_uri: hc:Gender
description: >-
Classification term used to represent stated gender identity in descriptive
metadata contexts.
alt_descriptions:
nl: Classificatieterm voor het weergeven van opgegeven genderidentiteit in beschrijvende metadata.
de: Klassifikationsterm zur Darstellung angegebener Geschlechtsidentitaet in beschreibenden Metadatenkontexten.
fr: Terme de classification utilise pour representer l'identite de genre declaree dans des metadonnees descriptives.
es: Termino de clasificacion para representar identidad de genero declarada en contextos de metadatos descriptivos.
ar: مصطلح تصنيفي لتمثيل الهوية الجندرية المصرّح بها ضمن سياقات البيانات الوصفية.
id: Istilah klasifikasi untuk merepresentasikan identitas gender yang dinyatakan dalam konteks metadata deskriptif.
zh: 用于在描述性元数据语境中表示申报性别认同的分类术语。
structured_aliases:
- literal_form: gender
in_language: nl
- literal_form: Geschlechtsidentitaet
in_language: de
- literal_form: identite de genre
in_language: fr
- literal_form: identidad de genero
in_language: es
- literal_form: الهوية الجندرية
in_language: ar
- literal_form: identitas gender
in_language: id
- literal_form: 性别认同
in_language: zh
slots:
- has_label
- has_description
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
- has_label
- has_description
broad_mappings:
- schema:GenderType
- skos:Concept

View file

@ -1,33 +0,0 @@
id: https://nde.nl/ontology/hc/classes/GenealogiewerkbalkEnrichment
name: GenealogiewerkbalkEnrichment
title: GenealogiewerkbalkEnrichment
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
prov: http://www.w3.org/ns/prov#
xsd: http://www.w3.org/2001/XMLSchema#
imports:
- linkml:types
- ../enums/DataTierEnum
# default_range: string
classes:
GenealogiewerkbalkEnrichment:
description: "Dutch genealogy archives registry (Genealogiewerkbalk) data including\
\ municipality, province, and associated archive information.\nOntology mapping\
\ rationale: - class_uri is prov:Entity because this represents enrichment data\n\
\ derived from the Dutch genealogy archives registry\n- close_mappings includes\
\ schema:Dataset for registry data semantics - related_mappings includes prov:PrimarySource\
\ for source registry"
class_uri: prov:Entity
close_mappings:
- schema:Dataset
related_mappings:
- prov:PrimarySource
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: '[''*'']'
slots:
- has_source
- has_url

View file

@ -0,0 +1,48 @@
id: https://nde.nl/ontology/hc/classes/GenealogyArchivesRegistryEnrichment
name: GenealogyArchivesRegistryEnrichment
title: Genealogy Archives Registry Enrichment Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
prov: http://www.w3.org/ns/prov#
xsd: http://www.w3.org/2001/XMLSchema#
imports:
- linkml:types
- ../enums/DataTierEnum
# default_range: string
classes:
GenealogyArchivesRegistryEnrichment:
description: >-
Enrichment data derived from genealogy-focused archive registry sources,
including municipality, province, and linked archive information.
class_uri: prov:Entity
alt_descriptions:
nl: {text: Verrijkingsdata uit genealogische archiefregisters, inclusief gemeente, provincie en gekoppelde archiefinformatie., language: nl}
de: {text: Anreicherungsdaten aus genealogischen Archivregistern mit Angaben zu Gemeinde, Provinz und verknuepften Archiven., language: de}
fr: {text: Donnees d enrichissement issues de registres d archives genealogiques, incluant municipalite, province et archives associees., language: fr}
es: {text: Datos de enriquecimiento derivados de registros archivisticos genealogicos, incluidos municipio, provincia e informacion de archivo vinculada., language: es}
ar: {text: بيانات إثراء مشتقة من سجلات أرشيفية خاصة بعلم الأنساب وتشمل البلدية والمقاطعة ومعلومات الأرشيف المرتبطة., language: ar}
id: {text: Data pengayaan dari registri arsip genealogi, termasuk munisipalitas, provinsi, dan informasi arsip terkait., language: id}
zh: {text: 源自家谱档案登记来源的富化数据,包含市镇、省份及关联档案信息。, language: zh}
structured_aliases:
nl: [{literal_form: verrijking genealogisch archiefregister, language: nl}]
de: [{literal_form: Anreicherung genealogisches Archivregister, language: de}]
fr: [{literal_form: enrichissement registre d archives genealogiques, language: fr}]
es: [{literal_form: enriquecimiento de registro archivistico genealogico, language: es}]
ar: [{literal_form: إثراء سجل الأرشيف الجينيالوجي, language: ar}]
id: [{literal_form: pengayaan registri arsip genealogi, language: id}]
zh: [{literal_form: 家谱档案登记富化, language: zh}]
exact_mappings:
- prov:Entity
close_mappings:
- schema:Dataset
related_mappings:
- prov:PrimarySource
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: '[''*'']'
slots:
- has_source
- has_url

View file

@ -1,106 +1,42 @@
id: https://nde.nl/ontology/hc/class/GenerationEvent
name: generation_event_class
name: GenerationEvent
title: Generation Event Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
prov: http://www.w3.org/ns/prov#
schema: http://schema.org/
default_prefix: hc
imports:
- linkml:types
- ../slots/has_description
- ../slots/has_provenance
- ../slots/has_score
- ../slots/temporal_extent
default_prefix: hc
- ../slots/has_provenance
- ../slots/has_description
- ../slots/has_score
classes:
GenerationEvent:
description: >-
An event representing the generation or creation of an entity.
**USAGE**:
Used for tracking when and how something was generated, including:
- Video chapter generation (manual, AI, imported)
- Content extraction events
- Automated processing activities
- Confidence scoring for generated content
**STRUCTURE**:
- temporal_extent: When the generation occurred (TimeSpan)
- has_provenance: Who/what performed the generation (Provenance)
- has_description: Details about the generation process
- has_score: Confidence score for the generated content (ConfidenceScore)
**ONTOLOGY ALIGNMENT**:
- Maps to prov:Generation (PROV-O generation event)
- Also maps to schema:CreateAction (Schema.org action)
class_uri: prov:Generation
description: Event in which an entity is created or generated.
exact_mappings:
- prov:Generation
close_mappings:
- schema:CreateAction
slots:
- temporal_extent
- has_provenance
- has_description
- has_score
slot_usage:
temporal_extent:
range: TimeSpan
required: false
inlined: true
examples:
- value:
begin_of_the_begin: "2024-01-15T10:30:00Z"
end_of_the_end: "2024-01-15T10:30:00Z"
has_provenance:
range: Provenance
required: false
inlined: true
examples:
- value:
has_agent:
has_type: SOFTWARE
has_name: "YouTube Auto-Chapters"
has_description:
# range: string
required: false
examples:
- value: "Generated using Whisper transcript segmentation"
has_score:
range: ConfidenceScore
required: false
inlined: true
examples:
- value:
has_score: 0.95
has_method: "xpath_extraction"
has_description: "High confidence - exact match at expected location"
annotations:
custodian_types: '["*"]'
custodian_types_rationale: >-
Generation events are universal for tracking content creation.
custodian_types_primary: "*"
specificity_score: 0.30
specificity_rationale: >-
Moderately low specificity - used across many content types.
examples:
- value:
temporal_extent:
begin_of_the_begin: "2024-01-15T10:30:00Z"
has_description: "AI-generated video chapters from transcript"
has_score:
has_score: 0.92
has_method: "transcript_segmentation"
comments:
- Created from slot_fixes.yaml migration (2026-01-19)
- Updated 2026-01-19 to include has_score for confidence tracking
specificity_score: 0.3
specificity_rationale: Cross-domain provenance event for generated content.

View file

@ -1,40 +1,33 @@
id: https://nde.nl/ontology/hc/class/GeoFeature
name: GeoFeature
title: Geographic Feature
description: 'A classification of a geographic feature (e.g., populated place, administrative division). MIGRATED from feature_class/feature_code slots.
Used to classify GeoSpatialPlace instances according to GeoNames feature codes.'
title: Geo Feature Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
gn: http://www.geonames.org/ontology#
skos: http://www.w3.org/2004/02/skos/core#
dcterms: http://purl.org/dc/terms/
prov: http://www.w3.org/ns/prov#
crm: http://www.cidoc-crm.org/cidoc-crm/
rdfs: http://www.w3.org/2000/01/rdf-schema#
org: http://www.w3.org/ns/org#
xsd: http://www.w3.org/2001/XMLSchema#
gn: http://www.geonames.org/ontology#
default_prefix: hc
imports:
- linkml:types
- ../slots/has_code
- ../slots/has_type
default_prefix: hc
- ../slots/has_code
classes:
GeoFeature:
class_uri: skos:Concept
description: Geo feature classification entry, typically aligned to GeoNames coding.
broad_mappings:
- skos:Concept
close_mappings:
- gn:Feature
slots:
- has_type
- has_code
- has_type
- has_code
slot_usage:
has_type:
# range: string # uriorcurie
required: true
has_code:
# range: string # uriorcurie
required: true
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
specificity_score: 0.35
specificity_rationale: Controlled geospatial classification term.
custodian_types: '["*"]'

View file

@ -1,25 +1,26 @@
id: https://nde.nl/ontology/hc/class/GeoFeatureType
name: GeoFeatureType
title: Geographic Feature Type
description: Abstract base class for geographic feature types (e.g., PopulatedPlace, AdministrativeDivision). MIGRATED from feature_class slot per Rule 0b.
title: Geo Feature Type Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
skos: http://www.w3.org/2004/02/skos/core#
gn: http://www.geonames.org/ontology#
default_prefix: hc
imports:
- linkml:types
- ../slots/has_description
- ../slots/has_label
default_prefix: hc
- ../slots/has_description
classes:
GeoFeatureType:
class_uri: skos:Concept
abstract: true
description: Abstract taxonomy node for geographic feature classes.
broad_mappings:
- skos:Concept
slots:
- has_label
- has_description
- has_label
- has_description
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
specificity_score: 0.3
specificity_rationale: Shared hierarchy base for geospatial feature typing.
custodian_types: '["*"]'

View file

@ -1,82 +1,77 @@
id: https://nde.nl/ontology/hc/class/GeoFeatureTypes
name: GeoFeatureTypes
title: Geographic Feature Type Subclasses
description: Concrete subclasses of GeoFeatureType representing specific geographic
feature categories. Based on GeoNames feature classes.
title: Geo Feature Types Class Module
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
skos: http://www.w3.org/2004/02/skos/core#
gn: http://www.geonames.org/ontology#
schema: http://schema.org/
crm: http://www.cidoc-crm.org/cidoc-crm/
default_prefix: hc
imports:
- ./GeoFeatureType
- linkml:types
default_prefix: hc
classes:
AdministrativeBoundary:
is_a: GeoFeatureType
class_uri: gn:A
description: Country, state, region, etc. (GeoNames class A)
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: '[''*'']'
description: Administrative division feature class.
broad_mappings:
- schema:Place
- crm:E53_Place
- schema:Place
- crm:E53_Place
HydrographicFeature:
is_a: GeoFeatureType
class_uri: gn:H
description: Stream, lake, etc. (GeoNames class H)
description: Hydrographic feature class.
broad_mappings:
- schema:Place
- crm:E53_Place
- schema:Place
- crm:E53_Place
AreaFeature:
is_a: GeoFeatureType
class_uri: gn:L
description: Parks, area, etc. (GeoNames class L)
description: Area feature class.
broad_mappings:
- schema:Place
- crm:E53_Place
- schema:Place
- crm:E53_Place
PopulatedPlace:
is_a: GeoFeatureType
class_uri: gn:P
description: City, village, etc. (GeoNames class P)
description: Populated place feature class.
broad_mappings:
- schema:Place
- crm:E53_Place
- schema:Place
- crm:E53_Place
RoadRailroad:
is_a: GeoFeatureType
class_uri: gn:R
description: Road, railroad, etc. (GeoNames class R)
description: Transport corridor feature class.
broad_mappings:
- schema:Place
- crm:E53_Place
- schema:Place
- crm:E53_Place
SpotFeature:
is_a: GeoFeatureType
class_uri: gn:S
description: Spot, building, farm (GeoNames class S)
description: Spot feature class, including discrete built entities.
broad_mappings:
- schema:Place
- crm:E53_Place
- schema:Place
- crm:E53_Place
HypsographicFeature:
is_a: GeoFeatureType
class_uri: gn:T
description: Mountain, hill, rock (GeoNames class T)
description: Terrain elevation feature class.
broad_mappings:
- schema:Place
- crm:E53_Place
- schema:Place
- crm:E53_Place
UnderseaFeature:
is_a: GeoFeatureType
class_uri: gn:U
description: Undersea feature (GeoNames class U)
description: Undersea feature class.
broad_mappings:
- schema:Place
- crm:E53_Place
- schema:Place
- crm:E53_Place
VegetationFeature:
is_a: GeoFeatureType
class_uri: gn:V
description: Forest, heath, etc. (GeoNames class V)
description: Vegetation feature class.
broad_mappings:
- schema:Place
- crm:E53_Place
- schema:Place
- crm:E53_Place

View file

@ -1,23 +1,24 @@
id: https://nde.nl/ontology/hc/class/GeoNamesIdentifier
name: GeoNamesIdentifier
title: GeoNames Identifier
description: Identifier from the GeoNames geographical database. MIGRATED from geonames_id slot per Rule 53. Follows gn:geonamesID.
title: GeoNames Identifier Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
dcterms: http://purl.org/dc/terms/
gn: http://www.geonames.org/ontology#
default_prefix: hc
imports:
- linkml:types
default_prefix: hc
classes:
GeoNamesIdentifier:
is_a: Identifier
class_uri: hc:GeoNamesIdentifier
description: External identifier referencing a feature in the GeoNames gazetteer.
close_mappings:
- gn:geonamesID
description: A unique identifier for a GeoNames feature. Typically an integer.
- dcterms:Identifier
related_mappings:
- gn:geonamesID
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
specificity_score: 0.3
specificity_rationale: Specialized external place identifier class.
custodian_types: '["*"]'

View file

@ -1,158 +1,74 @@
id: https://nde.nl/ontology/hc/class/GeoSpatialPlace
name: geospatial_place_class
title: GeoSpatialPlace Class
name: GeoSpatialPlace
title: Geo Spatial Place Class
prefixes:
geo: http://www.opengis.net/ont/geosparql#
rov: http://www.w3.org/ns/regorg#
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
geosparql: http://www.opengis.net/ont/geosparql#
wgs84: http://www.w3.org/2003/01/geo/wgs84_pos#
sf: http://www.opengis.net/ont/sf#
gn: http://www.geonames.org/ontology#
gn_entity: http://sws.geonames.org/
geo: http://www.opengis.net/ont/geosparql#
schema: http://schema.org/
prov: http://www.w3.org/ns/prov#
crm: http://www.cidoc-crm.org/cidoc-crm/
tooi: https://identifier.overheid.nl/tooi/def/ont/
default_prefix: hc
imports:
- linkml:types
- ../enums/GeometryTypeEnum
- ../metadata
- ../slots/has_reference_system
- ../slots/has_altitude
- ../slots/has_coordinates
- ../slots/has_geofeature
- ../slots/has_altitude
- ../slots/geographic_extent
- ../slots/geometric_extent
- ../slots/has_reference_system
- ../slots/has_geofeature
- ../slots/identified_by
- ../slots/has_score
- ../slots/has_resolution
- ../slots/temporal_extent
- ../slots/has_score
types:
WktLiteral:
uri: geosparql:wktLiteral
base: str
description: 'Well-Known Text (WKT) representation of geometry.
See OGC Simple Features specification.
'
examples:
- value: POINT(4.2894 52.0705)
- value: POLYGON((4.0 52.0, 4.5 52.0, 4.5 52.5, 4.0 52.5, 4.0 52.0))
description: Well-Known Text representation of geometry.
classes:
GeoSpatialPlace:
class_uri: geosparql:Feature
description: "Geospatial location with coordinates, geometry, and projections.\n\nCRITICAL DISTINCTION FROM CustodianPlace:\n\n| Aspect | CustodianPlace | GeoSpatialPlace |\n|--------|----------------|-----------------|\n| Nature | Nominal reference | Geospatial data |\n| Content | \"het herenhuis in de Schilderswijk\" | lat: 52.0705, lon: 4.2894 |\n| Purpose | Identify custodian by place name | Locate custodian precisely |\n| Ambiguity | May be vague (\"the mansion\") | Precise, measurable |\n| Source | Archival documents, oral history | GPS, cadastral surveys, geocoding |\n\n**TOOI Ontology Alignment**:\n\nThis class follows the TOOI pattern for geospatial data:\n- `tooi:BestuurlijkeRuimte` is a subclass of `geosparql:Feature` and `prov:Entity`\n- `tooi:BestuurlijkeRuimte-hasGeometry` \u2192 `geosparql:Geometry`\n- `tooi:RegistratieveRuimte` for administrative boundaries\n- `tooi:JuridischeRuimte` for legal jurisdiction boundaries\n\nLike TOOI, we separate:\n- **geosparql:Feature**\
\ (this class): The real-world place with location data\n- **geosparql:Geometry**: The mathematical representation (WKT, GeoJSON)\n\n**Use Cases**:\n\n1. **Building-level precision**: Museum building footprint (Polygon)\n2. **City-level approximation**: Heritage institution centroid (Point)\n3. **Administrative boundaries**: Archive jurisdiction area (MultiPolygon)\n4. **Historical boundaries**: Pre-merger municipal territory (Polygon + temporal_extent)\n\n**Relationship to CustodianPlace**:\n\nCustodianPlace.has_geospatial_location \u2192 GeoSpatialPlace\n\nA nominal place reference (\"Rijksmuseum\") links to its geospatial location\n(lat: 52.3600, lon: 4.8852, geometry: building footprint polygon).\n\n**Relationship to AuxiliaryPlace**:\n\nAuxiliaryPlace.has_geospatial_location \u2192 GeoSpatialPlace\n\nSecondary/subordinate locations (branch offices, storage depots, reading rooms)\ncan also link to precise geospatial coordinates. This enables:\n- Mapping all custodian locations\
\ (primary + auxiliary)\n- Spatial queries across an organization's entire footprint\n- Building footprints for off-site storage facilities\n- Historical boundary tracking for branch offices\n\n**Relationship to OrganizationalChangeEvent**:\n\nOrganizational changes may affect geographic location:\n- RELOCATION: New GeoSpatialPlace, old one gets temporal_extent.end_of_the_end\n- MERGER: Multiple locations \u2192 single primary + auxiliary locations\n- SPLIT: One location \u2192 multiple successor locations\n"
description: Measured geospatial place representation with coordinates, geometry, and reference system.
exact_mappings:
- geosparql:Feature
- geosparql:Feature
close_mappings:
- geo:SpatialThing
- schema:Place
- schema:GeoCoordinates
- schema:Place
- geo:SpatialThing
related_mappings:
- prov:Entity
- tooi:BestuurlijkeRuimte
- crm:E53_Place
- prov:Entity
- tooi:BestuurlijkeRuimte
- crm:E53_Place
slots:
- has_coordinates
- has_altitude
- geographic_extent
- identified_by
- has_reference_system
- has_geofeature
- geometric_extent
- identified_by
- has_resolution
- has_score
- temporal_extent
- identified_by
- has_coordinates
- has_altitude
- has_reference_system
- has_geofeature
- geographic_extent
- geometric_extent
- has_resolution
- temporal_extent
- has_score
slot_usage:
has_coordinates:
range: Coordinates
inlined: true
required: true
examples:
- value:
latitude: 52.36
longitude: 4.8852
has_reference_system:
ifabsent: string(EPSG:4326)
identified_by:
description: 'Cadastral identifiers for this geospatial place. MIGRATION NOTE (2026-01-14): Replaces cadastral_id per slot_fixes.yaml. Use Identifier with identifier_scheme=''cadastral'' for parcel IDs. Netherlands: Kadaster perceelnummer format {gemeente}-{sectie}-{perceelnummer}'
examples:
- value:
temporal_extent:
range: TimeSpan
inlined: true
required: false
examples:
- value:
begin_of_the_begin: '1920-01-01'
end_of_the_end: '2001-01-01'
comments:
- Follows TOOI BestuurlijkeRuimte pattern using GeoSPARQL
- 'CRITICAL: NOT a nominal reference - this is measured/surveyed location data'
- Use CustodianPlace for nominal references, this class for coordinates
- lat/lon required; geometry_wkt optional for point locations
- Link from CustodianPlace via has_geospatial_location slot
- Link from AuxiliaryPlace via has_geospatial_location slot (subordinate sites)
- Link from OrganizationalChangeEvent via has_affected_territory slot
- temporal_extent tracks boundary changes over time (was valid_from_geo/valid_to_geo)
- OSM and GeoNames IDs enable external linking
see_also:
- http://www.opengis.net/ont/geosparql
- https://www.geonames.org/
- https://www.openstreetmap.org/
- https://identifier.overheid.nl/tooi/def/ont/
examples:
- value:
geospatial_id: https://nde.nl/ontology/hc/geo/rijksmuseum-building
has_coordinates:
latitude: 52.36
longitude: 4.8852
altitude: 0.0
geometric_extent:
- has_format:
has_value: POLYGON((4.8830 52.3590, 4.8870 52.3590, 4.8870 52.3610, 4.8830 52.3610, 4.8830 52.3590))
has_type:
has_label: POLYGON
coordinate_reference_system: EPSG:4326
osm_id: way/27083908
spatial_resolution: BUILDING
has_geofeature:
- has_type: SpotFeature
has_code:
has_label: S.MUS
- value:
geospatial_id: https://nde.nl/ontology/hc/geo/amsterdam-centroid
has_coordinates:
latitude: 52.3676
longitude: 4.9041
geometric_extent:
- has_type:
has_label: POINT
coordinate_reference_system: EPSG:4326
spatial_resolution: CITY
has_geofeature:
- has_type: PopulatedPlace
has_code:
has_label: P.PPLC
- value:
geospatial_id: https://nde.nl/ontology/hc/geo/noord-holland-archive-territory-pre-2001
has_coordinates:
latitude: 52.5
longitude: 4.8
geometric_extent:
- has_format:
has_value: MULTIPOLYGON(((4.5 52.2, 5.2 52.2, 5.2 52.8, 4.5 52.8, 4.5 52.2)))
has_type:
has_label: MULTIPOLYGON
coordinate_reference_system: EPSG:4326
spatial_resolution: REGION
has_geofeature:
- has_type: AdministrativeBoundary
has_code:
has_label: A.ADM1
temporal_extent:
begin_of_the_begin: '1920-01-01'
end_of_the_end: '2001-01-01'
- Use this class for measurable geodata, not nominal place references.
- Link nominal place references through dedicated place classes.
- Temporal extent tracks boundary or footprint change over time.
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
specificity_score: 0.55
specificity_rationale: Primary geospatial feature class for coordinates and geometry.
custodian_types: '["*"]'

View file

@ -4,11 +4,9 @@ title: Geographic Extent Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
dcterms: http://purl.org/dc/terms/
schema: http://schema.org/
default_prefix: hc
imports:
- linkml:types
- ../metadata
@ -17,18 +15,15 @@ imports:
classes:
GeographicExtent:
class_uri: dcterms:Location
description: >-
A geographic area defining the scope or extent (e.g., eligible countries).
**Ontology Alignment**:
- **Primary**: `dcterms:Location`
- **Close**: `schema:Place`
description: Geographic area used to define spatial applicability or coverage.
exact_mappings:
- dcterms:Location
close_mappings:
- schema:Place
slots:
- has_label
- identified_by
- has_label
annotations:
custodian_types: '["*"]'
specificity_score: 0.3
specificity_rationale: Geographic metadata.
specificity_rationale: Spatial scope descriptor for policies and eligibility.

View file

@ -1,23 +1,25 @@
id: https://nde.nl/ontology/hc/class/GeographicScope
name: GeographicScope
title: Geographic Scope
description: The geographic scope or coverage of an entity (e.g., local, regional, national). MIGRATED from geographic_scope slot per Rule 53. Follows skos:Concept.
title: Geographic Scope Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
skos: http://www.w3.org/2004/02/skos/core#
default_prefix: hc
imports:
- linkml:types
- ../slots/has_description
- ../slots/has_label
default_prefix: hc
- ../slots/has_description
classes:
GeographicScope:
class_uri: skos:Concept
description: Controlled concept describing scale of geographic coverage.
broad_mappings:
- skos:Concept
slots:
- has_label
- has_description
- has_label
- has_description
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
specificity_score: 0.25
specificity_rationale: Controlled scope vocabulary for local-to-global coverage.
custodian_types: '["*"]'

View file

@ -1,34 +1,37 @@
id: https://nde.nl/ontology/hc/class/Geometry
name: Geometry
title: Geometry
description: A spatial geometry (point, polygon, etc.). MIGRATED from geometry_type/geometry_wkt slots. Follows GeoSPARQL Geometry.
title: Geometry Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
geosparql: http://www.opengis.net/ont/geosparql#
schema: http://schema.org/
default_prefix: hc
imports:
- linkml:types
- ../slots/has_description
- ../slots/has_format
- ../slots/has_label
- ../slots/has_description
- ../slots/has_type
default_prefix: hc
- ../slots/has_format
classes:
Geometry:
class_uri: geosparql:Geometry
description: Spatial geometry representation, such as point, line, or polygon.
exact_mappings:
- geosparql:Geometry
close_mappings:
- schema:GeoShape
slots:
- has_label
- has_description
- has_type
- has_format
- has_label
- has_description
- has_type
- has_format
slot_usage:
has_format:
# range: string # uriorcurie
required: true
has_type:
# range: string # uriorcurie
required: true
has_format:
required: true
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
specificity_score: 0.35
specificity_rationale: Core geometric encoding object for geospatial data.
custodian_types: '["*"]'

View file

@ -1,25 +1,26 @@
id: https://nde.nl/ontology/hc/class/GeometryType
name: GeometryType
title: Geometry Type
description: Abstract base class for geometry types (e.g., Point, Polygon). MIGRATED from geometry_type slot per Rule 0b.
title: Geometry Type Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
skos: http://www.w3.org/2004/02/skos/core#
geosparql: http://www.opengis.net/ont/geosparql#
default_prefix: hc
imports:
- linkml:types
- ../slots/has_description
- ../slots/has_label
default_prefix: hc
- ../slots/has_description
classes:
GeometryType:
class_uri: skos:Concept
abstract: true
description: Abstract controlled concept for geometry shape types.
broad_mappings:
- skos:Concept
slots:
- has_label
- has_description
- has_label
- has_description
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
specificity_score: 0.25
specificity_rationale: Geometry shape taxonomy base class.
custodian_types: '["*"]'

View file

@ -1,63 +1,49 @@
id: https://nde.nl/ontology/hc/class/GeometryTypes
name: GeometryTypes
title: Geometry Type Subclasses
description: Concrete subclasses of GeometryType representing specific geometry types.
Based on GeoSPARQL geometry types.
title: Geometry Types Class Module
prefixes:
geo: http://www.opengis.net/ont/geosparql#
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
skos: http://www.w3.org/2004/02/skos/core#
geosparql: http://www.opengis.net/ont/geosparql#
sf: http://www.opengis.net/ont/sf#
geo: http://www.opengis.net/ont/geosparql#
default_prefix: hc
imports:
- ./GeometryType
- linkml:types
default_prefix: hc
classes:
Point:
is_a: GeometryType
class_uri: sf:Point
description: A single point geometry.
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: '[''*'']'
description: Point geometry type.
broad_mappings:
- geo:Geometry
- sf:Geometry
- geo:Geometry
Polygon:
is_a: GeometryType
class_uri: sf:Polygon
description: A polygon geometry.
description: Polygon geometry type.
broad_mappings:
- geo:Geometry
- sf:Geometry
- geo:Geometry
MultiPolygon:
is_a: GeometryType
class_uri: sf:MultiPolygon
description: A collection of polygons.
description: Multi polygon geometry type.
broad_mappings:
- geo:Geometry
- sf:Geometry
- geo:Geometry
LineString:
is_a: GeometryType
class_uri: sf:LineString
description: A line string geometry.
description: Line string geometry type.
broad_mappings:
- geo:Geometry
- sf:Geometry
- geo:Geometry
MultiLineString:
is_a: GeometryType
class_uri: sf:MultiLineString
description: A collection of line strings.
description: Multi line string geometry type.
broad_mappings:
- geo:Geometry
- sf:Geometry
- geo:Geometry
MultiPoint:
is_a: GeometryType
class_uri: sf:MultiPoint
description: A collection of points.
description: Multi point geometry type.
broad_mappings:
- geo:Geometry
- sf:Geometry
- geo:Geometry

View file

@ -1,20 +1,24 @@
id: https://nde.nl/ontology/hc/class/GeospatialIdentifier
name: GeospatialIdentifier
title: Geospatial Identifier
description: A unique identifier for a geospatial feature (e.g., from GeoSPARQL). MIGRATED from geospatial_id slot per Rule 53. Follows geosparql:Feature.
title: Geospatial Identifier Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
dcterms: http://purl.org/dc/terms/
geosparql: http://www.opengis.net/ont/geosparql#
default_prefix: hc
imports:
- linkml:types
default_prefix: hc
classes:
GeospatialIdentifier:
is_a: Identifier
class_uri: geosparql:Feature
description: A persistent URI or identifier for a geospatial feature.
class_uri: hc:GeospatialIdentifier
description: Persistent identifier for a geospatial entity in an external or internal system.
close_mappings:
- dcterms:Identifier
related_mappings:
- geosparql:Feature
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
specificity_score: 0.3
specificity_rationale: Identifier class for geospatial record linking.
custodian_types: '["*"]'

View file

@ -1,7 +1,6 @@
id: https://nde.nl/ontology/hc/class/GeospatialLocation
name: GeospatialLocation
title: GeospatialLocation
description: A specific geospatial location.
title: Geospatial Location Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
@ -13,10 +12,12 @@ imports:
classes:
GeospatialLocation:
class_uri: schema:GeoCoordinates
description: Geospatial location.
description: Coordinate-based geospatial location.
exact_mappings:
- schema:GeoCoordinates
slots:
- has_location
- has_location
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
specificity_score: 0.25
specificity_rationale: Coordinate wrapper used in geospatial modeling.
custodian_types: '["*"]'

View file

@ -1,40 +1,30 @@
id: https://nde.nl/ontology/hc/classes/GhcidBlock
name: GhcidBlock
title: GhcidBlock
title: Ghcid Block Class
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
prov: http://www.w3.org/ns/prov#
xsd: http://www.w3.org/2001/XMLSchema#
dcterms: http://purl.org/dc/terms/
crm: http://www.cidoc-crm.org/cidoc-crm/
skos: http://www.w3.org/2004/02/skos/core#
rdfs: http://www.w3.org/2000/01/rdf-schema#
org: http://www.w3.org/ns/org#
prov: http://www.w3.org/ns/prov#
schema: http://schema.org/
default_prefix: hc
imports:
- linkml:types
# default_range: string
- ../slots/identified_by
classes:
GhcidBlock:
description: "GHCID (Global Heritage Custodian Identifier) generation metadata\
\ and history. Contains current GHCID string, UUID variants (v5, v8), numeric\
\ form, generation timestamp, and history of GHCID changes due to relocations,\
\ mergers, or collision resolution.\nOntology mapping rationale: - class_uri\
\ is dcterms:Identifier because GHCID is fundamentally\n an identifier assignment\
\ with associated metadata\n- close_mappings includes prov:Entity as identifier\
\ blocks are\n traceable provenance entities themselves\n- related_mappings\
\ includes schema:PropertyValue (identifier as\n property) and prov:Generation\
\ (identifier creation event)"
class_uri: dcterms:Identifier
description: Identifier metadata block capturing assignment, variants, and lifecycle history for GHCID values.
exact_mappings:
- dcterms:Identifier
close_mappings:
- prov:Entity
- prov:Entity
related_mappings:
- schema:PropertyValue
- prov:Generation
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: '[''*'']'
- schema:PropertyValue
- prov:Generation
slots:
- identified_by
- identified_by
annotations:
specificity_score: 0.35
specificity_rationale: Identifier lifecycle container for custody-level identifier governance.
custodian_types: '["*"]'

View file

@ -21,8 +21,8 @@ enums:
TIER_1_AUTHORITATIVE:
description: Official registry data (NDE CSV, Nationaal Archief ISIL)
TIER_2_VERIFIED:
description: Verified external sources (Wikidata, Google Maps, Genealogiewerkbalk)
description: Verified external sources (Wikidata, Google Maps, genealogy archive registries)
TIER_3_CROWD_SOURCED:
description: Community-contributed data (reviews, user edits)
TIER_4_INFERRED:
description: Algorithmically extracted (website scrape, Exa search)
description: Algorithmically extracted (website scrape, external search)