glam/.opencode/rules/no-duplicate-ontology-mappings.md
kempersc 73b3b21017
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m53s
docs: add Rule 52 prohibiting duplicate ontology mappings
- Create .opencode/rules/no-duplicate-ontology-mappings.md with detection script
- Add Rule 52 to AGENTS.md (after Rule 51)
- Fix 29 duplicate mappings: same URI in multiple mapping categories
  - 26 slot files: remove duplicates keeping most precise mapping
  - 3 class files: ExhibitionSpace, Custodian, DigitalPlatform
- Mapping precedence: exact > close > narrow/broad > related

Each ontology URI must appear in only ONE mapping category per schema
element, following SKOS semantics where mapping properties are mutually
exclusive.
2026-01-13 15:57:26 +01:00

189 lines
5.6 KiB
Markdown

# Rule 52: No Duplicate Ontology Mappings
## Summary
Each ontology URI MUST appear in only ONE mapping category per schema element. A URI cannot simultaneously have multiple semantic relationships to the same class or slot.
## The Problem
LinkML provides five mapping annotation types based on SKOS vocabulary alignment:
| Property | SKOS Predicate | Meaning |
|----------|---------------|---------|
| `exact_mappings` | `skos:exactMatch` | "This IS that" (equivalent) |
| `close_mappings` | `skos:closeMatch` | "This is very similar to that" |
| `related_mappings` | `skos:relatedMatch` | "This is conceptually related to that" |
| `narrow_mappings` | `skos:narrowMatch` | "This is MORE SPECIFIC than that" |
| `broad_mappings` | `skos:broadMatch` | "This is MORE GENERAL than that" |
These relationships are **mutually exclusive**. A URI cannot simultaneously:
- BE the element (`exact_mappings`) AND be broader than it (`broad_mappings`)
- Be closely similar (`close_mappings`) AND be more general (`broad_mappings`)
## Anti-Pattern (WRONG)
```yaml
# WRONG - schema:url appears in TWO mapping types
slots:
source_url:
slot_uri: prov:atLocation
exact_mappings:
- schema:url # Says "source_url IS schema:url"
broad_mappings:
- schema:url # Says "schema:url is MORE GENERAL than source_url"
```
This is a **logical contradiction**: `source_url` cannot simultaneously BE `schema:url` AND be more specific than `schema:url`.
## Correct Pattern
```yaml
# CORRECT - each URI appears in only ONE mapping type
slots:
source_url:
slot_uri: prov:atLocation
exact_mappings:
- schema:url # source_url IS schema:url
close_mappings:
- dcterms:source # Similar but not identical
```
## Decision Guide: Which Mapping to Keep
When a URI appears in multiple categories, keep the **most precise** one:
### Precedence Order (keep the first match)
1. **exact_mappings** - Strongest claim: semantic equivalence
2. **close_mappings** - Strong claim: nearly equivalent
3. **narrow_mappings** / **broad_mappings** - Hierarchical relationship
4. **related_mappings** - Weakest claim: conceptual association
### Decision Matrix
| If URI appears in... | Keep | Remove |
|---------------------|------|--------|
| exact + broad | exact | broad |
| exact + close | exact | close |
| exact + related | exact | related |
| close + broad | close | broad |
| close + related | close | related |
| related + broad | related | broad |
| narrow + broad | narrow | broad (contradictory!) |
### Special Case: narrow + broad
If a URI appears in BOTH `narrow_mappings` AND `broad_mappings`, this is a **data error** - the same URI cannot be both more specific AND more general. Investigate which is correct based on the ontology definition.
## Real Examples Fixed
### Example 1: source_url
```yaml
# BEFORE (wrong)
slots:
source_url:
exact_mappings:
- schema:url
broad_mappings:
- schema:url # Duplicate!
# AFTER (correct)
slots:
source_url:
exact_mappings:
- schema:url # Keep exact (strongest)
# broad_mappings removed
```
### Example 2: Custodian class
```yaml
# BEFORE (wrong)
classes:
Custodian:
close_mappings:
- cpov:PublicOrganisation
narrow_mappings:
- cpov:PublicOrganisation # Duplicate!
# AFTER (correct)
classes:
Custodian:
close_mappings:
- cpov:PublicOrganisation # Keep close (Custodian ≈ PublicOrganisation)
# narrow_mappings: use for URIs that are MORE SPECIFIC than Custodian
```
### Example 3: geonames_id (narrow + broad conflict)
```yaml
# BEFORE (wrong - logical contradiction!)
slots:
geonames_id:
narrow_mappings:
- dcterms:identifier # Says geonames_id is MORE SPECIFIC
broad_mappings:
- dcterms:identifier # Says geonames_id is MORE GENERAL
# AFTER (correct)
slots:
geonames_id:
narrow_mappings:
- dcterms:identifier # geonames_id IS a specific type of identifier
# broad_mappings removed (was contradictory)
```
## Detection Script
Run this to find duplicate mappings in the schema:
```python
import yaml
from pathlib import Path
from collections import defaultdict
mapping_types = ['exact_mappings', 'close_mappings', 'related_mappings',
'narrow_mappings', 'broad_mappings']
dirs = [
Path('schemas/20251121/linkml/modules/slots'),
Path('schemas/20251121/linkml/modules/classes'),
]
for d in dirs:
for yaml_file in d.glob('*.yaml'):
try:
with open(yaml_file) as f:
content = yaml.safe_load(f)
except Exception:
continue
if not content:
continue
for section in ['classes', 'slots']:
items = content.get(section, {})
if not isinstance(items, dict):
continue
for name, defn in items.items():
if not isinstance(defn, dict):
continue
uri_to_types = defaultdict(list)
for mt in mapping_types:
for uri in defn.get(mt, []) or []:
uri_to_types[uri].append(mt)
for uri, types in uri_to_types.items():
if len(types) > 1:
print(f"{yaml_file}: {name} - {uri} in {types}")
```
## Validation Rule
**Pre-commit check**: Before committing LinkML schema changes, run the detection script. If any duplicates are found, the commit should fail.
## References
- [LinkML Mappings Documentation](https://linkml.io/linkml-model/latest/docs/mappings/)
- [SKOS Mapping Properties](https://www.w3.org/TR/skos-reference/#mapping)
- Rule 50: Ontology-to-LinkML Mapping Convention (parent rule)
- Rule 51: No Hallucinated Ontology References