glam/.opencode/rules/no-duplicate-ontology-mappings.md
kempersc 73b3b21017
All checks were successful
Deploy Frontend / build-and-deploy (push) Successful in 3m53s
docs: add Rule 52 prohibiting duplicate ontology mappings
- Create .opencode/rules/no-duplicate-ontology-mappings.md with detection script
- Add Rule 52 to AGENTS.md (after Rule 51)
- Fix 29 duplicate mappings: same URI in multiple mapping categories
  - 26 slot files: remove duplicates keeping most precise mapping
  - 3 class files: ExhibitionSpace, Custodian, DigitalPlatform
- Mapping precedence: exact > close > narrow/broad > related

Each ontology URI must appear in only ONE mapping category per schema
element, following SKOS semantics where mapping properties are mutually
exclusive.
2026-01-13 15:57:26 +01:00

5.6 KiB

Rule 52: No Duplicate Ontology Mappings

Summary

Each ontology URI MUST appear in only ONE mapping category per schema element. A URI cannot simultaneously have multiple semantic relationships to the same class or slot.

The Problem

LinkML provides five mapping annotation types based on SKOS vocabulary alignment:

Property SKOS Predicate Meaning
exact_mappings skos:exactMatch "This IS that" (equivalent)
close_mappings skos:closeMatch "This is very similar to that"
related_mappings skos:relatedMatch "This is conceptually related to that"
narrow_mappings skos:narrowMatch "This is MORE SPECIFIC than that"
broad_mappings skos:broadMatch "This is MORE GENERAL than that"

These relationships are mutually exclusive. A URI cannot simultaneously:

  • BE the element (exact_mappings) AND be broader than it (broad_mappings)
  • Be closely similar (close_mappings) AND be more general (broad_mappings)

Anti-Pattern (WRONG)

# WRONG - schema:url appears in TWO mapping types
slots:
  source_url:
    slot_uri: prov:atLocation
    exact_mappings:
      - schema:url      # Says "source_url IS schema:url"
    broad_mappings:
      - schema:url      # Says "schema:url is MORE GENERAL than source_url"

This is a logical contradiction: source_url cannot simultaneously BE schema:url AND be more specific than schema:url.

Correct Pattern

# CORRECT - each URI appears in only ONE mapping type
slots:
  source_url:
    slot_uri: prov:atLocation
    exact_mappings:
      - schema:url      # source_url IS schema:url
    close_mappings:
      - dcterms:source  # Similar but not identical

Decision Guide: Which Mapping to Keep

When a URI appears in multiple categories, keep the most precise one:

Precedence Order (keep the first match)

  1. exact_mappings - Strongest claim: semantic equivalence
  2. close_mappings - Strong claim: nearly equivalent
  3. narrow_mappings / broad_mappings - Hierarchical relationship
  4. related_mappings - Weakest claim: conceptual association

Decision Matrix

If URI appears in... Keep Remove
exact + broad exact broad
exact + close exact close
exact + related exact related
close + broad close broad
close + related close related
related + broad related broad
narrow + broad narrow broad (contradictory!)

Special Case: narrow + broad

If a URI appears in BOTH narrow_mappings AND broad_mappings, this is a data error - the same URI cannot be both more specific AND more general. Investigate which is correct based on the ontology definition.

Real Examples Fixed

Example 1: source_url

# BEFORE (wrong)
slots:
  source_url:
    exact_mappings:
      - schema:url
    broad_mappings:
      - schema:url  # Duplicate!

# AFTER (correct)
slots:
  source_url:
    exact_mappings:
      - schema:url  # Keep exact (strongest)
    # broad_mappings removed

Example 2: Custodian class

# BEFORE (wrong)
classes:
  Custodian:
    close_mappings:
      - cpov:PublicOrganisation
    narrow_mappings:
      - cpov:PublicOrganisation  # Duplicate!

# AFTER (correct)
classes:
  Custodian:
    close_mappings:
      - cpov:PublicOrganisation  # Keep close (Custodian ≈ PublicOrganisation)
    # narrow_mappings: use for URIs that are MORE SPECIFIC than Custodian

Example 3: geonames_id (narrow + broad conflict)

# BEFORE (wrong - logical contradiction!)
slots:
  geonames_id:
    narrow_mappings:
      - dcterms:identifier  # Says geonames_id is MORE SPECIFIC
    broad_mappings:
      - dcterms:identifier  # Says geonames_id is MORE GENERAL

# AFTER (correct)
slots:
  geonames_id:
    narrow_mappings:
      - dcterms:identifier  # geonames_id IS a specific type of identifier
    # broad_mappings removed (was contradictory)

Detection Script

Run this to find duplicate mappings in the schema:

import yaml
from pathlib import Path
from collections import defaultdict

mapping_types = ['exact_mappings', 'close_mappings', 'related_mappings', 
                 'narrow_mappings', 'broad_mappings']

dirs = [
    Path('schemas/20251121/linkml/modules/slots'),
    Path('schemas/20251121/linkml/modules/classes'),
]

for d in dirs:
    for yaml_file in d.glob('*.yaml'):
        try:
            with open(yaml_file) as f:
                content = yaml.safe_load(f)
        except Exception:
            continue
        if not content:
            continue
        
        for section in ['classes', 'slots']:
            items = content.get(section, {})
            if not isinstance(items, dict):
                continue
            for name, defn in items.items():
                if not isinstance(defn, dict):
                    continue
                uri_to_types = defaultdict(list)
                for mt in mapping_types:
                    for uri in defn.get(mt, []) or []:
                        uri_to_types[uri].append(mt)
                for uri, types in uri_to_types.items():
                    if len(types) > 1:
                        print(f"{yaml_file}: {name} - {uri} in {types}")

Validation Rule

Pre-commit check: Before committing LinkML schema changes, run the detection script. If any duplicates are found, the commit should fail.

References