glam/.opencode/rules/linkml/no-hallucinated-ontology-references.md
kempersc 554fe520ea Add comprehensive rules for LinkML schema management and ontology mapping
- Introduced Rule 42: No Ontology Prefixes in Slot Names to enforce clean naming conventions.
- Established Rule: No Rough Edits in Schema Files to ensure structural integrity during modifications.
- Implemented Rule: No Version Indicators in Names to maintain stable semantic naming.
- Created Rule: Ontology Detection vs Heuristics to emphasize the importance of verifying ontology definitions.
- Defined Rule 50: Ontology-to-LinkML Mapping Convention to standardize mapping practices.
- Added Rule: Polished Slot Storage Location to specify directory structure for polished slot files.
- Enforced Rule: Preserve Bespoke Slots Until Refactoring to prevent unintended migrations during slot updates.
- Instituted Rule 56: Semantic Consistency Over Simplicity to mandate execution of revisions in slot_fixes.yaml.
- Added new Genealogy Archives Registry Enrichment class with multilingual support and structured aliases.
2026-02-15 19:20:09 +01:00

9.2 KiB

Rule 51: No Hallucinated Ontology References

Priority: CRITICAL
Scope: All LinkML schema files (schemas/20251121/linkml/)
Created: 2025-01-13


Summary

All ontology references in LinkML schema files (class_uri, slot_uri, *_mappings) MUST be verifiable against actual ontology files in /data/ontology/. References to predicates or classes that do not exist in local ontology files are considered hallucinated and are prohibited.


The Problem

AI agents may suggest ontology mappings based on training data without verifying that:

  1. The ontology file exists in /data/ontology/
  2. The specific predicate/class exists within that ontology file
  3. The prefix is declared and resolvable

This leads to schema files containing references like dqv:value or adms:status that cannot be validated or serialized to RDF.


Requirements

1. All Ontology Prefixes Must Have Local Files

Before using a prefix (e.g., prov:, schema:, org:), verify the ontology file exists:

# Check if ontology exists
ls data/ontology/ | grep -i "prov\|schema\|org"

Available Ontologies (as of 2025-01-13):

Prefix File Verified
prov: prov-o.ttl, prov.ttl
schema: schemaorg.owl
org: org.rdf
skos: skos.rdf
dcterms: dublin_core_elements.rdf
foaf: foaf.ttl
rico: RiC-O_1-1.rdf
crm: CIDOC_CRM_v7.1.3.rdf
geo: geo.ttl
sosa: sosa.ttl
bf: bibframe.rdf
edm: edm.owl
premis: premis3.owl
dcat: dcat3.ttl
ore: ore.rdf
pico: pico.ttl
gn: geonames_ontology.rdf
time: time.ttl
locn: locn.ttl
dqv: dqv.ttl
adms: adms.ttl

NOT Available (do not use without adding):

Prefix Status Alternative
qudt: Only referenced in era_ontology.ttl Use hc: with close_mappings annotation

2. Predicates Must Exist in Ontology Files

Before using a predicate, verify it exists:

# Verify predicate exists
grep -l "hasFrameRate\|frameRate" data/ontology/premis3.owl

# Check specific predicate definition
grep -E "premis:hasFrameRate|:hasFrameRate" data/ontology/premis3.owl

3. Use hc: Prefix for Domain-Specific Concepts

When no standard ontology predicate exists, use the Heritage Custodian namespace:

# CORRECT - Use hc: with documentation
slots:
  heritage_relevance_score:
    slot_uri: hc:heritageRelevanceScore
    description: Heritage sector relevance score (0.0-1.0)
    annotations:
      ontology_note: >-
        No standard ontology predicate for heritage relevance scoring.
        Domain-specific metric for this project.        

# WRONG - Hallucinated predicate
slots:
  heritage_relevance_score:
    slot_uri: dqv:heritageScore  # Does not exist!

4. Document External References in close_mappings

When a similar concept exists in an ontology we don't have locally, document it in close_mappings with a note:

slots:
  confidence_score:
    slot_uri: hc:confidenceScore
    close_mappings:
      - dqv:value  # W3C Data Quality Vocabulary (not in local files)
    annotations:
      external_ontology_note: >-
        dqv:value from W3C Data Quality Vocabulary would be semantically 
        appropriate but ontology not included in project. See 
        https://www.w3.org/TR/vocab-dqv/        

Verification Workflow

Before Adding New Mappings

  1. Check if ontology file exists:

    ls data/ontology/ | grep -i "<ontology-name>"
    
  2. Search for predicate in ontology:

    grep -l "<predicate-name>" data/ontology/*
    
  3. Verify predicate definition:

    grep -B2 -A5 "<predicate-name>" data/ontology/<file>
    
  4. If not found: Use hc: prefix with appropriate documentation

When Reviewing Existing Mappings

Run validation script:

# Find all slot_uri references
grep -r "slot_uri:" schemas/20251121/linkml/modules/slots/ | \
  grep -v "hc:" | \
  cut -d: -f3 | \
  sort -u

# Verify each prefix has a local file
for prefix in prov schema org skos dcterms foaf rico; do
  echo "Checking $prefix:"
  ls data/ontology/ | grep -i "$prefix" || echo "  NOT FOUND!"
done

Ontology Addition Process

If a new ontology is genuinely needed:

  1. Download the ontology:

    curl -L -o data/ontology/<name>.ttl "<url>" -H "Accept: text/turtle"
    
  2. Update ONTOLOGY_CATALOG.md:

    # Add entry to data/ontology/ONTOLOGY_CATALOG.md
    
  3. Verify predicates exist:

    grep "<predicate>" data/ontology/<name>.ttl
    
  4. Update LinkML prefixes in schema files


Examples

CORRECT: Verified Mapping

slots:
  retrieval_timestamp:
    slot_uri: prov:atTime  # Verified in data/ontology/prov-o.ttl
    range: datetime

CORRECT: Domain-Specific with External Reference

slots:
  confidence_score:
    slot_uri: hc:confidenceScore  # HC namespace (always valid)
    range: float
    close_mappings:
      - dqv:value  # External reference (documented, not required locally)
    annotations:
      ontology_note: >-
        Uses HC namespace as dqv: ontology not in local files.
        dqv:value would be semantically appropriate alternative.        

WRONG: Hallucinated Mapping

slots:
  confidence_score:
    slot_uri: dqv:value  # INVALID - dqv: not in data/ontology/!
    range: float

WRONG: Non-Existent Predicate

slots:
  frame_rate:
    slot_uri: premis:hasFrameRate  # INVALID - predicate not in premis3.owl!
    range: float

Consequences of Violation

  1. RDF serialization fails - Invalid prefixes cause gen-owl errors
  2. Schema validation errors - LinkML validates prefix declarations
  3. Broken interoperability - External systems cannot resolve URIs
  4. Data quality issues - Semantic web tooling cannot process data

PREMIS Ontology Reference (premis3.owl)

CRITICAL: The PREMIS ontology is frequently hallucinated. ALL premis: references MUST be verified.

Valid PREMIS Classes

Action, Agent, Bitstream, Copyright, Dependency, EnvironmentCharacteristic,
Event, File, Fixity, HardwareAgent, Identifier, Inhibitor, InstitutionalPolicy,
IntellectualEntity, License, Object, Organization, OutcomeStatus, Person,
PreservationPolicy, Representation, RightsBasis, RightsStatus, Rule, Signature,
SignatureEncoding, SignificantProperties, SoftwareAgent, Statute,
StorageLocation, StorageMedium

Valid PREMIS Properties

act, allows, basis, characteristic, citation, compositionLevel, dependency,
determinationDate, documentation, encoding, endDate, fixity, governs,
identifier, inhibitedBy, inhibits, jurisdiction, key, medium, note,
originalName, outcome, outcomeNote, policy, prohibits, purpose, rationale,
relationship, restriction, rightsStatus, signature, size, startDate,
storedAt, terms, validationRules, version

Known Hallucinated PREMIS Terms (DO NOT USE)

Hallucinated Term Correction
premis:PreservationEvent Use premis:Event
premis:RightsDeclaration Use premis:RightsBasis or premis:RightsStatus
premis:hasRightsStatement Use premis:rightsStatus
premis:hasRightsDeclaration Use premis:rightsStatus
premis:hasRepresentation Use premis:relationship or dcterms:hasFormat
premis:hasRelatedStatementInformation Use premis:note or adms:status
premis:hasObjectCharacteristics Use premis:characteristic
premis:rightsGranted Use premis:RightsStatus class with premis:restriction
premis:rightsEndDate Use premis:endDate
premis:linkingAgentIdentifier Use premis:Agent class
premis:storageLocation (lowercase) Use premis:storedAt property or premis:StorageLocation class
premis:hasFrameRate Does not exist - use hc:frameRate
premis:environmentCharacteristic (lowercase) Use premis:EnvironmentCharacteristic (class)

PREMIS Verification Commands

# List all PREMIS classes
grep -E "owl:Class.*premis" data/ontology/premis3.owl | \
  sed 's/.*v3\///' | sed 's/".*//' | sort -u

# List all PREMIS properties
grep -E "ObjectProperty|DatatypeProperty" data/ontology/premis3.owl | \
  grep -oP 'v3/\K[^"]+' | sort -u

# Verify a specific term exists
grep -c "YourTermHere" data/ontology/premis3.owl

See Also

  • Rule 38: Slot Centralization and Semantic URI Requirements
  • Rule 50: Ontology-to-LinkML Mapping Convention
  • /data/ontology/ONTOLOGY_CATALOG.md - Available ontologies
  • .opencode/rules/slot-ontology-mapping-reference.md - Mapping reference

Version History

  • 2025-01-13: Added 7 more hallucinated PREMIS terms discovered during schema audit:
    • premis:hasRightsStatement, premis:hasRightsDeclaration, premis:hasRepresentation
    • premis:hasRelatedStatementInformation, premis:rightsGranted, premis:rightsEndDate
    • premis:linkingAgentIdentifier
  • 2025-01-13: Initial creation after discovering dqv:, adms:, qudt: references without local files