- Introduced Rule 42: No Ontology Prefixes in Slot Names to enforce clean naming conventions. - Established Rule: No Rough Edits in Schema Files to ensure structural integrity during modifications. - Implemented Rule: No Version Indicators in Names to maintain stable semantic naming. - Created Rule: Ontology Detection vs Heuristics to emphasize the importance of verifying ontology definitions. - Defined Rule 50: Ontology-to-LinkML Mapping Convention to standardize mapping practices. - Added Rule: Polished Slot Storage Location to specify directory structure for polished slot files. - Enforced Rule: Preserve Bespoke Slots Until Refactoring to prevent unintended migrations during slot updates. - Instituted Rule 56: Semantic Consistency Over Simplicity to mandate execution of revisions in slot_fixes.yaml. - Added new Genealogy Archives Registry Enrichment class with multilingual support and structured aliases.
9.2 KiB
Rule 51: No Hallucinated Ontology References
Priority: CRITICAL
Scope: All LinkML schema files (schemas/20251121/linkml/)
Created: 2025-01-13
Summary
All ontology references in LinkML schema files (class_uri, slot_uri, *_mappings) MUST be verifiable against actual ontology files in /data/ontology/. References to predicates or classes that do not exist in local ontology files are considered hallucinated and are prohibited.
The Problem
AI agents may suggest ontology mappings based on training data without verifying that:
- The ontology file exists in
/data/ontology/ - The specific predicate/class exists within that ontology file
- The prefix is declared and resolvable
This leads to schema files containing references like dqv:value or adms:status that cannot be validated or serialized to RDF.
Requirements
1. All Ontology Prefixes Must Have Local Files
Before using a prefix (e.g., prov:, schema:, org:), verify the ontology file exists:
# Check if ontology exists
ls data/ontology/ | grep -i "prov\|schema\|org"
Available Ontologies (as of 2025-01-13):
| Prefix | File | Verified |
|---|---|---|
prov: |
prov-o.ttl, prov.ttl |
✅ |
schema: |
schemaorg.owl |
✅ |
org: |
org.rdf |
✅ |
skos: |
skos.rdf |
✅ |
dcterms: |
dublin_core_elements.rdf |
✅ |
foaf: |
foaf.ttl |
✅ |
rico: |
RiC-O_1-1.rdf |
✅ |
crm: |
CIDOC_CRM_v7.1.3.rdf |
✅ |
geo: |
geo.ttl |
✅ |
sosa: |
sosa.ttl |
✅ |
bf: |
bibframe.rdf |
✅ |
edm: |
edm.owl |
✅ |
premis: |
premis3.owl |
✅ |
dcat: |
dcat3.ttl |
✅ |
ore: |
ore.rdf |
✅ |
pico: |
pico.ttl |
✅ |
gn: |
geonames_ontology.rdf |
✅ |
time: |
time.ttl |
✅ |
locn: |
locn.ttl |
✅ |
dqv: |
dqv.ttl |
✅ |
adms: |
adms.ttl |
✅ |
NOT Available (do not use without adding):
| Prefix | Status | Alternative |
|---|---|---|
qudt: |
Only referenced in era_ontology.ttl | Use hc: with close_mappings annotation |
2. Predicates Must Exist in Ontology Files
Before using a predicate, verify it exists:
# Verify predicate exists
grep -l "hasFrameRate\|frameRate" data/ontology/premis3.owl
# Check specific predicate definition
grep -E "premis:hasFrameRate|:hasFrameRate" data/ontology/premis3.owl
3. Use hc: Prefix for Domain-Specific Concepts
When no standard ontology predicate exists, use the Heritage Custodian namespace:
# CORRECT - Use hc: with documentation
slots:
heritage_relevance_score:
slot_uri: hc:heritageRelevanceScore
description: Heritage sector relevance score (0.0-1.0)
annotations:
ontology_note: >-
No standard ontology predicate for heritage relevance scoring.
Domain-specific metric for this project.
# WRONG - Hallucinated predicate
slots:
heritage_relevance_score:
slot_uri: dqv:heritageScore # Does not exist!
4. Document External References in close_mappings
When a similar concept exists in an ontology we don't have locally, document it in close_mappings with a note:
slots:
confidence_score:
slot_uri: hc:confidenceScore
close_mappings:
- dqv:value # W3C Data Quality Vocabulary (not in local files)
annotations:
external_ontology_note: >-
dqv:value from W3C Data Quality Vocabulary would be semantically
appropriate but ontology not included in project. See
https://www.w3.org/TR/vocab-dqv/
Verification Workflow
Before Adding New Mappings
-
Check if ontology file exists:
ls data/ontology/ | grep -i "<ontology-name>" -
Search for predicate in ontology:
grep -l "<predicate-name>" data/ontology/* -
Verify predicate definition:
grep -B2 -A5 "<predicate-name>" data/ontology/<file> -
If not found: Use
hc:prefix with appropriate documentation
When Reviewing Existing Mappings
Run validation script:
# Find all slot_uri references
grep -r "slot_uri:" schemas/20251121/linkml/modules/slots/ | \
grep -v "hc:" | \
cut -d: -f3 | \
sort -u
# Verify each prefix has a local file
for prefix in prov schema org skos dcterms foaf rico; do
echo "Checking $prefix:"
ls data/ontology/ | grep -i "$prefix" || echo " NOT FOUND!"
done
Ontology Addition Process
If a new ontology is genuinely needed:
-
Download the ontology:
curl -L -o data/ontology/<name>.ttl "<url>" -H "Accept: text/turtle" -
Update ONTOLOGY_CATALOG.md:
# Add entry to data/ontology/ONTOLOGY_CATALOG.md -
Verify predicates exist:
grep "<predicate>" data/ontology/<name>.ttl -
Update LinkML prefixes in schema files
Examples
CORRECT: Verified Mapping
slots:
retrieval_timestamp:
slot_uri: prov:atTime # Verified in data/ontology/prov-o.ttl
range: datetime
CORRECT: Domain-Specific with External Reference
slots:
confidence_score:
slot_uri: hc:confidenceScore # HC namespace (always valid)
range: float
close_mappings:
- dqv:value # External reference (documented, not required locally)
annotations:
ontology_note: >-
Uses HC namespace as dqv: ontology not in local files.
dqv:value would be semantically appropriate alternative.
WRONG: Hallucinated Mapping
slots:
confidence_score:
slot_uri: dqv:value # INVALID - dqv: not in data/ontology/!
range: float
WRONG: Non-Existent Predicate
slots:
frame_rate:
slot_uri: premis:hasFrameRate # INVALID - predicate not in premis3.owl!
range: float
Consequences of Violation
- RDF serialization fails - Invalid prefixes cause gen-owl errors
- Schema validation errors - LinkML validates prefix declarations
- Broken interoperability - External systems cannot resolve URIs
- Data quality issues - Semantic web tooling cannot process data
PREMIS Ontology Reference (premis3.owl)
CRITICAL: The PREMIS ontology is frequently hallucinated. ALL premis: references MUST be verified.
Valid PREMIS Classes
Action, Agent, Bitstream, Copyright, Dependency, EnvironmentCharacteristic,
Event, File, Fixity, HardwareAgent, Identifier, Inhibitor, InstitutionalPolicy,
IntellectualEntity, License, Object, Organization, OutcomeStatus, Person,
PreservationPolicy, Representation, RightsBasis, RightsStatus, Rule, Signature,
SignatureEncoding, SignificantProperties, SoftwareAgent, Statute,
StorageLocation, StorageMedium
Valid PREMIS Properties
act, allows, basis, characteristic, citation, compositionLevel, dependency,
determinationDate, documentation, encoding, endDate, fixity, governs,
identifier, inhibitedBy, inhibits, jurisdiction, key, medium, note,
originalName, outcome, outcomeNote, policy, prohibits, purpose, rationale,
relationship, restriction, rightsStatus, signature, size, startDate,
storedAt, terms, validationRules, version
Known Hallucinated PREMIS Terms (DO NOT USE)
| Hallucinated Term | Correction |
|---|---|
premis:PreservationEvent |
Use premis:Event |
premis:RightsDeclaration |
Use premis:RightsBasis or premis:RightsStatus |
premis:hasRightsStatement |
Use premis:rightsStatus |
premis:hasRightsDeclaration |
Use premis:rightsStatus |
premis:hasRepresentation |
Use premis:relationship or dcterms:hasFormat |
premis:hasRelatedStatementInformation |
Use premis:note or adms:status |
premis:hasObjectCharacteristics |
Use premis:characteristic |
premis:rightsGranted |
Use premis:RightsStatus class with premis:restriction |
premis:rightsEndDate |
Use premis:endDate |
premis:linkingAgentIdentifier |
Use premis:Agent class |
premis:storageLocation (lowercase) |
Use premis:storedAt property or premis:StorageLocation class |
premis:hasFrameRate |
Does not exist - use hc:frameRate |
premis:environmentCharacteristic (lowercase) |
Use premis:EnvironmentCharacteristic (class) |
PREMIS Verification Commands
# List all PREMIS classes
grep -E "owl:Class.*premis" data/ontology/premis3.owl | \
sed 's/.*v3\///' | sed 's/".*//' | sort -u
# List all PREMIS properties
grep -E "ObjectProperty|DatatypeProperty" data/ontology/premis3.owl | \
grep -oP 'v3/\K[^"]+' | sort -u
# Verify a specific term exists
grep -c "YourTermHere" data/ontology/premis3.owl
See Also
- Rule 38: Slot Centralization and Semantic URI Requirements
- Rule 50: Ontology-to-LinkML Mapping Convention
/data/ontology/ONTOLOGY_CATALOG.md- Available ontologies.opencode/rules/slot-ontology-mapping-reference.md- Mapping reference
Version History
- 2025-01-13: Added 7 more hallucinated PREMIS terms discovered during schema audit:
premis:hasRightsStatement,premis:hasRightsDeclaration,premis:hasRepresentationpremis:hasRelatedStatementInformation,premis:rightsGranted,premis:rightsEndDatepremis:linkingAgentIdentifier
- 2025-01-13: Initial creation after discovering dqv:, adms:, qudt: references without local files