Schema enhancements (443 files): - Add class_uri with proper ontology references (schema:, prov:, skos:, rico:) - Add close_mappings, related_mappings per Rule 50 convention - Replace stub hc: slot_uri with standard predicates (dcterms:identifier, skos:prefLabel) - Improve descriptions with ontology mapping rationale - Add prefixes blocks to all schema modules Entity Resolution improvements: - Add entity_resolution module with email semantics parsing - Enhance build_entity_resolution.py with email-based matching signals - Extend Entity Review API with filtering by signal types and count - Add candidates caching and indexing for performance - Add ReviewLoginPage component New rules and documentation: - Add Rule 51: No Hallucinated Ontology References - Add .opencode/rules/no-hallucinated-ontology-references.md - Add .opencode/rules/slot-ontology-mapping-reference.md - Add adms.ttl and dqv.ttl ontology files Frontend ontology support: - Add RiC-O_1-1.rdf and schemaorg.owl to public/ontology
9.2 KiB
Rule 51: No Hallucinated Ontology References
Priority: CRITICAL
Scope: All LinkML schema files (schemas/20251121/linkml/)
Created: 2025-01-13
Summary
All ontology references in LinkML schema files (class_uri, slot_uri, *_mappings) MUST be verifiable against actual ontology files in /data/ontology/. References to predicates or classes that do not exist in local ontology files are considered hallucinated and are prohibited.
The Problem
AI agents may suggest ontology mappings based on training data without verifying that:
- The ontology file exists in
/data/ontology/ - The specific predicate/class exists within that ontology file
- The prefix is declared and resolvable
This leads to schema files containing references like dqv:value or adms:status that cannot be validated or serialized to RDF.
Requirements
1. All Ontology Prefixes Must Have Local Files
Before using a prefix (e.g., prov:, schema:, org:), verify the ontology file exists:
# Check if ontology exists
ls data/ontology/ | grep -i "prov\|schema\|org"
Available Ontologies (as of 2025-01-13):
| Prefix | File | Verified |
|---|---|---|
prov: |
prov-o.ttl, prov.ttl |
✅ |
schema: |
schemaorg.owl |
✅ |
org: |
org.rdf |
✅ |
skos: |
skos.rdf |
✅ |
dcterms: |
dublin_core_elements.rdf |
✅ |
foaf: |
foaf.ttl |
✅ |
rico: |
RiC-O_1-1.rdf |
✅ |
crm: |
CIDOC_CRM_v7.1.3.rdf |
✅ |
geo: |
geo.ttl |
✅ |
sosa: |
sosa.ttl |
✅ |
bf: |
bibframe.rdf |
✅ |
edm: |
edm.owl |
✅ |
premis: |
premis3.owl |
✅ |
dcat: |
dcat3.ttl |
✅ |
ore: |
ore.rdf |
✅ |
pico: |
pico.ttl |
✅ |
gn: |
geonames_ontology.rdf |
✅ |
time: |
time.ttl |
✅ |
locn: |
locn.ttl |
✅ |
dqv: |
dqv.ttl |
✅ |
adms: |
adms.ttl |
✅ |
NOT Available (do not use without adding):
| Prefix | Status | Alternative |
|---|---|---|
qudt: |
Only referenced in era_ontology.ttl | Use hc: with close_mappings annotation |
2. Predicates Must Exist in Ontology Files
Before using a predicate, verify it exists:
# Verify predicate exists
grep -l "hasFrameRate\|frameRate" data/ontology/premis3.owl
# Check specific predicate definition
grep -E "premis:hasFrameRate|:hasFrameRate" data/ontology/premis3.owl
3. Use hc: Prefix for Domain-Specific Concepts
When no standard ontology predicate exists, use the Heritage Custodian namespace:
# CORRECT - Use hc: with documentation
slots:
heritage_relevance_score:
slot_uri: hc:heritageRelevanceScore
description: Heritage sector relevance score (0.0-1.0)
annotations:
ontology_note: >-
No standard ontology predicate for heritage relevance scoring.
Domain-specific metric for this project.
# WRONG - Hallucinated predicate
slots:
heritage_relevance_score:
slot_uri: dqv:heritageScore # Does not exist!
4. Document External References in close_mappings
When a similar concept exists in an ontology we don't have locally, document it in close_mappings with a note:
slots:
confidence_score:
slot_uri: hc:confidenceScore
close_mappings:
- dqv:value # W3C Data Quality Vocabulary (not in local files)
annotations:
external_ontology_note: >-
dqv:value from W3C Data Quality Vocabulary would be semantically
appropriate but ontology not included in project. See
https://www.w3.org/TR/vocab-dqv/
Verification Workflow
Before Adding New Mappings
-
Check if ontology file exists:
ls data/ontology/ | grep -i "<ontology-name>" -
Search for predicate in ontology:
grep -l "<predicate-name>" data/ontology/* -
Verify predicate definition:
grep -B2 -A5 "<predicate-name>" data/ontology/<file> -
If not found: Use
hc:prefix with appropriate documentation
When Reviewing Existing Mappings
Run validation script:
# Find all slot_uri references
grep -r "slot_uri:" schemas/20251121/linkml/modules/slots/ | \
grep -v "hc:" | \
cut -d: -f3 | \
sort -u
# Verify each prefix has a local file
for prefix in prov schema org skos dcterms foaf rico; do
echo "Checking $prefix:"
ls data/ontology/ | grep -i "$prefix" || echo " NOT FOUND!"
done
Ontology Addition Process
If a new ontology is genuinely needed:
-
Download the ontology:
curl -L -o data/ontology/<name>.ttl "<url>" -H "Accept: text/turtle" -
Update ONTOLOGY_CATALOG.md:
# Add entry to data/ontology/ONTOLOGY_CATALOG.md -
Verify predicates exist:
grep "<predicate>" data/ontology/<name>.ttl -
Update LinkML prefixes in schema files
Examples
CORRECT: Verified Mapping
slots:
retrieval_timestamp:
slot_uri: prov:atTime # Verified in data/ontology/prov-o.ttl
range: datetime
CORRECT: Domain-Specific with External Reference
slots:
confidence_score:
slot_uri: hc:confidenceScore # HC namespace (always valid)
range: float
close_mappings:
- dqv:value # External reference (documented, not required locally)
annotations:
ontology_note: >-
Uses HC namespace as dqv: ontology not in local files.
dqv:value would be semantically appropriate alternative.
WRONG: Hallucinated Mapping
slots:
confidence_score:
slot_uri: dqv:value # INVALID - dqv: not in data/ontology/!
range: float
WRONG: Non-Existent Predicate
slots:
frame_rate:
slot_uri: premis:hasFrameRate # INVALID - predicate not in premis3.owl!
range: float
Consequences of Violation
- RDF serialization fails - Invalid prefixes cause gen-owl errors
- Schema validation errors - LinkML validates prefix declarations
- Broken interoperability - External systems cannot resolve URIs
- Data quality issues - Semantic web tooling cannot process data
PREMIS Ontology Reference (premis3.owl)
CRITICAL: The PREMIS ontology is frequently hallucinated. ALL premis: references MUST be verified.
Valid PREMIS Classes
Action, Agent, Bitstream, Copyright, Dependency, EnvironmentCharacteristic,
Event, File, Fixity, HardwareAgent, Identifier, Inhibitor, InstitutionalPolicy,
IntellectualEntity, License, Object, Organization, OutcomeStatus, Person,
PreservationPolicy, Representation, RightsBasis, RightsStatus, Rule, Signature,
SignatureEncoding, SignificantProperties, SoftwareAgent, Statute,
StorageLocation, StorageMedium
Valid PREMIS Properties
act, allows, basis, characteristic, citation, compositionLevel, dependency,
determinationDate, documentation, encoding, endDate, fixity, governs,
identifier, inhibitedBy, inhibits, jurisdiction, key, medium, note,
originalName, outcome, outcomeNote, policy, prohibits, purpose, rationale,
relationship, restriction, rightsStatus, signature, size, startDate,
storedAt, terms, validationRules, version
Known Hallucinated PREMIS Terms (DO NOT USE)
| Hallucinated Term | Correction |
|---|---|
premis:PreservationEvent |
Use premis:Event |
premis:RightsDeclaration |
Use premis:RightsBasis or premis:RightsStatus |
premis:hasRightsStatement |
Use premis:rightsStatus |
premis:hasRightsDeclaration |
Use premis:rightsStatus |
premis:hasRepresentation |
Use premis:relationship or dcterms:hasFormat |
premis:hasRelatedStatementInformation |
Use premis:note or adms:status |
premis:hasObjectCharacteristics |
Use premis:characteristic |
premis:rightsGranted |
Use premis:RightsStatus class with premis:restriction |
premis:rightsEndDate |
Use premis:endDate |
premis:linkingAgentIdentifier |
Use premis:Agent class |
premis:storageLocation (lowercase) |
Use premis:storedAt property or premis:StorageLocation class |
premis:hasFrameRate |
Does not exist - use hc:frameRate |
premis:environmentCharacteristic (lowercase) |
Use premis:EnvironmentCharacteristic (class) |
PREMIS Verification Commands
# List all PREMIS classes
grep -E "owl:Class.*premis" data/ontology/premis3.owl | \
sed 's/.*v3\///' | sed 's/".*//' | sort -u
# List all PREMIS properties
grep -E "ObjectProperty|DatatypeProperty" data/ontology/premis3.owl | \
grep -oP 'v3/\K[^"]+' | sort -u
# Verify a specific term exists
grep -c "YourTermHere" data/ontology/premis3.owl
See Also
- Rule 38: Slot Centralization and Semantic URI Requirements
- Rule 50: Ontology-to-LinkML Mapping Convention
/data/ontology/ONTOLOGY_CATALOG.md- Available ontologies.opencode/rules/slot-ontology-mapping-reference.md- Mapping reference
Version History
- 2025-01-13: Added 7 more hallucinated PREMIS terms discovered during schema audit:
premis:hasRightsStatement,premis:hasRightsDeclaration,premis:hasRepresentationpremis:hasRelatedStatementInformation,premis:rightsGranted,premis:rightsEndDatepremis:linkingAgentIdentifier
- 2025-01-13: Initial creation after discovering dqv:, adms:, qudt: references without local files