# Rule 51: No Hallucinated Ontology References **Priority**: CRITICAL **Scope**: All LinkML schema files (`schemas/20251121/linkml/`) **Created**: 2025-01-13 --- ## Summary All ontology references in LinkML schema files (`class_uri`, `slot_uri`, `*_mappings`) MUST be verifiable against actual ontology files in `/data/ontology/`. References to predicates or classes that do not exist in local ontology files are considered **hallucinated** and are prohibited. --- ## The Problem AI agents may suggest ontology mappings based on training data without verifying that: 1. The ontology file exists in `/data/ontology/` 2. The specific predicate/class exists within that ontology file 3. The prefix is declared and resolvable This leads to schema files containing references like `dqv:value` or `adms:status` that cannot be validated or serialized to RDF. --- ## Requirements ### 1. All Ontology Prefixes Must Have Local Files Before using a prefix (e.g., `prov:`, `schema:`, `org:`), verify the ontology file exists: ```bash # Check if ontology exists ls data/ontology/ | grep -i "prov\|schema\|org" ``` **Available Ontologies** (as of 2025-01-13): | Prefix | File | Verified | |--------|------|----------| | `prov:` | `prov-o.ttl`, `prov.ttl` | ✅ | | `schema:` | `schemaorg.owl` | ✅ | | `org:` | `org.rdf` | ✅ | | `skos:` | `skos.rdf` | ✅ | | `dcterms:` | `dublin_core_elements.rdf` | ✅ | | `foaf:` | `foaf.ttl` | ✅ | | `rico:` | `RiC-O_1-1.rdf` | ✅ | | `crm:` | `CIDOC_CRM_v7.1.3.rdf` | ✅ | | `geo:` | `geo.ttl` | ✅ | | `sosa:` | `sosa.ttl` | ✅ | | `bf:` | `bibframe.rdf` | ✅ | | `edm:` | `edm.owl` | ✅ | | `premis:` | `premis3.owl` | ✅ | | `dcat:` | `dcat3.ttl` | ✅ | | `ore:` | `ore.rdf` | ✅ | | `pico:` | `pico.ttl` | ✅ | | `gn:` | `geonames_ontology.rdf` | ✅ | | `time:` | `time.ttl` | ✅ | | `locn:` | `locn.ttl` | ✅ | | `dqv:` | `dqv.ttl` | ✅ | | `adms:` | `adms.ttl` | ✅ | **NOT Available** (do not use without adding): | Prefix | Status | Alternative | |--------|--------|-------------| | `qudt:` | Only referenced in era_ontology.ttl | Use `hc:` with close_mappings annotation | ### 2. Predicates Must Exist in Ontology Files Before using a predicate, verify it exists: ```bash # Verify predicate exists grep -l "hasFrameRate\|frameRate" data/ontology/premis3.owl # Check specific predicate definition grep -E "premis:hasFrameRate|:hasFrameRate" data/ontology/premis3.owl ``` ### 3. Use hc: Prefix for Domain-Specific Concepts When no standard ontology predicate exists, use the Heritage Custodian namespace: ```yaml # CORRECT - Use hc: with documentation slots: heritage_relevance_score: slot_uri: hc:heritageRelevanceScore description: Heritage sector relevance score (0.0-1.0) annotations: ontology_note: >- No standard ontology predicate for heritage relevance scoring. Domain-specific metric for this project. # WRONG - Hallucinated predicate slots: heritage_relevance_score: slot_uri: dqv:heritageScore # Does not exist! ``` ### 4. Document External References in close_mappings When a similar concept exists in an ontology we don't have locally, document it in `close_mappings` with a note: ```yaml slots: confidence_score: slot_uri: hc:confidenceScore close_mappings: - dqv:value # W3C Data Quality Vocabulary (not in local files) annotations: external_ontology_note: >- dqv:value from W3C Data Quality Vocabulary would be semantically appropriate but ontology not included in project. See https://www.w3.org/TR/vocab-dqv/ ``` --- ## Verification Workflow ### Before Adding New Mappings 1. **Check if ontology file exists**: ```bash ls data/ontology/ | grep -i "" ``` 2. **Search for predicate in ontology**: ```bash grep -l "" data/ontology/* ``` 3. **Verify predicate definition**: ```bash grep -B2 -A5 "" data/ontology/ ``` 4. **If not found**: Use `hc:` prefix with appropriate documentation ### When Reviewing Existing Mappings Run validation script: ```bash # Find all slot_uri references grep -r "slot_uri:" schemas/20251121/linkml/modules/slots/ | \ grep -v "hc:" | \ cut -d: -f3 | \ sort -u # Verify each prefix has a local file for prefix in prov schema org skos dcterms foaf rico; do echo "Checking $prefix:" ls data/ontology/ | grep -i "$prefix" || echo " NOT FOUND!" done ``` --- ## Ontology Addition Process If a new ontology is genuinely needed: 1. **Download the ontology**: ```bash curl -L -o data/ontology/.ttl "" -H "Accept: text/turtle" ``` 2. **Update ONTOLOGY_CATALOG.md**: ```bash # Add entry to data/ontology/ONTOLOGY_CATALOG.md ``` 3. **Verify predicates exist**: ```bash grep "" data/ontology/.ttl ``` 4. **Update LinkML prefixes** in schema files --- ## Examples ### CORRECT: Verified Mapping ```yaml slots: retrieval_timestamp: slot_uri: prov:atTime # Verified in data/ontology/prov-o.ttl range: datetime ``` ### CORRECT: Domain-Specific with External Reference ```yaml slots: confidence_score: slot_uri: hc:confidenceScore # HC namespace (always valid) range: float close_mappings: - dqv:value # External reference (documented, not required locally) annotations: ontology_note: >- Uses HC namespace as dqv: ontology not in local files. dqv:value would be semantically appropriate alternative. ``` ### WRONG: Hallucinated Mapping ```yaml slots: confidence_score: slot_uri: dqv:value # INVALID - dqv: not in data/ontology/! range: float ``` ### WRONG: Non-Existent Predicate ```yaml slots: frame_rate: slot_uri: premis:hasFrameRate # INVALID - predicate not in premis3.owl! range: float ``` --- ## Consequences of Violation 1. **RDF serialization fails** - Invalid prefixes cause gen-owl errors 2. **Schema validation errors** - LinkML validates prefix declarations 3. **Broken interoperability** - External systems cannot resolve URIs 4. **Data quality issues** - Semantic web tooling cannot process data --- ## PREMIS Ontology Reference (premis3.owl) **CRITICAL**: The PREMIS ontology is frequently hallucinated. ALL premis: references MUST be verified. ### Valid PREMIS Classes ``` Action, Agent, Bitstream, Copyright, Dependency, EnvironmentCharacteristic, Event, File, Fixity, HardwareAgent, Identifier, Inhibitor, InstitutionalPolicy, IntellectualEntity, License, Object, Organization, OutcomeStatus, Person, PreservationPolicy, Representation, RightsBasis, RightsStatus, Rule, Signature, SignatureEncoding, SignificantProperties, SoftwareAgent, Statute, StorageLocation, StorageMedium ``` ### Valid PREMIS Properties ``` act, allows, basis, characteristic, citation, compositionLevel, dependency, determinationDate, documentation, encoding, endDate, fixity, governs, identifier, inhibitedBy, inhibits, jurisdiction, key, medium, note, originalName, outcome, outcomeNote, policy, prohibits, purpose, rationale, relationship, restriction, rightsStatus, signature, size, startDate, storedAt, terms, validationRules, version ``` ### Known Hallucinated PREMIS Terms (DO NOT USE) | Hallucinated Term | Correction | |-------------------|------------| | `premis:PreservationEvent` | Use `premis:Event` | | `premis:RightsDeclaration` | Use `premis:RightsBasis` or `premis:RightsStatus` | | `premis:hasRightsStatement` | Use `premis:rightsStatus` | | `premis:hasRightsDeclaration` | Use `premis:rightsStatus` | | `premis:hasRepresentation` | Use `premis:relationship` or `dcterms:hasFormat` | | `premis:hasRelatedStatementInformation` | Use `premis:note` or `adms:status` | | `premis:hasObjectCharacteristics` | Use `premis:characteristic` | | `premis:rightsGranted` | Use `premis:RightsStatus` class with `premis:restriction` | | `premis:rightsEndDate` | Use `premis:endDate` | | `premis:linkingAgentIdentifier` | Use `premis:Agent` class | | `premis:storageLocation` (lowercase) | Use `premis:storedAt` property or `premis:StorageLocation` class | | `premis:hasFrameRate` | Does not exist - use `hc:frameRate` | | `premis:environmentCharacteristic` (lowercase) | Use `premis:EnvironmentCharacteristic` (class) | ### PREMIS Verification Commands ```bash # List all PREMIS classes grep -E "owl:Class.*premis" data/ontology/premis3.owl | \ sed 's/.*v3\///' | sed 's/".*//' | sort -u # List all PREMIS properties grep -E "ObjectProperty|DatatypeProperty" data/ontology/premis3.owl | \ grep -oP 'v3/\K[^"]+' | sort -u # Verify a specific term exists grep -c "YourTermHere" data/ontology/premis3.owl ``` --- ## See Also - Rule 38: Slot Centralization and Semantic URI Requirements - Rule 50: Ontology-to-LinkML Mapping Convention - `/data/ontology/ONTOLOGY_CATALOG.md` - Available ontologies - `.opencode/rules/slot-ontology-mapping-reference.md` - Mapping reference --- ## Version History - **2025-01-13**: Added 7 more hallucinated PREMIS terms discovered during schema audit: - `premis:hasRightsStatement`, `premis:hasRightsDeclaration`, `premis:hasRepresentation` - `premis:hasRelatedStatementInformation`, `premis:rightsGranted`, `premis:rightsEndDate` - `premis:linkingAgentIdentifier` - **2025-01-13**: Initial creation after discovering dqv:, adms:, qudt: references without local files