Schema enhancements (443 files): - Add class_uri with proper ontology references (schema:, prov:, skos:, rico:) - Add close_mappings, related_mappings per Rule 50 convention - Replace stub hc: slot_uri with standard predicates (dcterms:identifier, skos:prefLabel) - Improve descriptions with ontology mapping rationale - Add prefixes blocks to all schema modules Entity Resolution improvements: - Add entity_resolution module with email semantics parsing - Enhance build_entity_resolution.py with email-based matching signals - Extend Entity Review API with filtering by signal types and count - Add candidates caching and indexing for performance - Add ReviewLoginPage component New rules and documentation: - Add Rule 51: No Hallucinated Ontology References - Add .opencode/rules/no-hallucinated-ontology-references.md - Add .opencode/rules/slot-ontology-mapping-reference.md - Add adms.ttl and dqv.ttl ontology files Frontend ontology support: - Add RiC-O_1-1.rdf and schemaorg.owl to public/ontology
316 lines
9.2 KiB
Markdown
316 lines
9.2 KiB
Markdown
# Rule 51: No Hallucinated Ontology References
|
|
|
|
**Priority**: CRITICAL
|
|
**Scope**: All LinkML schema files (`schemas/20251121/linkml/`)
|
|
**Created**: 2025-01-13
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
All ontology references in LinkML schema files (`class_uri`, `slot_uri`, `*_mappings`) MUST be verifiable against actual ontology files in `/data/ontology/`. References to predicates or classes that do not exist in local ontology files are considered **hallucinated** and are prohibited.
|
|
|
|
---
|
|
|
|
## The Problem
|
|
|
|
AI agents may suggest ontology mappings based on training data without verifying that:
|
|
1. The ontology file exists in `/data/ontology/`
|
|
2. The specific predicate/class exists within that ontology file
|
|
3. The prefix is declared and resolvable
|
|
|
|
This leads to schema files containing references like `dqv:value` or `adms:status` that cannot be validated or serialized to RDF.
|
|
|
|
---
|
|
|
|
## Requirements
|
|
|
|
### 1. All Ontology Prefixes Must Have Local Files
|
|
|
|
Before using a prefix (e.g., `prov:`, `schema:`, `org:`), verify the ontology file exists:
|
|
|
|
```bash
|
|
# Check if ontology exists
|
|
ls data/ontology/ | grep -i "prov\|schema\|org"
|
|
```
|
|
|
|
**Available Ontologies** (as of 2025-01-13):
|
|
|
|
| Prefix | File | Verified |
|
|
|--------|------|----------|
|
|
| `prov:` | `prov-o.ttl`, `prov.ttl` | ✅ |
|
|
| `schema:` | `schemaorg.owl` | ✅ |
|
|
| `org:` | `org.rdf` | ✅ |
|
|
| `skos:` | `skos.rdf` | ✅ |
|
|
| `dcterms:` | `dublin_core_elements.rdf` | ✅ |
|
|
| `foaf:` | `foaf.ttl` | ✅ |
|
|
| `rico:` | `RiC-O_1-1.rdf` | ✅ |
|
|
| `crm:` | `CIDOC_CRM_v7.1.3.rdf` | ✅ |
|
|
| `geo:` | `geo.ttl` | ✅ |
|
|
| `sosa:` | `sosa.ttl` | ✅ |
|
|
| `bf:` | `bibframe.rdf` | ✅ |
|
|
| `edm:` | `edm.owl` | ✅ |
|
|
| `premis:` | `premis3.owl` | ✅ |
|
|
| `dcat:` | `dcat3.ttl` | ✅ |
|
|
| `ore:` | `ore.rdf` | ✅ |
|
|
| `pico:` | `pico.ttl` | ✅ |
|
|
| `gn:` | `geonames_ontology.rdf` | ✅ |
|
|
| `time:` | `time.ttl` | ✅ |
|
|
| `locn:` | `locn.ttl` | ✅ |
|
|
| `dqv:` | `dqv.ttl` | ✅ |
|
|
| `adms:` | `adms.ttl` | ✅ |
|
|
|
|
**NOT Available** (do not use without adding):
|
|
|
|
| Prefix | Status | Alternative |
|
|
|--------|--------|-------------|
|
|
| `qudt:` | Only referenced in era_ontology.ttl | Use `hc:` with close_mappings annotation |
|
|
|
|
### 2. Predicates Must Exist in Ontology Files
|
|
|
|
Before using a predicate, verify it exists:
|
|
|
|
```bash
|
|
# Verify predicate exists
|
|
grep -l "hasFrameRate\|frameRate" data/ontology/premis3.owl
|
|
|
|
# Check specific predicate definition
|
|
grep -E "premis:hasFrameRate|:hasFrameRate" data/ontology/premis3.owl
|
|
```
|
|
|
|
### 3. Use hc: Prefix for Domain-Specific Concepts
|
|
|
|
When no standard ontology predicate exists, use the Heritage Custodian namespace:
|
|
|
|
```yaml
|
|
# CORRECT - Use hc: with documentation
|
|
slots:
|
|
heritage_relevance_score:
|
|
slot_uri: hc:heritageRelevanceScore
|
|
description: Heritage sector relevance score (0.0-1.0)
|
|
annotations:
|
|
ontology_note: >-
|
|
No standard ontology predicate for heritage relevance scoring.
|
|
Domain-specific metric for this project.
|
|
|
|
# WRONG - Hallucinated predicate
|
|
slots:
|
|
heritage_relevance_score:
|
|
slot_uri: dqv:heritageScore # Does not exist!
|
|
```
|
|
|
|
### 4. Document External References in close_mappings
|
|
|
|
When a similar concept exists in an ontology we don't have locally, document it in `close_mappings` with a note:
|
|
|
|
```yaml
|
|
slots:
|
|
confidence_score:
|
|
slot_uri: hc:confidenceScore
|
|
close_mappings:
|
|
- dqv:value # W3C Data Quality Vocabulary (not in local files)
|
|
annotations:
|
|
external_ontology_note: >-
|
|
dqv:value from W3C Data Quality Vocabulary would be semantically
|
|
appropriate but ontology not included in project. See
|
|
https://www.w3.org/TR/vocab-dqv/
|
|
```
|
|
|
|
---
|
|
|
|
## Verification Workflow
|
|
|
|
### Before Adding New Mappings
|
|
|
|
1. **Check if ontology file exists**:
|
|
```bash
|
|
ls data/ontology/ | grep -i "<ontology-name>"
|
|
```
|
|
|
|
2. **Search for predicate in ontology**:
|
|
```bash
|
|
grep -l "<predicate-name>" data/ontology/*
|
|
```
|
|
|
|
3. **Verify predicate definition**:
|
|
```bash
|
|
grep -B2 -A5 "<predicate-name>" data/ontology/<file>
|
|
```
|
|
|
|
4. **If not found**: Use `hc:` prefix with appropriate documentation
|
|
|
|
### When Reviewing Existing Mappings
|
|
|
|
Run validation script:
|
|
|
|
```bash
|
|
# Find all slot_uri references
|
|
grep -r "slot_uri:" schemas/20251121/linkml/modules/slots/ | \
|
|
grep -v "hc:" | \
|
|
cut -d: -f3 | \
|
|
sort -u
|
|
|
|
# Verify each prefix has a local file
|
|
for prefix in prov schema org skos dcterms foaf rico; do
|
|
echo "Checking $prefix:"
|
|
ls data/ontology/ | grep -i "$prefix" || echo " NOT FOUND!"
|
|
done
|
|
```
|
|
|
|
---
|
|
|
|
## Ontology Addition Process
|
|
|
|
If a new ontology is genuinely needed:
|
|
|
|
1. **Download the ontology**:
|
|
```bash
|
|
curl -L -o data/ontology/<name>.ttl "<url>" -H "Accept: text/turtle"
|
|
```
|
|
|
|
2. **Update ONTOLOGY_CATALOG.md**:
|
|
```bash
|
|
# Add entry to data/ontology/ONTOLOGY_CATALOG.md
|
|
```
|
|
|
|
3. **Verify predicates exist**:
|
|
```bash
|
|
grep "<predicate>" data/ontology/<name>.ttl
|
|
```
|
|
|
|
4. **Update LinkML prefixes** in schema files
|
|
|
|
---
|
|
|
|
## Examples
|
|
|
|
### CORRECT: Verified Mapping
|
|
|
|
```yaml
|
|
slots:
|
|
retrieval_timestamp:
|
|
slot_uri: prov:atTime # Verified in data/ontology/prov-o.ttl
|
|
range: datetime
|
|
```
|
|
|
|
### CORRECT: Domain-Specific with External Reference
|
|
|
|
```yaml
|
|
slots:
|
|
confidence_score:
|
|
slot_uri: hc:confidenceScore # HC namespace (always valid)
|
|
range: float
|
|
close_mappings:
|
|
- dqv:value # External reference (documented, not required locally)
|
|
annotations:
|
|
ontology_note: >-
|
|
Uses HC namespace as dqv: ontology not in local files.
|
|
dqv:value would be semantically appropriate alternative.
|
|
```
|
|
|
|
### WRONG: Hallucinated Mapping
|
|
|
|
```yaml
|
|
slots:
|
|
confidence_score:
|
|
slot_uri: dqv:value # INVALID - dqv: not in data/ontology/!
|
|
range: float
|
|
```
|
|
|
|
### WRONG: Non-Existent Predicate
|
|
|
|
```yaml
|
|
slots:
|
|
frame_rate:
|
|
slot_uri: premis:hasFrameRate # INVALID - predicate not in premis3.owl!
|
|
range: float
|
|
```
|
|
|
|
---
|
|
|
|
## Consequences of Violation
|
|
|
|
1. **RDF serialization fails** - Invalid prefixes cause gen-owl errors
|
|
2. **Schema validation errors** - LinkML validates prefix declarations
|
|
3. **Broken interoperability** - External systems cannot resolve URIs
|
|
4. **Data quality issues** - Semantic web tooling cannot process data
|
|
|
|
---
|
|
|
|
## PREMIS Ontology Reference (premis3.owl)
|
|
|
|
**CRITICAL**: The PREMIS ontology is frequently hallucinated. ALL premis: references MUST be verified.
|
|
|
|
### Valid PREMIS Classes
|
|
|
|
```
|
|
Action, Agent, Bitstream, Copyright, Dependency, EnvironmentCharacteristic,
|
|
Event, File, Fixity, HardwareAgent, Identifier, Inhibitor, InstitutionalPolicy,
|
|
IntellectualEntity, License, Object, Organization, OutcomeStatus, Person,
|
|
PreservationPolicy, Representation, RightsBasis, RightsStatus, Rule, Signature,
|
|
SignatureEncoding, SignificantProperties, SoftwareAgent, Statute,
|
|
StorageLocation, StorageMedium
|
|
```
|
|
|
|
### Valid PREMIS Properties
|
|
|
|
```
|
|
act, allows, basis, characteristic, citation, compositionLevel, dependency,
|
|
determinationDate, documentation, encoding, endDate, fixity, governs,
|
|
identifier, inhibitedBy, inhibits, jurisdiction, key, medium, note,
|
|
originalName, outcome, outcomeNote, policy, prohibits, purpose, rationale,
|
|
relationship, restriction, rightsStatus, signature, size, startDate,
|
|
storedAt, terms, validationRules, version
|
|
```
|
|
|
|
### Known Hallucinated PREMIS Terms (DO NOT USE)
|
|
|
|
| Hallucinated Term | Correction |
|
|
|-------------------|------------|
|
|
| `premis:PreservationEvent` | Use `premis:Event` |
|
|
| `premis:RightsDeclaration` | Use `premis:RightsBasis` or `premis:RightsStatus` |
|
|
| `premis:hasRightsStatement` | Use `premis:rightsStatus` |
|
|
| `premis:hasRightsDeclaration` | Use `premis:rightsStatus` |
|
|
| `premis:hasRepresentation` | Use `premis:relationship` or `dcterms:hasFormat` |
|
|
| `premis:hasRelatedStatementInformation` | Use `premis:note` or `adms:status` |
|
|
| `premis:hasObjectCharacteristics` | Use `premis:characteristic` |
|
|
| `premis:rightsGranted` | Use `premis:RightsStatus` class with `premis:restriction` |
|
|
| `premis:rightsEndDate` | Use `premis:endDate` |
|
|
| `premis:linkingAgentIdentifier` | Use `premis:Agent` class |
|
|
| `premis:storageLocation` (lowercase) | Use `premis:storedAt` property or `premis:StorageLocation` class |
|
|
| `premis:hasFrameRate` | Does not exist - use `hc:frameRate` |
|
|
| `premis:environmentCharacteristic` (lowercase) | Use `premis:EnvironmentCharacteristic` (class) |
|
|
|
|
### PREMIS Verification Commands
|
|
|
|
```bash
|
|
# List all PREMIS classes
|
|
grep -E "owl:Class.*premis" data/ontology/premis3.owl | \
|
|
sed 's/.*v3\///' | sed 's/".*//' | sort -u
|
|
|
|
# List all PREMIS properties
|
|
grep -E "ObjectProperty|DatatypeProperty" data/ontology/premis3.owl | \
|
|
grep -oP 'v3/\K[^"]+' | sort -u
|
|
|
|
# Verify a specific term exists
|
|
grep -c "YourTermHere" data/ontology/premis3.owl
|
|
```
|
|
|
|
---
|
|
|
|
## See Also
|
|
|
|
- Rule 38: Slot Centralization and Semantic URI Requirements
|
|
- Rule 50: Ontology-to-LinkML Mapping Convention
|
|
- `/data/ontology/ONTOLOGY_CATALOG.md` - Available ontologies
|
|
- `.opencode/rules/slot-ontology-mapping-reference.md` - Mapping reference
|
|
|
|
---
|
|
|
|
## Version History
|
|
|
|
- **2025-01-13**: Added 7 more hallucinated PREMIS terms discovered during schema audit:
|
|
- `premis:hasRightsStatement`, `premis:hasRightsDeclaration`, `premis:hasRepresentation`
|
|
- `premis:hasRelatedStatementInformation`, `premis:rightsGranted`, `premis:rightsEndDate`
|
|
- `premis:linkingAgentIdentifier`
|
|
- **2025-01-13**: Initial creation after discovering dqv:, adms:, qudt: references without local files
|