Schema enhancements (443 files): - Add class_uri with proper ontology references (schema:, prov:, skos:, rico:) - Add close_mappings, related_mappings per Rule 50 convention - Replace stub hc: slot_uri with standard predicates (dcterms:identifier, skos:prefLabel) - Improve descriptions with ontology mapping rationale - Add prefixes blocks to all schema modules Entity Resolution improvements: - Add entity_resolution module with email semantics parsing - Enhance build_entity_resolution.py with email-based matching signals - Extend Entity Review API with filtering by signal types and count - Add candidates caching and indexing for performance - Add ReviewLoginPage component New rules and documentation: - Add Rule 51: No Hallucinated Ontology References - Add .opencode/rules/no-hallucinated-ontology-references.md - Add .opencode/rules/slot-ontology-mapping-reference.md - Add adms.ttl and dqv.ttl ontology files Frontend ontology support: - Add RiC-O_1-1.rdf and schemaorg.owl to public/ontology
359 lines
16 KiB
Markdown
359 lines
16 KiB
Markdown
# Slot Ontology Mapping Reference
|
|
|
|
This document provides the complete ontology predicate mappings for all 970 LinkML slots that previously had the `hc:` (Heritage Custodian) prefix.
|
|
|
|
## Overview
|
|
|
|
| Category | Count | Primary Ontologies |
|
|
|----------|-------|-------------------|
|
|
| Provenance/Extraction | ~50 | `prov:`, `dcterms:` |
|
|
| Confidence/Scores | ~62 | `dqv:`, `hc:` |
|
|
| Boolean Flags | ~131 | `hc:`, `org:` |
|
|
| Names/Descriptions | ~72 | `schema:`, `skos:`, `dcterms:` |
|
|
| Types/Status | ~60 | `dcterms:`, `hc:` |
|
|
| Identifiers | ~53 | `dcterms:`, `org:`, `schema:` |
|
|
| URLs | ~24 | `schema:`, `foaf:`, `dcterms:` |
|
|
| Dates/Times | ~22 | `schema:`, `prov:`, `dcterms:` |
|
|
| Budget/Financial | ~22 | `schema:`, `hc:` |
|
|
| Counts/Quantities | ~28 | `schema:`, `hc:` |
|
|
| Relationships | ~30 | `org:`, `dcterms:`, `rico:` |
|
|
| Language/Text | ~25 | `schema:`, `dcterms:`, `rdf:` |
|
|
| Physical/Spatial | ~30 | `schema:`, `geo:`, `hc:` |
|
|
| Audio/Video | ~30 | `schema:`, `dcterms:`, `hc:` |
|
|
| **TOTAL** | **~639** | |
|
|
|
|
---
|
|
|
|
## 1. Provenance/Extraction Slots
|
|
|
|
| Slot | slot_uri | Rationale |
|
|
|------|----------|-----------|
|
|
| `retrieval_timestamp` | `prov:atTime` | Timestamp of retrieval event |
|
|
| `extraction_timestamp` | `prov:generatedAtTime` | When extraction result was created |
|
|
| `archived_at` | `prov:generatedAtTime` | When archived copy was created |
|
|
| `retrieval_method` | `prov:hadPlan` | The plan/procedure used for retrieval |
|
|
| `generated_by` | `prov:wasGeneratedBy` | Activity that generated this entity |
|
|
| `created_by_project` | `prov:wasAttributedTo` | Project agent responsible for creation |
|
|
| `approved_by` | `prov:wasAttributedTo` | Agent who approved (with role qualifier) |
|
|
| `asserted_by` | `prov:wasAttributedTo` | Agent who made the assertion |
|
|
| `documented_by` | `prov:hadPrimarySource` | Primary source documentation |
|
|
| `retrieved_by` | `prov:wasAssociatedWith` | Agent who performed retrieval |
|
|
| `observed_entity` | `prov:entity` | Entity being observed |
|
|
| `observer_name` | `prov:agent` | Observing agent |
|
|
| `observer_type` | `prov:hadRole` | Role/type of observer |
|
|
| `observer_affiliation` | `prov:actedOnBehalfOf` | Observer's affiliated organization |
|
|
| `processing_status` | `hc:processingStatus` | Workflow status (domain-specific) |
|
|
| `processing_duration_seconds` | `prov:value` | Duration value |
|
|
| `processing_priority` | `hc:processingPriority` | Priority (domain-specific) |
|
|
| `validation_status` | `hc:validationStatus` | Validation status (domain-specific) |
|
|
| `backup_status` | `hc:backupStatus` | Backup status (domain-specific) |
|
|
| `verified` | `prov:wasAttributedTo` | Verified by agent |
|
|
|
|
---
|
|
|
|
## 2. Confidence/Score Slots
|
|
|
|
| Slot | slot_uri | Rationale |
|
|
|------|----------|-----------|
|
|
| `confidence` | `dqv:value` | W3C Data Quality Vocabulary |
|
|
| `confidence_score` | `dqv:value` | Data quality measurement |
|
|
| `likelihood_score` | `hc:likelihood_score` | Probabilistic assessment |
|
|
| `heritage_relevance_score` | `hc:heritage_relevance_score` | Heritage domain metric |
|
|
| `museum_search_score` | `hc:museum_search_score` | Template-specific relevance |
|
|
| `library_search_score` | `hc:library_search_score` | Template-specific relevance |
|
|
| `archive_search_score` | `hc:archive_search_score` | Template-specific relevance |
|
|
| `collection_discovery_score` | `hc:collection_discovery_score` | Template-specific relevance |
|
|
| `person_research_score` | `hc:person_research_score` | Template-specific relevance |
|
|
| `location_browse_score` | `hc:location_browse_score` | Template-specific relevance |
|
|
| `identifier_lookup_score` | `hc:identifier_lookup_score` | Template-specific relevance |
|
|
| `organizational_change_score` | `hc:organizational_change_score` | Template-specific relevance |
|
|
| `digital_platform_score` | `hc:digital_platform_score` | Template-specific relevance |
|
|
| `general_heritage_score` | `hc:general_heritage_score` | Fallback template score |
|
|
| `specificity_score` | `hc:specificity_score` | Schema design metric (Rule 37) |
|
|
| `engagement_rate` | `hc:engagement_rate` | Analytics ratio metric |
|
|
| `visitor_conversion_rate` | `hc:visitor_conversion_rate` | Analytics ratio metric |
|
|
| `growth_rate` | `hc:growth_rate` | Temporal change rate |
|
|
| `funding_rate` | `hc:funding_rate` | Funding rate metric |
|
|
|
|
---
|
|
|
|
## 3. Boolean Flag Slots (Selected Examples)
|
|
|
|
| Slot | slot_uri | Rationale |
|
|
|------|----------|-----------|
|
|
| `is_accredited` | `hc:isAccredited` | Domain-specific (close: schema:hasCredential) |
|
|
| `is_active` | `hc:isActive` | Domain-specific (close: adms:status) |
|
|
| `is_branch_of` | `org:unitOf` | **Direct match** to W3C Org |
|
|
| `is_heritage_relevant` | `hc:isHeritageRelevant` | Core HC domain concept |
|
|
| `is_public_facing` | `hc:isPublicFacing` | Close to schema:publicAccess |
|
|
| `is_current_position` | `hc:isCurrentPosition` | Employment convenience flag |
|
|
| `is_embeddable` | `hc:isEmbeddable` | Platform content policy |
|
|
| `is_made_for_kid` | `hc:isMadeForKids` | Close to schema:isFamilyFriendly |
|
|
|
|
**Note**: Most boolean flags use `hc:` prefix with `close_mappings` to related ontology terms, as standard ontologies rarely have boolean equivalents.
|
|
|
|
---
|
|
|
|
## 4. Name/Description Slots
|
|
|
|
| Slot | slot_uri | Rationale |
|
|
|------|----------|-----------|
|
|
| `branch_name` | `schema:name` | Entity name |
|
|
| `branch_description` | `schema:description` | Entity description |
|
|
| `portal_name` | `schema:name` | Entity name |
|
|
| `portal_description` | `schema:description` | Entity description |
|
|
| `scheme_name` | `skos:prefLabel` | SKOS concept scheme |
|
|
| `scheme_description` | `skos:definition` | SKOS definition |
|
|
| `call_title` | `dcterms:title` | Formal document title |
|
|
| `call_description` | `dcterms:description` | Formal document description |
|
|
| `organization_name` | `schema:name` | Organization entity name |
|
|
| `employer_name` | `schema:name` | Employer entity name |
|
|
| `degree_name` | `schema:name` | Academic credential name |
|
|
|
|
**Pattern**:
|
|
- Entity names → `schema:name`
|
|
- Entity descriptions → `schema:description`
|
|
- Concept schemes → `skos:prefLabel` / `skos:definition`
|
|
- Formal documents → `dcterms:title` / `dcterms:description`
|
|
|
|
---
|
|
|
|
## 5. Type/Status Slots
|
|
|
|
| Slot | slot_uri | Rationale |
|
|
|------|----------|-----------|
|
|
| `storage_type` | `dcterms:type` | Type classification |
|
|
| `branch_type` | `dcterms:type` | Type classification |
|
|
| `content_type` | `dcterms:type` | Type classification |
|
|
| `document_type` | `dcterms:type` | Type classification |
|
|
| `heritage_type` | `dcterms:type` | Type classification |
|
|
| `operational_status` | `hc:operationalStatus` | Status (close: adms:status) |
|
|
| `portal_status` | `hc:portalStatus` | Status (close: adms:status) |
|
|
| `project_status` | `hc:projectStatus` | Status (close: adms:status) |
|
|
| `loan_status` | `hc:loanStatus` | Status (close: adms:status) |
|
|
| `unesco_list_status` | `hc:unescoListStatus` | Heritage-specific status |
|
|
|
|
**Pattern**:
|
|
- Types → `dcterms:type` (with exact_mappings to `schema:additionalType`)
|
|
- Status → `hc:*Status` (with close_mappings to `adms:status`)
|
|
|
|
---
|
|
|
|
## 6. Identifier Slots
|
|
|
|
| Slot | slot_uri | Rationale |
|
|
|------|----------|-----------|
|
|
| `activity_id` | `dcterms:identifier` | General identifier |
|
|
| `project_id` | `dcterms:identifier` | General identifier |
|
|
| `branch_id` | `org:identifier` | Organizational unit |
|
|
| `department_id` | `org:identifier` | Organizational unit |
|
|
| `unit_id` | `org:identifier` | Organizational unit |
|
|
| `video_id` | `schema:identifier` | Media context |
|
|
| `device_id` | `schema:identifier` | Device context |
|
|
| `scheme_id` | `adms:identifier` | Formal scheme |
|
|
|
|
**Pattern**:
|
|
- General → `dcterms:identifier`
|
|
- Organizational → `org:identifier`
|
|
- Web/Media → `schema:identifier`
|
|
- Formal schemes → `adms:identifier`
|
|
|
|
---
|
|
|
|
## 7. URL Slots
|
|
|
|
| Slot | slot_uri | Rationale |
|
|
|------|----------|-----------|
|
|
| `portal_url` | `schema:url` | Generic URL |
|
|
| `repository_url` | `schema:url` | Generic URL |
|
|
| `platform_url` | `schema:url` | Generic URL |
|
|
| `recording_url` | `schema:contentUrl` | Media content URL |
|
|
| `thumbnail_url` | `schema:contentUrl` | Media content URL |
|
|
| `profile_image_url` | `schema:contentUrl` | Media content URL |
|
|
| `homepage_web_address` | `foaf:homepage` | Primary homepage |
|
|
| `vendor_url` | `foaf:homepage` | Vendor homepage |
|
|
| `profile_url` | `foaf:page` | Page about entity |
|
|
| `collection_web_address` | `foaf:page` | Page about collection |
|
|
| `financial_document_url` | `dcterms:source` | Source document |
|
|
| `rights_statement_url` | `dcterms:source` | Authoritative source |
|
|
| `scheme_url` | `rdfs:seeAlso` | External reference |
|
|
|
|
---
|
|
|
|
## 8. Date/Time Slots
|
|
|
|
| Slot | slot_uri | Rationale |
|
|
|------|----------|-----------|
|
|
| `actual_start` | `schema:startDate` | Event start |
|
|
| `actual_end` | `schema:endDate` | Event end |
|
|
| `planned_start` | `schema:startDate` | Planned start |
|
|
| `planned_end` | `schema:endDate` | Planned end |
|
|
| `fiscal_year_start` | `schema:startDate` | Period boundary |
|
|
| `fiscal_year_end` | `schema:endDate` | Period boundary |
|
|
| `chapters_generated_at` | `prov:generatedAtTime` | Generation timestamp |
|
|
| `comment_published_at` | `schema:datePublished` | Publication timestamp |
|
|
| `comment_updated_at` | `schema:dateModified` | Modification timestamp |
|
|
| `start_time` | `schema:startTime` | Media timing |
|
|
| `end_time` | `schema:endTime` | Media timing |
|
|
| `start_seconds` | `schema:startTime` | Media timing (numeric) |
|
|
| `end_seconds` | `schema:endTime` | Media timing (numeric) |
|
|
| `programme_year` | `dcterms:date` | Single temporal point |
|
|
|
|
---
|
|
|
|
## 9. Budget/Financial Slots
|
|
|
|
| Slot | slot_uri | Rationale |
|
|
|------|----------|-----------|
|
|
| `funding_amount` | `schema:fundingAmount` | **Direct match** |
|
|
| `external_funding` | `hc:externalFunding` | Close to schema:fundingAmount |
|
|
| `internal_funding` | `hc:internalFunding` | Close to fibo:MonetaryAmount |
|
|
| `total_budget` | `hc:totalBudget` | Close to fibo:MonetaryAmount |
|
|
| `operating_budget` | `hc:operatingBudget` | Close to fibo:OperatingExpense |
|
|
| `preservation_budget` | `hc:preservationBudget` | Heritage domain-specific |
|
|
| `total_revenue` | `hc:totalRevenue` | Close to fibo:Revenue |
|
|
| `total_expense` | `hc:totalExpense` | Close to fibo:Expense |
|
|
| `total_asset` | `hc:totalAsset` | Close to fibo:Asset |
|
|
| `total_liability` | `hc:totalLiability` | Close to fibo:Liability |
|
|
| `endowment_draw` | `hc:endowmentDraw` | Related to fibo:Endowment |
|
|
|
|
**Note**: Only `funding_amount` has direct schema.org match. Other financial terms use `hc:` with FIBO close_mappings.
|
|
|
|
---
|
|
|
|
## 10. Count/Quantity Slots
|
|
|
|
| Slot | slot_uri | Rationale |
|
|
|------|----------|-----------|
|
|
| `comment_count` | `schema:commentCount` | **Direct match** |
|
|
| `word_count` | `schema:wordCount` | **Direct match** |
|
|
| `character_count` | `schema:characterCount` | **Direct match** |
|
|
| `visitor_count` | `hc:visitorCount` | Close to maximumAttendeeCapacity |
|
|
| `follower_count` | `hc:followerCount` | Close to interactionStatistic |
|
|
| `view_count` | `hc:viewCount` | Close to interactionStatistic |
|
|
| `like_count` | `hc:likeCount` | Close to interactionStatistic |
|
|
| `record_count` | `hc:recordCount` | Close to rico:hasExtent |
|
|
| `objects_count` | `hc:objectsCount` | Close to crm:P57_has_number_of_parts |
|
|
| `unique_face_count` | `hc:uniqueFaceCount` | Close to sosa:hasSimpleResult |
|
|
|
|
---
|
|
|
|
## 11. Relationship Slots
|
|
|
|
| Slot | slot_uri | Rationale |
|
|
|------|----------|-----------|
|
|
| `aggregates_from` | `dcterms:references` | Data source reference |
|
|
| `feeds_portal` | `dcterms:relation` | Generic data flow |
|
|
| `provides_access_to` | `dcterms:references` | Resource reference |
|
|
| `platform_of` | `org:unitOf` | Organizational unit |
|
|
| `hosts_branch` | `org:hasUnit` | Organizational hosting |
|
|
| `parent_department` | `org:subOrganizationOf` | Organizational hierarchy |
|
|
| `parent_society` | `org:memberOf` | Membership relationship |
|
|
| `parent_programme` | `rico:isOrWasPartOf` | Archival/temporal part-of |
|
|
| `participating_custodian` | `org:hasMember` | Participation |
|
|
| `refers_to_person` | `dcterms:references` | Entity reference |
|
|
| `linked_to_collection` | `rico:isOrWasIncludedIn` | Archival inclusion |
|
|
|
|
---
|
|
|
|
## 12. Language/Text Slots
|
|
|
|
| Slot | slot_uri | Rationale |
|
|
|------|----------|-----------|
|
|
| `nl`, `en`, `de`, `fr` | `rdf:value` | Language-tagged values |
|
|
| `alpha_2`, `alpha_3` | `skos:notation` | Standardized codes |
|
|
| `default_language` | `schema:inLanguage` | Content language |
|
|
| `portal_language` | `schema:inLanguage` | Content language |
|
|
| `available_caption_languages` | `schema:availableLanguage` | Available options |
|
|
| `text_fragment` | `schema:text` | Text content |
|
|
| `full_text` | `schema:text` | Text content |
|
|
| `speech_text` | `schema:text` | Text content |
|
|
|
|
---
|
|
|
|
## 13. Physical/Spatial Slots
|
|
|
|
| Slot | slot_uri | Rationale |
|
|
|------|----------|-----------|
|
|
| `building_floor_area_sqm` | `schema:floorSize` | **Direct match** |
|
|
| `warehouse_floor_area_sqm` | `schema:floorSize` | **Direct match** |
|
|
| `square_meters` | `schema:size` | Standard measurement |
|
|
| `capacity_linear_meters` | `hc:capacityLinearMeters` | Archival standard (close: qudt) |
|
|
| `temperature_min/max/target` | `hc:temperature*` | Preservation (close: qudt) |
|
|
| `humidity_min/max/target` | `hc:humidity*` | Preservation (close: qudt) |
|
|
| `geometry_type` | `geo:hasGeometry` | GeoSPARQL standard |
|
|
| `from_location` | `schema:fromLocation` | **Direct match** |
|
|
| `to_location` | `schema:toLocation` | **Direct match** |
|
|
| `physical_location` | `schema:location` | **Direct match** |
|
|
| `work_location` | `schema:workLocation` | **Direct match** |
|
|
|
|
---
|
|
|
|
## 14. Audio/Video Slots
|
|
|
|
| Slot | slot_uri | Rationale |
|
|
|------|----------|-----------|
|
|
| `music_genre` | `schema:genre` | **Direct match** |
|
|
| `live_broadcast_content` | `schema:isLiveBroadcast` | **Direct match** |
|
|
| `subtitle_format` | `dcterms:format` | Format description |
|
|
| `transcript_format` | `dcterms:format` | Format description |
|
|
| `frame_rate` | `hc:frameRate` | Domain-specific (no standard equivalent) |
|
|
| `snr_db` | `hc:signalToNoiseRatio` | Audio quality metric |
|
|
| `diarization_*` | `hc:diarization*` | Speaker processing |
|
|
| `music_*` | `hc:music*` | Music detection |
|
|
| `speech_*` | `hc:speech*` | Speech detection |
|
|
|
|
---
|
|
|
|
## Summary: Ontology Usage Distribution
|
|
|
|
| Ontology Prefix | Full Count | Primary Use Cases |
|
|
|-----------------|------------|-------------------|
|
|
| `hc:` | ~500+ | Domain-specific heritage concepts |
|
|
| `schema:` | ~80 | Names, URLs, dates, counts, locations |
|
|
| `dcterms:` | ~60 | Types, identifiers, formats, relations |
|
|
| `prov:` | ~30 | Provenance, generation, attribution |
|
|
| `org:` | ~20 | Organizational relationships |
|
|
| `skos:` | ~15 | Concept schemes, notations, labels |
|
|
| `foaf:` | ~10 | Homepage, page references |
|
|
| `rico:` | ~5 | Archival relationships |
|
|
| `geo:` | ~5 | Geographic/spatial |
|
|
| `dqv:` | ~2 | Data quality measurements |
|
|
| `adms:` | ~5 | Status, identifiers for assets |
|
|
|
|
---
|
|
|
|
## Implementation Notes
|
|
|
|
### When to Use Standard Ontology Predicates
|
|
|
|
Use standard predicates when:
|
|
1. **Exact semantic match exists** (e.g., `schema:name`, `dcterms:identifier`)
|
|
2. **Well-established patterns** (e.g., `schema:startDate/endDate`)
|
|
3. **Interoperability is primary concern**
|
|
|
|
### When to Use hc: Prefix
|
|
|
|
Use domain-specific `hc:` prefix when:
|
|
1. **No standard equivalent exists** (e.g., `hc:heritageRelevanceScore`)
|
|
2. **Domain-specific semantics** (e.g., `hc:capacityLinearMeters`)
|
|
3. **Boolean flags without standard equivalents**
|
|
4. **Technical processing metrics** (e.g., `hc:diarizationConfidence`)
|
|
|
|
### Required Close/Related Mappings
|
|
|
|
When using `hc:` prefix, always include appropriate mappings:
|
|
- `close_mappings`: Similar semantics, different scope
|
|
- `related_mappings`: Conceptually related
|
|
- `broad_mappings`: External term is narrower
|
|
- `narrow_mappings`: External term is broader
|
|
|
|
---
|
|
|
|
## Version History
|
|
|
|
- **2025-01-13**: Added dqv.ttl and adms.ttl ontologies; fixed premis:hasFrameRate reference
|
|
- **2025-01-13**: Initial comprehensive mapping of 970 slots
|
|
- Follows Rule 38 (Slot Centralization and Semantic URI)
|
|
- Follows Rule 50 (Ontology-to-LinkML Mapping Convention)
|
|
- Follows Rule 51 (No Hallucinated Ontology References)
|