glam/.opencode/rules/slot-ontology-mapping-reference.md
kempersc 1fb924c412 feat: add ontology mappings to LinkML schema and enhance entity resolution
Schema enhancements (443 files):
- Add class_uri with proper ontology references (schema:, prov:, skos:, rico:)
- Add close_mappings, related_mappings per Rule 50 convention
- Replace stub hc: slot_uri with standard predicates (dcterms:identifier, skos:prefLabel)
- Improve descriptions with ontology mapping rationale
- Add prefixes blocks to all schema modules

Entity Resolution improvements:
- Add entity_resolution module with email semantics parsing
- Enhance build_entity_resolution.py with email-based matching signals
- Extend Entity Review API with filtering by signal types and count
- Add candidates caching and indexing for performance
- Add ReviewLoginPage component

New rules and documentation:
- Add Rule 51: No Hallucinated Ontology References
- Add .opencode/rules/no-hallucinated-ontology-references.md
- Add .opencode/rules/slot-ontology-mapping-reference.md
- Add adms.ttl and dqv.ttl ontology files

Frontend ontology support:
- Add RiC-O_1-1.rdf and schemaorg.owl to public/ontology
2026-01-13 13:51:02 +01:00

16 KiB

Slot Ontology Mapping Reference

This document provides the complete ontology predicate mappings for all 970 LinkML slots that previously had the hc: (Heritage Custodian) prefix.

Overview

Category Count Primary Ontologies
Provenance/Extraction ~50 prov:, dcterms:
Confidence/Scores ~62 dqv:, hc:
Boolean Flags ~131 hc:, org:
Names/Descriptions ~72 schema:, skos:, dcterms:
Types/Status ~60 dcterms:, hc:
Identifiers ~53 dcterms:, org:, schema:
URLs ~24 schema:, foaf:, dcterms:
Dates/Times ~22 schema:, prov:, dcterms:
Budget/Financial ~22 schema:, hc:
Counts/Quantities ~28 schema:, hc:
Relationships ~30 org:, dcterms:, rico:
Language/Text ~25 schema:, dcterms:, rdf:
Physical/Spatial ~30 schema:, geo:, hc:
Audio/Video ~30 schema:, dcterms:, hc:
TOTAL ~639

1. Provenance/Extraction Slots

Slot slot_uri Rationale
retrieval_timestamp prov:atTime Timestamp of retrieval event
extraction_timestamp prov:generatedAtTime When extraction result was created
archived_at prov:generatedAtTime When archived copy was created
retrieval_method prov:hadPlan The plan/procedure used for retrieval
generated_by prov:wasGeneratedBy Activity that generated this entity
created_by_project prov:wasAttributedTo Project agent responsible for creation
approved_by prov:wasAttributedTo Agent who approved (with role qualifier)
asserted_by prov:wasAttributedTo Agent who made the assertion
documented_by prov:hadPrimarySource Primary source documentation
retrieved_by prov:wasAssociatedWith Agent who performed retrieval
observed_entity prov:entity Entity being observed
observer_name prov:agent Observing agent
observer_type prov:hadRole Role/type of observer
observer_affiliation prov:actedOnBehalfOf Observer's affiliated organization
processing_status hc:processingStatus Workflow status (domain-specific)
processing_duration_seconds prov:value Duration value
processing_priority hc:processingPriority Priority (domain-specific)
validation_status hc:validationStatus Validation status (domain-specific)
backup_status hc:backupStatus Backup status (domain-specific)
verified prov:wasAttributedTo Verified by agent

2. Confidence/Score Slots

Slot slot_uri Rationale
confidence dqv:value W3C Data Quality Vocabulary
confidence_score dqv:value Data quality measurement
likelihood_score hc:likelihood_score Probabilistic assessment
heritage_relevance_score hc:heritage_relevance_score Heritage domain metric
museum_search_score hc:museum_search_score Template-specific relevance
library_search_score hc:library_search_score Template-specific relevance
archive_search_score hc:archive_search_score Template-specific relevance
collection_discovery_score hc:collection_discovery_score Template-specific relevance
person_research_score hc:person_research_score Template-specific relevance
location_browse_score hc:location_browse_score Template-specific relevance
identifier_lookup_score hc:identifier_lookup_score Template-specific relevance
organizational_change_score hc:organizational_change_score Template-specific relevance
digital_platform_score hc:digital_platform_score Template-specific relevance
general_heritage_score hc:general_heritage_score Fallback template score
specificity_score hc:specificity_score Schema design metric (Rule 37)
engagement_rate hc:engagement_rate Analytics ratio metric
visitor_conversion_rate hc:visitor_conversion_rate Analytics ratio metric
growth_rate hc:growth_rate Temporal change rate
funding_rate hc:funding_rate Funding rate metric

3. Boolean Flag Slots (Selected Examples)

Slot slot_uri Rationale
is_accredited hc:isAccredited Domain-specific (close: schema:hasCredential)
is_active hc:isActive Domain-specific (close: adms:status)
is_branch_of org:unitOf Direct match to W3C Org
is_heritage_relevant hc:isHeritageRelevant Core HC domain concept
is_public_facing hc:isPublicFacing Close to schema:publicAccess
is_current_position hc:isCurrentPosition Employment convenience flag
is_embeddable hc:isEmbeddable Platform content policy
is_made_for_kid hc:isMadeForKids Close to schema:isFamilyFriendly

Note: Most boolean flags use hc: prefix with close_mappings to related ontology terms, as standard ontologies rarely have boolean equivalents.


4. Name/Description Slots

Slot slot_uri Rationale
branch_name schema:name Entity name
branch_description schema:description Entity description
portal_name schema:name Entity name
portal_description schema:description Entity description
scheme_name skos:prefLabel SKOS concept scheme
scheme_description skos:definition SKOS definition
call_title dcterms:title Formal document title
call_description dcterms:description Formal document description
organization_name schema:name Organization entity name
employer_name schema:name Employer entity name
degree_name schema:name Academic credential name

Pattern:

  • Entity names → schema:name
  • Entity descriptions → schema:description
  • Concept schemes → skos:prefLabel / skos:definition
  • Formal documents → dcterms:title / dcterms:description

5. Type/Status Slots

Slot slot_uri Rationale
storage_type dcterms:type Type classification
branch_type dcterms:type Type classification
content_type dcterms:type Type classification
document_type dcterms:type Type classification
heritage_type dcterms:type Type classification
operational_status hc:operationalStatus Status (close: adms:status)
portal_status hc:portalStatus Status (close: adms:status)
project_status hc:projectStatus Status (close: adms:status)
loan_status hc:loanStatus Status (close: adms:status)
unesco_list_status hc:unescoListStatus Heritage-specific status

Pattern:

  • Types → dcterms:type (with exact_mappings to schema:additionalType)
  • Status → hc:*Status (with close_mappings to adms:status)

6. Identifier Slots

Slot slot_uri Rationale
activity_id dcterms:identifier General identifier
project_id dcterms:identifier General identifier
branch_id org:identifier Organizational unit
department_id org:identifier Organizational unit
unit_id org:identifier Organizational unit
video_id schema:identifier Media context
device_id schema:identifier Device context
scheme_id adms:identifier Formal scheme

Pattern:

  • General → dcterms:identifier
  • Organizational → org:identifier
  • Web/Media → schema:identifier
  • Formal schemes → adms:identifier

7. URL Slots

Slot slot_uri Rationale
portal_url schema:url Generic URL
repository_url schema:url Generic URL
platform_url schema:url Generic URL
recording_url schema:contentUrl Media content URL
thumbnail_url schema:contentUrl Media content URL
profile_image_url schema:contentUrl Media content URL
homepage_web_address foaf:homepage Primary homepage
vendor_url foaf:homepage Vendor homepage
profile_url foaf:page Page about entity
collection_web_address foaf:page Page about collection
financial_document_url dcterms:source Source document
rights_statement_url dcterms:source Authoritative source
scheme_url rdfs:seeAlso External reference

8. Date/Time Slots

Slot slot_uri Rationale
actual_start schema:startDate Event start
actual_end schema:endDate Event end
planned_start schema:startDate Planned start
planned_end schema:endDate Planned end
fiscal_year_start schema:startDate Period boundary
fiscal_year_end schema:endDate Period boundary
chapters_generated_at prov:generatedAtTime Generation timestamp
comment_published_at schema:datePublished Publication timestamp
comment_updated_at schema:dateModified Modification timestamp
start_time schema:startTime Media timing
end_time schema:endTime Media timing
start_seconds schema:startTime Media timing (numeric)
end_seconds schema:endTime Media timing (numeric)
programme_year dcterms:date Single temporal point

9. Budget/Financial Slots

Slot slot_uri Rationale
funding_amount schema:fundingAmount Direct match
external_funding hc:externalFunding Close to schema:fundingAmount
internal_funding hc:internalFunding Close to fibo:MonetaryAmount
total_budget hc:totalBudget Close to fibo:MonetaryAmount
operating_budget hc:operatingBudget Close to fibo:OperatingExpense
preservation_budget hc:preservationBudget Heritage domain-specific
total_revenue hc:totalRevenue Close to fibo:Revenue
total_expense hc:totalExpense Close to fibo:Expense
total_asset hc:totalAsset Close to fibo:Asset
total_liability hc:totalLiability Close to fibo:Liability
endowment_draw hc:endowmentDraw Related to fibo:Endowment

Note: Only funding_amount has direct schema.org match. Other financial terms use hc: with FIBO close_mappings.


10. Count/Quantity Slots

Slot slot_uri Rationale
comment_count schema:commentCount Direct match
word_count schema:wordCount Direct match
character_count schema:characterCount Direct match
visitor_count hc:visitorCount Close to maximumAttendeeCapacity
follower_count hc:followerCount Close to interactionStatistic
view_count hc:viewCount Close to interactionStatistic
like_count hc:likeCount Close to interactionStatistic
record_count hc:recordCount Close to rico:hasExtent
objects_count hc:objectsCount Close to crm:P57_has_number_of_parts
unique_face_count hc:uniqueFaceCount Close to sosa:hasSimpleResult

11. Relationship Slots

Slot slot_uri Rationale
aggregates_from dcterms:references Data source reference
feeds_portal dcterms:relation Generic data flow
provides_access_to dcterms:references Resource reference
platform_of org:unitOf Organizational unit
hosts_branch org:hasUnit Organizational hosting
parent_department org:subOrganizationOf Organizational hierarchy
parent_society org:memberOf Membership relationship
parent_programme rico:isOrWasPartOf Archival/temporal part-of
participating_custodian org:hasMember Participation
refers_to_person dcterms:references Entity reference
linked_to_collection rico:isOrWasIncludedIn Archival inclusion

12. Language/Text Slots

Slot slot_uri Rationale
nl, en, de, fr rdf:value Language-tagged values
alpha_2, alpha_3 skos:notation Standardized codes
default_language schema:inLanguage Content language
portal_language schema:inLanguage Content language
available_caption_languages schema:availableLanguage Available options
text_fragment schema:text Text content
full_text schema:text Text content
speech_text schema:text Text content

13. Physical/Spatial Slots

Slot slot_uri Rationale
building_floor_area_sqm schema:floorSize Direct match
warehouse_floor_area_sqm schema:floorSize Direct match
square_meters schema:size Standard measurement
capacity_linear_meters hc:capacityLinearMeters Archival standard (close: qudt)
temperature_min/max/target hc:temperature* Preservation (close: qudt)
humidity_min/max/target hc:humidity* Preservation (close: qudt)
geometry_type geo:hasGeometry GeoSPARQL standard
from_location schema:fromLocation Direct match
to_location schema:toLocation Direct match
physical_location schema:location Direct match
work_location schema:workLocation Direct match

14. Audio/Video Slots

Slot slot_uri Rationale
music_genre schema:genre Direct match
live_broadcast_content schema:isLiveBroadcast Direct match
subtitle_format dcterms:format Format description
transcript_format dcterms:format Format description
frame_rate hc:frameRate Domain-specific (no standard equivalent)
snr_db hc:signalToNoiseRatio Audio quality metric
diarization_* hc:diarization* Speaker processing
music_* hc:music* Music detection
speech_* hc:speech* Speech detection

Summary: Ontology Usage Distribution

Ontology Prefix Full Count Primary Use Cases
hc: ~500+ Domain-specific heritage concepts
schema: ~80 Names, URLs, dates, counts, locations
dcterms: ~60 Types, identifiers, formats, relations
prov: ~30 Provenance, generation, attribution
org: ~20 Organizational relationships
skos: ~15 Concept schemes, notations, labels
foaf: ~10 Homepage, page references
rico: ~5 Archival relationships
geo: ~5 Geographic/spatial
dqv: ~2 Data quality measurements
adms: ~5 Status, identifiers for assets

Implementation Notes

When to Use Standard Ontology Predicates

Use standard predicates when:

  1. Exact semantic match exists (e.g., schema:name, dcterms:identifier)
  2. Well-established patterns (e.g., schema:startDate/endDate)
  3. Interoperability is primary concern

When to Use hc: Prefix

Use domain-specific hc: prefix when:

  1. No standard equivalent exists (e.g., hc:heritageRelevanceScore)
  2. Domain-specific semantics (e.g., hc:capacityLinearMeters)
  3. Boolean flags without standard equivalents
  4. Technical processing metrics (e.g., hc:diarizationConfidence)

When using hc: prefix, always include appropriate mappings:

  • close_mappings: Similar semantics, different scope
  • related_mappings: Conceptually related
  • broad_mappings: External term is narrower
  • narrow_mappings: External term is broader

Version History

  • 2025-01-13: Added dqv.ttl and adms.ttl ontologies; fixed premis:hasFrameRate reference
  • 2025-01-13: Initial comprehensive mapping of 970 slots
  • Follows Rule 38 (Slot Centralization and Semantic URI)
  • Follows Rule 50 (Ontology-to-LinkML Mapping Convention)
  • Follows Rule 51 (No Hallucinated Ontology References)