- Removed unused slots from TaxonomicAuthority, TechnicalFeature, TelevisionArchive, TentativeWorldHeritageSite, Threat, TimeSpan, Title, TradeRegister, TradeUnionArchive, TradeUnionArchiveRecordSetType, TransferEvent, UNESCODomain, UnitIdentifier, UniversityArchive, UnspecifiedType, UserCommunity, Venue, Vereinsarchiv, Verlagsarchiv, VerlagsarchivRecordSetType, Version, Verwaltungsarchiv, VideoAnnotationTypes, VideoAudioAnnotation, VideoFrame, VideoPost, VideoSubtitle, VideoTextContent, Warehouse, WebArchive, WebClaim, WebClaimsBlock, WebLink, WebPortal, WebPortalTypes, WomensArchives, WordCount, WorldHeritageSite, WritingSystem, and XPathScore. - Introduced new slot is_or_was_retrieved_at for tracking data retrieval timestamps.
241 lines
10 KiB
YAML
241 lines
10 KiB
YAML
id: https://nde.nl/ontology/hc/class/VideoTextContent
|
|
name: video_text_content_class
|
|
title: Video Text Content Class
|
|
imports:
|
|
- linkml:types
|
|
- ../enums/GenerationMethodEnum
|
|
- ../slots/content_title
|
|
- ../slots/has_or_had_language
|
|
- ../slots/has_or_had_quantity
|
|
- ../slots/has_or_had_score
|
|
- ../slots/is_or_was_generated_by
|
|
- ../slots/is_or_was_verified_by
|
|
- ../slots/is_verified
|
|
- ../slots/model_provider
|
|
- ../slots/model_version
|
|
- ../slots/overall_confidence
|
|
- ../slots/processing_duration_seconds
|
|
- ../slots/source_video
|
|
- ../slots/source_video_url
|
|
- ../slots/specificity_annotation
|
|
- ../slots/temporal_extent
|
|
- ./Methodology
|
|
- ./Quantity
|
|
- ./SpecificityAnnotation
|
|
- ./TemplateSpecificityScore
|
|
- ./TemplateSpecificityType
|
|
- ./TemplateSpecificityTypes
|
|
- ./TimeSpan
|
|
- ./Verifier
|
|
- ./VideoPost
|
|
- ./GenerationEvent
|
|
- ./Language
|
|
prefixes:
|
|
linkml: https://w3id.org/linkml/
|
|
hc: https://nde.nl/ontology/hc/
|
|
schema: http://schema.org/
|
|
dcterms: http://purl.org/dc/terms/
|
|
prov: http://www.w3.org/ns/prov#
|
|
crm: http://www.cidoc-crm.org/cidoc-crm/
|
|
skos: http://www.w3.org/2004/02/skos/core#
|
|
oa: http://www.w3.org/ns/oa#
|
|
default_prefix: hc
|
|
classes:
|
|
VideoTextContent:
|
|
class_uri: crm:E73_Information_Object
|
|
abstract: true
|
|
description: "Abstract base class for all textual/derived content from videos.\n\n**DEFINITION**:\n\nVideoTextContent is the abstract parent for all text that is extracted,\ntranscribed, or derived from video content. This includes:\n\n| Subclass | Source | Description |\n|----------|--------|-------------|\n| VideoTranscript | Audio | Full text transcription of spoken content |\n| VideoSubtitle | Audio | Time-coded caption entries (SRT/VTT) |\n| VideoAnnotation | Visual | CV/multimodal-derived descriptions |\n\n**PROVENANCE REQUIREMENTS**:\n\nAll video-derived text MUST include comprehensive provenance:\n\n1. **Source**: Which video was processed (`source_video`)\n2. **Method**: How was content generated (`generation_method`)\n3. **Agent**: Who/what generated it (`generated_by`)\n4. **Time**: When was it generated (`generation_timestamp`)\n5. **Version**: Tool/model version (`model_version`)\n6. **Quality**: Overall confidence (`overall_confidence`)\n\n**PROV-O ALIGNMENT**:\n\nMaps\
|
|
\ to W3C PROV-O for provenance tracking:\n\n```turtle\n:transcript a hc:VideoTranscript ;\n prov:wasGeneratedBy :asr_activity ;\n prov:wasAttributedTo :whisper_model ;\n prov:generatedAtTime \"2025-12-01T10:00:00Z\" ;\n prov:wasDerivedFrom :source_video .\n```\n\n**CIDOC-CRM E73_Information_Object**:\n\n- E73 is the base for all identifiable immaterial items\n- Includes texts, computer programs, songs, recipes\n- VideoTextContent are E73 instances derived from video (E73)\n\n**GENERATION METHODS**:\n\n| Method | Description | Typical Confidence |\n|--------|-------------|-------------------|\n| ASR_AUTOMATIC | Automatic speech recognition | 0.75-0.95 |\n| ASR_ENHANCED | ASR with post-processing | 0.85-0.98 |\n| MANUAL_TRANSCRIPTION | Human transcription | 0.98-1.0 |\n| MANUAL_CORRECTION | Human-corrected ASR | 0.95-1.0 |\n| CV_AUTOMATIC | Computer vision detection | 0.60-0.90 |\n| MULTIMODAL | Combined audio+visual AI | 0.70-0.95 |\n| OCR | Optical character recognition\
|
|
\ | 0.80-0.98 |\n| PLATFORM_PROVIDED | From YouTube/Vimeo API | 0.85-0.95 |\n\n**HERITAGE INSTITUTION CONTEXT**:\n\nVideo text content is critical for:\n- **Accessibility**: Deaf/HoH users need accurate captions\n- **Discovery**: Full-text search over video collections\n- **Preservation**: Text outlasts video format obsolescence\n- **Research**: Analyzing spoken content at scale\n- **Translation**: Multilingual access to heritage content\n\n**LANGUAGE SUPPORT**:\n\n- `content_language`: Primary language of text content\n- May differ from video's default_audio_language if translated\n- ISO 639-1 codes (e.g., \"nl\", \"en\", \"de\")\n"
|
|
exact_mappings:
|
|
- crm:E73_Information_Object
|
|
close_mappings:
|
|
- prov:Entity
|
|
related_mappings:
|
|
- schema:CreativeWork
|
|
- dcterms:Text
|
|
slots:
|
|
- has_or_had_language
|
|
- content_title
|
|
- generated_by
|
|
- is_or_was_generated_by
|
|
- temporal_extent
|
|
- is_verified
|
|
- model_provider
|
|
- model_version
|
|
- overall_confidence
|
|
- processing_duration_seconds
|
|
- source_video
|
|
- source_video_url
|
|
- specificity_annotation
|
|
- has_or_had_score
|
|
- temporal_extent
|
|
- is_or_was_verified_by
|
|
- has_or_had_quantity
|
|
slot_usage:
|
|
source_video:
|
|
range: string
|
|
required: true
|
|
examples:
|
|
- value: FbIoC-Owy-M
|
|
description: YouTube video ID as source reference
|
|
source_video_url:
|
|
range: uri
|
|
required: false
|
|
examples:
|
|
- value: https://www.youtube.com/watch?v=FbIoC-Owy-M
|
|
description: Full YouTube video URL
|
|
has_or_had_language:
|
|
range: string
|
|
required: true
|
|
inlined: true
|
|
multivalued: true
|
|
description: |
|
|
Language of the content.
|
|
MIGRATED from content_language (2026-01-28).
|
|
examples:
|
|
- value:
|
|
iso_639_1: "nl"
|
|
language_name: "Dutch"
|
|
description: Dutch language content
|
|
- value:
|
|
iso_639_1: "en"
|
|
language_name: "English"
|
|
description: English translation
|
|
content_title:
|
|
range: string
|
|
required: false
|
|
examples:
|
|
- value: De Vrijheidsroute Ep.3 - Dutch Transcript
|
|
description: Descriptive title for transcript
|
|
generated_by:
|
|
range: string
|
|
required: true
|
|
examples:
|
|
- value: openai/whisper-large-v3
|
|
description: OpenAI Whisper ASR model
|
|
- value: YouTube Auto-captions
|
|
description: Platform-provided captions
|
|
- value: manual:curator@rijksmuseum.nl
|
|
description: Human transcriber
|
|
is_or_was_generated_by:
|
|
description: 'Method used to generate this text content.
|
|
MIGRATED from generation_method per Rule 53.
|
|
Uses GenerationEvent linking to Methodology (was GenerationMethodEnum).
|
|
'
|
|
range: GenerationEvent
|
|
required: true
|
|
inlined: true
|
|
examples:
|
|
- value:
|
|
has_or_had_methodology:
|
|
methodology_type: ASR_AUTOMATIC
|
|
has_or_had_label: Automatic Speech Recognition
|
|
description: Automatic speech recognition
|
|
- value:
|
|
has_or_had_methodology:
|
|
methodology_type: MANUAL_TRANSCRIPTION
|
|
has_or_had_label: Manual Transcription
|
|
description: Human transcription
|
|
temporal_extent:
|
|
description: 'Verification date using CIDOC-CRM TimeSpan.
|
|
MIGRATED from verification_date per slot_fixes.yaml (Rule 53).
|
|
Use begin_of_the_begin for the verification timestamp.
|
|
'
|
|
range: TimeSpan
|
|
inlined: true
|
|
required: false
|
|
examples:
|
|
- value:
|
|
begin_of_the_begin: '2025-12-02T15:00:00Z'
|
|
description: Verified December 2, 2025
|
|
model_version:
|
|
range: string
|
|
required: false
|
|
examples:
|
|
- value: large-v3
|
|
description: Whisper model version
|
|
- value: v2.3.1
|
|
description: Software version number
|
|
model_provider:
|
|
range: string
|
|
required: false
|
|
examples:
|
|
- value: OpenAI
|
|
description: Model provider
|
|
- value: Google Cloud
|
|
description: Cloud service provider
|
|
overall_confidence:
|
|
range: float
|
|
required: false
|
|
minimum_value: 0.0
|
|
maximum_value: 1.0
|
|
examples:
|
|
- value: 0.92
|
|
description: High confidence ASR output
|
|
is_verified:
|
|
range: boolean
|
|
required: false
|
|
ifabsent: 'false'
|
|
examples:
|
|
- value: true
|
|
description: Human-verified transcript
|
|
is_or_was_verified_by:
|
|
range: Verifier
|
|
required: false
|
|
inlined: true
|
|
description: 'Who verified the annotation.
|
|
MIGRATED from verified_by slot (2026-01-14) per Rule 53.
|
|
Uses Verifier class for structured verifier with name, type, and URI.
|
|
'
|
|
examples:
|
|
- value: 'verifier_name: curator@rijksmuseum.nl
|
|
verifier_type: PERSON
|
|
'
|
|
description: Staff member who verified
|
|
processing_duration_seconds:
|
|
range: float
|
|
required: false
|
|
minimum_value: 0.0
|
|
examples:
|
|
- value: 45.3
|
|
description: Processed in 45.3 seconds
|
|
has_or_had_quantity:
|
|
range: integer
|
|
required: false
|
|
multivalued: true
|
|
inlined: true
|
|
inlined_as_list: true
|
|
description: 'Quantitative measurements of the text content.
|
|
MIGRATED: word_count (2026-01-14) and character_count (2026-01-18) per Rule 53.
|
|
Uses Quantity class for structured quantity with value, type, and unit.
|
|
Can represent word count, character count, or other text metrics.
|
|
'
|
|
examples:
|
|
- value:
|
|
- quantity_value: 1523
|
|
quantity_type: WORD_COUNT
|
|
has_or_had_measurement_unit:
|
|
has_or_had_type: WORD
|
|
has_or_had_symbol: words
|
|
has_or_had_description: Word count in transcript
|
|
- quantity_value: 8742
|
|
quantity_type: CHARACTER_COUNT
|
|
has_or_had_measurement_unit:
|
|
has_or_had_type: CHARACTER
|
|
has_or_had_symbol: chars
|
|
has_or_had_description: Character count including spaces
|
|
description: Text metrics (word and character count)
|
|
comments:
|
|
- Abstract base for all video-derived text content
|
|
- Comprehensive PROV-O provenance tracking
|
|
- Confidence scoring for AI-generated content
|
|
- Verification workflow support
|
|
- Critical for heritage accessibility and discovery
|
|
see_also:
|
|
- https://www.w3.org/TR/prov-o/
|
|
- http://www.cidoc-crm.org/cidoc-crm/E73_Information_Object
|
|
annotations:
|
|
specificity_score: 0.1
|
|
specificity_rationale: Generic utility class/slot created during migration
|
|
custodian_types: "['*']"
|