- Renamed `has_or_had_auxiliary_entities` to `is_or_was_associated_with` in DigitalPlatform.yaml to align with naming conventions. - Updated examples in DigitalPlatform.yaml to reflect new slot names and types. - Migrated `has_av_equipment` to `has_or_had_equipment` in EducationCenter.yaml, including detailed descriptions and examples. - Consolidated archival references by migrating `archival_reference` to `has_or_had_identifier` in InformationCarrier.yaml. - Removed deprecated slots: `has_authority_file_name`, `has_authority_file_url`, `has_auxiliary_place`, `has_auxiliary_place_type`, `has_auxiliary_platform`, `has_auxiliary_platform_type`, and `has_av_equipment`, archiving their definitions. - Updated slot fixes to reflect the migration of various slots to more generic or appropriate counterparts, ensuring all changes are documented with processing notes.
198 lines
9.3 KiB
YAML
198 lines
9.3 KiB
YAML
id: https://nde.nl/ontology/hc/class/VideoTranscript
|
|
name: video_transcript_class
|
|
title: Video Transcript Class
|
|
imports:
|
|
- linkml:types
|
|
- ./VideoTextContent
|
|
- ./VideoTimeSegment
|
|
- ../slots/contains_or_contained # was: full_text - migrated per Rule 53 (2026-01-26)
|
|
- ../slots/includes_speaker
|
|
- ../slots/includes_timestamp
|
|
- ../slots/paragraph_count
|
|
- ../slots/primary_speaker
|
|
- ../slots/has_or_had_segment
|
|
- ../slots/sentence_count
|
|
- ../slots/source_language_auto_detected
|
|
- ../slots/speaker_count
|
|
- ../slots/specificity_annotation
|
|
- ../slots/has_or_had_score # was: template_specificity - migrated per Rule 53 (2026-01-17)
|
|
# REMOVED 2026-01-14: ../slots/transcript_format - migrated to has_or_had_transcript_format with TranscriptFormat
|
|
- ../slots/has_or_had_format
|
|
- ./SpecificityAnnotation
|
|
- ./TemplateSpecificityScore # was: TemplateSpecificityScores - migrated per Rule 53 (2026-01-17)
|
|
|
|
- ./TemplateSpecificityType
|
|
|
|
- ./TemplateSpecificityTypes
|
|
- ../enums/TranscriptFormatEnum
|
|
prefixes:
|
|
linkml: https://w3id.org/linkml/
|
|
hc: https://nde.nl/ontology/hc/
|
|
schema: http://schema.org/
|
|
dcterms: http://purl.org/dc/terms/
|
|
prov: http://www.w3.org/ns/prov#
|
|
crm: http://www.cidoc-crm.org/cidoc-crm/
|
|
skos: http://www.w3.org/2004/02/skos/core#
|
|
default_prefix: hc
|
|
classes:
|
|
VideoTranscript:
|
|
is_a: VideoTextContent
|
|
class_uri: crm:E33_Linguistic_Object
|
|
abstract: false
|
|
description: "Full text transcription of video audio content.\n\n**DEFINITION**:\n\nA VideoTranscript is the complete\
|
|
\ textual representation of all spoken\ncontent in a video. It extends VideoTextContent with transcript-specific\nproperties\
|
|
\ and inherits all provenance tracking capabilities.\n\n**RELATIONSHIP TO VideoSubtitle**:\n\nVideoSubtitle is a subclass\
|
|
\ of VideoTranscript because:\n1. A subtitle file contains everything a transcript needs PLUS time codes\n2. You can\
|
|
\ derive a plain transcript from subtitles by stripping times\n3. This inheritance allows polymorphic handling of text\
|
|
\ content\n\n```\nVideoTranscript VideoSubtitle (is_a VideoTranscript)\n├── full_text ├── full_text\
|
|
\ (inherited)\n├── segments[] ├── segments[] (required, with times)\n└── (optional times) └── subtitle_format\
|
|
\ (SRT, VTT, etc.)\n```\n\n**SCHEMA.ORG ALIGNMENT**:\n\nMaps to `schema:transcript` property:\n> \"If this MediaObject\
|
|
\ is an AudioObject or VideoObject, \n> the transcript of that object.\"\n\n**CIDOC-CRM E33_Linguistic_Object**:\n\n\
|
|
E33 is the class comprising:\n> \"identifiable expressions in natural language or code\"\n\nA transcript is a linguistic\
|
|
\ object derived from the audio track of\na video (which is itself an E73_Information_Object).\n\n**TRANSCRIPT FORMATS**:\n\
|
|
\n| Format | Description | Use Case |\n|--------|-------------|----------|\n| PLAIN_TEXT | Continuous text, no structure\
|
|
\ | Simple search indexing |\n| PARAGRAPHED | Text broken into paragraphs | Human reading |\n| STRUCTURED | Segments\
|
|
\ with speaker labels | Research, analysis |\n| TIMESTAMPED | Segments with time markers | Navigation, subtitling |\n\
|
|
\n**GENERATION METHODS** (inherited from VideoTextContent):\n\n| Method | Typical Use | Quality |\n|--------|-------------|---------|\n\
|
|
| ASR_AUTOMATIC | Whisper, Google STT | 0.80-0.95 |\n| MANUAL_TRANSCRIPTION | Human transcriber | 0.98-1.0 |\n| PLATFORM_PROVIDED\
|
|
\ | YouTube auto-captions | 0.75-0.90 |\n| HYBRID | ASR + human correction | 0.95-1.0 |\n\n**HERITAGE INSTITUTION CONTEXT**:\n\
|
|
\nTranscripts are critical for heritage video collections:\n\n1. **Discovery**: Full-text search over video content\n\
|
|
2. **Accessibility**: Deaf/HoH access to spoken content\n3. **Preservation**: Text outlasts video format obsolescence\n\
|
|
4. **Research**: Corpus analysis, keyword extraction\n5. **Translation**: Base for multilingual access\n6. **SEO**:\
|
|
\ Search engine indexing of video content\n\n**STRUCTURED SEGMENTS**:\n\nWhen `segments` is populated, the transcript\
|
|
\ has structural breakdown:\n\n```yaml\nsegments:\n - segment_index: 0\n start_seconds: 0.0\n end_seconds: 5.5\n\
|
|
\ segment_text: \"Welcome to the Rijksmuseum.\"\n speaker_label: \"Narrator\"\n confidence: 0.94\n - segment_index:\
|
|
\ 1\n start_seconds: 5.5\n end_seconds: 12.3\n segment_text: \"Today we'll explore the Night Watch gallery.\"\
|
|
\n speaker_label: \"Narrator\"\n confidence: 0.91\n```\n\n**PROVENANCE** (inherited from VideoTextContent):\n\n\
|
|
All transcripts include:\n- `source_video`: Which video was transcribed\n- `generated_by`: Tool/person that created\
|
|
\ transcript\n- `generation_method`: ASR_AUTOMATIC, MANUAL_TRANSCRIPTION, etc.\n- `generation_timestamp`: When transcript\
|
|
\ was created\n- `overall_confidence`: Aggregate quality score\n- `is_verified`: Whether human-reviewed\n"
|
|
exact_mappings:
|
|
- crm:E33_Linguistic_Object
|
|
close_mappings:
|
|
- schema:transcript
|
|
related_mappings:
|
|
- dcterms:Text
|
|
slots:
|
|
- contains_or_contained # was: full_text - migrated per Rule 53 (2026-01-26)
|
|
- includes_speaker
|
|
- includes_timestamp
|
|
- paragraph_count
|
|
- primary_speaker
|
|
- has_or_had_segment
|
|
- sentence_count
|
|
- source_language_auto_detected
|
|
- speaker_count
|
|
- specificity_annotation
|
|
- has_or_had_score # was: template_specificity - migrated per Rule 53 (2026-01-17)
|
|
# REMOVED 2026-01-14: transcript_format - migrated to has_or_had_format with TranscriptFormatEnum
|
|
- has_or_had_format
|
|
slot_usage:
|
|
contains_or_contained: # was: full_text - migrated per Rule 53 (2026-01-26)
|
|
description: |
|
|
Full text content of the transcript.
|
|
MIGRATED from full_text per Rule 53.
|
|
Currently mapped to string range for backward compatibility, but slot supports Text class.
|
|
range: string
|
|
required: true
|
|
examples:
|
|
- value: 'Welcome to the Rijksmuseum. Today we''ll explore the masterpieces
|
|
|
|
of Dutch Golden Age painting. Our first stop is the Night Watch
|
|
|
|
by Rembrandt van Rijn, painted in 1642.
|
|
|
|
'
|
|
description: Plain text transcript excerpt
|
|
- value: '[Narrator] Welcome to the Rijksmuseum.
|
|
|
|
[Narrator] Today we''ll explore the masterpieces of Dutch Golden Age painting.
|
|
|
|
[Curator] Our first stop is the Night Watch by Rembrandt van Rijn.
|
|
|
|
'
|
|
description: Transcript with speaker labels
|
|
# REMOVED 2026-01-14: transcript_format - migrated to has_or_had_format with TranscriptFormatEnum
|
|
# transcript_format:
|
|
# range: TranscriptFormatEnum
|
|
# required: false
|
|
# ifabsent: string(PLAIN_TEXT)
|
|
# examples:
|
|
# - value: STRUCTURED
|
|
# description: Text with speaker labels and paragraph breaks
|
|
has_or_had_format:
|
|
range: TranscriptFormatEnum
|
|
required: false
|
|
description: The format of the transcript (plain text, structured, timestamped, etc.)
|
|
examples:
|
|
- value: STRUCTURED
|
|
description: Text with speaker labels and paragraph breaks
|
|
includes_timestamp:
|
|
range: boolean
|
|
required: false
|
|
ifabsent: 'false'
|
|
examples:
|
|
- value: true
|
|
description: Transcript has time codes
|
|
includes_speaker:
|
|
range: boolean
|
|
required: false
|
|
ifabsent: 'false'
|
|
examples:
|
|
- value: true
|
|
description: Multi-speaker transcript with diarization
|
|
has_or_had_segment:
|
|
range: VideoTimeSegment
|
|
required: false
|
|
multivalued: true
|
|
inlined: true
|
|
inlined_as_list: true
|
|
examples:
|
|
- value: "- segment_index: 0\n start_seconds: 0.0\n end_seconds: 3.5\n segment_text: \"Welcome to the museum.\"\
|
|
\n confidence: 0.95\n"
|
|
description: Single structured segment
|
|
speaker_count:
|
|
range: integer
|
|
required: false
|
|
minimum_value: 0
|
|
examples:
|
|
- value: 3
|
|
description: Three speakers identified
|
|
primary_speaker:
|
|
range: string
|
|
required: false
|
|
examples:
|
|
- value: Narrator
|
|
description: Generic primary speaker
|
|
- value: Dr. Taco Dibbits, Museum Director
|
|
description: Named primary speaker
|
|
source_language_auto_detected:
|
|
range: boolean
|
|
required: false
|
|
ifabsent: 'false'
|
|
examples:
|
|
- value: true
|
|
description: Language was auto-detected
|
|
paragraph_count:
|
|
range: integer
|
|
required: false
|
|
minimum_value: 0
|
|
examples:
|
|
- value: 15
|
|
description: Transcript has 15 paragraphs
|
|
sentence_count:
|
|
range: integer
|
|
required: false
|
|
minimum_value: 0
|
|
examples:
|
|
- value: 47
|
|
description: Transcript has ~47 sentences
|
|
comments:
|
|
- Full text transcription of video audio content
|
|
- Extends VideoTextContent with transcript-specific properties
|
|
- Base class for VideoSubtitle (subtitles are transcripts + time codes)
|
|
- Supports both plain text and structured segment-based transcripts
|
|
- Critical for accessibility, discovery, and preservation
|
|
see_also:
|
|
- https://schema.org/transcript
|
|
- http://www.cidoc-crm.org/cidoc-crm/E33_Linguistic_Object
|