glam/schemas/20251121/linkml/modules/classes/VideoTranscript.yaml
kempersc 66adec257e Add scripts for normalizing LinkML schemas and validating schema integrity
- Implement `normalize_linkml_alt_descriptions.py` to convert structured alt_descriptions to the expected scalar form.
- Implement `normalize_linkml_structured_aliases.py` to flatten language-keyed structured_aliases into a standard list-of-objects format.
- Implement `validate_linkml_schema_integrity.py` to validate the integrity of LinkML schema bundles, checking for import resolution, YAML parsing, and reference existence.
2026-02-16 10:16:51 +01:00

156 lines
5.8 KiB
YAML

id: https://nde.nl/ontology/hc/class/VideoTranscript
name: video_transcript_class
title: Video Transcript Class
imports:
- linkml:types
- ../enums/TranscriptFormatEnum
- ../slots/contain
- ../slots/has_format
- ../slots/has_score
- ../slots/has_segment
- ../slots/has_speaker
- ../slots/identified_by
- ../slots/has_paragraph
- ../slots/has_quantity
- ../slots/has_language
prefixes:
linkml: https://w3id.org/linkml/
hc: https://nde.nl/ontology/hc/
schema: http://schema.org/
dcterms: http://purl.org/dc/terms/
prov: http://www.w3.org/ns/prov#
crm: http://www.cidoc-crm.org/cidoc-crm/
skos: http://www.w3.org/2004/02/skos/core#
default_prefix: hc
classes:
VideoTranscript:
is_a: VideoTextContent
class_uri: crm:E33_Linguistic_Object
abstract: false
description: >-
Full text transcription of spoken audio content in a video, optionally
segmented and annotated with speaker and timing information.
alt_descriptions:
nl: Volledige transcriptie van gesproken audio in een video.
de: Vollstaendige Transkription des gesprochenen Audioinhalts eines Videos.
fr: Transcription complete du contenu parle d une video.
es: Transcripcion completa del contenido hablado de un video.
ar: تفريغ نصي كامل لمحتوى الكلام في فيديو.
id: Transkrip lengkap dari konten ujaran dalam video.
zh: 视频语音内容的完整文字转录。
structured_aliases:
- {literal_form: videotranscript, in_language: nl}
- {literal_form: Videotranskript, in_language: de}
- {literal_form: transcription video, in_language: fr}
- {literal_form: transcripcion de video, in_language: es}
- {literal_form: تفريغ فيديو, in_language: ar}
- {literal_form: transkrip video, in_language: id}
- {literal_form: 视频转录, in_language: zh}
exact_mappings:
- crm:E33_Linguistic_Object
close_mappings:
- schema:transcript
related_mappings:
- dcterms:Text
slots:
- contain
- has_format
- has_segment
- has_speaker
- has_language
- has_paragraph
- has_quantity
- identified_by
- has_score
slot_usage:
contain:
range: string
required: true
examples:
- value: 'Welcome to the Rijksmuseum. Today we''ll explore the masterpieces of Dutch Golden Age painting.'
- value: '[Narrator] Welcome to the Rijksmuseum.'
has_format:
range: TranscriptFormatEnum
required: false
examples:
- value: STRUCTURED
- value: PLAIN_TEXT
has_segment:
range: VideoTimeSegment
required: false
multivalued: true
inlined: true
inlined_as_list: true
has_speaker:
range: string
required: false
multivalued: true
examples:
- value: Narrator
- value: Curator
has_language:
range: Language
required: false
multivalued: true
inlined: true
inlined_as_list: true
examples:
- value:
has_code: nl
has_label: Dutch
- value:
has_code: en
has_label: English
has_paragraph:
range: integer
required: false
minimum_value: 0
examples:
- value: 15
has_quantity:
range: integer
required: false
minimum_value: 0
examples:
- value: 47
identified_by:
range: uriorcurie
required: false
identifier: true
examples:
- value: https://nde.nl/ontology/hc/video-transcript/example-001
comments:
- Full text transcription of video audio content
- Base class for VideoSubtitle (subtitles are transcripts plus time codes)
- Supports both plain text and segment-based transcripts
- Useful for accessibility, discovery, and preservation
see_also:
- https://schema.org/transcript
- http://www.cidoc-crm.org/cidoc-crm/E33_Linguistic_Object
notes:
- |
Preserved from prior description (commit 37852a46):
"Full text transcription of video audio content.\n\n**DEFINITION**:\n\nA VideoTranscript is the complete textual representation of all spoken\ncontent in a video. It extends VideoTextContent with transcript-specific\nproperties and inherits all provenance tracking capabilities.\n\n**RELATIONSHIP TO VideoSubtitle**:\n\nVideoSubtitle is a subclass of VideoTranscript because:\n1. A subtitle file contains everything a transcript needs PLUS time codes\n2. You can derive a plain transcript from subtitles by stripping times\n3. This inheritance allows polymorphic handling of text content\n\n```\nVideoTranscript VideoSubtitle (is_a VideoTranscript)\n\u251C\u2500\u2500 full_text \u251C\u2500\u2500 full_text (inherited)\n\u251C\u2500\u2500 segments[] \u251C\u2500\u2500 segments[] (required, with times)\n\u2514\u2500\u2500 (optional times) \u2514\u2500\u2500 subtitle_format (SRT, VTT, etc.)\n```\n\n**SCHEMA.ORG ALIGNMENT**:\n\nMaps to `schema:transcript` property:\n\
annotations:
specificity_score: 0.1
specificity_rationale: Generic utility class/slot created during migration
custodian_types: "['*']"
modeling_notes: |
A VideoTranscript is the complete textual representation of spoken
content in a video.
VideoSubtitle is modeled as a subclass of VideoTranscript.
Subtitles add required time codes and a subtitle file format (e.g., VTT/SRT).
Alignment
- schema:transcript
- crm:E33_Linguistic_Object
legacy_description: |
Preserved from earlier, more verbose description.
It included:
- rationale for the VideoTranscript/VideoSubtitle relationship
- format comparisons (PLAIN_TEXT, PARAGRAPHED, STRUCTURED, TIMESTAMPED)
- extended heritage context and provenance guidance