id: https://nde.nl/ontology/hc/class/VideoAnnotation name: video_annotation_class title: Video Annotation Class imports: - linkml:types - ./VideoTextContent - ./VideoTimeSegment - ./AnnotationMotivationType - ./AnnotationMotivationTypes - ../slots/has_annotation_motivation - ../slots/has_annotation_segment - ../slots/has_annotation_type - ../slots/detection_count - ../slots/detection_threshold # MIGRATED 2026-01-22: frame_sample_rate → analyzes_or_analyzed + VideoFrame + has_or_had_quantity + Quantity (Rule 53) - ./VideoFrame - ../slots/has_or_had_quantity - ../slots/has_or_had_unit - ./Quantity - ./Unit - ../slots/includes_bounding_box - ../slots/includes_segmentation_mask - ../slots/keyframe_extraction - ../slots/model_architecture - ../slots/model_task - ../slots/specificity_annotation - ../slots/has_or_had_score # was: template_specificity - migrated per Rule 53 (2026-01-17) - ../slots/analyzes_or_analyzed - ./SpecificityAnnotation - ./TemplateSpecificityScore # was: TemplateSpecificityScores - migrated per Rule 53 (2026-01-17) - ./TemplateSpecificityType - ./TemplateSpecificityTypes - ../enums/AnnotationTypeEnum prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ schema: http://schema.org/ dcterms: http://purl.org/dc/terms/ prov: http://www.w3.org/ns/prov# crm: http://www.cidoc-crm.org/cidoc-crm/ oa: http://www.w3.org/ns/oa# as: https://www.w3.org/ns/activitystreams# default_prefix: hc classes: VideoAnnotation: is_a: VideoTextContent class_uri: oa:Annotation abstract: true description: "Abstract base class for computer vision and multimodal video annotations.\n\n**DEFINITION**:\n\nVideoAnnotation\ \ represents structured information derived from visual\nanalysis of video content. This includes:\n\n| Subclass | Analysis\ \ Type | Output |\n|----------|---------------|--------|\n| VideoSceneAnnotation | Shot/scene detection | Scene boundaries,\ \ types |\n| VideoObjectAnnotation | Object detection | Objects, faces, logos |\n| VideoOCRAnnotation | Text extraction\ \ | On-screen text (OCR) |\n\n**RELATIONSHIP TO W3C WEB ANNOTATION**:\n\nVideoAnnotation aligns with the W3C Web Annotation\ \ Data Model:\n\n```turtle\n:annotation a oa:Annotation ;\n oa:hasBody :detection_result ;\n oa:hasTarget [\n\ \ oa:hasSource :video ;\n oa:hasSelector [\n a oa:FragmentSelector ;\n dcterms:conformsTo\ \ ;\n rdf:value \"t=30,35\"\n ]\n ] ;\n oa:motivatedBy oa:classifying\ \ .\n```\n\n**FRAME-BASED ANALYSIS**:\n\nUnlike audio transcription (continuous stream), video annotation is\ntypically\ \ frame-based:\n\n- `frame_sample_rate`: Frames analyzed per second (e.g., 1 fps, 5 fps)\n- `analyzes_or_analyzed`:\ \ Total frames processed\n- Higher sample rates = more detections but higher compute cost\n\n**DETECTION THRESHOLDS**:\n\ \nCV models output confidence scores. Thresholds filter noise:\n\n| Threshold | Use Case |\n|-----------|----------|\n\ | 0.9+ | High precision, production display |\n| 0.7-0.9 | Balanced, general use |\n| 0.5-0.7 | High recall, research/review\ \ |\n| < 0.5 | Raw output, needs filtering |\n\n**MODEL ARCHITECTURE TRACKING**:\n\nDifferent model architectures have\ \ different characteristics:\n\n| Architecture | Examples | Strengths |\n|--------------|----------|-----------|\n|\ \ CNN | ResNet, VGG | Fast inference, good for objects |\n| Transformer | ViT, CLIP | Better context, multimodal |\n\ | Hybrid | DETR, Swin | Balance of speed and accuracy |\n\n**HERITAGE INSTITUTION CONTEXT**:\n\nVideo annotations enable:\n\ - **Discovery**: Find videos containing specific objects/artworks\n- **Accessibility**: Scene descriptions for visually\ \ impaired\n- **Research**: Analyze visual content at scale\n- **Preservation**: Document visual content as text\n-\ \ **Linking**: Connect detected artworks to collection records\n\n**CIDOC-CRM E13_Attribute_Assignment**:\n\nAnnotations\ \ are attribute assignments - asserting properties about\nvideo segments. The CV model or human annotator is the assigning\ \ agent.\n" exact_mappings: - oa:Annotation close_mappings: - crm:E13_Attribute_Assignment related_mappings: - as:Activity - schema:ClaimReview slots: - has_annotation_motivation - has_annotation_segment - has_annotation_type - detection_count - detection_threshold # REMOVED 2026-01-22: frame_sample_rate - migrated to analyzes_or_analyzed + VideoFrame + has_or_had_quantity (Rule 53) - includes_bounding_box - includes_segmentation_mask - keyframe_extraction - model_architecture - model_task - specificity_annotation - has_or_had_score # was: template_specificity - migrated per Rule 53 (2026-01-17) - analyzes_or_analyzed slot_usage: has_annotation_type: range: AnnotationTypeEnum required: true examples: - value: OBJECT_DETECTION description: Object and face detection annotation has_annotation_segment: range: VideoTimeSegment multivalued: true required: false inlined_as_list: true examples: - value: '[{start_seconds: 30.0, end_seconds: 35.0, segment_text: ''Night Watch painting visible''}]' description: Object detection segment detection_threshold: range: float required: false minimum_value: 0.0 maximum_value: 1.0 examples: - value: 0.5 description: Standard detection threshold detection_count: range: integer required: false minimum_value: 0 examples: - value: 342 description: 342 total detections found # MIGRATED 2026-01-22: frame_sample_rate → analyzes_or_analyzed + VideoFrame + has_or_had_quantity (Rule 53) # frame_sample_rate: # range: float # required: false # minimum_value: 0.0 # examples: # - value: 1.0 # description: Analyzed 1 frame per second analyzes_or_analyzed: description: | MIGRATED 2026-01-22: Now supports VideoFrame class for frame_sample_rate migration. Frame analysis information including: - Total frames analyzed (integer, legacy pattern) - Frame sample rate and analysis parameters (VideoFrame class) MIGRATED SLOTS: - frame_sample_rate → VideoFrame.has_or_had_quantity with unit "samples per second" range: VideoFrame inlined: true required: false examples: - value: has_or_had_quantity: quantity_value: 1.0 quantity_type: FRAME_SAMPLE_RATE has_or_had_unit: unit_value: "samples per second" frame_count: 1800 description: Analyzed 1,800 frames at 1 fps (30 min video) - value: has_or_had_quantity: quantity_value: 5.0 quantity_type: FRAME_SAMPLE_RATE has_or_had_unit: unit_value: "fps" description: 5 frames per second sample rate keyframe_extraction: range: boolean required: false examples: - value: true description: Used keyframe extraction model_architecture: range: string required: false examples: - value: Transformer description: Vision Transformer architecture - value: CNN description: Convolutional Neural Network model_task: range: string required: false examples: - value: detection description: Object detection task - value: captioning description: Video captioning task includes_bounding_box: range: boolean required: false examples: - value: true description: Includes bounding box coordinates includes_segmentation_mask: range: boolean required: false examples: - value: false description: No segmentation masks included has_annotation_motivation: range: AnnotationMotivationType required: false examples: - value: ClassifyingMotivation description: Annotation for classification purposes comments: - Abstract base for all CV/multimodal video annotations - Extends VideoTextContent with frame-based analysis parameters - W3C Web Annotation compatible structure - Supports both temporal and spatial annotation - Tracks detection thresholds and model architecture see_also: - https://www.w3.org/TR/annotation-model/ - http://www.cidoc-crm.org/cidoc-crm/E13_Attribute_Assignment - https://iiif.io/api/presentation/3.0/