id: https://nde.nl/ontology/hc/class/VideoAnnotation name: video_annotation_class title: Video Annotation Class imports: - linkml:types - ./VideoTextContent - ./VideoTimeSegment - ./AnnotationMotivationType - ./AnnotationMotivationTypes - ../slots/has_annotation_motivation - ../slots/has_annotation_segment - ../slots/has_annotation_type - ../slots/detection_count - ../slots/detection_threshold - ../slots/frame_sample_rate - ../slots/includes_bounding_box - ../slots/includes_segmentation_mask - ../slots/keyframe_extraction - ../slots/model_architecture - ../slots/model_task - ../slots/specificity_annotation - ../slots/template_specificity - ../slots/total_frames_analyzed - ./SpecificityAnnotation - ./TemplateSpecificityScores - ../enums/AnnotationTypeEnum prefixes: linkml: https://w3id.org/linkml/ hc: https://nde.nl/ontology/hc/ schema: http://schema.org/ dcterms: http://purl.org/dc/terms/ prov: http://www.w3.org/ns/prov# crm: http://www.cidoc-crm.org/cidoc-crm/ oa: http://www.w3.org/ns/oa# as: https://www.w3.org/ns/activitystreams# default_prefix: hc classes: VideoAnnotation: is_a: VideoTextContent class_uri: oa:Annotation abstract: true description: "Abstract base class for computer vision and multimodal video annotations.\n\n**DEFINITION**:\n\nVideoAnnotation\ \ represents structured information derived from visual\nanalysis of video content. This includes:\n\n| Subclass | Analysis\ \ Type | Output |\n|----------|---------------|--------|\n| VideoSceneAnnotation | Shot/scene detection | Scene boundaries,\ \ types |\n| VideoObjectAnnotation | Object detection | Objects, faces, logos |\n| VideoOCRAnnotation | Text extraction\ \ | On-screen text (OCR) |\n\n**RELATIONSHIP TO W3C WEB ANNOTATION**:\n\nVideoAnnotation aligns with the W3C Web Annotation\ \ Data Model:\n\n```turtle\n:annotation a oa:Annotation ;\n oa:hasBody :detection_result ;\n oa:hasTarget [\n\ \ oa:hasSource :video ;\n oa:hasSelector [\n a oa:FragmentSelector ;\n dcterms:conformsTo\ \ ;\n rdf:value \"t=30,35\"\n ]\n ] ;\n oa:motivatedBy oa:classifying\ \ .\n```\n\n**FRAME-BASED ANALYSIS**:\n\nUnlike audio transcription (continuous stream), video annotation is\ntypically\ \ frame-based:\n\n- `frame_sample_rate`: Frames analyzed per second (e.g., 1 fps, 5 fps)\n- `total_frames_analyzed`:\ \ Total frames processed\n- Higher sample rates = more detections but higher compute cost\n\n**DETECTION THRESHOLDS**:\n\ \nCV models output confidence scores. Thresholds filter noise:\n\n| Threshold | Use Case |\n|-----------|----------|\n\ | 0.9+ | High precision, production display |\n| 0.7-0.9 | Balanced, general use |\n| 0.5-0.7 | High recall, research/review\ \ |\n| < 0.5 | Raw output, needs filtering |\n\n**MODEL ARCHITECTURE TRACKING**:\n\nDifferent model architectures have\ \ different characteristics:\n\n| Architecture | Examples | Strengths |\n|--------------|----------|-----------|\n|\ \ CNN | ResNet, VGG | Fast inference, good for objects |\n| Transformer | ViT, CLIP | Better context, multimodal |\n\ | Hybrid | DETR, Swin | Balance of speed and accuracy |\n\n**HERITAGE INSTITUTION CONTEXT**:\n\nVideo annotations enable:\n\ - **Discovery**: Find videos containing specific objects/artworks\n- **Accessibility**: Scene descriptions for visually\ \ impaired\n- **Research**: Analyze visual content at scale\n- **Preservation**: Document visual content as text\n-\ \ **Linking**: Connect detected artworks to collection records\n\n**CIDOC-CRM E13_Attribute_Assignment**:\n\nAnnotations\ \ are attribute assignments - asserting properties about\nvideo segments. The CV model or human annotator is the assigning\ \ agent.\n" exact_mappings: - oa:Annotation close_mappings: - crm:E13_Attribute_Assignment related_mappings: - as:Activity - schema:ClaimReview slots: - has_annotation_motivation - has_annotation_segment - has_annotation_type - detection_count - detection_threshold - frame_sample_rate - includes_bounding_box - includes_segmentation_mask - keyframe_extraction - model_architecture - model_task - specificity_annotation - template_specificity - total_frames_analyzed slot_usage: has_annotation_type: range: AnnotationTypeEnum required: true examples: - value: OBJECT_DETECTION description: Object and face detection annotation has_annotation_segment: range: VideoTimeSegment multivalued: true required: false inlined_as_list: true examples: - value: '[{start_seconds: 30.0, end_seconds: 35.0, segment_text: ''Night Watch painting visible''}]' description: Object detection segment detection_threshold: range: float required: false minimum_value: 0.0 maximum_value: 1.0 examples: - value: 0.5 description: Standard detection threshold detection_count: range: integer required: false minimum_value: 0 examples: - value: 342 description: 342 total detections found frame_sample_rate: range: float required: false minimum_value: 0.0 examples: - value: 1.0 description: Analyzed 1 frame per second total_frames_analyzed: range: integer required: false minimum_value: 0 examples: - value: 1800 description: Analyzed 1,800 frames (30 min video at 1 fps) keyframe_extraction: range: boolean required: false examples: - value: true description: Used keyframe extraction model_architecture: range: string required: false examples: - value: Transformer description: Vision Transformer architecture - value: CNN description: Convolutional Neural Network model_task: range: string required: false examples: - value: detection description: Object detection task - value: captioning description: Video captioning task includes_bounding_box: range: boolean required: false examples: - value: true description: Includes bounding box coordinates includes_segmentation_mask: range: boolean required: false examples: - value: false description: No segmentation masks included has_annotation_motivation: range: AnnotationMotivationType required: false examples: - value: ClassifyingMotivation description: Annotation for classification purposes comments: - Abstract base for all CV/multimodal video annotations - Extends VideoTextContent with frame-based analysis parameters - W3C Web Annotation compatible structure - Supports both temporal and spatial annotation - Tracks detection thresholds and model architecture see_also: - https://www.w3.org/TR/annotation-model/ - http://www.cidoc-crm.org/cidoc-crm/E13_Attribute_Assignment - https://iiif.io/api/presentation/3.0/