id: https://nde.nl/ontology/hc/class/VideoAnnotation
name: video_annotation_class
title: Video Annotation Class
imports:
  - linkml:types
  - ../enums/AnnotationTypeEnum
  - ../slots/analyze
  - ../slots/contain
  - ../slots/filter
  - ../slots/has_quantity
  - ../slots/has_rationale
  - ../slots/has_score
  - ../slots/has_threshold
  - ../slots/has_type
  - ../slots/has_measurement_unit
  - ../slots/has_bounding_box
  - ../slots/mask
  - ../slots/has_method
  - ../slots/has_model
  - ../slots/has_objective
prefixes:
  linkml: https://w3id.org/linkml/
  hc: https://nde.nl/ontology/hc/
  schema: http://schema.org/
  dcterms: http://purl.org/dc/terms/
  prov: http://www.w3.org/ns/prov#
  crm: http://www.cidoc-crm.org/cidoc-crm/
  oa: http://www.w3.org/ns/oa#
  as: https://www.w3.org/ns/activitystreams#
default_prefix: hc
classes:
  VideoAnnotation:
    is_a: VideoTextContent
    class_uri: oa:Annotation
    abstract: true
    description: "Abstract base class for computer vision and multimodal video annotations.\n\n**DEFINITION**:\n\nVideoAnnotation represents structured information derived from visual\nanalysis of video content. This includes:\n\n| Subclass | Analysis Type | Output |\n|----------|---------------|--------|\n| VideoSceneAnnotation | Shot/scene detection | Scene boundaries, types |\n| VideoObjectAnnotation | Object detection | Objects, faces, logos |\n| VideoOCRAnnotation | Text extraction | On-screen text (OCR) |\n\n**RELATIONSHIP TO W3C WEB ANNOTATION**:\n\nVideoAnnotation aligns with the W3C Web Annotation Data Model:\n\n```turtle\n:annotation a oa:Annotation ;\n    oa:hasBody :detection_result ;\n    oa:hasTarget [\n        oa:hasSource :video ;\n        oa:hasSelector [\n            a oa:FragmentSelector ;\n            dcterms:conformsTo <http://www.w3.org/TR/media-frags/> ;\n            rdf:value \"t=30,35\"\n        ]\n    ] ;\n    oa:motivatedBy oa:classifying .\n```\n\n**FRAME-BASED\
      \ ANALYSIS**:\n\nUnlike audio transcription (continuous stream), video annotation is\ntypically frame-based:\n\n- `frame_sample_rate`: Frames analyzed per second (e.g., 1 fps, 5 fps)\n- `analyze`: Total frames processed\n- Higher sample rates = more detections but higher compute cost\n\n**DETECTION THRESHOLDS**:\n\nCV models output confidence scores. Thresholds filter noise:\n\n| Threshold | Use Case |\n|-----------|----------|\n| 0.9+ | High precision, production display |\n| 0.7-0.9 | Balanced, general use |\n| 0.5-0.7 | High recall, research/review |\n| < 0.5 | Raw output, needs filtering |\n\n**MODEL ARCHITECTURE TRACKING**:\n\nDifferent model architectures have different characteristics:\n\n| Architecture | Examples | Strengths |\n|--------------|----------|-----------|\n| CNN | ResNet, VGG | Fast inference, good for objects |\n| Transformer | ViT, CLIP | Better context, multimodal |\n| Hybrid | DETR, Swin | Balance of speed and accuracy |\n\n**HERITAGE INSTITUTION\
      \ CONTEXT**:\n\nVideo annotations enable:\n- **Discovery**: Find videos containing specific objects/artworks\n- **Accessibility**: Scene descriptions for visually impaired\n- **Research**: Analyze visual content at scale\n- **Preservation**: Document visual content as text\n- **Linking**: Connect detected artworks to collection records\n\n**CIDOC-CRM E13_Attribute_Assignment**:\n\nAnnotations are attribute assignments - asserting properties about\nvideo segments. The CV model or human annotator is the assigning agent.\n"
    exact_mappings:
    - oa:Annotation
    close_mappings:
    - crm:E13_Attribute_Assignment
    related_mappings:
    - as:Activity
    - schema:ClaimReview
    slots:
    - has_rationale
    - contain
    - has_type
    - filter
    - has_bounding_box
    - mask
    - has_method
    - has_model
    - has_objective
    - has_score
    - analyze
    slot_usage:
      has_type:
#         range: string # uriorcurie
        required: true
        examples:
        - value:
            has_code: OBJECT_DETECTION
            has_label: Object Detection
      contain:
#         range: string
        multivalued: true
        required: false
        inlined_as_list: false # Fixed invalid inline for primitive type
        examples:
        - value:
            has_label: Night Watch painting visible
            has_description: 30.0 - 35.0 seconds
      has_rationale:
#         range: string
        required: false
        examples:
        - value:
            has_label: ClassifyingMotivation
      filter:
        description: "MIGRATED 2026-01-25: Replaces detection_count and detection_threshold slots.\n\nLinks to DetectedEntity which contains:\n- has_quantity \u2192 Quantity (for detection_count)\n- has_threshold \u2192 DetectionThreshold (for detection_threshold)\n\n**Migration Pattern**:\n- Old: detection_count: 342, detection_threshold: 0.5\n- New: filters_or_filtered \u2192 DetectedEntity with structured data\n"
        range: DetectedEntity
        inlined: true
        required: false
        examples:
        - value:
            has_quantity:
              has_measurement_unit:
            has_threshold:
        - value:
            has_quantity:
              has_measurement_unit:
            has_threshold:
              has_label: High Precision
      analyze:
        description: "MIGRATED 2026-01-22: Now supports VideoFrame class for frame_sample_rate migration.\n\nFrame analysis information including:\n- Total frames analyzed (integer, legacy pattern)\n- Frame sample rate and analysis parameters (VideoFrame class)\n\nMIGRATED SLOTS:\n- frame_sample_rate \u2192 VideoFrame.has_quantity with unit \"samples per second\"\n"
        range: VideoFrame
        inlined: true
        required: false
        examples:
        - value:
            has_quantity:
              has_measurement_unit:
        - value:
            has_quantity:
              has_measurement_unit:
      has_method:
        range: boolean
        required: false
        examples:
        - value: true
      has_model:
#         range: string
        required: false
        examples:
        - value: Transformer
        - value: CNN
      has_objective:
#         range: string
        required: false
        examples:
        - value: detection
        - value: captioning
      has_bounding_box:
        range: boolean
        required: false
        examples:
        - value: true
      mask:
        range: boolean
        required: false
        examples:
        - value: false
    comments:
    - Abstract base for all CV/multimodal video annotations
    - Extends VideoTextContent with frame-based analysis parameters
    - W3C Web Annotation compatible structure
    - Supports both temporal and spatial annotation
    - Tracks detection thresholds and model architecture
    see_also:
    - https://www.w3.org/TR/annotation-model/
    - http://www.cidoc-crm.org/cidoc-crm/E13_Attribute_Assignment
    - https://iiif.io/api/presentation/3.0/
    annotations:
      specificity_score: 0.1
      specificity_rationale: Generic utility class/slot created during migration
      custodian_types: "['*']"