Skip to content

questionable TP alignment of TD annotation #379

@owencking

Description

@owencking

Bug Description

I am worried about whether the TP to which the TD is aligned is the same TP that was the source of the image processed by the VLM.

I ran the captioner on input MMIF from SWT. It produced this in its output view:

        {
          "@type": "http://mmif.clams.ai/vocabulary/TextDocument/v1",
          "properties": {
            "document": "d1",
            "origin": "v_1:tf_18",
            "provenance": "derived",
            "mime": "application/json",
            "text": {
              "@value": "AL GALLETTA BLUEBERRY GROWER",
              "@language": "en"
            },
            "id": "v_2:td_6"
          }
        },
        {
          "@type": "http://mmif.clams.ai/vocabulary/Alignment/v1",
          "properties": {
            "source": "v_0:tp_919",
            "target": "v_2:td_6",
            "id": "v_2:al_6"
          }
        },

Here are the TF and TP annotations that is referenced:

        {
          "@type": "http://mmif.clams.ai/vocabulary/TimeFrame/v6",
          "properties": {
            "label": "chyron and person",
            "classification": {
              "chyron and person": 0.9387489855289459
            },
            "targets": [
              "v_0:tp_919",
              "v_0:tp_920",
              "v_0:tp_921",
              "v_0:tp_922",
              "v_0:tp_923",
              "v_0:tp_924",
              "v_0:tp_925",
              "v_0:tp_926"
            ],
            "representatives": [
              "v_0:tp_919"
            ],
            "timeUnit": "milliseconds",
            "id": "v_1:tf_18"
          }
        },
        {
          "@type": "http://mmif.clams.ai/vocabulary/TimePoint/v5",
          "properties": {
            "timePoint": 459026,
            "label": "IN",
            "classification": {
              "GLOTW": 3.5924065741710365e-05,
              "CR": 3.7618651731463615e-06,
              "IN": 0.947616696357727,
              "KU": 0.001030643587000668,
              "B": 2.926498436818542e-25,
              "S": 1.0204522123136162e-11,
              "M": 5.168930283794282e-10,
              "Y": 1.1250751413172111e-05,
              "F": 1.7080129310897973e-08,
              "E": 0.0006756898364983499,
              "P": 0.050621531903743744,
              "-": 4.5278543439053465e-06
            },
            "id": "v_0:tp_919"
          }
        },

However, this is the frame from the video sought for 459026 (and found at 00459025).
Image

Note that there is no text in that image, not even faintly. However, here are some nearby frames

Image (from 00459892) Image (from 00461661)

Question: Is it possible that the app is pulling an image from later than the time point it is seeking?

Here is the cataloging aid where we discovered this.

Reproduction steps

Full MMIF file: cpb-aacip-259-9c6s1g2d_NJN_News_pre-1984_2.mmif.json

Media file: cpb-aacip-259-9c6s1g2d.mp4

Expected behavior

The VLM should perform captioning on the exact frame referenced by the TP annotation.

Log output

Screenshots

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Todo

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions