Fix VideoClassificationText by paulnoirel · Pull Request #2044 · Labelbox/labelbox-python

paulnoirel · 2026-02-26T10:16:28Z

Description

Summary

Fixes NDJSON serialization for VideoClassificationAnnotation with Text values. Previously, frame information and secondary text segments were silently dropped -- only the first annotation's text was emitted as a plain string with no frames. After this fix, the output includes per-segment text values with their frame ranges, matching the format produced by TemporalClassificationText.

Problem

annotations = [
    lb_types.VideoClassificationAnnotation(
        name="free_text_per_frame", frame=9, segment_index=0,
        value=lb_types.Text(answer="sample text 1"),
    ),
    lb_types.VideoClassificationAnnotation(
        name="free_text_per_frame", frame=15, segment_index=0,
        value=lb_types.Text(answer="sample text 1"),
    ),
    lb_types.VideoClassificationAnnotation(
        name="free_text_per_frame", frame=40, segment_index=0,
        value=lb_types.Text(answer="sample text 2"),
    ),
    lb_types.VideoClassificationAnnotation(
        name="free_text_per_frame", frame=50, segment_index=0,
        value=lb_types.Text(answer="sample text 2"),
    ),
]

Before -- frames lost, second text segment dropped:

{
  "name": "free_text_per_frame",
  "answer": "sample text 1",
  "uuid": "73a61cfa-...",
  "dataRow": {"globalKey": "my-video-global-key"}
}

After -- all segments and frame ranges preserved:

{
  "name": "free_text_per_frame",
  "answer": [
    {"value": "sample text 1", "frames": [{"start": 9, "end": 15}]},
    {"value": "sample text 2", "frames": [{"start": 40, "end": 50}]}
  ],
  "dataRow": {"globalKey": "my-video-global-key"}
}

Changes

classification.py: Added NDVideoText and NDVideoTextAnswer classes (standalone BaseModel, no dependency on temporal pipeline).
label.py: Added a Text branch in _create_video_annotations that groups annotations by text value, computes segment-aware frame ranges, and yields NDVideoText.
test_video.py: Added two tests covering multi-text and single-text scenarios.

Scope

Only affects VideoClassificationAnnotation + Text. Video Radio, Checklist, and object annotations are unchanged.
Non-video text annotations are unchanged.
No breaking changes to the Python API -- users keep using VideoClassificationAnnotation(value=Text(...)).

Fixes # (issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Document change (fix typo or modifying any markdown files, code comments or anything in the examples folder only)

All Submissions

Have you followed the guidelines in our Contributing document?
Have you provided a description?
Are your changes properly formatted?

New Feature Submissions

Does your submission pass tests?
Have you added thorough tests for your new feature?
Have you commented your code, particularly in hard-to-understand areas?
Have you added a Docstring?

Changes to Core Features

Have you written new tests for your core changes, as applicable?
Have you successfully run tests with your changes locally?
Have you updated any code comments, as applicable?

Note

Medium Risk
Changes the emitted NDJSON shape for video Text classifications from a single string to a structured list with frame ranges, which may impact downstream parsers expecting the old format. Scope is limited to VideoClassificationAnnotation + Text and is covered by new tests.

Overview
Fixes video free-text classification export so NDJSON no longer drops frame information or later text segments.

VideoClassificationAnnotation with Text is now serialized as a single row whose answer is a list of {value, frames} entries (computed by grouping annotations by text value and segment-aware frame ranges) via a new NDVideoText model, with tests added for multi-text and single-text cases.

^{Written by Cursor Bugbot for commit fcc1107. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix prepared fixes for both issues found in the latest run.

✅ Fixed: Empty dataRow when only external_id is provided
- NDVideoText now builds data_row via DataRow(id=data.uid, global_key=data.global_key), so an external_id-only row raises the same clear validation error instead of serializing an empty dataRow.
✅ Fixed: NDVideoText drops feature_schema_id and requires name
- NDVideoText now supports optional name, preserves feature_schema_id as schemaId, and validates that at least one identifier is set to match other classification serializers.

Or push these changes by commenting:

@cursor push d470be00b2

Preview (d470be00b2)

diff --git a/libs/labelbox/src/labelbox/data/serialization/ndjson/classification.py b/libs/labelbox/src/labelbox/data/serialization/ndjson/classification.py
--- a/libs/labelbox/src/labelbox/data/serialization/ndjson/classification.py
+++ b/libs/labelbox/src/labelbox/data/serialization/ndjson/classification.py
@@ -223,10 +223,21 @@
       {"name": "...", "answer": [{"value": "text", "frames": [{"start": 1, "end": 5}]}], ...}
     """
 
-    name: str
+    name: Optional[str] = None
+    schema_id: Optional[Cuid] = Field(
+        default=None, serialization_alias="schemaId"
+    )
     answer: List[NDVideoTextAnswer]
-    dataRow: Dict[str, str]
+    data_row: DataRow = Field(serialization_alias="dataRow")
 
+    model_config = ConfigDict(populate_by_name=True)
+
+    @model_validator(mode="after")
+    def must_set_one(self):
+        if self.schema_id is None and self.name is None:
+            raise ValueError("Schema id or name are not set. Set either one.")
+        return self
+
     @classmethod
     def from_video_text_group(
         cls,
@@ -235,14 +246,10 @@
         data: "GenericDataRowData",
     ) -> "NDVideoText":
         first = annotation_group[0]
-        data_row = {}
-        if data.global_key:
-            data_row["globalKey"] = data.global_key
-        elif data.uid:
-            data_row["id"] = data.uid
         return cls(
             name=first.name,
-            dataRow=data_row,
+            schema_id=first.feature_schema_id,
+            data_row=DataRow(id=data.uid, global_key=data.global_key),
             answer=[
                 NDVideoTextAnswer(value=text_val, frames=ranges)
                 for text_val, ranges in frame_ranges_by_text.items()

diff --git a/libs/labelbox/tests/data/serialization/ndjson/test_video.py b/libs/labelbox/tests/data/serialization/ndjson/test_video.py
--- a/libs/labelbox/tests/data/serialization/ndjson/test_video.py
+++ b/libs/labelbox/tests/data/serialization/ndjson/test_video.py
@@ -1,4 +1,5 @@
 import json
+import pytest
 from labelbox.data.annotation_types.classification.classification import (
     Checklist,
     ClassificationAnnotation,
@@ -722,6 +723,59 @@
     assert answer[0]["frames"] == [{"start": 9, "end": 15}]
 
 
+def test_video_classification_text_with_external_id_raises():
+    label = Label(
+        data=GenericDataRowData(external_id="sample-video-external-id"),
+        annotations=[
+            VideoClassificationAnnotation(
+                name="free_text",
+                frame=9,
+                segment_index=0,
+                value=Text(answer="sample text"),
+            )
+        ],
+    )
+
+    with pytest.raises(ValueError, match="Must set either id or global_key"):
+        list(NDJsonConverter.serialize([label]))
+
+
+def test_video_classification_text_with_feature_schema_id_only():
+    label = Label(
+        data=GenericDataRowData(global_key="sample-video-schema-id-only"),
+        annotations=[
+            VideoClassificationAnnotation(
+                feature_schema_id="ckrb1sfjx099a0y914hl319ie",
+                frame=9,
+                segment_index=0,
+                value=Text(answer="sample text"),
+            ),
+            VideoClassificationAnnotation(
+                feature_schema_id="ckrb1sfjx099a0y914hl319ie",
+                frame=15,
+                segment_index=0,
+                value=Text(answer="sample text"),
+            ),
+        ],
+    )
+
+    serialized = list(NDJsonConverter.serialize([label]))
+    free_text_rows = [
+        r
+        for r in serialized
+        if r.get("schemaId") == "ckrb1sfjx099a0y914hl319ie"
+    ]
+    assert len(free_text_rows) == 1
+
+    row = free_text_rows[0]
+    assert row["schemaId"] == "ckrb1sfjx099a0y914hl319ie"
+    assert "name" not in row
+    assert row["dataRow"] == {"globalKey": "sample-video-schema-id-only"}
+    assert row["answer"] == [
+        {"value": "sample text", "frames": [{"start": 9, "end": 15}]}
+    ]
+
+
 def test_video_classification_nesting_bbox():
     bbox_annotation = [
         VideoObjectAnnotation(

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

cursor · 2026-02-26T10:30:02Z

libs/labelbox/src/labelbox/data/serialization/ndjson/classification.py

+        if data.global_key:
+            data_row["globalKey"] = data.global_key
+        elif data.uid:
+            data_row["id"] = data.uid


Empty dataRow when only external_id is provided

Low Severity

NDVideoText.from_video_text_group builds data_row manually checking only data.global_key and data.uid. If a user creates a GenericDataRowData with only external_id, data_row stays {}, producing a silently empty "dataRow": {} in the output. All other annotation types use the DataRow class, which has a validator that raises a clear ValueError("Must set either id or global_key") in this situation. This new path silently drops the data row identifier, which could lead to API rejection or annotation loss.

cursor · 2026-02-26T10:30:02Z

libs/labelbox/src/labelbox/data/serialization/ndjson/classification.py

+        elif data.uid:
+            data_row["id"] = data.uid
+        return cls(
+            name=first.name,


NDVideoText drops feature_schema_id and requires name

Medium Severity

NDVideoText declares name: str (required, non-optional) and has no schema_id/feature_schema_id field. FeatureSchema allows name=None when feature_schema_id is set, and all other annotation types go through NDAnnotation which accepts either. With this change, a VideoClassificationAnnotation using only feature_schema_id as its identifier will crash at NDVideoText(name=None, …) with a Pydantic validation error. Even when both are provided, feature_schema_id is silently dropped from the serialized output, unlike the NDText/NDRadio/NDChecklist paths that preserve it as schemaId.

Additional Locations (1)

libs/labelbox/src/labelbox/data/serialization/ndjson/label.py#L166-L169

Fix VideoClassificationText

fcc1107

paulnoirel temporarily deployed to Test-PyPI February 26, 2026 10:16 — with GitHub Actions Inactive

paulnoirel marked this pull request as ready for review February 26, 2026 10:20

paulnoirel requested a review from a team as a code owner February 26, 2026 10:20

paulnoirel requested review from KeshavSahoo, RainIwakura, cyrusj89, gmadaan-hue, golchin-shahriar, ramy1951 and vedsirdeshmukh February 26, 2026 10:20

cursor bot reviewed Feb 26, 2026

View reviewed changes

paulnoirel marked this pull request as draft February 26, 2026 10:30

use DataRow in NDVideoText

210d84e

paulnoirel deployed to Test-PyPI February 26, 2026 10:36 — with GitHub Actions View deployment

paulnoirel marked this pull request as ready for review February 26, 2026 10:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix VideoClassificationText#2044

Fix VideoClassificationText#2044
paulnoirel wants to merge 2 commits intodevelopfrom
PLT-3599-Fix-VideoClassificationAnnotation-text-per-frame

paulnoirel commented Feb 26, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment •

edited

Loading

Uh oh!

cursor bot Feb 26, 2026

Uh oh!

cursor bot Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

paulnoirel commented Feb 26, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary

Problem

Changes

Scope

Type of change

All Submissions

New Feature Submissions

Changes to Core Features

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 26, 2026

Choose a reason for hiding this comment

Empty dataRow when only external_id is provided

Uh oh!

cursor bot Feb 26, 2026

Choose a reason for hiding this comment

NDVideoText drops feature_schema_id and requires name

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

paulnoirel commented Feb 26, 2026 •

edited by cursor bot

Loading

cursor bot left a comment •

edited

Loading