Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Empty dataRow when only external_id is provided
- NDVideoText now builds
data_rowviaDataRow(id=data.uid, global_key=data.global_key), so an external_id-only row raises the same clear validation error instead of serializing an empty dataRow.
- NDVideoText now builds
- ✅ Fixed: NDVideoText drops feature_schema_id and requires name
- NDVideoText now supports optional
name, preservesfeature_schema_idasschemaId, and validates that at least one identifier is set to match other classification serializers.
- NDVideoText now supports optional
Or push these changes by commenting:
@cursor push d470be00b2
Preview (d470be00b2)
diff --git a/libs/labelbox/src/labelbox/data/serialization/ndjson/classification.py b/libs/labelbox/src/labelbox/data/serialization/ndjson/classification.py
--- a/libs/labelbox/src/labelbox/data/serialization/ndjson/classification.py
+++ b/libs/labelbox/src/labelbox/data/serialization/ndjson/classification.py
@@ -223,10 +223,21 @@
{"name": "...", "answer": [{"value": "text", "frames": [{"start": 1, "end": 5}]}], ...}
"""
- name: str
+ name: Optional[str] = None
+ schema_id: Optional[Cuid] = Field(
+ default=None, serialization_alias="schemaId"
+ )
answer: List[NDVideoTextAnswer]
- dataRow: Dict[str, str]
+ data_row: DataRow = Field(serialization_alias="dataRow")
+ model_config = ConfigDict(populate_by_name=True)
+
+ @model_validator(mode="after")
+ def must_set_one(self):
+ if self.schema_id is None and self.name is None:
+ raise ValueError("Schema id or name are not set. Set either one.")
+ return self
+
@classmethod
def from_video_text_group(
cls,
@@ -235,14 +246,10 @@
data: "GenericDataRowData",
) -> "NDVideoText":
first = annotation_group[0]
- data_row = {}
- if data.global_key:
- data_row["globalKey"] = data.global_key
- elif data.uid:
- data_row["id"] = data.uid
return cls(
name=first.name,
- dataRow=data_row,
+ schema_id=first.feature_schema_id,
+ data_row=DataRow(id=data.uid, global_key=data.global_key),
answer=[
NDVideoTextAnswer(value=text_val, frames=ranges)
for text_val, ranges in frame_ranges_by_text.items()
diff --git a/libs/labelbox/tests/data/serialization/ndjson/test_video.py b/libs/labelbox/tests/data/serialization/ndjson/test_video.py
--- a/libs/labelbox/tests/data/serialization/ndjson/test_video.py
+++ b/libs/labelbox/tests/data/serialization/ndjson/test_video.py
@@ -1,4 +1,5 @@
import json
+import pytest
from labelbox.data.annotation_types.classification.classification import (
Checklist,
ClassificationAnnotation,
@@ -722,6 +723,59 @@
assert answer[0]["frames"] == [{"start": 9, "end": 15}]
+def test_video_classification_text_with_external_id_raises():
+ label = Label(
+ data=GenericDataRowData(external_id="sample-video-external-id"),
+ annotations=[
+ VideoClassificationAnnotation(
+ name="free_text",
+ frame=9,
+ segment_index=0,
+ value=Text(answer="sample text"),
+ )
+ ],
+ )
+
+ with pytest.raises(ValueError, match="Must set either id or global_key"):
+ list(NDJsonConverter.serialize([label]))
+
+
+def test_video_classification_text_with_feature_schema_id_only():
+ label = Label(
+ data=GenericDataRowData(global_key="sample-video-schema-id-only"),
+ annotations=[
+ VideoClassificationAnnotation(
+ feature_schema_id="ckrb1sfjx099a0y914hl319ie",
+ frame=9,
+ segment_index=0,
+ value=Text(answer="sample text"),
+ ),
+ VideoClassificationAnnotation(
+ feature_schema_id="ckrb1sfjx099a0y914hl319ie",
+ frame=15,
+ segment_index=0,
+ value=Text(answer="sample text"),
+ ),
+ ],
+ )
+
+ serialized = list(NDJsonConverter.serialize([label]))
+ free_text_rows = [
+ r
+ for r in serialized
+ if r.get("schemaId") == "ckrb1sfjx099a0y914hl319ie"
+ ]
+ assert len(free_text_rows) == 1
+
+ row = free_text_rows[0]
+ assert row["schemaId"] == "ckrb1sfjx099a0y914hl319ie"
+ assert "name" not in row
+ assert row["dataRow"] == {"globalKey": "sample-video-schema-id-only"}
+ assert row["answer"] == [
+ {"value": "sample text", "frames": [{"start": 9, "end": 15}]}
+ ]
+
+
def test_video_classification_nesting_bbox():
bbox_annotation = [
VideoObjectAnnotation(| if data.global_key: | ||
| data_row["globalKey"] = data.global_key | ||
| elif data.uid: | ||
| data_row["id"] = data.uid |
There was a problem hiding this comment.
Empty dataRow when only external_id is provided
Low Severity
NDVideoText.from_video_text_group builds data_row manually checking only data.global_key and data.uid. If a user creates a GenericDataRowData with only external_id, data_row stays {}, producing a silently empty "dataRow": {} in the output. All other annotation types use the DataRow class, which has a validator that raises a clear ValueError("Must set either id or global_key") in this situation. This new path silently drops the data row identifier, which could lead to API rejection or annotation loss.
| elif data.uid: | ||
| data_row["id"] = data.uid | ||
| return cls( | ||
| name=first.name, |
There was a problem hiding this comment.
NDVideoText drops feature_schema_id and requires name
Medium Severity
NDVideoText declares name: str (required, non-optional) and has no schema_id/feature_schema_id field. FeatureSchema allows name=None when feature_schema_id is set, and all other annotation types go through NDAnnotation which accepts either. With this change, a VideoClassificationAnnotation using only feature_schema_id as its identifier will crash at NDVideoText(name=None, …) with a Pydantic validation error. Even when both are provided, feature_schema_id is silently dropped from the serialized output, unlike the NDText/NDRadio/NDChecklist paths that preserve it as schemaId.



Description
Summary
Fixes NDJSON serialization for
VideoClassificationAnnotationwithTextvalues. Previously, frame information and secondary text segments were silently dropped -- only the first annotation's text was emitted as a plain string with no frames. After this fix, the output includes per-segment text values with their frame ranges, matching the format produced byTemporalClassificationText.Problem
Before -- frames lost, second text segment dropped:
{ "name": "free_text_per_frame", "answer": "sample text 1", "uuid": "73a61cfa-...", "dataRow": {"globalKey": "my-video-global-key"} }After -- all segments and frame ranges preserved:
{ "name": "free_text_per_frame", "answer": [ {"value": "sample text 1", "frames": [{"start": 9, "end": 15}]}, {"value": "sample text 2", "frames": [{"start": 40, "end": 50}]} ], "dataRow": {"globalKey": "my-video-global-key"} }Changes
classification.py: AddedNDVideoTextandNDVideoTextAnswerclasses (standaloneBaseModel, no dependency on temporal pipeline).label.py: Added aTextbranch in_create_video_annotationsthat groups annotations by text value, computes segment-aware frame ranges, and yieldsNDVideoText.test_video.py: Added two tests covering multi-text and single-text scenarios.Scope
VideoClassificationAnnotation+Text. Video Radio, Checklist, and object annotations are unchanged.VideoClassificationAnnotation(value=Text(...)).Fixes # (issue)
Type of change
Please delete options that are not relevant.
All Submissions
New Feature Submissions
Changes to Core Features
Note
Medium Risk
Changes the emitted NDJSON shape for video
Textclassifications from a single string to a structured list with frame ranges, which may impact downstream parsers expecting the old format. Scope is limited toVideoClassificationAnnotation+Textand is covered by new tests.Overview
Fixes video free-text classification export so NDJSON no longer drops frame information or later text segments.
VideoClassificationAnnotationwithTextis now serialized as a single row whoseansweris a list of{value, frames}entries (computed by grouping annotations by text value and segment-aware frame ranges) via a newNDVideoTextmodel, with tests added for multi-text and single-text cases.Written by Cursor Bugbot for commit fcc1107. This will update automatically on new commits. Configure here.