fix(vace): resample vace_input_masks to match target_num_frames on first chunk#722
fix(vace): resample vace_input_masks to match target_num_frames on first chunk#722livepeer-tessa wants to merge 3 commits intomainfrom
Conversation
…ly outputs Signed-off-by: Rafal Leszko <rafal@livepeer.org>
…eprocessVideoBlock
On the first chunk (current_start_frame == 0), target_num_frames is
num_frame_per_block * vae_temporal_downsample_factor + 1 (e.g. 13 for
default config). PreprocessVideoBlock already resamples 'video' and
'vace_input_frames' to this count, but 'vace_input_masks' was never
adjusted. When masks arrive from a queue or client parameter they have
the base chunk size (e.g. 12 frames), causing VaceEncodingBlock to
raise:
ValueError: vace_input_masks shape mismatch: expected [B, 1, 13, ...]
got [B, 1, 12, ...]
Fix: add vace_input_masks to PreprocessVideoBlock inputs/outputs and
resample its temporal dimension to target_num_frames whenever it does
not already match, using the same linear-interpolation index strategy
used for video/vace_input_frames.
Fixes #721
Signed-off-by: livepeer-robot <robot@livepeer.org>
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.Change the |
🚀 fal.ai Preview Deployment
TestingConnect to this preview deployment by running this on your branch: 🧪 E2E tests will run automatically against this deployment. |
✅ E2E Tests passed
Test ArtifactsCheck the workflow run for screenshots. |
Signed-off-by: livepeer-robot <robot@livepeer.org>
Problem
Fixes #721.
On the first chunk (
current_start_frame == 0),PreprocessVideoBlocksetstarget_num_frames = num_frame_per_block * vae_temporal_downsample_factor + 1(e.g. 13 with defaults). It correctly resamples bothvideoandvace_input_framesto this count — butvace_input_maskswas never adjusted.When masks arrive from a queue or a client parameter they have the base chunk size (e.g. 12 frames), so
VaceEncodingBlock._encode_with_conditioningraises:Fix
vace_input_maskstoPreprocessVideoBlockinputsandintermediate_outputs__call__, resample the temporal dimension ofvace_input_maskstotarget_num_frameswhenever it does not already match, using the same linear-interpolation index strategy already used forvideo/vace_input_framesTesting
Repro: any workflow that passes
vace_input_masksto the longlive pipeline on its first chunk (e.g. pixel-art-preserved-background). After this fix the first chunk processes without shape errors.