StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation (MLSys 2026)

Tianrui Feng¹, Zhi Li², Shuo Yang², Haocheng Xi², Muyang Li³, Xiuyu Li¹, Lvmin Zhang⁴, Keting Yang⁵, Kelly Peng⁶, Song Han⁷, Maneesh Agrawala⁴, Kurt Keutzer², Akio Kodaira⁸, Chenfeng Xu^†,1

¹UT Austin, ²UC Berkeley, ³Nunchaku AI, ⁴Stanford University, ⁵Independent Researcher, ⁶First Intelligence, ⁷MIT, ⁸Shizhuku AI

^† Project lead, corresponding to xuchenfeng@utexas.edu

Overview

StreamDiffusionV2 is an open-source interactive diffusion pipeline for real-time streaming applications. It scales across diverse GPU setups, supports flexible denoising steps, and delivers high FPS for creators and platforms. Further details are available on our project homepage.

News

[2026-01-26] 🎉 StreamDiffusionV2 is accepted by MLSys 2026!
[2025-11-10] 🚀 We have released our paper at arXiv. Check it for more details!
[2025-10-18] Release our model checkpoint on huggingface.
[2025-10-06] 🔥 Our StreamDiffusionV2 is publicly released! Check our project homepage for more details.

Prerequisites

OS: Linux with NVIDIA GPU
CUDA-compatible GPU and drivers

Installation

conda create -n stream python=3.10.0
conda activate stream
# Require CUDA 12.4 or above, please check via `nvcc -V`
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
# Optional but recommended for better throughput
# The project will fall back to PyTorch attention if FlashAttention is unavailable
pip install flash_attn==2.7.4.post1 --no-build-isolation
python setup.py develop

Download Checkpoints

# 1.3B Model
huggingface-cli download --resume-download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B
huggingface-cli download --resume-download jerryfeng/StreamDiffusionV2 --local-dir ./ckpts --include "wan_causal_dmd_v2v/*"

# 14B Model
huggingface-cli download --resume-download Wan-AI/Wan2.1-T2V-14B --local-dir wan_models/Wan2.1-T2V-14B
huggingface-cli download --resume-download jerryfeng/StreamDiffusionV2 --local-dir ./ckpts --include "wan_causal_dmd_v2v_14b/*"

We use the 14B model from CausVid-Plus for offline inference demo.

Offline Inference

All offline inference entrypoints are unified under run_v2v.sh.

Choose one mode first:

single: single-GPU streaming inference
single-wo: single-GPU inference without Stream-batch
pipe: multi-GPU pipeline inference

Quick start:

./run_v2v.sh single
./run_v2v.sh single-wo
./run_v2v.sh pipe
./run_v2v.sh pipe --profile

Use --profile only when you want synchronized throughput measurements.

The legacy wrappers v2v.sh, v2v_wo.sh, and pipe_v2v.sh still work, but they now forward to the same shared entrypoint.

Common Arguments

The most important options are:

--config_path: model config YAML
--checkpoint_folder: checkpoint directory
--video_path: input video
--prompt_file_path: prompt text file
--output_folder: output directory
--height and --width: output resolution
--fps: target output FPS
--step: number of denoising steps used during inference

You can pass overrides either as CLI flags or as environment variables. For example:

OUTPUT_FOLDER=outputs/run_single ./run_v2v.sh single
VIDEO_PATH=examples/original.mp4 PROMPT_FILE_PATH=examples/prompt.txt ./run_v2v.sh single-wo
NPROC_PER_NODE=2 MASTER_PORT=29511 ./run_v2v.sh pipe

Single GPU

This is the standard offline path when you run on one GPU.

./run_v2v.sh single \
--config_path configs/wan_causal_dmd_v2v.yaml \
--checkpoint_folder ckpts/wan_causal_dmd_v2v \
--output_folder outputs/ \
--prompt_file_path examples/prompt.txt \
--video_path examples/original.mp4 \
--height 480 \
--width 832 \
--fps 16 \
--step 2

Multi-GPU

Use this mode when you want to split inference across multiple GPUs.

./run_v2v.sh pipe \
--config_path configs/wan_causal_dmd_v2v.yaml \
--checkpoint_folder ckpts/wan_causal_dmd_v2v \
--output_folder outputs/ \
--prompt_file_path examples/prompt.txt \
--video_path examples/original.mp4 \
--height 480 \
--width 832 \
--fps 16 \
--step 2
# --schedule_block  # optional: enable block scheduling

Notes:

--schedule_block is optional and can improve throughput on some multi-GPU setups.
Adjust NPROC_PER_NODE, --height, --width, and --fps to match your hardware and target workload.
./run_v2v.sh pipe --profile is intended for profiling runs, not normal benchmarking or deployment.

Online Inference (Web UI)

A minimal web demo is available under demo/. For setup and startup, please refer to demo.

Access in a browser after startup: http://0.0.0.0:7860 or http://localhost:7860

To-do List

Acknowledgements

StreamDiffusionV2 is inspired by the prior works StreamDiffusion and StreamV2V. Our Causal DiT builds upon CausVid, and the rolling KV cache design is inspired by Self-Forcing.

We are grateful to the team members of StreamDiffusion for their support. We also thank First Intelligence and Daydream team for their great feedback.

We also especially thank DayDream team for the great collaboration and incorporating our StreamDiffusionV2 pipeline into their cool Demo UI.

Citation

If you find this repository useful in your research, please consider giving a star ⭐ or a citation.

@article{feng2025streamdiffusionv2,
  title={StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation},
  author={Feng, Tianrui and Li, Zhi and Yang, Shuo and Xi, Haocheng and Li, Muyang and Li, Xiuyu and Zhang, Lvmin and Yang, Keting and Peng, Kelly and Han, Song and others},
  journal={arXiv preprint arXiv:2511.07399},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
assets		assets
causvid		causvid
configs		configs
demo		demo
examples		examples
streamv2v		streamv2v
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_v2v.sh		run_v2v.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation (MLSys 2026)

Overview

News

Prerequisites

Installation

Download Checkpoints

Offline Inference

Common Arguments

Single GPU

Multi-GPU

Online Inference (Web UI)

To-do List

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Languages

Folders and files

Latest commit

History

Repository files navigation

StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation (MLSys 2026)

Overview

News

Prerequisites

Installation

Download Checkpoints

Offline Inference

Common Arguments

Single GPU

Multi-GPU

Online Inference (Web UI)

To-do List

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Languages

Packages