PVMamba

PVMamba: Parallelizing Vision Mamba via Dynamic State Aggregation

Fei Xie, Zhongdao Wang, Weijia Zhang, Chao Ma

The official implementation of the paper.

If you have any questions, please don't hesitate to contact me (jaffe031@sjtu.edu.cn).

You can also refer to this GitHub repository: PVMamba.

To Do List

2025.06 PVMamba is accepted by ICCV2025.
2025.06 Release the code for image classification.
2025.07 Release the logs/configs for image classification.
Enhance PVMamba by DCNv4 operator!
Publish the paper.

📜 Introduction

Mamba, an architecture with RNN-like sequence modeling of the State Space Model (SSM), has demonstrated promising capabilities in long-range modeling with high efficiency. However, Mamba models struggle with structured 2D visual data using sequential computing, thereby lagging behind their attention-based counterparts. In this paper, we propose Parallel Vision Mamba (PVMamba), a novel SSM architecture specifically designed for visual data. PVMamba encompasses two key designs: 1) Based on the sparsity and adjacency of visual signals, we parallelize the sequential computing through three core steps, termed Dynamic State Aggregation (DSA), i.e., parallelization, alignment, and aggregation. DSA generates the hidden state in SSM by a feasible spatial aggregation, thereby overcoming the inherent sequential constraints. 2) In addition to maintaining linear computational complexity, we apply a dynamic operator to learn the spatial samplings for each hidden state. To further boost the local modeling capability, we restrict the dynamic operator to the neighboring pixels in shallow layers. We also devise a layer multiplexing technique to stabilize the training and reduce the learning redundancy. PVMamba is a versatile backbone network with dynamic operators for various vision tasks, such as image classification and dense prediction.

Classification on ImageNet-1K

name	pretrain	resolution	acc@1	#params	FLOPs	configs/logs/ckpts
PVMamba-Tiny	ImageNet-1K	224x224	84.8	89M	16.1G	BaiduNetDisk/password=ajh1
PVMamba-Small	ImageNet-1K	224x224	84.2	40M	7.4G	BaiduNetDisk/password=ajh1
PVMamba-Base	ImageNet-1K	224x224	83.9	24M	4.5G	BaiduNetDisk/password=ajh1

🕹️ SGetting Started

Installation

The installation tips can also be referred to VMamba.

Environment Setup:

VMamba recommends setting up a conda environment and installing dependencies via pip. Use the following commands to set up your environment: Additionally, we recommend using PyTorch >=2.0, CUDA >=11.8. But a lower version of PyTorch and CUDA is also supported.

Create and activate a new conda environment

conda create -n pvmamba
conda activate pvmamba

Install Dependencies

For the SSM library, please do as follows:

pip install -r requirements.txt
cd kernels/selective_scan && pip install .

For the DCNv4 library, please do as follows:

cd kernels
unzip DCNv4_op.zip
cd kernels/DCNv4_op && pip install .
cp ./dcnv4.py kernels/DCNv4_op/DCNv4/modules/dcnv4.py

Dependencies for Detection and Segmentation (optional)

pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0

📋 Model Training and Inference

Classification

To train PVMamba models for classification on ImageNet, use the following commands for different configurations. Add --mesa if you want to use mesa training.

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp

If you only want to test the performance (together with params and flops):

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp --pretrained </path/of/checkpoint>

To train with mmdetection or mmsegmentation:

bash ./tools/dist_train.sh </path/to/config> 8

🤗 Citation

If you find this paper useful, please consider citing it. Thanks!


@article{xie2025pvmamba,
    title={PVMamba: Parallelizing Vision Mamba via Dynamic State Aggregation},
    author={Xie, Fei and Wang, Zhongdao and Zhang, Weijia and Ma, Chao},
    journal={Proceedings of the IEEE/CVF International Conference on Computer Vision},
    year={2025}
}

❤️ Acknowledgment

This project is based on VMamba, VSSD, Mamba2 and DCNv4. Thanks for their great work!

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
assets		assets
classification		classification
detection/configs		detection/configs
kernels		kernels
segmentation		segmentation
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PVMamba

To Do List

📜 Introduction

Classification on ImageNet-1K

🕹️ SGetting Started

Installation

📋 Model Training and Inference

🤗 Citation

❤️ Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PVMamba

To Do List

📜 Introduction

Classification on ImageNet-1K

🕹️ SGetting Started

Installation

📋 Model Training and Inference

🤗 Citation

❤️ Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages