PVMamba: Parallelizing Vision Mamba via Dynamic State Aggregation
Fei Xie, Zhongdao Wang, Weijia Zhang, Chao Ma
The official implementation of the paper.
If you have any questions, please don't hesitate to contact me (jaffe031@sjtu.edu.cn).
You can also refer to this GitHub repository: PVMamba.
- 2025.06 PVMamba is accepted by ICCV2025.
- 2025.06 Release the code for image classification.
- 2025.07 Release the logs/configs for image classification.
- Enhance PVMamba by DCNv4 operator!
- Publish the paper.
Mamba, an architecture with RNN-like sequence modeling of the State Space Model (SSM), has demonstrated promising capabilities in long-range modeling with high efficiency. However, Mamba models struggle with structured 2D visual data using sequential computing, thereby lagging behind their attention-based counterparts. In this paper, we propose Parallel Vision Mamba (PVMamba), a novel SSM architecture specifically designed for visual data. PVMamba encompasses two key designs: 1) Based on the sparsity and adjacency of visual signals, we parallelize the sequential computing through three core steps, termed Dynamic State Aggregation (DSA), i.e., parallelization, alignment, and aggregation. DSA generates the hidden state in SSM by a feasible spatial aggregation, thereby overcoming the inherent sequential constraints. 2) In addition to maintaining linear computational complexity, we apply a dynamic operator to learn the spatial samplings for each hidden state. To further boost the local modeling capability, we restrict the dynamic operator to the neighboring pixels in shallow layers. We also devise a layer multiplexing technique to stabilize the training and reduce the learning redundancy. PVMamba is a versatile backbone network with dynamic operators for various vision tasks, such as image classification and dense prediction.
| name | pretrain | resolution | acc@1 | #params | FLOPs | configs/logs/ckpts |
|---|---|---|---|---|---|---|
| PVMamba-Tiny | ImageNet-1K | 224x224 | 84.8 | 89M | 16.1G | BaiduNetDisk/password=ajh1 |
| PVMamba-Small | ImageNet-1K | 224x224 | 84.2 | 40M | 7.4G | BaiduNetDisk/password=ajh1 |
| PVMamba-Base | ImageNet-1K | 224x224 | 83.9 | 24M | 4.5G | BaiduNetDisk/password=ajh1 |
The installation tips can also be referred to VMamba.
Environment Setup:
VMamba recommends setting up a conda environment and installing dependencies via pip. Use the following commands to set up your environment: Additionally, we recommend using PyTorch >=2.0, CUDA >=11.8. But a lower version of PyTorch and CUDA is also supported.
Create and activate a new conda environment
conda create -n pvmamba
conda activate pvmambaInstall Dependencies
For the SSM library, please do as follows:
pip install -r requirements.txt
cd kernels/selective_scan && pip install .For the DCNv4 library, please do as follows:
cd kernels
unzip DCNv4_op.zip
cd kernels/DCNv4_op && pip install .
cp ./dcnv4.py kernels/DCNv4_op/DCNv4/modules/dcnv4.pyDependencies for Detection and Segmentation (optional)
pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0Classification
To train PVMamba models for classification on ImageNet, use the following commands for different configurations. Add --mesa if you want to use mesa training.
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmpIf you only want to test the performance (together with params and flops):
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp --pretrained </path/of/checkpoint>To train with mmdetection or mmsegmentation:
bash ./tools/dist_train.sh </path/to/config> 8If you find this paper useful, please consider citing it. Thanks!
@article{xie2025pvmamba,
title={PVMamba: Parallelizing Vision Mamba via Dynamic State Aggregation},
author={Xie, Fei and Wang, Zhongdao and Zhang, Weijia and Ma, Chao},
journal={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2025}
}
This project is based on VMamba, VSSD, Mamba2 and DCNv4. Thanks for their great work!
