Skip to content

amingozz/RAFT-StereoZero

Repository files navigation

Zero-Shot Neural Architecture Search for Efficient Deep Stereo Matching

This repository contains the source code for our paper:

Zero-Shot Neural Architecture Search for Efficient Deep Stereo Matching
ICIAP 2025
Alessio Mingozzi, Stefano Mattoccia, Matteo Poggi, Fatma Güney

@inproceedings{mingozzi2025zero,
  title={Zero-Shot Neural Architecture Search for Efficient Deep Stereo Matching},
  author={Mingozzi, Alessio and Mattoccia, Stefano and Poggi, Matteo and Güney, Fatma},
  booktitle={International Conference on
Image Analysis and Processing (ICIAP)},
  year={2025}
}

Architecture Diagram

Requirements

The code has been tested with PyTorch 1.13 and Cuda 11.6

conda env create -f environment.yaml
conda activate rszero

For the Neural Architecture Search process we use the library Neural Network Intelligence (NNI) version 2.10.1. Install it in the new environment with:

pip install nni==2.10.1

Important

It's required also for demo and evaluation since it's used to load the architecture configuration.

Required Data

For this part we refer to the original RAFT-Stereo paper. This setup includes all datasets mentioned in the paper, with the addition of two new datasets to extend the original configuration.

To evaluate/train the model, you will need to download the required datasets.

To download the ETH3D and Middlebury test datasets for the demos, run

bash download_datasets.sh

By default stereo_datasets.py will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets folder

├── datasets
    ├── FlyingThings3D
        ├── frames_cleanpass
        ├── frames_finalpass
        ├── disparity
    ├── Monkaa
        ├── frames_cleanpass
        ├── frames_finalpass
        ├── disparity
    ├── Driving
        ├── frames_cleanpass
        ├── frames_finalpass
        ├── disparity
    ├── KITTI
        ├── testing
        ├── training
        ├── devkit
    ├── Middlebury
        ├── MiddEval3
    ├── ETH3D
        ├── two_view_testing
    ├── nerf
        ├── 0000
        ├── ...
        ├── 0269
    ├── CREStereo
        ├── hole
        ├── reflective
        ├── shapenet
        ├── tree

Demos

Pretrained models can be downloaded by running

bash download_models.sh

You can demo a trained model on pairs of images. To predict stereo for Middlebury, run

python demo.py --restore_ckpt models/raftstereozero.pth --corr_implementation alt --mixed_precision -l=datasets/Middlebury/MiddEval3/testF/*/im0.png -r=datasets/Middlebury/MiddEval3/testF/*/im1.png

Or for ETH3D:

python demo.py --restore_ckpt models/raftstereozero.pth -l=datasets/ETH3D/two_view_testing/*/im0.png -r=datasets/ETH3D/two_view_testing/*/im1.png

Our fastest model (uses the faster implementation):

python demo.py --restore_ckpt models/raftstereozero-realtime.pth --shared_backbone --n_downsample 3 --n_gru_layers 2 --slow_fast_gru --valid_iters 7 --corr_implementation reg_cuda --mixed_precision

To save the disparity values as .npy files, run any of the demos with the --save_numpy flag.

Evaluation

To evaluate a trained model on a validation set (e.g. Middlebury), run

python evaluate_stereo.py --restore_ckpt models/raftstereozero.pth --dataset middlebury_H

Training

Our model is trained on four A100 GPUs using the following command. Training logs will be written to runs/ which can be visualized using tensorboard.

python train.py --batch_size 8 --train_iters 22 --valid_iters 32 --spatial_scale -0.2 0.4 --saturation_range 0 1.4 --n_downsample 2 --num_steps 200000 --mixed_precision

To train using significantly less memory, change --n_downsample 2 to --n_downsample 3. This will slightly reduce accuracy.

In order to train the fast model, use the training command using the same configuration as above, but with the following changes:

python train.py --batch_size 8 --train_iters 8 --valid_iters 10 --spatial_scale -0.2 0.4 --saturation_range 0 1.4 --n_downsample 3 --num_steps 200000 --mixed_precision --shared_backbone --n_gru_layers 2 --slow_fast_gru

Architecture Search

The architecture search procedure can be executed using this command

python architecture_search.py --experiment_duration 24h --trial_concurrency 1 --train_datasets nerf_S --batch_size 1 --num_steps 10000 --train_iters 22 --spatial_scale -0.2 0.4 --saturation_range 0 1.4 --n_downsample 2

Once the search is completed, the experiment can be extracted and using:

sh extract_experiment.sh <experiment_id>

The resulting csv file can be elaborated using pandas to find the best performing architectures. To extract an architecture from the csv file, save the mutation field to a file called <arch_name>.json

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors