Zero-Shot Neural Architecture Search for Efficient Deep Stereo Matching

This repository contains the source code for our paper:

Zero-Shot Neural Architecture Search for Efficient Deep Stereo Matching
ICIAP 2025
Alessio Mingozzi, Stefano Mattoccia, Matteo Poggi, Fatma Güney

@inproceedings{mingozzi2025zero,
  title={Zero-Shot Neural Architecture Search for Efficient Deep Stereo Matching},
  author={Mingozzi, Alessio and Mattoccia, Stefano and Poggi, Matteo and Güney, Fatma},
  booktitle={International Conference on
Image Analysis and Processing (ICIAP)},
  year={2025}
}

Requirements

The code has been tested with PyTorch 1.13 and Cuda 11.6

conda env create -f environment.yaml
conda activate rszero

For the Neural Architecture Search process we use the library Neural Network Intelligence (NNI) version 2.10.1. Install it in the new environment with:

pip install nni==2.10.1

Important

It's required also for demo and evaluation since it's used to load the architecture configuration.

Required Data

For this part we refer to the original RAFT-Stereo paper. This setup includes all datasets mentioned in the paper, with the addition of two new datasets to extend the original configuration.

To evaluate/train the model, you will need to download the required datasets.

Sceneflow (Includes FlyingThings3D, Driving & Monkaa)
Middlebury
ETH3D
KITTI
NeRF Stereo [new]
CREStereo [new]

To download the ETH3D and Middlebury test datasets for the demos, run

bash download_datasets.sh

By default stereo_datasets.py will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets folder

├── datasets
    ├── FlyingThings3D
        ├── frames_cleanpass
        ├── frames_finalpass
        ├── disparity
    ├── Monkaa
        ├── frames_cleanpass
        ├── frames_finalpass
        ├── disparity
    ├── Driving
        ├── frames_cleanpass
        ├── frames_finalpass
        ├── disparity
    ├── KITTI
        ├── testing
        ├── training
        ├── devkit
    ├── Middlebury
        ├── MiddEval3
    ├── ETH3D
        ├── two_view_testing
    ├── nerf
        ├── 0000
        ├── ...
        ├── 0269
    ├── CREStereo
        ├── hole
        ├── reflective
        ├── shapenet
        ├── tree

Demos

Pretrained models can be downloaded by running

bash download_models.sh

You can demo a trained model on pairs of images. To predict stereo for Middlebury, run

python demo.py --restore_ckpt models/raftstereozero.pth --corr_implementation alt --mixed_precision -l=datasets/Middlebury/MiddEval3/testF/*/im0.png -r=datasets/Middlebury/MiddEval3/testF/*/im1.png

Or for ETH3D:

python demo.py --restore_ckpt models/raftstereozero.pth -l=datasets/ETH3D/two_view_testing/*/im0.png -r=datasets/ETH3D/two_view_testing/*/im1.png

Our fastest model (uses the faster implementation):

python demo.py --restore_ckpt models/raftstereozero-realtime.pth --shared_backbone --n_downsample 3 --n_gru_layers 2 --slow_fast_gru --valid_iters 7 --corr_implementation reg_cuda --mixed_precision

To save the disparity values as .npy files, run any of the demos with the --save_numpy flag.

Evaluation

To evaluate a trained model on a validation set (e.g. Middlebury), run

python evaluate_stereo.py --restore_ckpt models/raftstereozero.pth --dataset middlebury_H

Training

Our model is trained on four A100 GPUs using the following command. Training logs will be written to runs/ which can be visualized using tensorboard.

python train.py --batch_size 8 --train_iters 22 --valid_iters 32 --spatial_scale -0.2 0.4 --saturation_range 0 1.4 --n_downsample 2 --num_steps 200000 --mixed_precision

To train using significantly less memory, change --n_downsample 2 to --n_downsample 3. This will slightly reduce accuracy.

In order to train the fast model, use the training command using the same configuration as above, but with the following changes:

python train.py --batch_size 8 --train_iters 8 --valid_iters 10 --spatial_scale -0.2 0.4 --saturation_range 0 1.4 --n_downsample 3 --num_steps 200000 --mixed_precision --shared_backbone --n_gru_layers 2 --slow_fast_gru

Architecture Search

The architecture search procedure can be executed using this command

python architecture_search.py --experiment_duration 24h --trial_concurrency 1 --train_datasets nerf_S --batch_size 1 --num_steps 10000 --train_iters 22 --spatial_scale -0.2 0.4 --saturation_range 0 1.4 --n_downsample 2

Once the search is completed, the experiment can be extracted and using:

sh extract_experiment.sh <experiment_id>

The resulting csv file can be elaborated using pandas to find the best performing architectures. To extract an architecture from the csv file, save the mutation field to a file called <arch_name>.json

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
arch_config		arch_config
core		core
datasets/lists		datasets/lists
proxy		proxy
sampler		sampler
.gitignore		.gitignore
LICENSE		LICENSE
Qual.png		Qual.png
README.md		README.md
architecture_analisis.ipynb		architecture_analisis.ipynb
architecture_search.py		architecture_search.py
demo.py		demo.py
download_datasets.sh		download_datasets.sh
download_models.sh		download_models.sh
environment.yaml		environment.yaml
evaluate_stereo.py		evaluate_stereo.py
export_experiment.sh		export_experiment.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zero-Shot Neural Architecture Search for Efficient Deep Stereo Matching

Requirements

Required Data

Demos

Evaluation

Training

Architecture Search

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Zero-Shot Neural Architecture Search for Efficient Deep Stereo Matching

Requirements

Required Data

Demos

Evaluation

Training

Architecture Search

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages