This repository contains the source code for our paper:
Zero-Shot Neural Architecture Search for Efficient Deep Stereo Matching
ICIAP 2025
Alessio Mingozzi, Stefano Mattoccia, Matteo Poggi, Fatma Güney
@inproceedings{mingozzi2025zero,
title={Zero-Shot Neural Architecture Search for Efficient Deep Stereo Matching},
author={Mingozzi, Alessio and Mattoccia, Stefano and Poggi, Matteo and Güney, Fatma},
booktitle={International Conference on
Image Analysis and Processing (ICIAP)},
year={2025}
}
The code has been tested with PyTorch 1.13 and Cuda 11.6
conda env create -f environment.yaml
conda activate rszeroFor the Neural Architecture Search process we use the library Neural Network Intelligence (NNI) version 2.10.1. Install it in the new environment with:
pip install nni==2.10.1Important
It's required also for demo and evaluation since it's used to load the architecture configuration.
For this part we refer to the original RAFT-Stereo paper. This setup includes all datasets mentioned in the paper, with the addition of two new datasets to extend the original configuration.
To evaluate/train the model, you will need to download the required datasets.
- Sceneflow (Includes FlyingThings3D, Driving & Monkaa)
- Middlebury
- ETH3D
- KITTI
- NeRF Stereo [new]
- CREStereo [new]
To download the ETH3D and Middlebury test datasets for the demos, run
bash download_datasets.shBy default stereo_datasets.py will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets folder
├── datasets
├── FlyingThings3D
├── frames_cleanpass
├── frames_finalpass
├── disparity
├── Monkaa
├── frames_cleanpass
├── frames_finalpass
├── disparity
├── Driving
├── frames_cleanpass
├── frames_finalpass
├── disparity
├── KITTI
├── testing
├── training
├── devkit
├── Middlebury
├── MiddEval3
├── ETH3D
├── two_view_testing
├── nerf
├── 0000
├── ...
├── 0269
├── CREStereo
├── hole
├── reflective
├── shapenet
├── treePretrained models can be downloaded by running
bash download_models.shYou can demo a trained model on pairs of images. To predict stereo for Middlebury, run
python demo.py --restore_ckpt models/raftstereozero.pth --corr_implementation alt --mixed_precision -l=datasets/Middlebury/MiddEval3/testF/*/im0.png -r=datasets/Middlebury/MiddEval3/testF/*/im1.pngOr for ETH3D:
python demo.py --restore_ckpt models/raftstereozero.pth -l=datasets/ETH3D/two_view_testing/*/im0.png -r=datasets/ETH3D/two_view_testing/*/im1.pngOur fastest model (uses the faster implementation):
python demo.py --restore_ckpt models/raftstereozero-realtime.pth --shared_backbone --n_downsample 3 --n_gru_layers 2 --slow_fast_gru --valid_iters 7 --corr_implementation reg_cuda --mixed_precisionTo save the disparity values as .npy files, run any of the demos with the --save_numpy flag.
To evaluate a trained model on a validation set (e.g. Middlebury), run
python evaluate_stereo.py --restore_ckpt models/raftstereozero.pth --dataset middlebury_HOur model is trained on four A100 GPUs using the following command. Training logs will be written to runs/ which can be visualized using tensorboard.
python train.py --batch_size 8 --train_iters 22 --valid_iters 32 --spatial_scale -0.2 0.4 --saturation_range 0 1.4 --n_downsample 2 --num_steps 200000 --mixed_precisionTo train using significantly less memory, change --n_downsample 2 to --n_downsample 3. This will slightly reduce accuracy.
In order to train the fast model, use the training command using the same configuration as above, but with the following changes:
python train.py --batch_size 8 --train_iters 8 --valid_iters 10 --spatial_scale -0.2 0.4 --saturation_range 0 1.4 --n_downsample 3 --num_steps 200000 --mixed_precision --shared_backbone --n_gru_layers 2 --slow_fast_gruThe architecture search procedure can be executed using this command
python architecture_search.py --experiment_duration 24h --trial_concurrency 1 --train_datasets nerf_S --batch_size 1 --num_steps 10000 --train_iters 22 --spatial_scale -0.2 0.4 --saturation_range 0 1.4 --n_downsample 2Once the search is completed, the experiment can be extracted and using:
sh extract_experiment.sh <experiment_id>The resulting csv file can be elaborated using pandas to find the best performing architectures. To extract an architecture from the csv file, save the mutation field to a file called <arch_name>.json
