π¨ This repository contains download links to our code, and trained deep stereo models of our works "Active Stereo Without Pattern Projector", ICCV 2023 and "Stereo-Depth Fusion through Virtual Pattern Projection", Journal Extension
by Luca Bartolomei1,2, Matteo Poggi2, Fabio Tosi2, Andrea Conti2, and Stefano Mattoccia1,2
Advanced Research Center on Electronic System (ARCES)1 University of Bologna2
Note: π§ Kindly note that this repository is currently in the development phase. We are actively working to add and refine features and documentation. We apologize for any inconvenience caused by incomplete or missing elements and appreciate your patience as we work towards completion.
This paper proposes a novel framework integrating the principles of active stereo in standard passive camera systems without a physical pattern projector. We virtually project a pattern over the left and right images according to the sparse measurements obtained from a depth sensor.
Contributions:
-
Even with meager amounts of sparse depth seeds (e.g., 1% of the whole image), our approach outperforms by a large margin state-of-the-art sensor fusion methods based on handcrafted algorithms and deep networks.
-
When dealing with deep networks trained on synthetic data, it dramatically improves accuracy and shows a compelling ability to tackle domain shift issues, even without additional training or fine-tuning.
-
By neglecting a physical pattern projector, our solution works under sunlight, both indoors and outdoors, at long and close ranges with no additional processing cost for the original stereo matcher.
Extension Contributions:
-
Acting before any processing occurs, it can be seamlessly deployed with any stereo algorithm or deep network without modifications and benefit from future progress in the field.
-
Moreover, in contrast to active stereo systems, using a depth sensor in place of a pattern projector:
- It is more effective even in the specific application domain of projector-based systems and potentially less expensive;
- It does not require additional hardware (e.g., additional RGB or IR cameras), as depth estimation is performed in the same target visual spectrum;
- The virtual projection paradigm can be tailored on the fly to adapt to the image content and is agnostic to dynamic objects and ego-motion.
ποΈ If you find this code useful in your research, please cite:
@InProceedings{Bartolomei_2023_ICCV,
author = {Bartolomei, Luca and Poggi, Matteo and Tosi, Fabio and Conti, Andrea and Mattoccia, Stefano},
title = {Active Stereo Without Pattern Projector},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {18470-18482}
}
@misc{bartolomei2024stereodepth,
title={Stereo-Depth Fusion through Virtual Pattern Projection},
author={Luca Bartolomei and Matteo Poggi and Fabio Tosi and Andrea Conti and Stefano Mattoccia},
year={2024},
eprint={2406.04345},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Here, you can download the weights of RAFT-Stereo and PSMNet architectures.
- Vanilla Models: these models are pretrained on Sceneflow vanilla images and Middlebury vanilla images
- PSMNet vanilla models: psmnet/sceneflow/psmnet.tar, psmnet/middlebury/psmnet.tar
- RAFT-Stereo vanilla models (raft-stereo/sceneflow/raftstereo.pth and raft-stereo/middlebury/raftstereo.pth) are just a copy from authors' drive
- Fine-tuned Models: starting from vanilla models, these models (*-vpp-ft.tar) are finetuned in the same domain but with virtual projected images
- Models trained from scratch: these models (*-vpp-tr.tar) are trained from scratch using virtual projected images
To use these weights, please follow these steps:
- Install GDown python package:
pip install gdown
- Download all weights from our drive:
gdown --folder https://drive.google.com/drive/folders/1GqcY-Z-gtWHqDVMx-31uxrPzprM38UJl?usp=drive_link
The Test section provides scripts to evaluate disparity estimation models on datasets like KITTI, Middlebury, and ETH3D. It helps assess the accuracy of the models and saves predicted disparity maps.
Please refer to each section for detailed instructions on setup and execution.
Warning:
- Please be aware that we will not be releasing the training code for deep stereo models. The provided code focuses on evaluation and demonstration purposes only.
- With the latest updates in PyTorch, slight variations in the quantitative results compared to the numbers reported in the paper may occur.
- Dependencies: Ensure that you have installed all the necessary dependencies. The list of dependencies can be found in the
./requirements.txt
file. - Build rSGM:
- Firstly, please initialize and update git submodules:
git submodule init; git submodule update
- Go to
./thirdparty/stereo-vision/reconstruction/base/rSGM/
- Build and install pyrSGM package:
python setup.py build_ext --inplace install
We used seven datasets for training and evaluation.
Midd-14: We used the MiddEval3 training split for evaluation and fine-tuning purposes.
$ cd PATH_TO_DOWNLOAD
$ wget https://vision.middlebury.edu/stereo/submit3/zip/MiddEval3-data-F.zip
$ wget https://vision.middlebury.edu/stereo/submit3/zip/MiddEval3-GT0-F.zip
$ unzip \*.zip
After that, you will get a data structure as follows:
MiddEval3
βββ TrainingF
β βββ Adirondack
β β βββ im0.png
β β βββ ...
| ...
| βββ Vintage
β βββ ...
βββ TestF
βββ ...
Midd-A: We used the Scenes2014 additional split for evaluation and grid-search purposes.
$ cd PATH_TO_DOWNLOAD
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Backpack-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Bicycle1-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Cable-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Classroom1-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Couch-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Flowers-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Mask-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Shopvac-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Sticks-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Storage-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Sword1-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Sword2-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Umbrella-perfect.zip
$ unzip \*.zip
After that, you will get a data structure as follows:
middlebury2014
βββ Backpack-perfect
β βββ im0.png
β βββ ...
...
βββ Umbrella-perfect
βββ ...
Midd-21: We used the Scenes2021 split for evaluation purposes.
$ cd PATH_TO_DOWNLOAD
$ wget https://vision.middlebury.edu/stereo/data/scenes2021/zip/all.zip
$ unzip all.zip
$ mv data/* .
After that, you will get a data structure as follows:
middlebury2021
βββ artroom1
β βββ im0.png
β βββ ...
...
βββ traproom2
βββ ...
Note that additional datasets are available at the official website.
We based our KITTI142 validation split from KITTI141 (we added frame 000124). You can download it from our drive using this script:
$ cd PATH_TO_DOWNLOAD
$ gdown --fuzzy https://drive.google.com/file/d/1A14EMqcGLDhH3nTHTVFpSP2P7We0SY-C/view?usp=drive_link
$ unzip kitti142.zip
After that, you will get a data structure as follows:
kitti142
βββ image_2
β βββ 000002_10.png
| ...
β βββ 000199_10.png
βββ image_3
β βββ 000002_10.png
| ...
β βββ 000199_10.png
βββ lidar_disp_2
β βββ 000002_10.png
| ...
β βββ 000199_10.png
βββ disp_occ
β βββ 000002_10.png
| ...
β βββ 000199_10.png
...
Note that additional information are available at the official website.
You can download ETH3D dataset following this script:
$ cd PATH_TO_DOWNLOAD
$ wget https://www.eth3d.net/data/two_view_training.7z
$ wget https://www.eth3d.net/data/two_view_training_gt.7z
$ p7zip -d *.7z
After that, you will get a data structure as follows:
eth3d
βββ delivery_area_1l
β βββ im0.png
β βββ ...
...
βββ terrains_2s
βββ ...
Note that the script erases 7z files. Further details are available at the official website.
We provide preprocessed DSEC testing splits Day, Afternoon and Night:
$ cd PATH_TO_DOWNLOAD
$ gdown --folder https://drive.google.com/drive/folders/1etkvdntDfMdwvx_NP0_QJcUcsogLXYK7?usp=drive_link
$ cd dsec
$ unzip -o \*.zip
$ cd ..
$ mv dsec/* .
$ rmdir dsec
After that, you will get a data structure as follows:
dsec
βββ afternoon
β βββ left
| | βββ 000000.png
| | ...
β βββ ...
...
βββ night
βββ ...
We managed to extract the splits using only data from the official website. We used FasterLIO to de-skew raw LiDAR scans and Open3D to perform ICP registration.
We provide preprocessed M3ED testing splits Outdoor Day, Outdoor Night and Indoor:
$ cd PATH_TO_DOWNLOAD
$ gdown --folder https://drive.google.com/drive/folders/1n-7H11ZfbPcR9_F0Ri2CcTJS2WWQlfCo?usp=drive_link
$ cd m3ed
$ unzip -o \*.zip
$ cd ..
$ mv m3ed/* .
$ rmdir m3ed
After that, you will get a data structure as follows:
m3ed
βββ indoor
β βββ left
| | βββ 000000.png
| | ...
β βββ ...
...
βββ night
βββ ...
We managed to extract the splits using only data from the official website.
We provide preprocessed M3ED Active testing splits Passive, and Active:
$ cd PATH_TO_DOWNLOAD
$ gdown --folder https://drive.google.com/drive/folders/1fv6f2mQUPW8MwSsGy1f0dEHOZCS4sk2-?usp=drive_link
$ cd m3ed_active
$ unzip -o \*.zip
$ cd ..
$ mv m3ed_active/* .
$ rmdir m3ed_active
After that, you will get a data structure as follows:
m3ed_active
βββ passive
β βββ left
| | βββ 000000.png
| | ...
β βββ ...
βββ active
βββ ...
We managed to extract the splits using only data from the official website.
You can download SIMSTEREO dataset here.
After that, you will get a data structure as follows:
simstereo
βββ test
β βββ nirColormanaged
| | βββ abstract_bowls_1_left.jpg
| | βββ abstract_bowls_1_right.jpg
| | ...
β βββ rgbColormanaged
| | βββ abstract_bowls_1_left.jpg
| | βββ abstract_bowls_1_right.jpg
| | ...
β βββ pfmDisp
| βββ abstract_bowls_1_left.pfm
| βββ abstract_bowls_1_right.pfm
| ...
βββ training
βββ ...
This code snippet allows you to evaluate the disparity maps on various datasets, including KITTI (142 split), Middlebury (Training, Additional, 2021), ETH3D, DSEC, M3ED, and SIMSTEREO. By executing the provided script, you can assess the accuracy of disparity estimation models on these datasets.
To run the test.py
script with the correct arguments, follow the instructions below:
-
Run the test:
- Open a terminal or command prompt.
- Navigate to the directory containing the
test.py
script.
-
Execute the command: Run the following command, replacing the placeholders with the actual values for your images and model:
# Parameters to reproduce Active Stereo Without Pattern Projector (ICCV 2023) CUDA_VISIBLE_DEVICES=0 python test.py --datapath <path_to_dataset> --dataset <dataset_type> --stereomodel <model_name> \ --loadstereomodel <path_to_pretrained_model> --maxdisp 192 \ --vpp --outdir <save_dmap_dir> --wsize 3 --guideperc 0.05 --blending 0.4 --iscale <input_image_scale> \ --maskocc
# Parameters to reproduce Stereo-Depth Fusion through Virtual Pattern Projection (Journal Extension) CUDA_VISIBLE_DEVICES=0 python test.py --datapath <path_to_dataset> --dataset <dataset_type> --stereomodel <model_name> \ --loadstereomodel <path_to_pretrained_model> --maxdisp 192 \ --vpp --outdir <save_dmap_dir> --wsize 7 --guideperc 0.05 --blending 0.4 --iscale <input_image_scale> \ --maskocc --bilateralpatch --bilateral_spatial_variance 1 --bilateral_color_variance 2 --bilateral_threshold 0.001 --rsgm_subpixel
Replace the placeholders (<max_disparity>, <path_to_dataset>, <dataset_type>, etc.) with the actual values for your setup.
The available arguments are:
--maxdisp
: Maximum disparity range for PSMNet and rSGM (default 192).--stereomodel
: Stereo model type. Options:raft-stereo
,psmnet
,rsgm
--normalize
: Normalize RAFT-Stereo input between [-1,1] instead of [0,1] (Only for official weights)--datapath
: Specify the dataset path.--dataset
: Specify dataset type. Options:kitti_stereo142
,middlebury_add
,middlebury2021
,middlebury
,eth3d
,simstereo
,simstereoir
,dsec
,m3ed
--outdir
: Output directory to save the disparity maps.--loadstereomodel
: Path to the pretrained model file.--iscale
Rescale input images before apply vpp and stereo matching. Original size is restored before evaluation. Example:--iscale 1
equals full scale,--iscale 2
equals half scale.--guideperc
: Simulate depth seeds using a certain percentage of randomly sampled GT points. Valid only if raw depth seeds do not exists.--vpp
: Apply virtual patterns to stereo images--colormethod
: Virtual pattering strategy. Options:rnd
(i.e., random strategy) andmaxDistance
(i.e., histogram based strategy)--uniform_color
: Uniform patch strategy--wsize
: Pattern patch size (e.g., 1, 3, 5, 7, ...)--wsizeAgg_x
: Histogram based search window width--wsizeAgg_y
: Histogram based search window height--blending
: Alpha-bleding between original images and virtual pattern--maskocc
: Use proposed occlusion handling--discard_occ
: Use occlusion point discard strategy--guided
: Apply Guided Stereo Matching strategy--bilateralpatch
: Use adaptive patch based on bilateral filter--bilateral_spatial_variance
: Spatial variance of the adaptive patch--bilateral_color_variance
: Color variance of the adaptive patch--bilateral_threshold
: Adaptive patch classification threshold
For more details, please refer to the test.py
script.
In this section, we present illustrative examples that demonstrate the effectiveness of our proposal.
Performance against competitors. We can notice that VPP generally reaches almost optimal performance with a meagre 1% density and, except few cases in the -tr configurations with some higher density, achieves much lower error rates.
VPP with off-the-shelf networks. We collects the results yielded VPP applied to several off-the-shelf stereo models, by running the weights provided by the authors. Again, VPP sensibly boosts the accuracy of any model with rare exceptions, either trained on synthetic or real data.
Qualitative Comparison on KITTI (top) and Middlebury (bottom). From left to right: vanilla left images and disparity maps by PSMNet model, left images enhanced by our virtual projection and disparity maps by vanilla PSMNet model and (most right) vpp fine tuned PSMNet model.
Fine-details preservation: We can appreciate how our virtual pattern can greatly enhance the quality of the disparity maps, without introducing relevant artefacts in correspondence of thin structures β despite applying the pattern on patches.
For questions, please send an email to [email protected]
We would like to extend our sincere appreciation to the authors of the following projects for making their code available, which we have utilized in our work:
- We would like to thank the authors of PSMNet, RAFT-Stereo,rSGM for providing their code, which has been instrumental in our stereo matching experiments.
We deeply appreciate the authors of the competing research papers for provision of code and model weights, which greatly aided accurate comparisons.