FormerStereo :
Learning Representations from Foundation Models for Domain Generalized Stereo Matching
⭐ECCV 2024⭐

Yongjian Zhang · Longguang Wang · Kunhong Li · Yun Wang · Yulan Guo

Paper | Project Page

FormerStereo is a general framework designed to enhance the zero-shot capability of any learning-based stereo matching algorithm.

Environment

In your python environment (tested on Ubuntu 22.04, python 3.11, CUDA 12.1), run:

conda create -n FormerStereo python=3.11
conda activate FormerStereo
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install tqdm matplotlib scikit-image
pip install opencv-python
pip install tensorboardX
pip install timm==0.5.4

We provide the environment configuration file environment.yaml to check the package versions. Note that not all packages are required for testing.

How to use

Modify the path in create_links.sh, then create symbolic links for the datasets and model weights:

sh create_links.sh

Modify the root_path in ./Option/options.json and create the rootpath JSON file as:

cd ./data_generator
python main_generator.py
cd ..

Modify the mode, split, data.datasets, model.name and train.resume to test the specific pretrained model on the target split of the selected dataset as:

sh test.sh

Note that the default options.json is configured to test Former-PSMNet (DINOv2-L) on the trainingH split of the Middlebury dataset. The supported pretrained models include Former_PSMNet, Former_GwcNet, Former_CFNet and Former_RAFT. To switch the Vision Foundation Models (e.g., from DINOv2 to SAM), modify the backbone argument of the feature extractor.

SceneFlow Pretrained Models

Google Drive

Reproduce Results

Model	KITTI 2015 (all-Bad 3.0)	KITTI 2012 (all-Bad 3.0)	Middlebury-H (noc-Bad 2.0)	Middlebury-Q (noc-Bad 2.0)	ETH3D (noc-Bad 1.0)
FormerPSMNet-DAM-L	5.00	4.22	6.95	5.73	7.74
FormerPSMNet-DINOv2-L	4.95	3.75	7.71	6.18	6.73
FormerPSMNet-SAM-L	5.03	4.25	9.51	7.75	6.36
FormerGwcNet-DAM-L	5.11	3.94	6.60	4.90	4.03
FormerGwcNet-DINOIv2-L	5.11	3.93	7.11	4.86	5.07
FormerCFNet-DAM-L	5.09	3.89	8.40	6.00	4.40
FormerCFNet-DINOv2-B	4.99	3.84	8.69	6.02	4.51
FormerRAFT-DAM-L	5.18	3.94	7.97	5.60	3.51

The reproduce results of Former-PSMNet-DAM-L and Former-RAFT-DAM-L are slightly worse than the results reported in our paper. We will investigate these issues in the future.

Running Time

We measured the average running time of DAM-L-integrated models on the KITTI 2015 dataset using an RTX 4090.

Model	FormerPSMNet	FormerGwcNet	FormerCFNet	FormerRAFT (32 iters)
Time (ms)	384.98	407.80	419.52	496.44

License

All our code except DINOv2 is MIT license. DINOv2 has an Apache 2 license DINOv2.

BibTeX

If you find our models useful, please consider citing our paper!

@InProceedings{formerstereo_zhangyj_eccv2024,
author="Zhang, Yongjian and Wang, Longguang and Li, Kunhong and Wang, Yun and Guo, Yulan",
title="Learning Representations from Foundation Models for Domain Generalized Stereo Matching",
booktitle="ECCV",
year="2024",
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
json_files		json_files
src		src
.gitignore		.gitignore
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FormerStereo :
Learning Representations from Foundation Models for Domain Generalized Stereo Matching
⭐ECCV 2024⭐

Paper | Project Page

Environment

How to use

SceneFlow Pretrained Models

Reproduce Results

Running Time

License

BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Languages

zhangyj85/FormerStereo_release

Folders and files

Latest commit

History

Repository files navigation

FormerStereo : Learning Representations from Foundation Models for Domain Generalized Stereo Matching ⭐ECCV 2024⭐

Paper | Project Page

Environment

How to use

SceneFlow Pretrained Models

Reproduce Results

Running Time

License

BibTeX

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

FormerStereo :
Learning Representations from Foundation Models for Domain Generalized Stereo Matching
⭐ECCV 2024⭐

Packages