Two Stage Visual Reasoning

ECCV 2024
Mingyu Zhang* Jiting Cai* Mingyu Liu Yue Xu Cewu Lu Yong-Lu Li
Shanghai Jiao Tong University Zhejiang University

🏠 Background

Through rigorous evaluation of diverse benchmarks, we demonstrate the shortcomings of existing ad-hoc methods in achieving cross-domain reasoning and their tendency to data bias fitting. In this paper, we revisit visual reasoning with a two-stage perspective: (1) symbolization and (2) logical reasoning given symbols or their representations. We find that the reasoning stage is better at generalization than symbolization. Thus, it is more efficient to implement symbolization via separated encoders for different data domains while using a shared reasoner.

📦 Installation

git clone https://github.com/mybearyZhang/TwoStageReason.git
cd TwoStageReason
pip install -r requirements.txt

🚀 Quick Start

Single task training

To run the task, please run

python train.py -c config/config_raven.json [-r saved/models/sota-RAVEN/mmdd_hhmmss/model_best.pth] [-d 0,1,2,3]

-c to configurate the settings of the training task
-r to resume pretrained model (optional)
-d to assign device (optional)

Multiple tasks training

To run the task, please run

python cotrain.py -c config/raven_cvr_svrt.json. [-r saved/models/raven_opt/resumed_model.pth] [-d 0,1,2,3]

-c to configurate the settings of the training task
-r to resume pretrained model (optional)
-d to assign device (optional)

Single task testing

To test the trained model, please run

python test.py -c config/config_raven.json [-r saved/models/sota-RAVEN/mmdd_hhmmss/model_best.pth]

-c to configurate the settings of the training task
-r to resume pretrained model (optional)
-d to assign device (optional)

Multiple tasks testing

To test the trained model, please run

python cotest.py -c config/raven_cvr_svrt.json -r saved/models/RAVEN-CVR-SVRT/mmdd_hhmmss/checkpoint-epoch50.pth -d 3,4,5,7

-c to configurate the settings of the training task
-r to resume pretrained model (optional)
-d to assign device (optional)

📝 Citation

If you find our work useful in your research, please consider citing:

@article{zhang2024take,
  title={Take A Step Back: Rethinking the Two Stages in Visual Reasoning},
  author={Zhang, Mingyu and Cai, Jiting and Liu, Mingyu and Xu, Yue and Lu, Cewu and Li, Yong-Lu},
  journal={arXiv preprint arXiv:2407.19666},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
base		base
config		config
data_loader		data_loader
dataset		dataset
logger		logger
model		model
trainer		trainer
utils		utils
.gitignore		.gitignore
README.md		README.md
consistency.py		consistency.py
cotest.py		cotest.py
cotrain.py		cotrain.py
parse_config.py		parse_config.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Two Stage Visual Reasoning

🏠 Background

📦 Installation

🚀 Quick Start

Single task training

Multiple tasks training

Single task testing

Multiple tasks testing

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mybearyZhang/TwoStageReason

Folders and files

Latest commit

History

Repository files navigation

Two Stage Visual Reasoning

🏠 Background

📦 Installation

🚀 Quick Start

Single task training

Multiple tasks training

Single task testing

Multiple tasks testing

📝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages