Skip to content

Official implementation of ECCV 2024 paper: Take A Step Back: Rethinking the Two Stages in Visual Reasoning

Notifications You must be signed in to change notification settings

mybearyZhang/TwoStageReason

Repository files navigation


Two Stage Visual Reasoning

ECCV 2024
Mingyu Zhang*Jiting Cai*Mingyu LiuYue XuCewu LuYong-Lu Li
Shanghai Jiao Tong University Zhejiang University

arXiv

🏠 Background

Teaser

Through rigorous evaluation of diverse benchmarks, we demonstrate the shortcomings of existing ad-hoc methods in achieving cross-domain reasoning and their tendency to data bias fitting. In this paper, we revisit visual reasoning with a two-stage perspective: (1) symbolization and (2) logical reasoning given symbols or their representations. We find that the reasoning stage is better at generalization than symbolization. Thus, it is more efficient to implement symbolization via separated encoders for different data domains while using a shared reasoner.

📦 Installation

git clone https://github.com/mybearyZhang/TwoStageReason.git
cd TwoStageReason
pip install -r requirements.txt

🚀 Quick Start

Single task training

To run the task, please run

python train.py -c config/config_raven.json [-r saved/models/sota-RAVEN/mmdd_hhmmss/model_best.pth] [-d 0,1,2,3]

  • -c to configurate the settings of the training task
  • -r to resume pretrained model (optional)
  • -d to assign device (optional)

Multiple tasks training

To run the task, please run

python cotrain.py -c config/raven_cvr_svrt.json. [-r saved/models/raven_opt/resumed_model.pth] [-d 0,1,2,3]

  • -c to configurate the settings of the training task
  • -r to resume pretrained model (optional)
  • -d to assign device (optional)

Single task testing

To test the trained model, please run

python test.py -c config/config_raven.json [-r saved/models/sota-RAVEN/mmdd_hhmmss/model_best.pth]

  • -c to configurate the settings of the training task
  • -r to resume pretrained model (optional)
  • -d to assign device (optional)

Multiple tasks testing

To test the trained model, please run

python cotest.py -c config/raven_cvr_svrt.json -r saved/models/RAVEN-CVR-SVRT/mmdd_hhmmss/checkpoint-epoch50.pth -d 3,4,5,7

  • -c to configurate the settings of the training task
  • -r to resume pretrained model (optional)
  • -d to assign device (optional)

📝 Citation

If you find our work useful in your research, please consider citing:

@article{zhang2024take,
  title={Take A Step Back: Rethinking the Two Stages in Visual Reasoning},
  author={Zhang, Mingyu and Cai, Jiting and Liu, Mingyu and Xu, Yue and Lu, Cewu and Li, Yong-Lu},
  journal={arXiv preprint arXiv:2407.19666},
  year={2024}
}

About

Official implementation of ECCV 2024 paper: Take A Step Back: Rethinking the Two Stages in Visual Reasoning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages