ECCV 2024
Mingyu Zhang*
Jiting Cai*
Mingyu Liu
Yue Xu
Cewu Lu
Yong-Lu Li
Shanghai Jiao Tong University Zhejiang University
Through rigorous evaluation of diverse benchmarks, we demonstrate the shortcomings of existing ad-hoc methods in achieving cross-domain reasoning and their tendency to data bias fitting. In this paper, we revisit visual reasoning with a two-stage perspective: (1) symbolization and (2) logical reasoning given symbols or their representations. We find that the reasoning stage is better at generalization than symbolization. Thus, it is more efficient to implement symbolization via separated encoders for different data domains while using a shared reasoner.
git clone https://github.com/mybearyZhang/TwoStageReason.git
cd TwoStageReason
pip install -r requirements.txtTo run the task, please run
python train.py -c config/config_raven.json [-r saved/models/sota-RAVEN/mmdd_hhmmss/model_best.pth] [-d 0,1,2,3]
-cto configurate the settings of the training task-rto resume pretrained model (optional)-dto assign device (optional)
To run the task, please run
python cotrain.py -c config/raven_cvr_svrt.json. [-r saved/models/raven_opt/resumed_model.pth] [-d 0,1,2,3]
-cto configurate the settings of the training task-rto resume pretrained model (optional)-dto assign device (optional)
To test the trained model, please run
python test.py -c config/config_raven.json [-r saved/models/sota-RAVEN/mmdd_hhmmss/model_best.pth]
-cto configurate the settings of the training task-rto resume pretrained model (optional)-dto assign device (optional)
To test the trained model, please run
python cotest.py -c config/raven_cvr_svrt.json -r saved/models/RAVEN-CVR-SVRT/mmdd_hhmmss/checkpoint-epoch50.pth -d 3,4,5,7
-cto configurate the settings of the training task-rto resume pretrained model (optional)-dto assign device (optional)
If you find our work useful in your research, please consider citing:
@article{zhang2024take,
title={Take A Step Back: Rethinking the Two Stages in Visual Reasoning},
author={Zhang, Mingyu and Cai, Jiting and Liu, Mingyu and Xu, Yue and Lu, Cewu and Li, Yong-Lu},
journal={arXiv preprint arXiv:2407.19666},
year={2024}
}