This repository contains code for Recurrent Multimodal Interaction for Referring Image Segmentation, ICCV 2017.
If you use the code, please cite
@inproceedings{liu2017recurrent,
title={Recurrent Multimodal Interaction for Referring Image Segmentation},
author={Liu, Chenxi and Lin, Zhe and Shen, Xiaohui and Yang, Jimei and Lu, Xin and Yuille, Alan},
booktitle={{ICCV}},
year={2017}
}
- Tensorflow 1.2.1
- Download or use symlink, such that the MS COCO images are under
data/coco/images/train2014/ - Download or use symlink, such that the ReferItGame data are under
data/referit/imagesanddata/referit/mask - Run
mkdir external. Download, git clone, or use symlink, such that TF-resnet and TF-deeplab are underexternal. Then strictly follow theExample Usagesection of their README - Download, git clone, or use symlink, such that refer is under
external. Then strictly follow theSetupandDownloadsection of its README. Also put thereferfolder inPYTHONPATH - Download, git clone, or use symlink, such that the MS COCO API is under
external(i.e.external/coco/PythonAPI/pycocotools) - pydensecrf
python build_batches.py -d Gref -t train
python build_batches.py -d Gref -t val
python build_batches.py -d unc -t train
python build_batches.py -d unc -t val
python build_batches.py -d unc -t testA
python build_batches.py -d unc -t testB
python build_batches.py -d unc+ -t train
python build_batches.py -d unc+ -t val
python build_batches.py -d unc+ -t testA
python build_batches.py -d unc+ -t testB
python build_batches.py -d referit -t trainval
python build_batches.py -d referit -t test
Specify several options/flags and then run main.py:
-g: Which GPU to use. Default is 0.-m:trainortest. Training mode or testing mode.-w:resnetordeeplab. Specify pre-trained weights.-n:LSTMorRMI. Model name.-d:Greforuncorunc+orreferit. Specify dataset.-t:trainortrainvalorvalortestortestAortestB. Which set to train/test on.-i: Number of training iterations in training mode. The iteration number of a snapshot in testing mode.-s: Used only in training mode. How many iterations per snapshot.-v: Used only in testing mode. Whether to visualize the prediction. Default is False.-c: Used only in testing mode. Whether to also apply Dense CRF. Default is False.
For example, to train the ResNet + LSTM model on Google-Ref using GPU 2, run
python main.py -m train -w resnet -n LSTM -d Gref -t train -g 2 -i 750000 -s 50000
To test the 650000-iteration snapshot of the DeepLab + RMI model on UNC testA set using GPU 1 (with visualization and Dense CRF), run
python main.py -m test -w deeplab -n RMI -d unc -t testA -g 1 -i 650000 -v -c
Code and data under util/ and data/referit/ are borrowed from text_objseg and slightly modified for compatibility with Tensorflow 1.2.1.
Add TensorBoard support.