Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning

The code of our work, Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning (EACL 2021). We proposed a simple and effective method to remove word-level spurious alignment between images and pseudo-captions for the better performance in unsupervised image captioning. Please refer to our paper for more details.

[arXiv] https://arxiv.org/abs/2104.13872
[ACL Anthology] https://www.aclweb.org/anthology/2021.eacl-main.323/

Citation

@inproceedings{honda-etal-2021-removing,
    title = "Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning",
    author = "Honda, Ukyo  and
      Ushiku, Yoshitaka  and
      Hashimoto, Atsushi  and
      Watanabe, Taro  and
      Matsumoto, Yuji",
    booktitle = "EACL",
    year = "2021",
}

Requirements

git clone https://github.com/ukyh/RemovingSpuriousAlignment.git
cd RemovingSpuriousAlignment
pip install -r requirements.txt

If you would like to make the dataset by yourself (optional), install nltk, spacy, and mosesdecoder for text preprocessing.

pip install nltk==3.5
python -c "import nltk; nltk.download('punkt')"
pip install spacy==2.2.4
pip install spacy-conll==2.0.0
python -m spacy download en_core_web_lg

cd tools
git clone https://github.com/moses-smt/mosesdecoder

Download Dataset

Download full_data.tar.gz and unpack it in data directory.

Acknowledgement

plural_words.json and word_counts.txt are provided by unsupervised_captioning
captions_train2014.json and captions_val2014.json are provided by MS COCO

Make Dataset (Optional: The files can be downloaded as described above)

Download Shutterstock corpus (sentences.pkl) and Google's Conceptual Captions (Train-GCC-training.tsv), and put them into data directory.
Follow the Preprocess instruction of unsupervised_captioning_fast to make the following items and copy them to data directory.

img_obj_test.json
img_obj_test_v4.json
img_obj_train.json
img_obj_train_v4.json
img_obj_val.json
img_obj_val_v4.json

Run the following commands.

# For Feng et al. (2019) setting
./get_data.sh --corpus ss --max_sent 400 --min_sent_len 5 --max_from_obj 4 --workers 70 --oid v2

# For Laina et al. (2019) setting
./get_data.sh --corpus gcc --max_sent 400 --min_sent_len 5 --max_from_obj 4 --workers 70 --oid v4

Preprocess and Store Image Features

To prepare the features of MS COCO images, follow the Preprocess instruction of unsupervised_captioning_fast. The image features will be saved to ~/mscoco_image_features by default.

Run

Commands to run the experiments.

# Run our full model in Feng et al. (2019) settings
python -u main.py --corpus ss --auto_setting --max_pos_dist 4 --max_data -1 --img_dir ~/mscoco_image_features --epoch_size 100 --batch_train 8 --batch_eval 32 --early_stop 20 --norm_img --use_gate --use_pseudoL --pos_gate_weight 16 --loss_weight 1 --use_unique --device 0

# Run our full model in Laina et al. (2019) settings
python -u main.py --corpus gcc --auto_setting --max_pos_dist 4 --max_data -1 --img_dir ~/mscoco_image_features --epoch_size 100 --batch_train 8 --batch_eval 32 --early_stop 20 --norm_img --use_gate --use_pseudoL --pos_gate_weight 16 --loss_weight 1 --use_unique --device 0

# Run w/o pseudoL model
python -u main.py --corpus ss --auto_setting --max_pos_dist 4 --max_data -1 --img_dir ~/mscoco_image_features --epoch_size 100 --batch_train 8 --batch_eval 32 --early_stop 20 --norm_img --use_gate --use_unique --device 0

# Run w/o gate model
python -u main.py --corpus ss --auto_setting --max_pos_dist 4 --max_data -1 --img_dir ~/mscoco_image_features --epoch_size 100 --batch_train 8 --batch_eval 32 --early_stop 20 --norm_img --use_unique --device 0

# Run w/o unique model
python -u main.py --corpus ss --auto_setting --max_pos_dist 4 --max_data -1 --img_dir ~/mscoco_image_features --epoch_size 100 --batch_train 8 --batch_eval 32 --early_stop 20 --norm_img --use_gate --use_pseudoL --pos_gate_weight 16 --loss_weight 1 --device 0

# Run w/o image model
python -u main_wo_img.py --corpus ss --auto_setting --max_pos_dist 4 --max_data -1 --img_dir ~/mscoco_image_features --epoch_size 100 --batch_train 8 --batch_eval 32 --early_stop 20 --norm_img --use_gate --use_pseudoL --pos_gate_weight 16 --loss_weight 1 --use_unique --device 0

Preprocess to Combine

This is a preprocessing step to combine our method with unsupervised_captioning_fast.

Train and save our full model. The command below will save the best model to ./saved_models/ss4_full.

python -u main.py --corpus ss --auto_setting --max_pos_dist 4 --max_data -1 --img_dir ~/mscoco_image_features --epoch_size 100 --batch_train 8 --batch_eval 32 --early_stop 20 --norm_img --use_gate --use_pseudoL --pos_gate_weight 16 --loss_weight 1 --use_unique --device 0 --save --model_path ss4_full

Load the saved model and generate captions for the training images. The command below will save the generated captions to ./saved_models/ss4_full/selfcap_ss4_mi2.json.

python -u main_selfcap.py --corpus ss --auto_setting --max_pos_dist 4 --min_intersect 2 --img_dir ~/mscoco_image_features --batch_eval 32 --norm_img --use_gate --use_pseudoL --pos_gate_weight 16 --loss_weight 1 --use_unique --device 0 --model_path ss4_full --gen_path selfcap_ss4_mi2.json

Copy the generated caption file to data directory of unsupervised_captioning_fast, then follow its Combine instruction.

Notes

Modified pseudo-caption preprocessing

We modified our pseudo-caption preprocessing to retain the sentences where 1 < n <= 4 words exist between a pair of detected objects, not 0 < n <= 4 as described in our EACL paper (description is corrected in the arXiv version). We excluded the n = 1 sentences as those sentences tended to ungramatically omit articles (e.g., plant on table). All results in our paper were obtained with the 1 < n <= 4 preprocessing.

Incomplete seed fixing

We found that our seed fixing option could not completely control the learning of a model. So seed fixing does not return exaclty the same results as shown in our paper, but we reran the experiments and confirmed that the results were almost the same as the ones in the paper on the average.

References

Yang Feng, Lin Ma, Wei Liu, and Jiebo Luo. 2019. Unsupervised image captioning. In CVPR.
Iro Laina, Christian Rupprecht, and Nassir Navab. 2019. Towards unsupervised image captioning with shared multimodal embeddings. In ICCV.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
nns		nns
tools		tools
README.md		README.md
arg_parser.py		arg_parser.py
data_loader.py		data_loader.py
data_loader_selfcap.py		data_loader_selfcap.py
data_loader_wo_img.py		data_loader_wo_img.py
get_data.sh		get_data.sh
main.py		main.py
main_selfcap.py		main_selfcap.py
main_wo_img.py		main_wo_img.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning

Citation

Requirements

Download Dataset

Make Dataset (Optional: The files can be downloaded as described above)

Preprocess and Store Image Features

Run

Preprocess to Combine

Notes

Modified pseudo-caption preprocessing

Incomplete seed fixing

References

About

Uh oh!

Releases

Packages

Languages

ukyh/RemovingSpuriousAlignment

Folders and files

Latest commit

History

Repository files navigation

Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning

Citation

Requirements

Download Dataset

Make Dataset (Optional: The files can be downloaded as described above)

Preprocess and Store Image Features

Run

Preprocess to Combine

Notes

Modified pseudo-caption preprocessing

Incomplete seed fixing

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages