Image Captioning via Dynamic Path Customization

Introduction

The official repository for “Image Captioning via Dynamic Path Customization”.

Dynamic Transformer Network (DTNet) is a model to genrate discriminative yet accurate captions, which dynamically assigns customized paths to different samples.

The framework of the proposed Dynamic Transformer Network (DTNet)

The detailed architectures of different cells in the spatial and channel routing space.

News

2023.09.28: Released code

Environment setup

Please refer to meshed-memory-transformer

Data preparation

Annotation. Download the annotation file annotation.zip. Extarct and put it in the project root directory.
Feature. You can download our ResNeXt-101 feature (hdf5 file) here. Acess code: jcj6.
evaluation. Download the evaluation tools here. Acess code: jcj6. Extarct and put it in the project root directory.

There are five kinds of keys in our .hdf5 file. They are

['%d_features' % image_id]: region features (N_regions, feature_dim)
['%d_boxes' % image_id]: bounding box of region features (N_regions, 4)
['%d_size' % image_id]: size of original image (for normalizing bounding box), (2,)
['%d_grids' % image_id]: grid features (N_grids, feature_dim)
['%d_mask' % image_id]: geometric alignment graph, (N_regions, N_grids)

The feature extraction can be followed as here

Training

python train.py --exp_name DTNet --batch_size 50 --rl_batch_size 100 --workers 4 --head 8 --warmup 10000 --features_path /home/data/coco_grid_feats2.hdf5 --annotation /home/data/m2_annotations --logs_folder tensorboard_logs

Evaluation

python eval.py --batch_size 50 --exp_name DTNet --features_path /home/data/coco_grid_feats2.hdf5 --annotation /home/data/m2_annotations --ckpt_path your_model_path

Performance

Comparisons with SOTAs on the Karpathy test split.

Qualitative Results

Examples of captions generated by Transformer and DTNet.

Images and the corresponding number of passed cells.

Path Visualization.

Acknowledgements

Thanks the meshed-memory-transformer.
Thanks the amazing work of grid-feats-vqa.

Citations

@ARTICLE{ma2024image,
  author={Ma, Yiwei and Ji, Jiayi and Sun, Xiaoshuai and Zhou, Yiyi and Hong, Xiaopeng and Wu, Yongjian and Ji, Rongrong},
  journal={IEEE Transactions on Neural Networks and Learning Systems}, 
  title={Image Captioning via Dynamic Path Customization}, 
  year={2024},
  volume={},
  number={},
  pages={1-15},
  keywords={Routing;Visualization;Transformers;Adaptation models;Task analysis;Feature extraction;Semantics;Dynamic network;image captioning;input-sensitive;transformer},
  doi={10.1109/TNNLS.2024.3409354}}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
img		img
models		models
utils		utils
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
train.py		train.py
vocab_transformer_grid.pkl		vocab_transformer_grid.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning via Dynamic Path Customization

Introduction

News

Environment setup

Data preparation

Training

Evaluation

Performance

Qualitative Results

Acknowledgements

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

xmu-xiaoma666/DTNet

Folders and files

Latest commit

History

Repository files navigation

Image Captioning via Dynamic Path Customization

Introduction

News

Environment setup

Data preparation

Training

Evaluation

Performance

Qualitative Results

Acknowledgements

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages