Skip to content

The official repository for “Image Captioning via Dynamic Path Customization”.

License

Notifications You must be signed in to change notification settings

xmu-xiaoma666/DTNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Captioning via Dynamic Path Customization

Introduction

The official repository for “Image Captioning via Dynamic Path Customization”.

Dynamic Transformer Network (DTNet) is a model to genrate discriminative yet accurate captions, which dynamically assigns customized paths to different samples.


The framework of the proposed Dynamic Transformer Network (DTNet)


The detailed architectures of different cells in the spatial and channel routing space.

News

  • 2023.09.28: Released code

Environment setup

Please refer to meshed-memory-transformer

Data preparation

  • Annotation. Download the annotation file annotation.zip. Extarct and put it in the project root directory.
  • Feature. You can download our ResNeXt-101 feature (hdf5 file) here. Acess code: jcj6.
  • evaluation. Download the evaluation tools here. Acess code: jcj6. Extarct and put it in the project root directory.

There are five kinds of keys in our .hdf5 file. They are

  • ['%d_features' % image_id]: region features (N_regions, feature_dim)
  • ['%d_boxes' % image_id]: bounding box of region features (N_regions, 4)
  • ['%d_size' % image_id]: size of original image (for normalizing bounding box), (2,)
  • ['%d_grids' % image_id]: grid features (N_grids, feature_dim)
  • ['%d_mask' % image_id]: geometric alignment graph, (N_regions, N_grids)

The feature extraction can be followed as here

Training

python train.py --exp_name DTNet --batch_size 50 --rl_batch_size 100 --workers 4 --head 8 --warmup 10000 --features_path /home/data/coco_grid_feats2.hdf5 --annotation /home/data/m2_annotations --logs_folder tensorboard_logs

Evaluation

python eval.py --batch_size 50 --exp_name DTNet --features_path /home/data/coco_grid_feats2.hdf5 --annotation /home/data/m2_annotations --ckpt_path your_model_path

Performance


Comparisons with SOTAs on the Karpathy test split.

Qualitative Results


Examples of captions generated by Transformer and DTNet.


Images and the corresponding number of passed cells.


Path Visualization.

Acknowledgements

Citations

@ARTICLE{ma2024image,
  author={Ma, Yiwei and Ji, Jiayi and Sun, Xiaoshuai and Zhou, Yiyi and Hong, Xiaopeng and Wu, Yongjian and Ji, Rongrong},
  journal={IEEE Transactions on Neural Networks and Learning Systems}, 
  title={Image Captioning via Dynamic Path Customization}, 
  year={2024},
  volume={},
  number={},
  pages={1-15},
  keywords={Routing;Visualization;Transformers;Adaptation models;Task analysis;Feature extraction;Semantics;Dynamic network;image captioning;input-sensitive;transformer},
  doi={10.1109/TNNLS.2024.3409354}}

About

The official repository for “Image Captioning via Dynamic Path Customization”.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages