Daniel Neimark, Omri Bar, Maya Zohar, Dotan Asselmann [Paper]
pip install timm
pip install transformers[torch]
To use VTN models please refer to the configs under configs/Kinetics, or see
the MODEL_ZOO.md
for pre-trained models*.
To train ViT-B-VTN on your dataset (see paper for details):
python tools/run_net.py \
--cfg configs/Kinetics/VIT_B_VTN.yaml \
DATA.PATH_TO_DATA_DIR path_to_your_dataset
To test the trained ViT-B-VTN on Kinetics-400 dataset:
python tools/run_net.py \
--cfg configs/Kinetics/VIT_B_VTN.yaml \
DATA.PATH_TO_DATA_DIR path_to_kinetics_dataset \
TRAIN.ENABLE False \
TEST.CHECKPOINT_FILE_PATH path_to_model \
TEST.CHECKPOINT_TYPE pytorch
* VTN models in MODEL_ZOO.md produce slightly different results than those reported in the paper due to differences between the PySlowFast code base and the original code used to train the models (mainly around data and video loading).
If you find VTN useful for your research, please consider citing the paper using the following BibTeX entry.
@article{neimark2021video,
title={Video Transformer Network},
author={Neimark, Daniel and Bar, Omri and Zohar, Maya and Asselmann, Dotan},
journal={arXiv preprint arXiv:2102.00719},
year={2021}
}





