GitHub - tongzhang111/acm-mm: competition code

FSM is a new architecture for spatio-temporal video grounding that consists of an efficient video and text encoder that models spatial multi-modal interactions over sparsely sampled frames and a space-time decoder that jointly performs spatio-temporal localization.

This repository provides the code for our paper. This includes:

Software setup, data downloading and preprocessing instructions for the VidSTG, HC-STVG1 datasets
Training scripts and pretrained checkpoints
Evaluation scripts and demo

Setup

Download FFMPEG and add it to the PATH environment variable. The code was tested with version ffmpeg-4.2.2-amd64-static. Then create a conda environment and install the requirements with the following commands:

conda create -n tubedetr_env python=3.8
conda activate tubedetr_env
pip install -r requ.txt

Data Downloading

Setup the paths where you are going to download videos and annotations in the config json files.

VidSTG: Download VidOR videos and annotations from the VidOR dataset providers. Then download the VidSTG annotations from the VidSTG dataset providers. The vidstg_vid_path folder should contain a folder video containing the unzipped video folders. The vidstg_ann_path folder should contain both VidOR and VidSTG annotations.

HC-STVG: Download HC-STVG1 and HC-STVG2.0 videos and annotations from the HC-STVG dataset providers. The hcstvg_vid_path folder should contain a folder video containing the unzipped video folders. The hcstvg_ann_path folder should contain both HC-STVG1 and HC-STVG2.0 annotations.

Data Preprocessing

To preprocess annotation files, run:

python preproc/preproc_vidstg.py
python preproc/preproc_hcstvg.py
python preproc/preproc_hcstvgv2.py

Training

Download pretrained RoBERTa tokenizer and model weights in the TRANSFORMERS_CACHE folder. Download pretrained ResNet-101 model weights in the TORCH_HOME folder. Download MDETR pretrained model weights with ResNet-101 backbone in the current folder.

VidSTG To train on VidSTG, run:

python -m torch.distributed.launch --nproc_per_node=NUM_GPUS --use_env main.py --ema \
--load=pretrained_resnet101_checkpoint.pth --combine_datasets=vidstg --combine_datasets_val=vidstg \
--dataset_config config/vidstg.json --output-dir=OUTPUT_DIR

HC-STVG1 To train on HC-STVG1, run:

python -m torch.distributed.launch --nproc_per_node=NUM_GPUS --use_env main.py --ema \
--load=pretrained_resnet101_checkpoint.pth --combine_datasets=hcstvg --combine_datasets_val=hcstvg \
--dataset_config config/hcstvg.json --epochs=40 --eval_skip=40 --output-dir=OUTPUT_DIR

Evaluation

For evaluation only, simply run the same commands as for training with --resume=CHECKPOINT --eval. For this to be done on the test set, add --test (in this case predictions and attention weights are also saved).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Setup

Data Downloading

Data Preprocessing

Training

Evaluation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
config		config
datasets		datasets
models		models
preproc		preproc
util		util
README.md		README.md
engine.py		engine.py
main.py		main.py
requ.txt		requ.txt

tongzhang111/acm-mm

Folders and files

Latest commit

History

Repository files navigation

Setup

Data Downloading

Data Preprocessing

Training

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages