Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering 💻

Chenyang Lyu, Tianbo Ji, Yvette Graham, Jennifer Foster

School of Computing, Dublin City University, Dublin, Ireland 🏠

This repository contains the code for the Semantic-aware Dynamic Retrospective-Prospective Reasoning system for Event-level Video Question Answering (EVQA) 💻. The system utilizes explicit semantic connections between questions and visual information at the event level to improve the reasoning process and provide optimal answers 🔍.

1. Introduction 📘

Event-Level Video Question Answering (EVQA) requires complex reasoning across video events to obtain the visual information needed for optimal answers. However, few studies have focused on utilizing explicit semantic connections between questions and visual information, especially at the event level. In this paper, we propose a semantic-aware dynamic retrospective-prospective reasoning approach for video-based question answering. We explicitly incorporate the Semantic Role Labeling (SRL) structure of the question in the dynamic reasoning process, determining which frame to move to based on the focused part of the SRL structure (agent, verb, patient, etc.) 🧐. We evaluate our approach on the TrafficQA benchmark EVQA dataset and demonstrate superior performance compared to previous state-of-the-art models 💪.

2. Dataset 📓

Please download the TrafficQA dataset from this link: https://sutdcv.github.io/SUTD-TrafficQA/#/download including videos and corresponding annotations and then move them under data/ directory.

3. Pre-processing 🔧

Please use data_preprocess.py to extract frames from videos in the TrafficQA dataset and then tokenize the annotations data to tensor dataset.

4. Training 🏫

Please use the following script to train and evaluate the VideoQA system:

python run_traffic_qa.py --do_train --do_eval --num_train_epochs 2 --n_frames 10 --eval_n_frames 10 --learning_rate 5e-6 --train_batch_size 8 --eval_batch_size 16 --attention_heads 8 --eval_steps 5000

5. Usage 📦

Once the model is trained, you can use it for VideoQA tasks. Provide a video, and the system will give the most probable answer based on the video. 🔎

6. Dependencies 🛠️

Python (>=3.8) 🐍
Pytorch (>=2.0) 🔥
MoviePy 🧮
ffmpeg 🐼

Please make sure to install the required dependencies before running the code. ⚙️

Citation 📄

Please cite our paper using the bibtex below if you found that our paper is useful to you:

@article{lyu2023semantic,
  title={Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering},
  author={Lyu, Chenyang and Ji, Tianbo and Graham, Yvette and Foster, Jennifer},
  journal={arXiv preprint arXiv:2305.08059},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
LICENSE		LICENSE
README.md		README.md
data_preprocess.py		data_preprocess.py
modeling.py		modeling.py
run_traffic_qa.py		run_traffic_qa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering 💻

Table of Contents

1. Introduction 📘

2. Dataset 📓

3. Pre-processing 🔧

4. Training 🏫

5. Usage 📦

6. Dependencies 🛠️

Citation 📄

About

Uh oh!

Releases

Packages

Languages

License

lyuchenyang/Semantic-aware-VideoQA

Folders and files

Latest commit

History

Repository files navigation

Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering 💻

Table of Contents

1. Introduction 📘

2. Dataset 📓

3. Pre-processing 🔧

4. Training 🏫

5. Usage 📦

6. Dependencies 🛠️

Citation 📄

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages