An Approach towards Video Captioning in Bengali

Rushadul Mannan; adnan islam

An Approach towards Video Captioning in Bengali

Rushadul Mannan

adnan islam

2022, Mathematical Statistician and Engineering Applications

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Video captioning refers to the process of predicting a semantically consistent textual description from a given video clip. Even though a significant amount of research work is present for video captioning in English, for Bengali the field of video captioning is nearly unexplored. Therefore, this research aims at generating Bengali captions that plausibly describe the gist of a specific short video. To accomplish this, Long Short-Term Memory (LSTM) based a sequence-to-sequence model is used that takes the video frame features as input and generates an analogous textual description. In this study, Microsoft Research Video Description Corpus (MSVD) dataset is used which is an English dataset. Therefore, a deep learning-based translator and manual labor are used to convert English captions into appropriate Bengali ones. Finally, the model's performance is evaluated using popular evaluation metrics-BLEU and TER. The proposed approach achieves BLEU and TER scores of 0.38 and 0.76 respectively, establishing a new benchmark for the Bengali video captioning tasks.

TOSIN IGE

arXiv (Cornell University), 2023

This work demonstrates the implementation and use of an encoder-decoder model to perform a many-to-many mapping of video data to text captions. The many-to-many mapping occurs via an input temporal sequence of video frames to an output sequence of words to form a caption sentence. Data preprocessing, model construction, and model training are discussed. Caption correctness is evaluated using 2-gram BLEU scores across the different splits of the dataset. Specific examples of output captions were shown to demonstrate model generality over the video temporal dimension. Predicted captions were shown to generalize over video action, even in instances where the video scene changed dramatically. Model architecture changes are discussed to improve sentence grammar and correctness.

Log In

An Approach towards Video Captioning in Bengali

Sign up for access to the world's latest research

Abstract

Related papers

Related topics