AN EFFICIENT MODEL FOR VIDEO PREDICTION

IRJET  Journal

AN EFFICIENT MODEL FOR VIDEO PREDICTION

IRJET Journal

2023, IRJET

visibility

…

description

7 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Video prediction aims to generate future frames from a given past frames. This is one of the fundamental tasks in the computer vision and machine learning. It has attracted many researchers and there are various methods have been proposed to address this task. However, most of them have focused on increasing the performance and ignored memory space and computation cost issue. In this paper, we proposed a lightweight yet efficient network for video prediction. In spire by depthwise and pointwise convolution in the image domainm, we introduce the 3D depthwise and pointwise con volution neural network for video prediction. The experiment results have shown that our proposed framework outperforms state-of-the-art methods in terms of PSNR, SSIM and LPIPS on standard datasets such as KTH, KITTI and BAIR datasets.

Mareeta Mathai

IEEE access, 2024

Video prediction is an essential vision task due to its wide applications in real-world scenarios. However, it is indeed challenging due to the inherent uncertainty and complex spatiotemporal dynamics of video content. Several state-of-the-art deep learning methods have achieved superior video prediction accuracy at the expense of huge computational cost. Hence, they are not suitable for devices with limitations in memory and computational resource. In the light of Green Artificial Intelligence (AI), more environment friendly deep learning solutions are desired to tackle the problem of large models and computational cost. In this work, we propose a novel video prediction network 3DTransLSTM, which adopts a hybrid transformer-long short-term memory (LSTM) structure to inherit the merits of both self-attention and recurrence. Three-dimensional (3D) depthwise separable convolutions are used in this hybrid structure to extract spatiotemporal features, meanwhile enhancing model efficiency. We conducted experimental studies on four popular video prediction datasets. Compared to existing methods, our proposed 3DTransLSTM achieved competitive frame prediction accuracy with significantly reduced model size, trainable parameters, and computational complexity. Moreover, we demonstrate the generalization ability of the proposed model by testing the model on dataset completely unseen in the training data.

Log In

AN EFFICIENT MODEL FOR VIDEO PREDICTION

Sign up for access to the world's latest research

Abstract

Related papers

Related papers

Related topics