0% found this document useful (0 votes)

60 views7 pages

Key-Shots Video Summarization Using Self-Attention

This document presents a project synopsis for a video summarization tool that generates concise summaries using key shots. It involves applying self-attention mechanisms and bi-directional LSTM to encode video frames from an encoder-decoder framework. The tool will select the most informative highlights from long videos to improve browsing efficiency while maintaining content integrity. It was submitted by 5 students for their undergraduate degree and aims to complete the project by [date not provided].

Uploaded by

rock star

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views7 pages

Key-Shots Video Summarization Using Self-Attention

Uploaded by

rock star

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Key-shots Based Video Summarization By

Applying Self-Attention Mechanism

PROJECT SYNOPSIS

BACHELOR OF ENGINEERING
Computer Engineering

SUBMITTED BY

Girish Mulmule - 41081

Gaurav Gavhane - 41079
Shivani Kale - 41080
Bhagyashree Vichare - 41082
Yogesh Kadam - 41083

Under the guidance of

Prof. Pradeep Patil

Department of Computer Engineering

P. E. S. Modern College of Engineering,
Pune.

2020-2021
Contents

1 Title . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

3 Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

4 Team . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

5 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

6 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

7 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

8 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

9 Brief Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

10 Technical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

11 Probable Date of Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

12 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

List of Figures

1 Architecture Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

List of Tables
1 Title

Key-shots Based Video Summarization By Applying Self-Attention Mechanism

2 Domain

Machine Learning

3 Keywords

Bi-LSTM (Bi-Directional Long Short Term Memory),

TVSum (Title Based Video Summarization),
AVS (Attentive Video Summarization),
RNN (Recurrent Neural Networks),
OVP (Open Video Project),
SDLC (Software Delivery Life Cycle),
GAN (Generative Adversarial Network).

4 Team

Group Id: 20

Team Members:
1. Girish Mulmule - 41081
2. Gaurav Gavhane - 41079
3. Shivani Kale - 41080
4. Bhagyashree Vichare - 41082
5. Yogesh Kadam - 41083

5 Literature Review

Based on Zhong Ji, Kailin Xiong, Yanwei Pang, and Xuelong Li ,”Video summarization with
attention-based encoder-decoder networks”, Tianjin University, Xi’an Institute of Optics and
Precision Mechanics, CoRR abs/1708.09545, 2018. [1] Video is inundating the Internet social
platform. There are more than 300 hours video upload per minute to YouTube. It is awfully time
consuming to browse these videos. Therefore it become increasingly important to efficiently
browse, manage, and retrieve these videos. An ideal video summarization is that can provide
users the maximum information of the target video with the shortest time. It is also useful for
many other practical applications, such as video indexing, video retrieval, and event detection.
Its main goal is to produce a compact yet comprehensive summary to enable an efficient
browsing experience. It is a novel video summarization framework named Attentive
encoder-decoder networks for Video Summarization (AVS), in which the encoder uses a
Bidirectional Long Short-Term Memory (BiLSTM) to encode the contextual information among
the input video frames. As for the decoder, two attention-based LSTM networks are explored by
using additive and multiplicative objective functions, respectively. The results demonstrate the

1
superiority of the proposed AVS-based approaches against the state-of-the-art approaches with
remarkable improvements.

6 Objective

1. Digital Videos , nowadays are becoming more common in various fields like education ,
entertainment, etc. due to increased computational power and electronic storage capacity.

2. With an increasing size of videos collection , a technology is needed to effectively and

efficiently brows through the video without losing important contents of the videos.

3. Without compromising on these points when we make a summary of that long videos is
what video summarization promises.

7 Problem Statement

Summarizing videos to generate a short summary of the content of a larger videos by selecting
and present only the most informative or essential highlights for potential users by implement
key-shots based video summarization using attention based mechanism

8 Scope

1. To Create a tool product which generates the non-redundant coherent summary in order to
improve the efficiency of the multimedia documents and reduce redundancy.

2. For the purpose of achieving better accuracy than extractive methods and achieve better
efficiency.

2
9 Brief Description

Figure 1: Architecture Diagram

1. After the video is inserted the first is selected key-frames, where the summarization result
is a subset of isolated frames. The second is interval-based key-shots, where the summary
is a set of short intervals along the time constraints. Instead of binary information being
selected or not selected, certain data sets provide frame-level importance scores computed
from human annotations Those scores represent the likelihoods of the frames being
selected as a part of summary. Our models make use of all types of annotations binary
key-frame labels, binary subshot labels, or frame-level importance as learning signals The
selected key-frames are then separated from the unwanted footage and then the passed
through the encoder as shown in figure below.
2. The encoder then uses LSTM that is Long Short Term Memory. LSTMs are a special kind
of recurrent neural network that are adept at modelling long-range dependencies. At the
core of the LSTMs are memory cells c which encode, at every time step, the knowledge of
the inputs that have been observed up to that step. The model is composed of two LSTM
(long short-term memory) layers: one layer models video sequences in the forward
direction and the other the backward direction.
3. In a common encoder-decoder framework, an encoder con- verts the input sequence X =
x1, x2, ..., xT into a representation vector v= v1,v2, · · · ,vT. he architecture of an encoder
depends on the input in a specific application. For instance, in the application of image
caption, Convolutional Neural Network (CNN) is a good choice. In the case of machine
translation, it is natural to use a RNN as the encoder, since its input is a variablelength
sequence of symbols. When applied to video summarization, LSTM is the most suitable
algorithm since the contextual information around a specific frame is necessary for
generating a video summary. As human relies on high-level semantic understanding of the
video contents, usually after viewing the whole sequence can she/he decide which frame

3
or shot should be selected into the summary. For example, considering summarizing a
basketball game video, only a key ball that affects the game process should be selected
into the summary. However, there are many goals in a basketball game, thus it is
necessary to combine the scene before and after the goal to determine whether a goal is a
key ball. But as the software is not as capable as human brain and depends on various
coherence factors to make a selection

10 Technical Details

Platform
1. Ubantu-18.04
Software Specification
1. CUDA: 9.0.176
2. CUDNN : 7.1.2
3. Python: 3.5.2
4. PyTorch:0.4.1
5. NumPy:1.16.1
6. JSON:2.0.9

4
11 Probable Date of Completion

12 References

[1] Zhong Ji, Kailin Xiong, Yanwei Pang, and Xuelong Li ,“Video summarization with
attention-based encoder-decoder networks”, Tianjin University, Xi’an Institute of Optics and
Precision Mechanics, CoRR abs/1708.09545, 2018.
[2] Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman, “Video summarization with long
short-term memory”, in Proc. Eur. Conference Computer Vision pp 766–782, 2016.
[3] Behrooz Mahasseni, Michael Lam, and Sinisa Todorovic, video summarization with
adversarial LSTM networks”, in Proceedings of IEEE conference on Computer Vision and
Pattern Recognition, 2017, pp. 1–10.
[4] Michael Gygli,Helmut Grabner and Luc Van Gool, summarization by learning submodular
mixtures of objectives”, in Proceedings of IEEE Conference Computer Vision. Pattern
Recognition pp. 3090–3098,2015.
[5] Ejaz, Naveed, Irfan Mehmood, and Sung Wook Baik, “Efficient visual attention based
framework for extracting key frames from videos”, [5]Signal Processing Image
Communication, vol. 28, no. 1, pp. 34–44, 2013.
Zhang, Ke, Wei-Lun Chao, Fei Sha, and Kristen Grauman, “Summary transfer: Exemplar-based
subset selection for video summarization”, In Proceedings of the IEEE conference on computer
vision and pattern recognition pp 1059-1067, 2016.

Summary
No ratings yet
Summary
5 pages
AudioVisual Video Summarization
No ratings yet
AudioVisual Video Summarization
8 pages
Video Summarization Using Fully Convolutional Sequence Networks
No ratings yet
Video Summarization Using Fully Convolutional Sequence Networks
17 pages
1.1 Overview
No ratings yet
1.1 Overview
37 pages
Mathematical Problems in Engineering - 2022 - Ul Haq - An Effective Video Summarization Framework Based On The Object of
No ratings yet
Mathematical Problems in Engineering - 2022 - Ul Haq - An Effective Video Summarization Framework Based On The Object of
25 pages
WANet: Deep Learning for Video Summarization
No ratings yet
WANet: Deep Learning for Video Summarization
13 pages
Deepak Report Phase1
No ratings yet
Deepak Report Phase1
80 pages
SDP Synopsis E4 Final
No ratings yet
SDP Synopsis E4 Final
2 pages
Synopsis On: Video Summarization
No ratings yet
Synopsis On: Video Summarization
11 pages
Computers 12 00186
No ratings yet
Computers 12 00186
14 pages
AI Video Summarization with FFmpeg & NLP
No ratings yet
AI Video Summarization with FFmpeg & NLP
6 pages
AI-Driven Video Summarization Framework
No ratings yet
AI-Driven Video Summarization Framework
13 pages
Attention-Guided Multi-Granularity Fusion Model For Video Summarization
No ratings yet
Attention-Guided Multi-Granularity Fusion Model For Video Summarization
11 pages
Video Summarization Techniques
No ratings yet
Video Summarization Techniques
2 pages
Hand Sign Language Recognition Using Deep Learning
No ratings yet
Hand Sign Language Recognition Using Deep Learning
14 pages
Deep Multi-Scale Pyramidal Features Network For Supervised Video Summarization
No ratings yet
Deep Multi-Scale Pyramidal Features Network For Supervised Video Summarization
14 pages
Video Summarization for Educators
No ratings yet
Video Summarization for Educators
17 pages
A Prototype For Ai Powered Smart Video Protection Final
No ratings yet
A Prototype For Ai Powered Smart Video Protection Final
28 pages
Comprehensive Video Summarization Guide
No ratings yet
Comprehensive Video Summarization Guide
55 pages
IJCRTBE02132
No ratings yet
IJCRTBE02132
7 pages
Paper 2
No ratings yet
Paper 2
6 pages
Final Report Major
No ratings yet
Final Report Major
43 pages
Muhammad 2018
No ratings yet
Muhammad 2018
9 pages
Video Summarization for Suspicious Behavior
No ratings yet
Video Summarization for Suspicious Behavior
39 pages
Complete Paper Draft
No ratings yet
Complete Paper Draft
7 pages
Video Summarization Using Deep Semantic Features
No ratings yet
Video Summarization Using Deep Semantic Features
16 pages
Yun Thesesgg
No ratings yet
Yun Thesesgg
127 pages
Me Review Report Copy
No ratings yet
Me Review Report Copy
19 pages
Hybrid Algorithms For Summarization of Video Surveillance Systems
No ratings yet
Hybrid Algorithms For Summarization of Video Surveillance Systems
35 pages
Unsupervised Video Summarization with GANs
No ratings yet
Unsupervised Video Summarization with GANs
10 pages
IEEE Paper Draft
No ratings yet
IEEE Paper Draft
5 pages
Video Summarization Research Guidance
No ratings yet
Video Summarization Research Guidance
20 pages
Final PRJT Ppt1
No ratings yet
Final PRJT Ppt1
20 pages
06 Chapter2
No ratings yet
06 Chapter2
30 pages
VTU Phase 1 Report: YOLO Object Tracking
No ratings yet
VTU Phase 1 Report: YOLO Object Tracking
18 pages
Pytorchvideo: A Deep Learning Library For Video Understanding
No ratings yet
Pytorchvideo: A Deep Learning Library For Video Understanding
4 pages
19 LS
No ratings yet
19 LS
15 pages
Exploring Global Diverse Attention Via Pairwise
No ratings yet
Exploring Global Diverse Attention Via Pairwise
12 pages
Video To Text Summarization
No ratings yet
Video To Text Summarization
17 pages
23 Video Summarization LTC-SUM Lightweight Client-Driven Personalized Video Summarization Framework Using 2D CNN
No ratings yet
23 Video Summarization LTC-SUM Lightweight Client-Driven Personalized Video Summarization Framework Using 2D CNN
15 pages
56856-Proof
No ratings yet
56856-Proof
12 pages
Lee Video Summarization With Large Language Models CVPR 2025 Paper
No ratings yet
Lee Video Summarization With Large Language Models CVPR 2025 Paper
11 pages
Pre Synopsis Report I&K (1) C PDF
No ratings yet
Pre Synopsis Report I&K (1) C PDF
1 page
Video Transcript Summarizer
No ratings yet
Video Transcript Summarizer
5 pages
A Machine Learning Based Approach To Video Summarization
No ratings yet
A Machine Learning Based Approach To Video Summarization
5 pages
Group 13 Sem 2 Review 1
No ratings yet
Group 13 Sem 2 Review 1
20 pages
Inception Recurrent Neural Network Architecture For Video Frame Prediction
No ratings yet
Inception Recurrent Neural Network Architecture For Video Frame Prediction
16 pages
Documentation 10
No ratings yet
Documentation 10
26 pages
Unsupervised Video Summarization Framework Using Keyframe Extraction and Video Skimming
No ratings yet
Unsupervised Video Summarization Framework Using Keyframe Extraction and Video Skimming
6 pages
An Efficient Transformer-Based System For Text-Based Video Segment Retrieval Using FAISS
No ratings yet
An Efficient Transformer-Based System For Text-Based Video Segment Retrieval Using FAISS
4 pages
Autocertify Final11
No ratings yet
Autocertify Final11
23 pages
Conference - Mimansha Singh
No ratings yet
Conference - Mimansha Singh
18 pages
FinalMP 2 Removed
No ratings yet
FinalMP 2 Removed
46 pages
Deep Learning Image Summarization System
No ratings yet
Deep Learning Image Summarization System
7 pages
Deep Learning in Video Summarization
No ratings yet
Deep Learning in Video Summarization
26 pages
Cloudproject
No ratings yet
Cloudproject
3 pages
Unsupervised Video Summarization via Attention
No ratings yet
Unsupervised Video Summarization via Attention
12 pages
Project Report
No ratings yet
Project Report
53 pages
Remote River Cleaning Machine Report
No ratings yet
Remote River Cleaning Machine Report
25 pages
FULLTEXT01
No ratings yet
FULLTEXT01
70 pages
Synopsis-Wheel Hub
No ratings yet
Synopsis-Wheel Hub
4 pages
Solar-Powered Fridge for Rural Medicine
No ratings yet
Solar-Powered Fridge for Rural Medicine
4 pages
Adaptive Head Light System
100% (1)
Adaptive Head Light System
11 pages
Design and Implementation of A High-Efficiency OnBoard Battery Charger For Electric Vehicles With
No ratings yet
Design and Implementation of A High-Efficiency OnBoard Battery Charger For Electric Vehicles With
6 pages
EV Wireless Charging Analysis
No ratings yet
EV Wireless Charging Analysis
12 pages
Brake Pedal Optimization Study
No ratings yet
Brake Pedal Optimization Study
4 pages
Design of Pick and Place Mechanical Arm
100% (1)
Design of Pick and Place Mechanical Arm
32 pages
Artificial Intelligence Based Person Identification Virtual Assistant
No ratings yet
Artificial Intelligence Based Person Identification Virtual Assistant
5 pages
Home Automation System Based Mobile Application: Poonphon Suesaowaluk
No ratings yet
Home Automation System Based Mobile Application: Poonphon Suesaowaluk
6 pages
Raspberry Pi Based Voice-Operated Personal Assistant (Neobot)
No ratings yet
Raspberry Pi Based Voice-Operated Personal Assistant (Neobot)
5 pages
Alloy Wheel Paper Ref
No ratings yet
Alloy Wheel Paper Ref
7 pages
AI Enhances Barbie's Interactivity
No ratings yet
AI Enhances Barbie's Interactivity
2 pages
Dynamic Display Using LED Synopsis
No ratings yet
Dynamic Display Using LED Synopsis
2 pages
Gym Equipment Power Generation
No ratings yet
Gym Equipment Power Generation
8 pages
Frame Analysis Report
No ratings yet
Frame Analysis Report
28 pages
Footstep Power Generation Seminar
No ratings yet
Footstep Power Generation Seminar
23 pages
C18T2S3 - Adam and Michelle's Presentation On Laki Eruption
No ratings yet
C18T2S3 - Adam and Michelle's Presentation On Laki Eruption
2 pages
TC Colorcodes
100% (1)
TC Colorcodes
7 pages
Com Lynx User Guide 15 The 20110513
No ratings yet
Com Lynx User Guide 15 The 20110513
45 pages
What Not To Do On Your Igcse Exams
No ratings yet
What Not To Do On Your Igcse Exams
3 pages
Kaka With Jianrong Song SCT SPS NiCrAlY 2010-Libre
No ratings yet
Kaka With Jianrong Song SCT SPS NiCrAlY 2010-Libre
4 pages
FEMINE
No ratings yet
FEMINE
133 pages
Exam Registration Tutorial PDF
No ratings yet
Exam Registration Tutorial PDF
9 pages
Moisture Methods Plastic EN PDF
No ratings yet
Moisture Methods Plastic EN PDF
24 pages
Accessories - Caponord Aprilia
No ratings yet
Accessories - Caponord Aprilia
12 pages
Modal Verbs for Past Speculation
No ratings yet
Modal Verbs for Past Speculation
4 pages
Tallinn EETN Airport Operations Guide
100% (1)
Tallinn EETN Airport Operations Guide
32 pages
"The Bomb": Iowa Agricultural College Yearbook For The Class of 1895
100% (2)
"The Bomb": Iowa Agricultural College Yearbook For The Class of 1895
259 pages
Flow Coefficients CV Values: Bolted Bonnet Globe Valves API 623 & B 16.34 Class: 150 - 2500 Size: 2" - 18"
No ratings yet
Flow Coefficients CV Values: Bolted Bonnet Globe Valves API 623 & B 16.34 Class: 150 - 2500 Size: 2" - 18"
1 page
Taylor Experiencing
No ratings yet
Taylor Experiencing
12 pages
ECEN 248 Lab 2: Behavioral Verilog
No ratings yet
ECEN 248 Lab 2: Behavioral Verilog
7 pages
Stage 1 Essential English
No ratings yet
Stage 1 Essential English
3 pages
Technical Specifications of GIV-20A2616-GSOLA
No ratings yet
Technical Specifications of GIV-20A2616-GSOLA
5 pages
Soil Mechanics - KNS2123: Tutorial Compaction
No ratings yet
Soil Mechanics - KNS2123: Tutorial Compaction
5 pages
Dab 6
No ratings yet
Dab 6
8 pages
Baf Preamble
No ratings yet
Baf Preamble
7 pages
Therminal Velocity
No ratings yet
Therminal Velocity
4 pages
SSC JE Physics Chapter1 Motion Notes Enhanced
No ratings yet
SSC JE Physics Chapter1 Motion Notes Enhanced
3 pages
Immanuel Kant's Theory of Knowledge
93% (14)
Immanuel Kant's Theory of Knowledge
39 pages
The Role of Translation in National Development
No ratings yet
The Role of Translation in National Development
7 pages
Geometric Probability: Enduring Understanding: Develop A Better Understanding of How To Identify, Determine The Size Of
100% (1)
Geometric Probability: Enduring Understanding: Develop A Better Understanding of How To Identify, Determine The Size Of
5 pages
Kelompok 5 Chapter 8&9
No ratings yet
Kelompok 5 Chapter 8&9
43 pages
Quantum Ransomware Analysis
100% (1)
Quantum Ransomware Analysis
27 pages
Forbes Estates Lipa
No ratings yet
Forbes Estates Lipa
19 pages
QC Tomotherapy
No ratings yet
QC Tomotherapy
37 pages
Tashnuba Anwar's Resume & Skills
No ratings yet
Tashnuba Anwar's Resume & Skills
3 pages

Key-Shots Video Summarization Using Self-Attention

Uploaded by

Key-Shots Video Summarization Using Self-Attention

Uploaded by

Key-shots Based Video Summarization By

Applying Self-Attention Mechanism

Girish Mulmule - 41081

Under the guidance of

Department of Computer Engineering

11 Probable Date of Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Key-shots Based Video Summarization By Applying Self-Attention Mechanism

Bi-LSTM (Bi-Directional Long Short Term Memory),

2. With an increasing size of videos collection , a technology is needed to effectively and

Figure 1: Architecture Diagram

You might also like