0% found this document useful (0 votes)

154 views24 pages

Image Captioning Using CNN & LSTM: Digital Signal Processing Laboratory (EEE - 316)

This document describes a model for image captioning using a convolutional neural network (CNN) and long short-term memory (LSTM) network. It first outlines the problem and overall model, then discusses the key building blocks: CNNs for image feature extraction, transfer learning with Inception V3, RNNs/LSTMs for generating captions, and word embeddings. It shows how these pieces are connected in the final model and provides examples of model performance on test and real data, as well as potential applications.

Uploaded by

উদয় কামাল

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

154 views24 pages

Image Captioning Using CNN & LSTM: Digital Signal Processing Laboratory (EEE - 316)

Uploaded by

উদয় কামাল

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Digital Signal Processing Laboratory ( EEE -316)

Image Captioning Using

CNN & LSTM

Uday Kamal Hasib Amin Rajib Al-Sabah Abhishek Shushil

Id : 1406041 Id : 1406045 Id : 1406035 Id : 1406034

Bangladesh University of Engineering and Technology (BUET)

Presentation Outline

● Problem Statement
● Basic building blocks for the
network
- CNN
- Transfer Learning
- RNN
- LSTM
● How do we wire them together?
● Code
● Other places this can be
implemented
● Interaction & Questions
Problem Overview
Problem Overview
Overall Model:
Building Blocks for the Network:
CNN
Building Blocks for the Network:
CNN

Convolution layer is a feature detector that automagically learns to filter out the not needed
information from an input by using convolution kernel.
Pooling layers compute the max or average value of a particular feature over a region of
the input data (downsizing of input images). Also helps to detect objects in some unusual
places and reduces memory size.
Building Blocks for the Network:
Transfer Learning
Building Blocks for the Network:
Inception V3
Building Blocks for the Network:
RNN
● As humans we understand context
● Every single time we don’t reset our understanding
● Thoughts have persistence
● Traditional NNs like CNNs don’t have persistence
● speech recognition, language modeling, translation
requires this persistence

RNNs are general computers which can learn algorithms to map

input sequences to output sequences (flexible-sized vectors).
The output vector’s contents are influenced by the entire
history of inputs.
Building Blocks for the Network:
RNN
Building Blocks for the Network:
LSTM
The LSTM units give the network memory cells with read, write and reset
operations. During training, the network can learn when it should remember data
and when it should throw it away.
Building Blocks for the Network:
LSTM

Ct is the cell state, which flows through the

entire chain...
Building Blocks for the Network:
LSTM

Forget Gate:

Concatenate
Building Blocks for the Network:
LSTM

Input Gate Layer

New contribution to cell state

Classic neuron
Building Blocks for the Network:
LSTM

Update Cell State (memory):

Building Blocks for the Network:
LSTM

Output Gate Layer

Output to next layer

Building Blocks for the Network:
Word Embedding

Embeddings are used to turn textual data (words, sentences, paragraphs) into
high- dimensional vector representations and group them together with
semantically similar data in a vectorspace. Thereby, computer can detect
similarities mathematically.
Final Model:
Training Data:

Flickr8k Dataset:
Dataset contains 8000 different images with 5 different human
labelled captions.:
The image is given 5 different captions:

1) A boy runs as others play on a home-made slip and

slide.

2) Children in swimming clothes in a field.

3) Little kids are playing outside with a water hose and

are sliding down a water slide.

4) Several children are playing outside with a wet tarp on

the ground.

5) Several children playing on a homemade water slide.

Training History:
Model’s Performance on Test Data:
Model’s Performance on Real Data:

Three people are on a boat in Three people pose for a One man is sitting at a table
the water picture together in front of a restaurant

A soccer player prepares to A group of kids play in the A boy hits the ball at a
kick the ball water baseball game .
Application:

● Visual to Text systems for blind people

● Search Engines for searching medical records based on

content based caption

● Auto Tagging different imaging data

● Auto Video tagging and summary generation

Real-Time Car Make and Model Recognition
No ratings yet
Real-Time Car Make and Model Recognition
8 pages
Machine Learning in Mechanical Engineering
No ratings yet
Machine Learning in Mechanical Engineering
20 pages
2 Convolutional Neural Network For Image Classification
No ratings yet
2 Convolutional Neural Network For Image Classification
6 pages
DL CNN
No ratings yet
DL CNN
129 pages
Week 1 Deep Learning Quiz Insights
No ratings yet
Week 1 Deep Learning Quiz Insights
2 pages
Image Captioning Using CNN and LSTM
No ratings yet
Image Captioning Using CNN and LSTM
9 pages
Deep Learning Quiz: Week 1 & 2
No ratings yet
Deep Learning Quiz: Week 1 & 2
5 pages
CNN Course: Deep Learning with Keras
No ratings yet
CNN Course: Deep Learning with Keras
2 pages
Machine Learning Unit 3
No ratings yet
Machine Learning Unit 3
40 pages
A Survey of Evolution of Image Captioning PDF
No ratings yet
A Survey of Evolution of Image Captioning PDF
18 pages
Lecture 6 Smaller Network: RNN: One X at A Time Re-Use The Same Edge Weights
No ratings yet
Lecture 6 Smaller Network: RNN: One X at A Time Re-Use The Same Edge Weights
39 pages
Lecture+Notes Intro To MLOps Session3
No ratings yet
Lecture+Notes Intro To MLOps Session3
8 pages
Unit 1
No ratings yet
Unit 1
55 pages
Medical Image Fusion Method by Deep Learning
No ratings yet
Medical Image Fusion Method by Deep Learning
9 pages
6months ML
No ratings yet
6months ML
161 pages
Cs490 Advanced Topics in Computing (Deep Learning) : Lecture 16: Convolutional Neural Networks (CNNS)
No ratings yet
Cs490 Advanced Topics in Computing (Deep Learning) : Lecture 16: Convolutional Neural Networks (CNNS)
63 pages
Data Science Project
No ratings yet
Data Science Project
3 pages
RNN LSTM
No ratings yet
RNN LSTM
49 pages
Face Detection & Emotion Recognition Project
No ratings yet
Face Detection & Emotion Recognition Project
26 pages
Supervised Regression in Machine Learning
No ratings yet
Supervised Regression in Machine Learning
32 pages
CNN Implementation for Image Classification
No ratings yet
CNN Implementation for Image Classification
7 pages
Btech CSE
100% (1)
Btech CSE
17 pages
AI Emotion Recognition System
No ratings yet
AI Emotion Recognition System
57 pages
Deep Learning for Age & Gender Detection
No ratings yet
Deep Learning for Age & Gender Detection
11 pages
Graph Neural Networks Overview
No ratings yet
Graph Neural Networks Overview
1 page
19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
NLP End Sem Paper - Evaluation Scheme
No ratings yet
NLP End Sem Paper - Evaluation Scheme
14 pages
Speech Recognition Using Machine Learning
No ratings yet
Speech Recognition Using Machine Learning
8 pages
RNN and LSTM Overview
No ratings yet
RNN and LSTM Overview
15 pages
AI Statistical Methods Course
No ratings yet
AI Statistical Methods Course
23 pages
CNN Guide for Machine Learning Students
No ratings yet
CNN Guide for Machine Learning Students
37 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
5 pages
Image Captioning with TensorFlow Guide
0% (1)
Image Captioning with TensorFlow Guide
2 pages
A Survey On Vision Transformer
No ratings yet
A Survey On Vision Transformer
23 pages
Infosys Pragathi Report
No ratings yet
Infosys Pragathi Report
68 pages
Face Detection and Smile Detection
No ratings yet
Face Detection and Smile Detection
8 pages
01 - ML Introduction - Course Outline
No ratings yet
01 - ML Introduction - Course Outline
21 pages
Customer Churn Prediction in Credit Unions
No ratings yet
Customer Churn Prediction in Credit Unions
70 pages
SEAT STATUS DETECTION PPT
No ratings yet
SEAT STATUS DETECTION PPT
23 pages
Regression for ML Beginners
No ratings yet
Regression for ML Beginners
18 pages
Deep Learning Course Overview
100% (1)
Deep Learning Course Overview
122 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
120 pages
Machine Learning Workshop Overview
0% (1)
Machine Learning Workshop Overview
3 pages
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
100% (1)
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
51 pages
Week8 - Machine Learning
No ratings yet
Week8 - Machine Learning
35 pages
Deep Learning LectureCNN
No ratings yet
Deep Learning LectureCNN
28 pages
Project
100% (1)
Project
30 pages
Vision-Face Recognition Attendance Monitoring System For Surveillance Using Deep Learning Technology and Computer Vision
No ratings yet
Vision-Face Recognition Attendance Monitoring System For Surveillance Using Deep Learning Technology and Computer Vision
5 pages
Image Enhancement Techniques
No ratings yet
Image Enhancement Techniques
15 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
25 pages
Federated Learning - Hope and Scope
No ratings yet
Federated Learning - Hope and Scope
4 pages
Step-by-Step Backpropagation Guide
No ratings yet
Step-by-Step Backpropagation Guide
13 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
27 pages
Introduction to Machine Learning Concepts
100% (1)
Introduction to Machine Learning Concepts
8 pages
Advanced Deep Learning Syllabus
No ratings yet
Advanced Deep Learning Syllabus
2 pages
Hypothesis Space and Inductive Bias, Training, Test Data and Cross Validation
No ratings yet
Hypothesis Space and Inductive Bias, Training, Test Data and Cross Validation
53 pages
Image Captioning for CS Students
No ratings yet
Image Captioning for CS Students
13 pages
Image Caption Generator
100% (1)
Image Caption Generator
20 pages
Synopsis Main
No ratings yet
Synopsis Main
11 pages
Lecture 4 Part2
No ratings yet
Lecture 4 Part2
28 pages
Spintronics ML Accelerator PDF
No ratings yet
Spintronics ML Accelerator PDF
40 pages
DRUM: A Dynamic Range Unbiased Multiplier For Approximate Applications
No ratings yet
DRUM: A Dynamic Range Unbiased Multiplier For Approximate Applications
8 pages
DRUM: A Dynamic Range Unbiased Multiplier For Approximate Applications
No ratings yet
DRUM: A Dynamic Range Unbiased Multiplier For Approximate Applications
8 pages
Sequential Circuit Design
No ratings yet
Sequential Circuit Design
35 pages
Adaptive Signal Processing Bernard Widrow, Peter N. Stearns
No ratings yet
Adaptive Signal Processing Bernard Widrow, Peter N. Stearns
14 pages
A PC-Based Controller For The Stiquito Robot: Feature Article
No ratings yet
A PC-Based Controller For The Stiquito Robot: Feature Article
4 pages
Comprehensive Copywriting Guide
100% (2)
Comprehensive Copywriting Guide
24 pages
Judo Coaches' Professional Education Insights
No ratings yet
Judo Coaches' Professional Education Insights
11 pages
DLL-PR1-Week-5 With Catch Up Friday Integration
No ratings yet
DLL-PR1-Week-5 With Catch Up Friday Integration
17 pages
Conference On Rnai Based Pesticides Programme
No ratings yet
Conference On Rnai Based Pesticides Programme
7 pages
Cognitive Domain Examples of Symptoms or Observations Examples of Assessments
No ratings yet
Cognitive Domain Examples of Symptoms or Observations Examples of Assessments
5 pages
BIBO Stability in Digital Control Systems
No ratings yet
BIBO Stability in Digital Control Systems
51 pages
COVID Safety Compliance Checklist
No ratings yet
COVID Safety Compliance Checklist
4 pages
Overview Management and Exploitation of Fishery Resources of Cameroon PDF
No ratings yet
Overview Management and Exploitation of Fishery Resources of Cameroon PDF
70 pages
Final Project Report PDF
No ratings yet
Final Project Report PDF
35 pages
Grashof Law
No ratings yet
Grashof Law
7 pages
How To Make The Mini Metal Foundry
0% (1)
How To Make The Mini Metal Foundry
18 pages
Oily Water Separator Regulations
100% (2)
Oily Water Separator Regulations
31 pages
MBA HR Case Study Analysis
No ratings yet
MBA HR Case Study Analysis
16 pages
Maxima-Minima Problems in Calculus
No ratings yet
Maxima-Minima Problems in Calculus
21 pages
Structural Engineer Profile
No ratings yet
Structural Engineer Profile
1 page
Science Megabrain Report
No ratings yet
Science Megabrain Report
40 pages
The Girl Effect - Laura Lora
No ratings yet
The Girl Effect - Laura Lora
6 pages
Luna and The Lost Star
No ratings yet
Luna and The Lost Star
5 pages
Spivak's Postcolonial Criticism
0% (1)
Spivak's Postcolonial Criticism
5 pages
Possessive Nouns
100% (1)
Possessive Nouns
4 pages
Negotiation and Cooperation Multi Agent
No ratings yet
Negotiation and Cooperation Multi Agent
19 pages
Wordform
No ratings yet
Wordform
31 pages
AOT Web
No ratings yet
AOT Web
24 pages
MATLAB A Ubiquitous Tool For The Practical Engineer
No ratings yet
MATLAB A Ubiquitous Tool For The Practical Engineer
558 pages
TOK Exhibition Commentary Exemplar
No ratings yet
TOK Exhibition Commentary Exemplar
11 pages
03 Lecture Notes Development of Sociology & Sociologist
No ratings yet
03 Lecture Notes Development of Sociology & Sociologist
5 pages
Roderick Jones - Conference Interpreting Explained
100% (1)
Roderick Jones - Conference Interpreting Explained
19 pages
Mann's 1491 Article Reading Guide
No ratings yet
Mann's 1491 Article Reading Guide
4 pages
Key Lists of India's Natural Wonders
No ratings yet
Key Lists of India's Natural Wonders
41 pages

Image Captioning Using CNN & LSTM: Digital Signal Processing Laboratory (EEE - 316)

Uploaded by

Image Captioning Using CNN & LSTM: Digital Signal Processing Laboratory (EEE - 316)

Uploaded by

Digital Signal Processing Laboratory ( EEE -316)

Image Captioning Using

Uday Kamal Hasib Amin Rajib Al-Sabah Abhishek Shushil

Id : 1406041 Id : 1406045 Id : 1406035 Id : 1406034

Bangladesh University of Engineering and Technology (BUET)

RNNs are general computers which can learn algorithms to map

Ct is the cell state, which flows through the

Input Gate Layer

New contribution to cell state

Update Cell State (memory):

Output Gate Layer

Output to next layer

1) A boy runs as others play on a home-made slip and

2) Children in swimming clothes in a field.

3) Little kids are playing outside with a water hose and

4) Several children are playing outside with a wet tarp on

5) Several children playing on a homemade water slide.

● Visual to Text systems for blind people

● Search Engines for searching medical records based on

● Auto Tagging different imaging data

● Auto Video tagging and summary generation

You might also like