0% found this document useful (0 votes)

76 views34 pages

CSC2535: 2013 Advanced Machine Learning Lecture 8b: Image Retrieval Using Multilayer Neural Networks

This document summarizes Geoffrey Hinton's work on using deep neural networks for image retrieval. It discusses training multilayer neural networks to extract low-dimensional representations of images and using these representations to retrieve similar images. Specifically, it describes using autoencoders to learn binary codes for images and evaluating different methods like semantic hashing for fast retrieval of images based on these learned codes. It also compares the performance of deep autoencoders to other spectral methods for assigning binary codes to images for retrieval purposes.

Uploaded by

VAIJAYANTHI S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views34 pages

CSC2535: 2013 Advanced Machine Learning Lecture 8b: Image Retrieval Using Multilayer Neural Networks

Uploaded by

VAIJAYANTHI S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

CSC2535: 2013

Advanced Machine Learning

Lecture 8b

Image retrieval using

multilayer neural networks

Geoffrey Hinton
Overview
• An efficient way to train a multilayer neural network to
extract a low-dimensional representation.
• Document retrieval (published work with Russ Salakhutdinov)
– How to model a bag of words with an RBM
– How to learn binary codes
– Semantic hashing: retrieval in no time
• Image retrieval (published work with Alex Krizhevsky)
– How good are 256-bit codes for retrieval of small
color images?
– Ways to use the speed of semantic hashing for much
higher-quality image retrieval (work in progress).
Deep Autoencoders 28x28
(with Ruslan Salakhutdinov) W1T
1000 neurons
• They always looked like a really W2T
nice way to do non-linear 500 neurons
dimensionality reduction: W3T
– But it is very difficult to 250 neurons
optimize deep autoencoders W4T
using backpropagation. 30
• We now have a much better way W4
to optimize them: 250 neurons
W3
– First train a stack of 4 RBM’s
– Then “unroll” them. 500 neurons
W2
– Then fine-tune with backprop.
1000 neurons
W1
28x28
A comparison of methods for compressing
digit images to 30 real numbers.

real
data
30-D
deep auto
30-D logistic
PCA
30-D
PCA
Compressing a document count vector to 2 numbers
output
2000 reconstructed counts vector

• We train the
500 neurons
autoencoder to
reproduce its input
250 neurons vector as its output
• This forces it to
compress as much
2 real-valued units information as possible
into the 2 real numbers
250 neurons
in the central bottleneck.
• These 2 numbers are
then a good way to
500 neurons visualize documents.

We need a special type

2000 word counts
of RBM to model counts
First compress all documents to 2 numbers using a type of PCA
Then use different colors for different document categories

Yuk!
First compress all documents to 2 numbers.
Then use different colors for different document categories
The replicated softmax model: How to
modify an RBM to model word count vectors
• Modification 1: Keep the binary hidden units but use
“softmax” visible units that represent 1-of-N
• Modification 2: Make each hidden unit use the same
weights for all the visible softmax units.
• Modification 3: Use as many softmax visible units as
there are non-stop words in the document.
– So its actually a family of different-sized RBMs that
share weights. Its not a single generative model.
• Modification 4: Multiply each hidden bias by the number
of words in the document (not done in our earlier work)
• The replicated softmax model is much better at modeling
bags of words than LDA topic models (in NIPS 2009)
The replicated softmax model

All the models in this family have 5 hidden

units. This model is for 8-word documents.
Finding real-valued codes for retrieval
2000 reconstructed counts
• Train an auto-encoder using
10 real-valued units in the code 500 neurons
layer.

• Compare with Latent Semantic 250 neurons

Analysis that uses PCA on the
transformed count vector
10
• Non-linear codes are much
better. 250 neurons

500 neurons

2000 word counts

Retrieval performance on 400,000 Reuters
business news stories
Finding binary codes for documents
2000 reconstructed counts
• Train an auto-encoder using 30
logistic units for the code layer. 500 neurons
• During the fine-tuning stage,
add noise to the inputs to the
code units. 250 neurons
– The “noise” vector for each
training case is fixed. So we
still get a deterministic 30
gradient. noise
– The noise forces their
activities to become bimodal 250 neurons
in order to resist the effects
of the noise.
500 neurons
– Then we simply threshold the
activities of the 30 code units
to get a binary code. 2000 word counts
Using a deep autoencoder as a hash-function
for finding approximate matches

hash
function

“supermarket search”
Another view of semantic hashing

• Fast retrieval methods typically work by

intersecting stored lists that are associated with
cues extracted from the query.
• Computers have special hardware that can
intersect 32 very long lists in one instruction.
– Each bit in a 32-bit binary code specifies a list
of half the addresses in the memory.
• Semantic hashing uses machine learning to map
the retrieval problem onto the type of list
intersection the computer is good at.
How good is a shortlist found this way?

• Russ has only implemented it for a million

documents with 20-bit codes --- but what could
possibly go wrong?
– A 20-D hypercube allows us to capture enough
of the similarity structure of our document set.
• The shortlist found using binary codes actually
improves the precision-recall curves of TF-IDF.
– Locality sensitive hashing (the fastest other
method) is much slower and has worse
precision-recall curves.
Semantic hashing for image retrieval

• Currently, image retrieval is typically done by

using the captions. Why not use the images too?
– Pixels are not like words: individual pixels do
not tell us much about the content.
– Extracting object classes from images is hard.
• Maybe we should extract a real-valued vector
that has information about the content?
– Matching real-valued vectors in a big
database is slow and requires a lot of storage
• Short binary codes are easy to store and match
A two-stage method

• First, use semantic hashing with 30-bit binary

codes to get a long “shortlist” of promising
images.
• Then use 256-bit binary codes to do a serial
search for good matches.
– This only requires a few words of storage per
image and the serial search can be done
using fast bit-operations.
• But how good are the 256-bit binary codes?
– Do they find images that we think are similar?
Some depressing competition

• Inspired by the speed of semantic hashing, Weiss,

Fergus and Torralba (NIPS 2008) used a very fast
spectral method to assign binary codes to images.
– This eliminates the long learning times required by
deep autoencoders.
• They claimed that their spectral method gave better
retrieval results than training a deep auto-encoder using
RBM’s.
– But they could not get RBM’s to work well for
extracting features from RGB pixels so they started
from 384 GIST features.
– This is too much dimensionality reduction too soon.
A comparison of deep auto-encoders and
the spectral method using 256-bit codes
(Alex Krizhevsky)

• Train auto-encoders “properly”

– Use Gaussian visible units with fixed variance.
Do not add noise to the reconstructions.
– Use a cluster machine or a big GPU board.
– Use a lot of hidden units in the early layers.
• Then compare with the spectral method
– The spectral method has no free parameters.
• Also compare with Euclidean match in pixel space
Krizhevsky’s deep autoencoder
The encoder 256-bit binary code
has about
67,000,000 There is no
parameters. 512
theory to justify
1024 this architecture
It takes a few
GTX 285 GPU
days to train on 2048
two million
images. 4096

8192

1024 1024 1024

E
A

E
The next step

• Implement the semantic hashing stage for

images.
• Check that a long shortlist still contains many
good matches.
– It works OK for documents, but they are very
different from images.
– Losing some recall may be OK. People don’t
miss what they don’t know about.
An obvious extension
• Use a multimedia auto-encoder that represents
captions and images in a single code.
– The captions should help it extract more
meaningful image features such as
“contains an animal” or “indoor image”
• RBM’s already work much better than standard
LDA topic models for modeling bags of words.
– So the multimedia auto-encoder should be
+ a win (for images)
+ a win (for captions)
+ a win (for the interaction during training)
A less obvious extension

• Semantic hashing gives incredibly fast retrieval

but its hard to go much beyond 32 bits.
• We can afford to use semantic hashing several
times with variations of the query and merge the
shortlists
– Its easy to enumerate the hamming ball
around a query image address in ascending
address order, so merging is linear time.
• Apply many transformations to the query image
to get transformation independent retrieval.
– Image translations are an obvious candidate.
Summary

• Restricted Boltzmann Machines provide an efficient way

to learn a layer of features without any supervision.
– Many layers of representation can be learned by
treating the hidden states of one RBM as the data for
the next.

• This allows us to learn very deep nets that extract short

binary codes for unlabeled images or documents.
– Using 32-bit codes as addresses allows us to get
approximate matches at the speed of hashing.

• Semantic hashing is fast enough to allow many retrieval

cycles for a single query image.
– So we can try multiple transformations of the query.
A more interesting extension
• Computer vision uses images of uniform resolution.
– Multi-resolution images still keep all the high-
resolution pixels.
• Even on 32x32 images, people use a lot of eye
movements to attend to different parts of the image.
– Human vision copes with big translations by
moving the fixation point.
– It only samples a tiny fraction of the image at
high resolution. The “post-retinal’’ image has
resolution that falls off rapidly outside the fovea.
– With less “neurons” intelligent sampling
becomes even more important.
How to perceive a big picture with a
small brain
• Even a human brain
cannot afford high-
resolution everywhere.
– By limiting the input we
make it possible to use
many layers of dense
features intelligently.
• For fine discrimination
that requires high-
resolution in several
different places we must
integrate over several
fixations. A much better “retina”.
A more human metric for image similarity
• Two images are similar if fixating at point X in one
image and point Y in the other image gives similar
post-retinal images.
• So use semantic hashing on post-retinal images.
– The address space is used for post-retinal
images and each address points to the whole
image that the post-retinal image came from.
– So we can accumulate similarity over multiple
fixations.
• The whole image addresses found after each
fixation have to be sorted to allow merging 
Starting from a better input
representation
• First learn a good model for object recognition
that can deal wit multiple objects in the same
image.
• Then use the outputs of the last hidden layer as
the inputs to a deep autoencoder.
• This should work really well.
– Euclidean distance on the activities n the last
hidden layer already works extremely well.
cue Euclidean nearest neighbors using the
4096 activities in the last hidden layer

Feature Engineering in ML & NLP
No ratings yet
Feature Engineering in ML & NLP
85 pages
Lecture 19
No ratings yet
Lecture 19
19 pages
Foundations of Computer Vision Techniques
No ratings yet
Foundations of Computer Vision Techniques
58 pages
VietNguyen MasterThesis
No ratings yet
VietNguyen MasterThesis
66 pages
Deep Learning for Image Retrieval
No ratings yet
Deep Learning for Image Retrieval
44 pages
DL Tutorial NIPS2015 PDF
No ratings yet
DL Tutorial NIPS2015 PDF
133 pages
Learning Multiple Layers of Features From Tiny Images. Alex Krizhevsky
No ratings yet
Learning Multiple Layers of Features From Tiny Images. Alex Krizhevsky
60 pages
Module 1
No ratings yet
Module 1
18 pages
Article cvprw15
No ratings yet
Article cvprw15
9 pages
1 AI - Introduction and ML
No ratings yet
1 AI - Introduction and ML
32 pages
Lecture 1 AI Summary
No ratings yet
Lecture 1 AI Summary
31 pages
Image Searches, Abstraction, Invariance: 36-350: Data Mining 2 September 2009
No ratings yet
Image Searches, Abstraction, Invariance: 36-350: Data Mining 2 September 2009
27 pages
Images, Neural Networks, CNNs
No ratings yet
Images, Neural Networks, CNNs
26 pages
Recurrent Neural Networks (RNNS) : 10-301/10-601 Introduction To Machine Learning
No ratings yet
Recurrent Neural Networks (RNNS) : 10-301/10-601 Introduction To Machine Learning
86 pages
Chapitre 8 2024
No ratings yet
Chapitre 8 2024
231 pages
Neural Network Data Simplification
No ratings yet
Neural Network Data Simplification
10 pages
Image Indexing & Retrieval Methods
No ratings yet
Image Indexing & Retrieval Methods
4 pages
Lecture4 - Convnets For CV Slide
No ratings yet
Lecture4 - Convnets For CV Slide
65 pages
Image Retrieval Pretrained Models
No ratings yet
Image Retrieval Pretrained Models
57 pages
Ml@ok Questions
No ratings yet
Ml@ok Questions
16 pages
Lec 6
No ratings yet
Lec 6
31 pages
Harsha Thesis
No ratings yet
Harsha Thesis
62 pages
CNN Course: Build & Apply Networks
No ratings yet
CNN Course: Build & Apply Networks
95 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
78 pages
A Computational Cognitive Model of Human Memory Based On Invertible Neural Networks
No ratings yet
A Computational Cognitive Model of Human Memory Based On Invertible Neural Networks
9 pages
DL Assignment 4 and 5 PDF
No ratings yet
DL Assignment 4 and 5 PDF
12 pages
Introduction to Data Science: (Khoa học dữ liệu)
No ratings yet
Introduction to Data Science: (Khoa học dữ liệu)
91 pages
Cours 8 B
No ratings yet
Cours 8 B
39 pages
L10 Image Classification
No ratings yet
L10 Image Classification
10 pages
Patil New Project Report
No ratings yet
Patil New Project Report
45 pages
Content Based Image Retrieval Using Deep Learning Process: Cluster Computing March 2019
No ratings yet
Content Based Image Retrieval Using Deep Learning Process: Cluster Computing March 2019
17 pages
CNN Applications in Computer Vision
No ratings yet
CNN Applications in Computer Vision
65 pages
COMP3411 Week 7 - Computer Vision
No ratings yet
COMP3411 Week 7 - Computer Vision
58 pages
Computer Vision Introduction
No ratings yet
Computer Vision Introduction
42 pages
Part 2
No ratings yet
Part 2
225 pages
Briefing 19EEE362
No ratings yet
Briefing 19EEE362
37 pages
Loreggia Giacomo
No ratings yet
Loreggia Giacomo
80 pages
Image Retrieval Thesis
100% (3)
Image Retrieval Thesis
6 pages
Rag Expected Interview Question
No ratings yet
Rag Expected Interview Question
6 pages
Semantic Image Segmentation for Autonomous Driving
No ratings yet
Semantic Image Segmentation for Autonomous Driving
38 pages
(RMIT Hack-A-Venture 2024) AI Workshop
No ratings yet
(RMIT Hack-A-Venture 2024) AI Workshop
40 pages
Lecture 5
No ratings yet
Lecture 5
114 pages
Image Recognition in Self-Driving Cars Using CNN
No ratings yet
Image Recognition in Self-Driving Cars Using CNN
7 pages
Understanding Deep Learning Concepts
No ratings yet
Understanding Deep Learning Concepts
74 pages
Discussion 1 - Introduction
No ratings yet
Discussion 1 - Introduction
26 pages
DL U3 Applications of Deep Learning To Computer Vision: Image Classification Object Detection
No ratings yet
DL U3 Applications of Deep Learning To Computer Vision: Image Classification Object Detection
15 pages
Short Course On Deep Learning: Welcome!!
No ratings yet
Short Course On Deep Learning: Welcome!!
57 pages
Deep Learning for Facial Recognition
No ratings yet
Deep Learning for Facial Recognition
6 pages
Deep Learning for Visual Recognition
No ratings yet
Deep Learning for Visual Recognition
82 pages
Module-1 DL
No ratings yet
Module-1 DL
53 pages
Cv2021-Lec1-Introduction 1600 PDF - Gdrive.vip
No ratings yet
Cv2021-Lec1-Introduction 1600 PDF - Gdrive.vip
61 pages
CV #1 Course Introduction-1
No ratings yet
CV #1 Course Introduction-1
61 pages
CNN Basic
No ratings yet
CNN Basic
64 pages
Lecture 4
No ratings yet
Lecture 4
36 pages
CV 2 Marks
No ratings yet
CV 2 Marks
11 pages
Springer Template
No ratings yet
Springer Template
6 pages
Divide-and-Conquer Based Ensemble To Spot Emotions
No ratings yet
Divide-and-Conquer Based Ensemble To Spot Emotions
9 pages
Deep Neural Network Architecture: Application For Facial Expression Recognition
No ratings yet
Deep Neural Network Architecture: Application For Facial Expression Recognition
5 pages
Schools and Contact Information in Chennai
67% (3)
Schools and Contact Information in Chennai
78 pages
Forbidden Park
No ratings yet
Forbidden Park
5 pages
Moslima Bibi Invoice No-1
No ratings yet
Moslima Bibi Invoice No-1
1 page
Jan 27, 2025 Ir 134rashesbhayand Mumbai: Date: To
No ratings yet
Jan 27, 2025 Ir 134rashesbhayand Mumbai: Date: To
4 pages
Maintenance by Robotics
No ratings yet
Maintenance by Robotics
17 pages
Expanded Polystyrene (EPS) in Road Construction-20 Years of Italian Experiences
No ratings yet
Expanded Polystyrene (EPS) in Road Construction-20 Years of Italian Experiences
8 pages
Quantformer From Attention To Profit With A Quantitative Transformer Trading Strategy
No ratings yet
Quantformer From Attention To Profit With A Quantitative Transformer Trading Strategy
40 pages
Luzon Lowlands Music Guide
No ratings yet
Luzon Lowlands Music Guide
14 pages
Small Scale Industries Small Scale Industries (SSI) Are Those Industries in Which Manufacturing, Providing
No ratings yet
Small Scale Industries Small Scale Industries (SSI) Are Those Industries in Which Manufacturing, Providing
9 pages
Quincy Selected Paintings
100% (1)
Quincy Selected Paintings
56 pages
AirTraffic (Version1 0)
No ratings yet
AirTraffic (Version1 0)
10 pages
Types of Auxiliary Components
No ratings yet
Types of Auxiliary Components
15 pages
Multilin 889: Grid Solutions
No ratings yet
Multilin 889: Grid Solutions
16 pages
Assigmt - Module 10 Assisting With Medications Group Assignment-Part I
No ratings yet
Assigmt - Module 10 Assisting With Medications Group Assignment-Part I
4 pages
Microeconomics Group Assignment
No ratings yet
Microeconomics Group Assignment
4 pages
Magnetic Properties and Paramagnetism
50% (2)
Magnetic Properties and Paramagnetism
20 pages
FAA 150-5340-30J (1) - Copy - Pages - 0002
No ratings yet
FAA 150-5340-30J (1) - Copy - Pages - 0002
1 page
Differences Between Prenatal Development and Neonatal Development
No ratings yet
Differences Between Prenatal Development and Neonatal Development
2 pages
10th Math Full Portion Set A
No ratings yet
10th Math Full Portion Set A
4 pages
LG Surgical Monitor - 55MH5K - Datasheet
No ratings yet
LG Surgical Monitor - 55MH5K - Datasheet
2 pages
CO2 LESSON PLAN Science 7
100% (1)
CO2 LESSON PLAN Science 7
2 pages
Drift Away (Arr. Mario Stallbaumer) by Steven Universe Sheet Music For Piano Solo at Sheet Music Direct
No ratings yet
Drift Away (Arr. Mario Stallbaumer) by Steven Universe Sheet Music For Piano Solo at Sheet Music Direct
1 page
Filled Ambika Joining Form Nikhil Sharma
No ratings yet
Filled Ambika Joining Form Nikhil Sharma
4 pages
Mock Test 13
No ratings yet
Mock Test 13
3 pages
CGL Exam Analysis
No ratings yet
CGL Exam Analysis
6 pages
Ceplattyn GT 10
No ratings yet
Ceplattyn GT 10
11 pages
Ballistic Glide Re Entry Vehicle (BGRV) and Indian Missile Program
100% (1)
Ballistic Glide Re Entry Vehicle (BGRV) and Indian Missile Program
35 pages
Alright - I
No ratings yet
Alright - I
8 pages
Sri Jayalakshmi Transport - CH1
No ratings yet
Sri Jayalakshmi Transport - CH1
29 pages
Pva, Peg, CMC
No ratings yet
Pva, Peg, CMC
12 pages
Nexus Between Inflation and Fiscal Deficit - A Comparative Study of India and China - Emerald Insight
No ratings yet
Nexus Between Inflation and Fiscal Deficit - A Comparative Study of India and China - Emerald Insight
3 pages

CSC2535: 2013 Advanced Machine Learning Lecture 8b: Image Retrieval Using Multilayer Neural Networks

Uploaded by

CSC2535: 2013 Advanced Machine Learning Lecture 8b: Image Retrieval Using Multilayer Neural Networks

Uploaded by

CSC2535: 2013

Advanced Machine Learning

Image retrieval using

We need a special type

All the models in this family have 5 hidden

• Compare with Latent Semantic 250 neurons

2000 word counts

• Fast retrieval methods typically work by

• Russ has only implemented it for a million

• Currently, image retrieval is typically done by

• First, use semantic hashing with 30-bit binary

• Inspired by the speed of semantic hashing, Weiss,

• Train auto-encoders “properly”

1024 1024 1024

• Implement the semantic hashing stage for

• Semantic hashing gives incredibly fast retrieval

• Restricted Boltzmann Machines provide an efficient way

• This allows us to learn very deep nets that extract short

• Semantic hashing is fast enough to allow many retrieval

You might also like