0% found this document useful (0 votes)

654 views109 pages

Deep Learning and AI by Andrew Ng

This document summarizes Andrew Ng's talk on using machine learning and brain simulations to advance artificial intelligence. The key points are: 1. Ng discusses using unsupervised feature learning via brain simulations to develop better machine learning algorithms that are easier to use and can enable revolutionary AI advances. 2. Unsupervised feature learning techniques like sparse coding are inspired by models of visual processing in the brain and can automatically learn higher-level feature representations from raw input data like images, audio, and text. 3. These learned features have been shown to improve performance on tasks like image classification, activity recognition, speech recognition by capturing higher-level concepts than hand-engineered features.

Uploaded by

Siddharth Jawahar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

654 views109 pages

Deep Learning and AI by Andrew Ng

Uploaded by

Siddharth Jawahar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Machine Learning and AI via Brain simulations

Andrew Ng
Stanford University & Google
Thanks to:

Stanford:

Adam Coates

Quoc Le

Honglak Lee

Andrew Saxe Andrew Maas Chris Manning Jiquan Ngiam Richard Socher

Will Zou

Google:

Kai Chen

Greg Corrado

Jeff Dean Matthieu Devin Rajat Monga MarcAurelio

Paul Tucker

Kay Le

Andrew Ng

100,000 400

This talk
The idea of deep learning. Using brain simulations, hope to: - Make learning algorithms much better and easier to use. - Make revolutionary advances in machine learning and AI. Vision is not only mine; shared with many researchers: E.g., Samy Bengio, Yoshua Bengio, Tom Dean, Jeff Dean, Nando de Freitas, Jeff Hawkins, Geoff Hinton, Quoc Le, Yann LeCun, Honglak Lee, Tommy Poggio, Ruslan Salakhutdinov, Josh Tenenbaum, Kai Yu, Jason Weston, . I believe this is our best shot at progress towards real AI.

Andrew Ng

What do we want computers to do with our data?

Images/video
Label: Motorcycle Suggest tags Image search

Audio

Speech recognition Music classification Speaker identification

Text

Web search Anti-spam Machine translation

Andrew Ng

Computer vision is hard!

Motorcycle Motorcycle Motorcycle

Motorcycle

Motorcycle Motorcycle

Andrew Ng

What do we want computers to do with our data?

Images/video
Label: Motorcycle Suggest tags Image search

Audio

Speech recognition Speaker identification Music classification

Text

Web search Anti-spam Machine translation

Machine learning performs well on many of these problems, but is a lot of work. What is it about machine learning that makes it so hard to use?

Andrew Ng

Machine learning for image classification

Motorcycle

This talk: Develop ideas using images and audio. Ideas apply to other problems (e.g., text) too.

Andrew Ng

Why is this hard?

You see this:

But the camera sees this:

Andrew Ng

Machine learning and feature representations

pixel 1

Learning algorithm

Input

pixel 2

Raw image

Motorbikes Non-Motorbikes

pixel 2

pixel 1
Andrew Ng

Machine learning and feature representations

pixel 1

Learning algorithm

Input

pixel 2

Raw image

Motorbikes Non-Motorbikes

pixel 2

pixel 1
Andrew Ng

Machine learning and feature representations

pixel 1

Learning algorithm

Input

pixel 2

Raw image

Motorbikes Non-Motorbikes

pixel 2

pixel 1
Andrew Ng

What we want

handlebars

wheel

Feature representation
E.g., Does it have Handlebars? Wheels?

Learning algorithm

Input
Raw image

Motorbikes Non-Motorbikes

Features

pixel 2

pixel 1

Wheels

Handlebars
Andrew Ng

Computing features in computer vision

But we dont have a handlebars detector. So, researchers try to hand-design features to capture various statistical properties of the image.
0.1 0.7 0.6

0.4

0.1 0.5

0.4 0.5

0.1 0.5

0.6
0.7

0.1 0.7 0.4 0.6 0.1 0.4 0.5 0.5

0.2 0.4

0.3 0.4

Find edges at four orientations

Sum up edge strength in each quadrant

Final feature vector

Andrew Ng

Feature representations

Feature Representation

Learning algorithm

Input

Andrew Ng

How is computer perception done?

Images/video
Image Vision features Detection

Audio
Audio Audio features Speaker ID

Text
Text Text features

Text classification, Machine translation, Information retrieval, ....

Andrew Ng

Feature representations

Feature Representation

Learning algorithm

Input

Andrew Ng

Computer vision features

SIFT

Spin image

HoG

RIFT

Textons

GLOH
Andrew Ng

Audio features

Spectrogram

MFCC

Flux

ZCR

Rolloff

Andrew Ng

NLP features

Parser features

Named entity recognition

Stemming

Coming up with features is difficult, timeconsuming, requires expert knowledge. When working applications of learning, we spend a lot of time tuning the features.
Part of speech Ontologies (WordNet)
Andrew Ng

Anaphora

Feature representations

Input

Feature Representation

Learning algorithm

Andrew Ng

The one learning algorithm hypothesis

Auditory Cortex

Auditory cortex learns to see

[Roe et al., 1992]

Andrew Ng

The one learning algorithm hypothesis

Somatosensory Cortex

Somatosensory cortex learns to see

[Metin & Frost, 1989]

Andrew Ng

Sensor representations in the brain

Seeing with your tongue

Human echolocation (sonar)

Haptic belt: Direction sense

Implanting a 3rd eye

[BrainPort; Welsh & Blasch, 1997; Nagel et al., 2005; Constantine-Paton & Law, 2009 ] Andrew Ng

On two approaches to computer perception

The adult visual system computes an incredibly complicated function of the input. We can try to directly implement most of this incredibly complicated function (hand-engineer features). Can we learn this function instead? A trained learning algorithm (e.g., neural network, boosting, decision tree, SVM,) is very complex. But the learning algorithm itself is usually very simple. The complexity of the trained algorithm comes from the data, not the algorithm.

Andrew Ng

Learning input representations

Find a better way to represent images than pixels.

Andrew Ng

Learning input representations

Find a better way to represent audio.

Andrew Ng

Feature learning problem

Given a 14x14 image patch x, can represent it using 196 real numbers.
255 98 93 87 89 91 48

Problem: Can we find a learn a better feature vector to represent this?

Andrew Ng

Self-taught learning (Unsupervised Feature Learning)

Unlabeled images Testing: What is this?

Motorcycles

Not motorcycles
Andrew Ng

Self-taught learning (Unsupervised Feature Learning)

Unlabeled images Testing: What is this?

Motorcycles

Not motorcycles
Andrew Ng

First stage of visual processing: V1

V1 is the first stage of visual processing in the brain. Neurons in V1 typically modeled as edge detectors:

Neuron #1 of visual cortex (model)

Neuron #2 of visual cortex (model)

Andrew Ng

Feature Learning via Sparse Coding

Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection). Input: Images x(1), x(2), , x(m) (each in Rn x n)

Learn: Dictionary of bases f1, f2, , fk (also Rn x n), so that each input x can be approximately decomposed as:

aj fj
j=1

s.t. ajs are mostly zero (sparse)

Andrew Ng

Sparse coding illustration Natural Images

50 100 150 200

Learned bases (f1 , , f64): Edges

250

100

300

150
350

200
400

250
450

300
500 50 100 150 200 250 300 350 400

100

350

450 150
200

500

400

250

450
300

500 50 100 150

350 200 400

250

300

350

400

450

500

450

500 50 100 150 200 250 300 350 400 450 500

Test example 0.8 * + 0.3 * + 0.5 *

0.8 *

f36

+ 0.3 *

f42

+ 0.5 *

f63

[a1, , a64] = [0, 0, , 0, 0.8, 0, , 0, 0.3, 0, , 0, 0.5, 0] More succinct, higher-level, (feature representation)

representation. Andrew Ng

More examples

0.6 *
f15

+ 0.8 *
f28

+ 0.4 *
f37

Represent as: [a15=0.6, a28=0.8, a37 = 0.4].

1.3 *

+ 0.9 *

+ 0.3 *
f29

f5 f18 Represent as: [a5=1.3, a18=0.9, a29 = 0.3].

Method invents edge detection. Automatically learns to represent an image in terms of the edges that appear in it. Gives a more succinct, higher-level representation than the raw pixels. Quantitatively similar to primary visual cortex (area V1) in brain.
Andrew Ng

Sparse coding applied to audio

Image shows 20 basis functions learned from unlabeled audio.

[Evan Smith & Mike Lewicki, 2006]

Andrew Ng

Sparse coding applied to audio

Image shows 20 basis functions learned from unlabeled audio.

[Evan Smith & Mike Lewicki, 2006]

Andrew Ng

Sparse coding applied to touch data

Collect touch data using a glove, following distribution of grasps used by animals in the wild.
Grasps used by animals

[Macfarlane & Graziano, 2009]

Example learned representations Sparse Autoencoder Sample Bases

Number of Neurons

25 20 15 10 5 0 -1

Experimental Data Distribution Biological data

arse Autoencoder Sample Bases

Number of Bases

Learning Algorithm

-0.5 0 0.5 Log (Excitatory/Inhibitory Area) Model Distribution

Sparse RBM Sample Bases

20 15 10 5 0 -1 -0.5 0 0.5 Log (Excitatory/Inhibitory Area) PDF comparisons (p = 0.5872) 0.1 1

Sparse RBM Sample Bases

[Andrew Saxe] Andrew Ng

Learning feature hierarchies

Higher layer (Combinations of edges; cf. V2)

Sparse coding (edges; cf. V1)

Input image (pixels)

[Technical details: Sparse autoencoder or sparse version of Hintons DBN.]

[Lee, Ranganath & Ng, 2007] Andrew Ng

Learning feature hierarchies

Higher layer (Model V3?)

Higher layer (Model V2?)

Model V1

Input image

[Technical details: Sparse autoencoder or sparse version of Hintons DBN.]

[Lee, Ranganath & Ng, 2007] Andrew Ng

Hierarchical Sparse coding (Sparse DBN): Trained on face images

object models

object parts (combination of edges)

Training set: Aligned images of faces.

edges

pixels
[Honglak Lee] Andrew Ng

Hierarchical Sparse coding (Sparse DBN) Features learned from training on different object classes.
Faces
Cars Elephants Chairs

[Honglak Lee] Andrew Ng

Machine learning applications

Andrew Ng

Video Activity recognition (Hollywood 2 benchmark)

Method Hessian + ESURF [Williems et al 2008] Harris3D + HOG/HOF [Laptev et al 2003, 2004] Cuboids + HOG/HOF [Dollar et al 2005, Laptev 2004] Hessian + HOG/HOF [Laptev 2004, Williems et al 2008] Dense + HOG / HOF [Laptev 2004] Cuboids + HOG3D [Klaser 2008, Dollar et al 2005] Unsupervised feature learning (our method)

Accuracy 38% 45% 46% 46% 47% 46% 52%

Unsupervised feature learning significantly improves on the previous state-of-the-art.

[Le, Zhou & Ng, 2011] Andrew Ng

Sparse coding on audio (speech)

Spectrogram

0.9 *

+ 0.7 *

+ 0.2 *

f36

f42

f63

Andrew Ng

Dictionary of bases fi learned for speech

Many bases seem to correspond to phonemes.

[Honglak Lee] Andrew Ng

Hierarchical Sparse coding (sparse DBN) for audio

Spectrogram
[Honglak Lee] Andrew Ng

Hierarchical Sparse coding (sparse DBN) for audio

Spectrogram
[Honglak Lee] Andrew Ng

Hierarchical Sparse coding (sparse DBN) for audio

[Honglak Lee] Andrew Ng

Phoneme Classification (TIMIT benchmark)

Method Clarkson and Moreno (1999) Gunawardana et al. (2005) Sung et al. (2007) Petrov et al. (2007) Sha and Saul (2006) Yu et al. (2006) Unsupervised feature learning (our method)

Accuracy 77.6% 78.3% 78.5% 78.6% 78.9% 79.2% 80.3%

Unsupervised feature learning significantly improves on the previous state-of-the-art.

[Lee et al., Andrew 2009] Ng

State-of-the-art Unsupervised feature learning

Andrew Ng

Images
CIFAR Object classification
Prior art (Ciresan et al., 2011)

Accuracy 80.5%

NORB Object classification

Prior art (Scherer et al., 2010)

Accuracy 94.4%

Stanford Feature learning

82.0%

Stanford Feature learning

95.0%

Galaxy Video
Hollywood2 Classification
Prior art (Laptev et al., 2004)
Stanford Feature learning

Accuracy

YouTube
Prior art (Liu et al., 2009)
Stanford Feature learning

Accuracy

48%
53% Accuracy 92.1%

71.2%
75.8% Accuracy 85.6%

KTH
Prior art (Wang et al., 2010)

UCF
Prior art (Wang et al., 2010)

Stanford Feature learning

93.9%

Stanford Feature learning

86.5%

Text/NLP
Paraphrase detection
Prior art (Das & Smith, 2009) Stanford Feature learning

Accuracy 76.1% 76.4%

Sentiment (MR/MPQA data)

Prior art (Nakagawa et al., 2010) Stanford Feature learning

Accuracy 77.3% 77.7%

Multimodal (audio/video)
AVLetters Lip reading
Prior art (Zhao et al., 2009) Stanford Feature learning

Accuracy 58.9% 65.8%

Other unsupervised feature learning records: Pedestrian detection (Yann LeCun) Speech recognition (Geoff Hinton) PASCAL VOC object classification (Kai Yu)
Andrew Ng

Technical challenge: Scaling up

Andrew Ng

Supervised Learning
Choices of learning algorithm: Memory based Winnow Perceptron Nave Bayes SVM . What matters the most?
Training set size (millions) [Banko & Brill, 2001]

Its not who has the best algorithm that wins. Its who has the most data.
Andrew Ng

Accuracy

Scaling and classification accuracy (CIFAR-10)

Large numbers of features is critical. The specific learning algorithm is important, but ones that can scale to many features also have a big advantage.

[Adam Coates] Andrew Ng

Attempts to scale up
Significant effort spent on algorithmic tricks to get algorithms to run faster. Efficient sparse coding. [LeCun, Ng, Yu] Efficient posterior inference [Bengio, Hinton] Convolutional Networks. [Bengio, de Freitas, LeCun, Lee, Ng] Tiled Networks. [Hinton, Ng]

Randomized/fast parameter search. [DiCarlo, Ng]

Massive data synthesis. [LeCun, Schmidhuber] Massive embedding models [Bengio, Collobert, Hinton, Weston]

Fast decoder algorithms. [LeCun, Lee, Ng, Yu]

GPU, FPGA and ASIC implementations. [Dean, LeCun, Ng, Olukotun]

Andrew Ng

Images
CIFAR Object classification
Prior art (Ciresan et al., 2011)

Accuracy 80.5%

NORB Object classification

Prior art (Scherer et al., 2010)

Accuracy 94.4%

Stanford Feature learning

82.0%

Stanford Feature learning

95.0%

Galaxy Video
Hollywood2 Classification
Prior art (Laptev et al., 2004)
Stanford Feature learning

Accuracy

YouTube
Prior art (Liu et al., 2009)
Stanford Feature learning

Accuracy

48%
53% Accuracy 92.1%

71.2%
75.8% Accuracy 85.6%

KTH
Prior art (Wang et al., 2010)

UCF
Prior art (Wang et al., 2010)

Stanford Feature learning

93.9%

Stanford Feature learning

86.5%

Text/NLP
Paraphrase detection
Prior art (Das & Smith, 2009) Stanford Feature learning

Accuracy 76.1% 76.4%

Sentiment (MR/MPQA data)

Prior art (Nakagawa et al., 2010) Stanford Feature learning

Accuracy 77.3% 77.7%

Multimodal (audio/video)
AVLetters Lip reading
Prior art (Zhao et al., 2009) Stanford Feature learning

Accuracy 58.9% 65.8%

Other unsupervised feature learning records: Pedestrian detection (Yann LeCun) Speech recognition (Geoff Hinton) PASCAL VOC object classification (Kai Yu)
Andrew Ng

Scaling up: Discovering object classes

[Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Greg Corrado, Matthieu Devin, Kai Chen, Jeff Dean] Andrew Ng

Training procedure
What features can we learn if we train a massive model on a massive amount of data. Can we learn a grandmother cell? Train on 10 million images (YouTube) 1000 machines (16,000 cores) for 1 week. 1.15 billion parameters Test on novel images

Training set (YouTube)

Test set (FITW + ImageNet)

Andrew Ng

Face neuron

Top Stimuli from the test set

Optimal stimulus by numerical optimization

Andrew Ng

Random distractors Faces

Invariance properties

Feature response

+15 pixels

+15 pixels
Vertical shift

Horizontal shift

Feature response

o 90 3D rotation angle

Feature response

1.6x

Scale factor
Andrew Ng

Cat neuron

Top Stimuli from the test set

Optimal stimulus by numerical optimization

Andrew Ng

Cat face neuron

Random distractors

Cat faces

Visualization

Top Stimuli from the test set

Optimal stimulus by numerical optimization

Pedestrian neuron

Random distractors Pedestrians

Weaknesses & Criticisms

Andrew Ng

Weaknesses & Criticisms

Youre learning everything. Its better to encode prior knowledge about structure of images (or audio, or text).
A: Wasnt there a similar machine learning vs. linguists debate in NLP ~20 years ago.

Unsupervised feature learning cannot currently do X, where X is:

Go beyond Gabor (1 layer) features. Work on temporal data (video). Learn hierarchical representations (compositional semantics). Get state-of-the-art in activity recognition. Get state-of-the-art on image classification. Get state-of-the-art on object detection. Learn variable-size representations. A: Many of these were true, but not anymore (were not fundamental weaknesses). Theres still work to be done though! We dont understand the learned features. A: True. Though many vision/audio/etc. features also suffer from this (e.g, concatenations/combinations of different features).
Andrew Ng

Conclusion

Andrew Ng

Unsupervised Feature Learning Summary

Deep Learning and Self-Taught learning: Lets learn rather than manually design our features. Discover the fundamental computational principles that underlie perception? Sparse coding and deep versions very successful on vision and audio tasks. Other variants for learning recursive representations. To get this to work for yourself, see online tutorial: [Link]
Car

Motorcycle

Unlabeled images

Thanks to:

Stanford

Adam Coates

Quoc Le

Honglak Lee

Andrew Saxe Andrew Maas Chris Manning Jiquan Ngiam Richard Socher

Will Zou

Google

Kai Chen

Greg Corrado

Jeff Dean Matthieu Devin Rajat Monga MarcAurelio

Paul Tucker

Kay Le

Andrew Ng

Advanced topics + Research philosophy

Andrew Ng
Stanford University & Google

Andrew Ng

Learning Recursive Representations

Andrew Ng

Feature representations of words

Imagine taking each word, and computing an n-dimensional feature vector for it.
[Distributional representations, or Bengio et al., 2003, Collobert & Weston, 2008.]

2-d embedding example below, but in practice use ~100-d embeddings.

5
x2

Monday Tuesday

2 4 2.1 3.3

8 5

3
2 1
0 1 2

Britain
France

9 2
9.5 1.5

0 0 0 0 1 0 0 0

0 1 0 0 0 0 0 0

Monday

Britain

6 x1

On Representation:
8 5

Monday, Britain .
2 4 9 2
Andrew Ng

Generic hierarchy on text doesnt make sense

Node has to represent sentence fragment cat sat on. Doesnt make sense.

9 1

5 3

7 1

8 5

9 1

4 3

The

The cat

cat sat

the

mat.
Feature representation for words
Andrew Ng

What we want (illustration)

S VP This nodes job is to represent on the mat.

NP NP

9 1

5 3

7 1

8 5

9 1

4 3

The

The cat

cat sat

the

mat.

Andrew Ng

What we want (illustration)

5 4

S
7 3

This nodes job is to represent on the mat.

8 3

5 2

3 3

9 1

5 3

7 1

8 5

9 1

4 3

The

The cat

cat sat

the

mat.

Andrew Ng

What we want (illustration)

5
x2

4
3 2

The day after my birthday Monday Tuesday

The country of my birth Britain France

1
0 1 2
3 5 8 3

5 6 x1

10
9 3 9 2

5 2

3 3

2 8

3 2

g 5

2 4

9 2

3 2

9 2

g 5

9 2

9 9

3 2

2 2

The

day

after

birthday,

The country

birth Andrew Ng

Learning recursive representations

This nodes job is to represent on the mat.

8 3

3 3

8 5

9 1

4 3

The

cat

the

mat.

Andrew Ng

Learning recursive representations

This nodes job is to represent on the mat.

8 3

3 3

8 5

9 1

4 3

The

cat

the

mat.

Andrew Ng

Learning recursive representations

Basic computational unit: Neural Network that inputs two candidate childrens representations, and outputs: Whether we should merge the two nodes. The semantic representation if the two nodes are merged. This nodes job is to represent on the mat.

8 3

Yes

8 3

3 3

Neural Network
8 5 9 1 4 3

The
8 5 3 3

cat

the

mat.

Andrew Ng

Parsing a sentence

Yes

5 2

0 1

0 0

Yes

3 3

Neural Network

9 1

5 3

7 1

8 5

9 1

4 3

The

The cat

cat sat

the

mat.

Andrew Ng

Parsing a sentence
8 3

0 1

Yes

Neural Network

5 2

3 3

9 1

5 3

7 1

8 5

9 1

4 3

The

The cat

cat sat

the

mat.

Andrew Ng

Parsing a sentence
5 4

7 3
8 3

5 2

3 3

9 1

5 3

7 1

8 5

9 1

4 3

The

The cat

cat sat

the

mat.

Andrew Ng

Finding Similar Sentences

Each sentence has a feature vector representation. Pick a sentence (center sentence) and list nearest neighbor sentences. Often either semantically or syntactically similar. (Digits all mapped to 2.)
Center Sentence Both took further hits yesterday Nearest Neighbor Sentences (most similar feature vector) 1. We 're in for a lot of turbulence ... 2. BSN currently has 2.2 million common shares outstanding 3. This is panic buying 4. We have a couple or three tough weeks coming 1. Our intent is to promote the best alternative, he says 2. We have sufficient cash flow to handle that, he said 3. Currently, average pay for machinists is 22.22 an hour, Boeing said 4. Profit from trading for its own account dropped, the securities firm said

Similarities Bad News

Something said

I had calls all night long from the States, he said

Gains and good news

Fujisawa gained 22 to 2,222

1. 2. 3. 4.

Mochida advanced 22 to 2,222 Commerzbank gained 2 to 222.2 Paris loved her at first sight Profits improved across Hess's businesses
Andrew Ng

Unknown words

Columbia , S.C

1. Greenville , Miss

Finding Similar Sentences

Similarities Declining to comment = not disclosing Large changes in sales or revenue Center Sentence Hess declined to comment Nearest Neighbor Sentences (most similar feature vector) 1. 2. 3. 4. PaineWebber declined to comment Phoenix declined to comment Campeau declined to comment Coastal wouldn't disclose the terms

Sales grew almost 2 % to 222.2 million from 222.2 million

1. Sales surged 22 % to 222.22 billion yen from 222.22 billion 2. Revenue fell 2 % to 2.22 billion from 2.22 billion 3. Sales rose more than 2 % to 22.2 million from 22.2 million 4. Volume was 222.2 million shares , more than triple recent levels

Negation of different types

There's nothing unusual about business groups pushing for more government spending
We were lucky

1. We don't think at this point anything needs to be said 2. It therefore makes no sense for each market to adopt different circuit breakers 3. You can't say the same with black and white 4. I don't think anyone left the place UNK UNK

People in bad situations

1. It was chaotic 2. We were wrong 3. People had died

Andrew Ng

Application: Paraphrase Detection

Task: Decide whether or not two sentences are paraphrases of each other. (MSR Paraphrase Corpus)
Method Baseline Rus et al., (2008) Mihalcea et al., (2006) Islam et al. (2007) Qiu et al. (2006) F1 79.9 80.5 81.3 81.3 81.6

Fernando & Stevenson (2008) (WordNet based features)

Das et al. (2009) Wan et al (2006) (many features: POS, parsing, BLEU, etc.)

82.4
82.7 83.0

Stanford Feature Learning

83.4

Andrew Ng

Parsing sentences and parsing images

A small crowd quietly enters the historic church.

Each node in the hierarchy has a feature vector representation.

Andrew Ng

Nearest neighbor examples for image patches

Each node (e.g., set of merged superpixels) in the hierarchy has a feature vector. Select a node (center patch) and list nearest neighbor nodes. I.e., what image patches/superpixels get mapped to similar features?

Selected patch

Nearest Neighbors
Andrew Ng

Multi-class segmentation (Stanford background dataset)

Method Pixel CRF (Gould et al., ICCV 2009) Classifier on superpixel features

Accuracy 74.3 75.9

Region-based energy (Gould et al., ICCV 2009) Local labelling (Tighe & Lazebnik, ECCV 2010) Superpixel MRF (Tighe & Lazebnik, ECCV 2010) Simultaneous MRF (Tighe & Lazebnik, ECCV 2010) Stanford Feature learning (our method)

76.4 76.9 77.5 77.5 78.1

Andrew Ng

Multi-class Segmentation MSRC dataset: 21 Classes

Methods TextonBoost (Shotton et al., ECCV 2006) Framework over mean-shift patches (Yang et al., CVPR 2007) Pixel CRF (Gould et al., ICCV 2009) Region-based energy (Gould et al., IJCV 2008) Stanford Feature learning (out method)

Accuracy 72.2 75.1 75.3 76.5 76.7

Andrew Ng

Analysis of feature learning algorithms

Andrew Coates Honglak Lee

Andrew Ng

Supervised Learning
Choices of learning algorithm: Memory based Winnow Perceptron Nave Bayes SVM . What matters the most?

Training set size Accuracy

[Banko & Brill, 2001]

Its not who has the best algorithm that wins. Its who has the most data.
Andrew Ng

Unsupervised Feature Learning Many choices in feature learning algorithms; Sparse coding, RBM, autoencoder, etc. Pre-processing steps (whitening) Number of features learned Various hyperparameters.

What matters the most?

Andrew Ng

Unsupervised feature learning

Most algorithms learn Gabor-like edge detectors.

Sparse auto-encoder
Andrew Ng

Unsupervised feature learning

Weights learned with and without whitening.
with whitening without whitening with whitening without whitening

Sparse auto-encoder
with whitening without whitening with whitening

Sparse RBM
without whitening

K-means

Gaussian mixture model

Andrew Ng

Scaling and classification accuracy (CIFAR-10)

Andrew Ng

Results on CIFAR-10 and NORB (old result)

K-means achieves state-of-the-art Scalable, fast and almost parameter-free, K-means does surprisingly well.
CIFAR-10 Test accuracy
Raw pixels RBM with back-propagation 3-Way Factored RBM (3 layers) Mean-covariance RBM (3 layers) Improved Local Coordinate Coding Convolutional RBM Sparse auto-encoder Sparse RBM K-means (Hard) K-means (Triangle, 1600 features) K-means (Triangle, 4000 features) 37.3% 64.8% 65.3% 71.0% 74.5% 78.9% 73.4% 72.4% 68.6% 77.9% 79.6% NORB Test accuracy (error)
Convolutional Neural Networks Deep Boltzmann Machines Deep Belief Networks Jarrett et al., 2009 Sparse auto-encoder 93.4% (6.6%) 92.8% (7.2%) 95.0% (5.0%) 94.4% (5.6%) 96.9% (3.1%)

Sparse RBM
K-means (Hard) K-means (Triangle)

96.2% (3.8%)
96.9% (3.1%) 97.0% (3.0%)

Andrew Ng

Tiled Convolution Neural Networks

Quoc Le

Jiquan Ngiam
Andrew Ng

Learning Invariances We want to learn invariant features. Convolutional networks uses weight tying to: Reduce number of weights that need to be learned. Allows scaling to larger images/models. Hard code translation invariance. Makes it harder to learn more complex types of invariances. Goal: Preserve computational scaling advantage of convolutional nets, but learn more complex invariances.

Andrew Ng

Fully Connected Topographic ICA

Pooling Units (Sqrt)

Simple Units (Square)

Input Doesnt scale to large images.

Andrew Ng

Fully Connected Topographic ICA

Pooling Units (Sqrt)

Simple Units (Square)

Orthogonalize

Input Doesnt scale to large images.

Andrew Ng

Local Receptive Fields

Pooling Units (Sqrt)

Simple Units (Square)

Input

Andrew Ng

Convolution Neural Networks (Weight Tying)

Pooling Units (Sqrt)

Simple Units (Square)

Input

Andrew Ng

Tiled Networks (Partial Weight Tying)

Pooling Units (Sqrt) Tile Size (k) = 2 Simple Units (Square)

Input
Local pooling can capture complex invariances (not just translation); but total number of parameters is small.
Andrew Ng

Tiled Networks (Partial Weight Tying)

Pooling Units (Sqrt) Tile Size (k) = 2 Simple Units (Square)

Input

Andrew Ng

Tiled Networks (Partial Weight Tying)

Pooling Units (Sqrt) Tile Size (k) = 2 Simple Units (Square) Number of Maps (l) =3

Input

Andrew Ng

Tiled Networks (Partial Weight Tying)

Pooling Units (Sqrt) Tile Size (k) = 2 Simple Units (Square) Number of Maps (l) =3
Local Orthogonalization

Input

Andrew Ng

NORB and CIFAR-10 results

Algorithms Deep Tiled CNNs [this work] NORB Accuracy 96.1%

CNNs [Huang & LeCun, 2006]

3D Deep Belief Networks [Nair & Hinton, 2009] Deep Boltzmann Machines [Salakhutdinov & Hinton, 2009]

94.1%
93.5% 92.8%

TICA [Hyvarinen et al., 2001]

SVMs

89.6%
88.4%

Algorithms
Improved LCC [Yu et al., 2010] Deep Tiled CNNs [this work] LCC [Yu et al., 2010] mcRBMs [Ranzato & Hinton, 2010]

CIFAR-10 Accuracy
74.5% 73.1% 72.3% 71.0%

Best of all RBMs [Krizhevsky, 2009]

TICA [Hyvarinen et al., 2001]

64.8%
56.1%
Andrew Ng

Summary/Big ideas

Andrew Ng

Summary/Big ideas Large scale brain simulations as revisiting of the big AI dream. Deep learning has had two big ideas: Learning multiple layers of representation Learning features from unlabeled data

Has worked well so far in two regimes (confusing to outsiders):

Lots of labeled data. Train the heck out of the network.

Unsupervised Feature Learning/Self-Taught learning

Scalability is important. Detailed tutorial: [Link]
Andrew Ng

END END END

Andrew Ng

Machine Learning Yearning V0.5 01
No ratings yet
Machine Learning Yearning V0.5 01
23 pages
Deep Learning for Volumetric Segmentation
No ratings yet
Deep Learning for Volumetric Segmentation
7 pages
Mastering Probabilistic Graphical Models Using Python - Sample Chapter
No ratings yet
Mastering Probabilistic Graphical Models Using Python - Sample Chapter
36 pages
Computational Intelligent Data Analysis For Sustainable Development PDF
No ratings yet
Computational Intelligent Data Analysis For Sustainable Development PDF
443 pages
Machine Learning Handouts
No ratings yet
Machine Learning Handouts
110 pages
Unit 1
No ratings yet
Unit 1
55 pages
Machine Learning Workshop Overview
0% (1)
Machine Learning Workshop Overview
3 pages
Machine Learning
No ratings yet
Machine Learning
27 pages
Bag of Words
No ratings yet
Bag of Words
32 pages
Linear Algebra For Machine Learning
No ratings yet
Linear Algebra For Machine Learning
115 pages
Wo2020056331 Pamph 20200319 4096
100% (2)
Wo2020056331 Pamph 20200319 4096
53 pages
Machine Learning - Applications, Process and Techniques
100% (1)
Machine Learning - Applications, Process and Techniques
241 pages
MI - Unit 3
100% (1)
MI - Unit 3
107 pages
The Hitchhikers Guide To Artificial Intelligence 2018 19.original
No ratings yet
The Hitchhikers Guide To Artificial Intelligence 2018 19.original
18 pages
Deep Learning Career Launch
No ratings yet
Deep Learning Career Launch
15 pages
Chatgpt Cheatsheet - Coders - Section
No ratings yet
Chatgpt Cheatsheet - Coders - Section
59 pages
ARIMA and ARMAX Models in Python
No ratings yet
ARIMA and ARMAX Models in Python
43 pages
TensorFlow Beginner's Guide
No ratings yet
TensorFlow Beginner's Guide
60 pages
MIT - Applied Parallel Computing - Alan Edelman
No ratings yet
MIT - Applied Parallel Computing - Alan Edelman
187 pages
Understanding Machine Learning Basics
100% (1)
Understanding Machine Learning Basics
64 pages
Python Programming
100% (1)
Python Programming
7 pages
Introduction to Machine Learning
100% (1)
Introduction to Machine Learning
17 pages
StaticSpeed Security Assessment
No ratings yet
StaticSpeed Security Assessment
57 pages
Functions of Deep Learning Explained
No ratings yet
Functions of Deep Learning Explained
1 page
Total Listing Machine Learning
100% (1)
Total Listing Machine Learning
114 pages
Jason Brown Lee Text Books
0% (1)
Jason Brown Lee Text Books
1 page
Evaluating Machine Learning Model
100% (4)
Evaluating Machine Learning Model
59 pages
Book 2.0 - Python
100% (1)
Book 2.0 - Python
143 pages
21ECO101T SRWC Unit1 r1
No ratings yet
21ECO101T SRWC Unit1 r1
66 pages
Week8 - Machine Learning
No ratings yet
Week8 - Machine Learning
35 pages
Guidebook Machine Learning Basics PDF
100% (1)
Guidebook Machine Learning Basics PDF
16 pages
PPTs of Business Analytics
No ratings yet
PPTs of Business Analytics
22 pages
s2406052 Report
100% (1)
s2406052 Report
7 pages
Roberts Ryan Machine Learning The Ultimate Beginners Guide F
No ratings yet
Roberts Ryan Machine Learning The Ultimate Beginners Guide F
45 pages
Learning Tensorflow
No ratings yet
Learning Tensorflow
9 pages
Book
No ratings yet
Book
199 pages
Numpy PDF
No ratings yet
Numpy PDF
273 pages
Autoregressive Generative Models Guide
No ratings yet
Autoregressive Generative Models Guide
57 pages
Bayesian Statistics in Machine Learning - 093615
No ratings yet
Bayesian Statistics in Machine Learning - 093615
7 pages
Machine Learning: Andrew NG's Course From Coursera: Presentation
100% (1)
Machine Learning: Andrew NG's Course From Coursera: Presentation
4 pages
Dive Into Deep Learning - Varios
No ratings yet
Dive Into Deep Learning - Varios
1,151 pages
Unsupervised Learning in Machine Learning
No ratings yet
Unsupervised Learning in Machine Learning
11 pages
"Hello World" of Deep Learning
No ratings yet
"Hello World" of Deep Learning
26 pages
Machine Learning, History and Types of ML
No ratings yet
Machine Learning, History and Types of ML
18 pages
M.Tech Deep Learning Exam Guide
100% (1)
M.Tech Deep Learning Exam Guide
6 pages
Correlated Attention Based Transformer For Multivariate Time Series
No ratings yet
Correlated Attention Based Transformer For Multivariate Time Series
15 pages
Quantitative Economics with Python
No ratings yet
Quantitative Economics with Python
543 pages
Intro to Computer Vision with OpenCV
No ratings yet
Intro to Computer Vision with OpenCV
11 pages
Transformers
No ratings yet
Transformers
102 pages
Ch3 Auto Encoder
No ratings yet
Ch3 Auto Encoder
40 pages
NLP Transformers for Data Scientists
No ratings yet
NLP Transformers for Data Scientists
38 pages
Exploring Deep Learning For Language
No ratings yet
Exploring Deep Learning For Language
160 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
155 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
46 pages
The Hundred-Page Machine Learning Book - Andriy Burkov
No ratings yet
The Hundred-Page Machine Learning Book - Andriy Burkov
133 pages
Eccv10 Tutorial Part4
No ratings yet
Eccv10 Tutorial Part4
52 pages
Neural Networks: Representa1on: Non - Linear Hypotheses
No ratings yet
Neural Networks: Representa1on: Non - Linear Hypotheses
34 pages
Neural Networks: Representation: Non-Linear Hypotheses
No ratings yet
Neural Networks: Representation: Non-Linear Hypotheses
35 pages
CNN Course: Build & Apply Networks
No ratings yet
CNN Course: Build & Apply Networks
95 pages
Schizophrenia vs. Kundalini: Similarities
No ratings yet
Schizophrenia vs. Kundalini: Similarities
2 pages
Nursing Care for Mobility & Pain
No ratings yet
Nursing Care for Mobility & Pain
6 pages
Chapter 6 Other Sensory Systems Psy154
No ratings yet
Chapter 6 Other Sensory Systems Psy154
20 pages
Anatomy & Physiology Course Resources
No ratings yet
Anatomy & Physiology Course Resources
6 pages
PerDev11 - Q1 - Mod2 - Aspects of Personal Development - Version 3
90% (20)
PerDev11 - Q1 - Mod2 - Aspects of Personal Development - Version 3
28 pages
MDCAT SUPER BATCH-1 SOS 2025 (Physical)
No ratings yet
MDCAT SUPER BATCH-1 SOS 2025 (Physical)
1 page
Banfield Anesthesia
No ratings yet
Banfield Anesthesia
217 pages
What Are Anxiety Disorders?: Portfolio Output No.13: Research On Anxiety Disorders/ Depression
No ratings yet
What Are Anxiety Disorders?: Portfolio Output No.13: Research On Anxiety Disorders/ Depression
5 pages
Vexation and Agitation
No ratings yet
Vexation and Agitation
2 pages
Neurodevelopmental Disorders
100% (1)
Neurodevelopmental Disorders
5 pages
Axinerve NP Monograph
100% (1)
Axinerve NP Monograph
12 pages
Physical Therapy Documentation Templates PDF
No ratings yet
Physical Therapy Documentation Templates PDF
6 pages
2025 Winter Pre-Test QP
No ratings yet
2025 Winter Pre-Test QP
9 pages
Brain Hemispheres
No ratings yet
Brain Hemispheres
7 pages
Craniovertebral Decompression Guide
No ratings yet
Craniovertebral Decompression Guide
8 pages
Therapeutic Components of Acupuncture Stimulation Based On Characteristics of Sensory Nerve and Nervous Signaling Pathway
No ratings yet
Therapeutic Components of Acupuncture Stimulation Based On Characteristics of Sensory Nerve and Nervous Signaling Pathway
7 pages
NLP High-Performance Game Guide
No ratings yet
NLP High-Performance Game Guide
5 pages
Biology of Depression
100% (1)
Biology of Depression
5 pages
Physiology: Table 1. Distinguishing Properties of Electrical and Chemical Synapses
No ratings yet
Physiology: Table 1. Distinguishing Properties of Electrical and Chemical Synapses
5 pages
Sleep Disorders Questionnaire
100% (2)
Sleep Disorders Questionnaire
2 pages
Neurological Lesion Localization Guide
No ratings yet
Neurological Lesion Localization Guide
9 pages
Assay of Acetylcholinesterase Activity in The Brain
90% (10)
Assay of Acetylcholinesterase Activity in The Brain
3 pages
Adjacent Segment Syndrome After Failed Back Surgery Biomechanics, Diagnosis, and Treatment 2022
No ratings yet
Adjacent Segment Syndrome After Failed Back Surgery Biomechanics, Diagnosis, and Treatment 2022
11 pages
Mosaic Test
No ratings yet
Mosaic Test
5 pages
Effectiveness of Kinesiotaping in Bell'S Palsy: A Literature Review
No ratings yet
Effectiveness of Kinesiotaping in Bell'S Palsy: A Literature Review
29 pages
Visual Acuity Test TEMPLATE
No ratings yet
Visual Acuity Test TEMPLATE
9 pages
TEXILA AMERICAN UNIVERSITY, Neuroscience Presentation
No ratings yet
TEXILA AMERICAN UNIVERSITY, Neuroscience Presentation
25 pages
MGH Handbook of Neurology 2nd
No ratings yet
MGH Handbook of Neurology 2nd
355 pages
Cutts - 2007 - Cubital Tunnel Syndrome
No ratings yet
Cutts - 2007 - Cubital Tunnel Syndrome
4 pages
Physiotherapy for Brain Tumor Care
No ratings yet
Physiotherapy for Brain Tumor Care
62 pages