0% found this document useful (0 votes)

9 views35 pages

Lec 5

UC Berkly CS182 Lecture Notes

Uploaded by

Phạm Thạch Thanh Trúc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views35 pages

Lec 5

UC Berkly CS182 Lecture Notes

Uploaded by

Phạm Thạch Thanh Trúc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Backpropagation

Designing, Visualizing and Understanding Deep Neural Networks

CS W182/282A
Instructor: Sergey Levine
UC Berkeley
Neural networks
Drawing computation graphs
what expression does this compute?
equivalently, what program does this correspond to?

this is a MSE loss with a linear regression model

neural networks are computation graphs

if we design generic tools for computation graphs, we
can train many kinds of neural networks
Drawing computation graphs
what expression does this compute?
a simpler way to draw the same thing: equivalently, what program does this correspond to?

dot product
this is a MSE loss with a linear regression model

neural networks are computation graphs

if we design generic tools for computation graphs, we
can train many kinds of neural networks
Logistic regression
remember this is a vector! let’s draw the computation graph for logistic regression
with the negative log-likelihood loss

what does this produce?

1 0
“one-hot” vector or
0 1
Logistic regression
a simpler way to draw the same thing:

matrix
Drawing it even more concisely
Notice that we have two types of variables:

the parameters usually affect one specific operation

(though there is often parameter sharing, e.g., conv nets – more on this later)

also called fully connected

layer
Neural network diagrams
(simplified) computation graph diagram neural network diagram

often we don’t draw this b/c

every layer has parameters

cross-ent
softmax loss

often we don’t draw this

linear b/c cross-entropy
layer always follows softmax
2x1 2x1

simplified softmax
drawing: linear
layer

2x1 2x1
Logistic regression with features
which layer

Learning the features

which feature
Problem: how do we represent the learned features? = rows of weight matrix
Idea: what if each feature is a (binary) logistic regression output?

per-element sigmoid
not the same as softmax
each feature is independent
Let’s draw this!

2x1
3x2 3x1

cross-ent
sigmoid softmax loss

linear linear
layer layer
2x1 3x1 3x1
Simpler drawing
cross-ent
sigmoid softmax loss

linear linear
layer layer
2x1 3x1 3x1

simpler way to draw the same thing: even simpler:

softmax linear
softmax
sigmoid
layer layer
sigmoid linear
layer layer
2x1 3x1 2x1 2x1 3x1 2x1
Doing it multiple times

2x1
3x2 3x1 3x3 3x1 3x3 3x1

linear
softmax
sigmoid sigmoid sigmoid
layer layer layer layer

2x1 3x1 3x1 3x1 2x1

Activation functions

we don’t have to use a sigmoid!

a wide range of non-linear functions will work we’ll discuss specific choices later
these are called activation functions why non-linear?

multiple linear layers = one linear layer

enough layers = we can represent anything (so long as they’re nonlinear)

softmax
sigmoid sigmoid sigmoid linear
layer layer layer layer
Demo time!

Source: [Link]
Aside: what’s so neural about it?
dendrites receive signals from other neurons artificial “neuron” sums up signals
from upstream neurons
(also referred to as “units”)
neuron “decides”
whether to fire based upstream activations
on incoming signals
neuron “decides” how
much to fire based on
axon transmits signal to
incoming signals
downstream neurons
activations transmitted
to downstream units activation function
Training neural networks
What do we need?
1. Define your model class sigmoid
layer
sigmoid
layer
sigmoid
layer
linear
layer
softmax

2x1 3x1 3x1 3x1 2x1

2. Define your loss function negative log-likelihood, just like before

stochastic gradient descent

3. Pick your optimizer what do we need?

4. Run it on a big GPU

Aside: chain rule High-dimensional chain rule

Row or column?
In this lecture: In some textbooks:

Just two different conventions!

Chain rule for neural networks
A neural network is just a composition of functions
So we can use chain rule to compute gradients!

cross-ent
sigmoid softmax loss

linear linear
layer layer
2x1 3x1 3x1
Does it work?

We can calculate each of these Jacobians!

Example:
Why might this be a bad idea?
Doing it more efficiently
Idea: start on the right

this is always true because

the loss is scalar-valued!
The backpropagation algorithm
“Classic” version softmax
sigmoid sigmoid sigmoid linear
layer layer layer layer

2x1
Let’s walk through it…
cross-ent
sigmoid softmax loss

linear linear
layer layer
2x1 3x1 3x1
Practical implementation
Neural network architecture details
cross-ent
sigmoid softmax loss

linear linear
layer layer
2x1 3x1 3x1

Some things we should figure out:

How many layers?
How big are the layers?
What type of activation function?
Bias terms
cross-ent
sigmoid softmax loss

linear linear
layer layer
2x1 3x1 3x1

additional parameters in each linear layer

What else do we need for backprop?
cross-ent
sigmoid softmax loss

linear linear
layer layer
2x1 3x1 3x1
Backpropagation recipes: linear layer
cross-ent
sigmoid softmax loss

linear linear
layer layer
2x1 3x1 3x1

(just to simplify notation!)

Backpropagation recipes: linear layer
cross-ent
sigmoid softmax loss

linear linear
layer layer
2x1 3x1 3x1

(just to simplify notation!)

Backpropagation recipes: linear layer
cross-ent
sigmoid softmax loss

linear linear
layer layer
2x1 3x1 3x1

(just to simplify notation!)

Backpropagation recipes: linear layer
cross-ent
sigmoid softmax loss

linear linear
layer layer
2x1 3x1 3x1

(just to simplify notation!)

Backpropagation recipes: sigmoid
cross-ent
sigmoid softmax loss

linear linear
layer layer
2x1 3x1 3x1
Backpropagation recipes: ReLU
cross-ent
sigmoid softmax loss

linear linear
layer layer
2x1 3x1 3x1
Summary
cross-ent
sigmoid softmax loss

linear linear
layer layer
2x1 3x1 3x1

Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Slide 7 - Neural Networks
No ratings yet
Slide 7 - Neural Networks
64 pages
Neural Network Training
No ratings yet
Neural Network Training
73 pages
Neural Network Calculation Walkthrough
No ratings yet
Neural Network Calculation Walkthrough
69 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
Part - Understanding A Neural Network
No ratings yet
Part - Understanding A Neural Network
96 pages
LLM For Maths People
No ratings yet
LLM For Maths People
53 pages
L2 - UCLxDeepMind DL2020
No ratings yet
L2 - UCLxDeepMind DL2020
104 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Lect 5
No ratings yet
Lect 5
89 pages
10 Neural Nets
No ratings yet
10 Neural Nets
61 pages
Expert Help for Neural Network Theses
100% (1)
Expert Help for Neural Network Theses
5 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
TensorFlow NN
No ratings yet
TensorFlow NN
178 pages
Neural Networks and Machine Learning Guide
No ratings yet
Neural Networks and Machine Learning Guide
63 pages
Neural Networks for Visual Recognition
No ratings yet
Neural Networks for Visual Recognition
12 pages
Lecture 0.4 - Neural Networks
No ratings yet
Lecture 0.4 - Neural Networks
51 pages
AI Unit5 Neural Network 1c2c9166 c1b7 47a3 8ce1 E914f1ab6afb
No ratings yet
AI Unit5 Neural Network 1c2c9166 c1b7 47a3 8ce1 E914f1ab6afb
52 pages
Chapter 5 ML
No ratings yet
Chapter 5 ML
65 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
Week 14 (NN)
No ratings yet
Week 14 (NN)
49 pages
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
No ratings yet
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
84 pages
Lecture 2 - Neural Network v1.0
No ratings yet
Lecture 2 - Neural Network v1.0
64 pages
NN PDF
No ratings yet
NN PDF
11 pages
Neural Networks & Backpropagation
No ratings yet
Neural Networks & Backpropagation
77 pages
Neural Networks for Beginners
No ratings yet
Neural Networks for Beginners
79 pages
Neural Networks: Feedforward Basics
No ratings yet
Neural Networks: Feedforward Basics
24 pages
Neural Networks: 10-601B Introduction To Machine Learning
No ratings yet
Neural Networks: 10-601B Introduction To Machine Learning
78 pages
Neural Networks
No ratings yet
Neural Networks
52 pages
Unit 2 DL
No ratings yet
Unit 2 DL
70 pages
CBOW vs Skip-Gram in Word2Vec
No ratings yet
CBOW vs Skip-Gram in Word2Vec
170 pages
Neural Networks in Statistical Analysis
No ratings yet
Neural Networks in Statistical Analysis
41 pages
Understanding Neural Networks
No ratings yet
Understanding Neural Networks
12 pages
Neural Networks
No ratings yet
Neural Networks
12 pages
Deep Learning for Beginners
100% (1)
Deep Learning for Beginners
87 pages
Neural Networks Optional
No ratings yet
Neural Networks Optional
96 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Notes Chapter Neural Networks
No ratings yet
Notes Chapter Neural Networks
18 pages
Neural Network
No ratings yet
Neural Network
7 pages
Neuralnetworks 1
No ratings yet
Neuralnetworks 1
65 pages
Understanding Neural Networks Basics
100% (1)
Understanding Neural Networks Basics
11 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
Deep Learning
100% (4)
Deep Learning
100 pages
Unit II
No ratings yet
Unit II
12 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
47 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
Unit 3 .
No ratings yet
Unit 3 .
48 pages
Learning Algorithm
No ratings yet
Learning Algorithm
100 pages
Bai 1 Eng
No ratings yet
Bai 1 Eng
10 pages
Session 8 Neural Network
No ratings yet
Session 8 Neural Network
29 pages
FDL Module1
No ratings yet
FDL Module1
102 pages
Lecture 4
No ratings yet
Lecture 4
146 pages
Sparseautoencoder 2011new
No ratings yet
Sparseautoencoder 2011new
19 pages
An Introduction To Mathematics Behind Neural Networks - Towards Data Science
No ratings yet
An Introduction To Mathematics Behind Neural Networks - Towards Data Science
14 pages
Experiment 2.4 DL
No ratings yet
Experiment 2.4 DL
4 pages
Applied Machine Learning: MLP Overview
No ratings yet
Applied Machine Learning: MLP Overview
37 pages
Intelligent Information Processing With Matlab - Xiu Zhang
No ratings yet
Intelligent Information Processing With Matlab - Xiu Zhang
347 pages
RNNs: Transforming AI Learning
No ratings yet
RNNs: Transforming AI Learning
33 pages
CS603 June 2024
No ratings yet
CS603 June 2024
2 pages
To Send
No ratings yet
To Send
6 pages
CNN Hyperparameter Tuning Guide
No ratings yet
CNN Hyperparameter Tuning Guide
2 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
CCS355 Set 2
No ratings yet
CCS355 Set 2
2 pages
AD3511 Lab Manual Deep
No ratings yet
AD3511 Lab Manual Deep
45 pages
DL - Unit - 1 - Foundations of Deep Learning
No ratings yet
DL - Unit - 1 - Foundations of Deep Learning
35 pages
Eetinis V10i4 4315
No ratings yet
Eetinis V10i4 4315
7 pages
Lecture7 PDF
No ratings yet
Lecture7 PDF
228 pages
Soft Computing Notes
No ratings yet
Soft Computing Notes
22 pages
Introduction to Deep Learning Techniques
No ratings yet
Introduction to Deep Learning Techniques
299 pages
Time Series Forecasting With Multilayer Perceptrons and Elmen Neural Neworks
No ratings yet
Time Series Forecasting With Multilayer Perceptrons and Elmen Neural Neworks
5 pages
Data Analytics: Unit 3: Time Series
No ratings yet
Data Analytics: Unit 3: Time Series
11 pages
Wa0002.
No ratings yet
Wa0002.
28 pages
The Backpropagation Algorithm
No ratings yet
The Backpropagation Algorithm
4 pages
简单粗暴Tensorflow2 0
No ratings yet
简单粗暴Tensorflow2 0
45 pages
Time: 03 Hours Marks: 80: 55695 Page 1 of 1
No ratings yet
Time: 03 Hours Marks: 80: 55695 Page 1 of 1
1 page
UNIT 1 Introduction Part 1
No ratings yet
UNIT 1 Introduction Part 1
37 pages
Unit 5 CNN
No ratings yet
Unit 5 CNN
151 pages
Understanding Neural Networks Basics
No ratings yet
Understanding Neural Networks Basics
6 pages
ML Lecture 10 - LSTM & Gru
No ratings yet
ML Lecture 10 - LSTM & Gru
14 pages
CNN Basics
No ratings yet
CNN Basics
58 pages
Deep Learning Record
No ratings yet
Deep Learning Record
38 pages
Unit 1 Neural Networks
No ratings yet
Unit 1 Neural Networks
59 pages
DL Imp Questions
No ratings yet
DL Imp Questions
4 pages

Lec 5

Uploaded by

Lec 5

Uploaded by

Backpropagation

Designing, Visualizing and Understanding Deep Neural Networks

this is a MSE loss with a linear regression model

neural networks are computation graphs

neural networks are computation graphs

what does this produce?

the parameters usually affect one specific operation

also called fully connected

often we don’t draw this b/c

often we don’t draw this

Learning the features

simpler way to draw the same thing: even simpler:

2x1 3x1 3x1 3x1 2x1

we don’t have to use a sigmoid!

multiple linear layers = one linear layer

2x1 3x1 3x1 3x1 2x1

2. Define your loss function negative log-likelihood, just like before

stochastic gradient descent

4. Run it on a big GPU

Just two different conventions!

We can calculate each of these Jacobians!

this is always true because

Some things we should figure out:

additional parameters in each linear layer

(just to simplify notation!)

(just to simplify notation!)

(just to simplify notation!)

(just to simplify notation!)

You might also like