0% found this document useful (0 votes)

41 views9 pages

Chapter 7 Part 2

This document summarizes backpropagation for calculating gradients with respect to the output units of a neural network. The key points are: 1) The loss function is cross entropy loss between the predicted (Q) and actual (P) probability distributions over classes. 2) The gradient of the loss with respect to the output units y is calculated as -1/yl for the true class l, and 0 otherwise. 3) This gradient can be written compactly using a one-hot vector, resulting in the gradient being -1/yl times a one-hot vector with a 1 in the position of the true class. 4) This gradient is then used to calculate the gradient of the loss

Uploaded by

ADY Beats

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views9 pages

Chapter 7 Part 2

Uploaded by

ADY Beats

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Backpropagation – Gradients w.r.

t to output Units
Consider a ANN structure

Some observations from this network

1) It is a classification ANN
2) There are 3 layers = 2 hiddenlayers+ 1 output Layer
3) There are L=3 total layers in this network.
4) There are L-1=2 Hidden Layers in this network.

[]
x1
x
5) x= 2 ∈ R is the input to the network.
…
xn

[]
y1
y2
6) y= .. is the output of this network (Probabilities of belonging to i th class
…
yk
7) There are n inputs
8) There are n neurons in each hidden layer.
9) There are k nodes in output layer.
10) Activation function used at each hidden layer neuron is g(x )
11) Weights and Biases are the learnable parameters
12) θ is the vector containing Weight matrix and Bias vector θ= [ WB ]
13)
L th th th th
wLnm=w nm=Weight of thelink connecting n node of L−1 Layer with m node of L Layer
14) bLm=b Lm=Bias added ¿ the m th node of Lth Layer
15) aLi=aiL= pre−activation of thei th node of Lth layer
L th th
16) hLi=h i =activation of i node of L Layer

[]
y1
y
17) y= 2 is the vector of probabilities of each class predicted by the ANN (Q distribution)
…
yk

[]
^y 1
^y
18) ^y = 2 is the vector of actual probabilities of each class (P distribution)
…
^y k
3

ea i

y i= k
→ softmax (a 3i )
19)
∑ ea
3
j

j=1
20) If there were L layers and L-1 hidden layers then
L

ea i

y i= k
→ softmax(aiL )
∑ a Lj
j=1
21) Loss function used = Cross entropy Loss

( Q1 )
N
L ( θ )=∑ P j . log
j=1 j
Here P is the Actual probability distribution of all k classes for the i th input of the dataset
Q is the predicted probability distribution of all k classes for the i th input of the dataset

Some pre-calculations

1) Since the actual Probability distribution for k classes for a given input vector x i would have
P(X=l )=1 for the correct class l and all other classes will have P(X≠ l)=0
2) This means
L ( θ )=P1 log
( )
1
Q1
+ P2 log
( )
1
Q2
… .+ Pl log
1
Ql
+… .. P N log
1
( )
QN ( )
¿ 1. log ( Q1 ) l

L ( θ )=log
( Q1 ) l

Hence L ( θ )=log
1
Ql ( )
3) But Q l is the predicted probability of the l th class by the ANN
L

ea l

Ql= yl = k
∑ ea
L
j

j=1
Hence L ( θ )=log ( ) 1
yl
=−log ⁡( y l ) (l=true class label ¿

4) And previously we saw - To backpropagate the Loss and update Weights and Biases we need
to find the gradient of L(θ) wrt to θ
But ∇ θ L ( θ ) would contain gradients with respect to intermediate members of the network

For example,

[ ]
∂ L (θ)
∇θ L ( θ ) = ∂ W
∂ L (θ)
∂B
∂ L ( θ ) ∂ L (θ ) ∂ y
= .
∂W ∂ y ∂W

∂ L(θ )
Ok, not going ahead, we first need to find
∂y

Gradient With Respect to Output Units

∂ ∂
∂y
L ( θ )=
∂y
(−log ( y l ) )

If y i is the i th element of output y vector

{ }
−1
∂ ,if i=l
L ( θ )= yl
∂ yi
0 otherwise

More compactly this can be written as

∂ −1i=l
L ( θ )=
∂ yi yl
Thisuses Kronecker delta function 1i=l which is equal ¿ 1 when∧0 otherwise

So,
∂ ∂

[]
∂y
(−log ( y l ) )= (−log ( yl ) )
y1
y2
∂ y3
…
…
yk

[ ]
∂
∂ y1 (
−log ( y l ) )

∂
(−log ( y l ) )
¿ ∂ y2
…
…
∂
∂ yk
( −log ( y l ) )

[]
−1 l=1
yl
−1 l=2
yl
∂
∂y
(−log ( y l ) )= −1l =3
yl
…
…
−1l =k
yl
∂
This means for a given input vector x there can only be 1 element in
∂y
(−log ( y l ) ) that will be 1 and
all others will be 0
This kind of vector is called “One hot vector ”

Can be denoted as

∂ −1
∂y
(−log ( y l ) )=
yl
e(l)

Hence

∂ −1
L ( θ )= e (l)
∂y yl

Where e is a k dimensional one hot vector such that it’s l th vector is 1, where l is the true class label
∂ −1
Till now we have backpropagation till the dark green part… L ( θ )= e (l)
∂y yl

Now let’s move to the light green part and calculate

∂
L
L ( θ )=∇(a ) L ( θ )
L

∂a
L th
a is the vector containing pre-activation at L layer

∂ ∂
L
L ( θ )= L (−log ( y l ) )
∂ ai ∂ ai
∂ ( −log ( y l ) ) ∂ y l
¿ . L
∂ yl ∂ ai
1 ∂ y l
¿− . L
y l ∂ ai

L
But is y l dependent on a i ?

Yes, as
[]
L
a1
e
k

∑ ea
L
j

[] [ ]
j=1
L

ea 2

softmax ( a1 )
L
y1 k

∑e
L
aj
y2 softmax ( a2L )
j=1
y= y 3 = = softmax ( a3 )
L L
ea 2

… k
…
∑ ea
L
… …
j

yk j=1 L
… softmax( ak )
…
L
a
e k

∑ ea
L
j

j=1

This means
L

ea i

y i=softmax ( a ) = L
i L L L

ea +e a +…+ e a
1 2 k

Each element of y involves each element of a L

And
L

ea l

y l=softmax ( a ) = L
l k

∑ ea
L
j

j=1
So,
∂ −1 ∂ y l
L
L ( θ )= .
∂ ai y l ∂ aiL

( )
alL
e
∂ k

∑ ea
L
j

∂ yl j=1
L
= L
∂a i ∂ ai

This is of the form

g(x) ∂ g (x ) ∂h(x) ∂ g (x ) ∂h(x)

∂ h(x) −g ( x ) h(x) g(x)
h (x )
∂x
=
∂x
(h ( x )) 2
∂x
=
∂x
( h (x )) 2
−
∂x
(h ( x ) )2
=
∂g(x) 1
.
∂ x h (x )
−
g(x) ∂ h ( x )
.
( h ( x ) )2 ∂ x ( )( )
How,
L L
al L al L
 As e is a function of ai , e =g(a i ) when i=l
k

∑ ea =ea +e a +…+ e a
L L L L
L L
 j 1 2 k
is a function of a i ,h(ai ) when i= j
j=1
L
 a i =x

Hence,

( )
aLl
e
∂ k
g(x)
∑ e (a )
L
j
∂
j=1

∂ ai
L
=
h(x)
∂x
=
∂ g(x) 1
.
∂x h(x)
− (
g (x) ∂ h ( x )
.
(h ( x ) ) ∂ x
2 )( )
( )
aLl
e

) (( )
∂

( )
k k
∑e (a Lj )
∂ ∑ e(
a Lj )

(
L L
a a
j=1 ∂e l
1 e l
j=1
= . − .

)
L L k k 2 L
∂ ai ∂ ai ∂ ai
∑e ( aLj )
∑e ( aLj )
j=1 j=1

)( )
( )
k

∑ e(a )
L

∂ j

(
L L
a a
∂e l
1 e l
1 j=1
¿ L
. k
− k
. k
. L
∂ ai ( aLj ) ( aLj ) ( aLj ) ∂ ai
∑e ∑e ∑e
j=1 j=1 j=1

( )( )
aLl
∂e 1 1
. e(
ai )
L

¿ . − softmax ( aLl ) .
∂ a Li k k

∑ e( a ) ∑ aLj
L
j

j=1 j=1

( )( )
al
L
( aLi )
∂e 1 e
− softmax ( a ) .
L
¿ L
. k l k
∂ ai ( aLj )
∑e ∑ aLj
j=1 j=1

{ ( ) }
L

ea l

−( softmax ( al ) . softmax ( ai ) ) ,if l=i

( )
a L L L
∂e l
1
−( softmax ( a ) . softmax ( a )) ¿
L L k
¿ .
∑ e( )
L
l i a
∂ a Li k j

( aLj )
∑e j=1
j=1
−( softmax ( aLl ) . softmax ( aiL ) ) , otherwise

( )
aLl
e
∂ k

∑ e (a )
L

{ }
j

j=1 softmax ( alL ) −( softmax ( alL ) . softmax ( aiL ) ) , if l=i

=
∂ aiL −( softmax ( alL ) . softmax ( aiL ) ) , otherwise

Which can be written more compactly as

( )
aLl
e
∂ k

∑ e (a )
L
j

j=1
=1(l=i) softmax ( al ) −softmax ( a l ) . softmax ( ai )
L L L
L
∂a i

L
But y l=softmax (al )
And y i=softmax ( ai )
L

Therefore

( )
aLl
e
∂ k

∑ e (a )
L
j

∂ yl j=1
L
= L
=1(l=i) y l− y l . y i
∂a i ∂ ai
And
∂ −1
L ( θ )= . [ 1(l =i ) y l − y l . y i ]
L
∂ ai yl
Finally,
∂
L
L ( θ )=−(1 ¿ ¿ ( l=i )− yi )¿
∂ ai

We can now write Gradient with respect to the vector a L

[]
∂
L (θ )
∂ aL1
∂
L (θ )
∂ aL2
∇ a L ( θ )= ∂ L (θ ) =¿
L

∂ aL2
…
…
∂
L (θ )
∂ aLk

Hence,
∇ (a ¿¿L) L (θ )=−(e (l )− y ) ¿

Where
e is a k dimensional one hot vector such that it’s l th vector is 1, where l is the true class label
y is the vector containing predicted probability of the input belonging to each of the k class

Deep Learning Lectures - 2
No ratings yet
Deep Learning Lectures - 2
73 pages
Lecture - 14 - FFNN
No ratings yet
Lecture - 14 - FFNN
59 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Softmax vs Sigmoid in Neural Networks
No ratings yet
Softmax vs Sigmoid in Neural Networks
15 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Backpropagation
No ratings yet
Backpropagation
2 pages
02B DL2023 NN Backprop
No ratings yet
02B DL2023 NN Backprop
45 pages
Additional Notes On Backpropagation
No ratings yet
Additional Notes On Backpropagation
2 pages
Module 3 - Modified
No ratings yet
Module 3 - Modified
106 pages
Back Propogation
No ratings yet
Back Propogation
43 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Logistic Regression and Sigmoid Function
No ratings yet
Logistic Regression and Sigmoid Function
32 pages
Limitations of Single Layer Neural Networks
No ratings yet
Limitations of Single Layer Neural Networks
43 pages
L05 Slides - mlp2
No ratings yet
L05 Slides - mlp2
21 pages
MLP and Backpropagation Overview
No ratings yet
MLP and Backpropagation Overview
22 pages
Part 2 Module 2 DL BP
No ratings yet
Part 2 Module 2 DL BP
66 pages
Backpropagation A Peek Into The Mathematics of Optimization
No ratings yet
Backpropagation A Peek Into The Mathematics of Optimization
4 pages
Neural Networks
No ratings yet
Neural Networks
38 pages
CI-6-8 Backpropagation (COMPLETE) Updated
No ratings yet
CI-6-8 Backpropagation (COMPLETE) Updated
76 pages
NN Ch3
No ratings yet
NN Ch3
40 pages
Neural Networks Handout
No ratings yet
Neural Networks Handout
7 pages
Lec3 Backpropagation
No ratings yet
Lec3 Backpropagation
9 pages
Deep Feedforward Neural Networks Guide
No ratings yet
Deep Feedforward Neural Networks Guide
97 pages
Curs4site PDF
No ratings yet
Curs4site PDF
44 pages
Neural Networks & Backpropagation Guide
No ratings yet
Neural Networks & Backpropagation Guide
68 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
79 pages
Supervised Learning: Multilayer Networks I
No ratings yet
Supervised Learning: Multilayer Networks I
40 pages
Module 3.docxaiml
No ratings yet
Module 3.docxaiml
20 pages
Artificial Neural Networks - 13: Dr. Aditya Abhyankar
No ratings yet
Artificial Neural Networks - 13: Dr. Aditya Abhyankar
43 pages
3a Variations
No ratings yet
3a Variations
17 pages
Lec 04 Deep Networks 2
No ratings yet
Lec 04 Deep Networks 2
78 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
L3 Cse256 Fa24 FFN
No ratings yet
L3 Cse256 Fa24 FFN
64 pages
Non-Linear Models Explained
No ratings yet
Non-Linear Models Explained
8 pages
W02 MLOptDL
No ratings yet
W02 MLOptDL
23 pages
Back in NN
No ratings yet
Back in NN
12 pages
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
9 pages
7 TrainingNN-2
No ratings yet
7 TrainingNN-2
84 pages
Neural Networks for Beginners
No ratings yet
Neural Networks for Beginners
79 pages
Deep Learning Assignment3 Solution
No ratings yet
Deep Learning Assignment3 Solution
9 pages
Neural Networks & Backpropagation
No ratings yet
Neural Networks & Backpropagation
77 pages
DNN - M2 - Deep Feedforward NN 23dec
No ratings yet
DNN - M2 - Deep Feedforward NN 23dec
97 pages
Notes On Backpropagation: Peter.j.sadowski@uci - Edu
No ratings yet
Notes On Backpropagation: Peter.j.sadowski@uci - Edu
3 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
Foundations of Deep Learning
No ratings yet
Foundations of Deep Learning
30 pages
Module 2
No ratings yet
Module 2
55 pages
Neural Network
No ratings yet
Neural Network
97 pages
Chapter 7 Part 3
No ratings yet
Chapter 7 Part 3
7 pages
Cheat Sheet For Exam
No ratings yet
Cheat Sheet For Exam
2 pages
Sheet #6 Ensemble + Neural Nets + Linear Regression + Backpropagation + CNN
No ratings yet
Sheet #6 Ensemble + Neural Nets + Linear Regression + Backpropagation + CNN
4 pages
7 Ann Multilayer Perceptron Full
No ratings yet
7 Ann Multilayer Perceptron Full
69 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
Feed-Forward Neural Networks Overview
No ratings yet
Feed-Forward Neural Networks Overview
18 pages
Part (A) - Differences Between Scalars, Vectors, Ma
No ratings yet
Part (A) - Differences Between Scalars, Vectors, Ma
11 pages
Diversity Assignment
No ratings yet
Diversity Assignment
9 pages
Gender and Society
No ratings yet
Gender and Society
7 pages
Pre Test and Post Test
100% (1)
Pre Test and Post Test
3 pages
Software Project Management Unit-5 - 2
No ratings yet
Software Project Management Unit-5 - 2
2 pages
# Note On The Theme of Loss and Displacement in Basti by Intizar Hussain
No ratings yet
# Note On The Theme of Loss and Displacement in Basti by Intizar Hussain
2 pages
Unreliable Narrators
No ratings yet
Unreliable Narrators
2 pages
Interior Design Midterm Reviewer
No ratings yet
Interior Design Midterm Reviewer
6 pages
Manager Effectiveness Feedback Survey
No ratings yet
Manager Effectiveness Feedback Survey
4 pages
Ei Syllabus
No ratings yet
Ei Syllabus
3 pages
Laziness Does Not Exist Human Parts
No ratings yet
Laziness Does Not Exist Human Parts
16 pages
Bach Flower Remedy: in This FREE E-Book
No ratings yet
Bach Flower Remedy: in This FREE E-Book
11 pages
Rewritten Text From - Docx - Lead Inside The Box
No ratings yet
Rewritten Text From - Docx - Lead Inside The Box
6 pages
Behavioural Approaches in Political Science
No ratings yet
Behavioural Approaches in Political Science
12 pages
Analyzing "Mommy Dead and Dearest"
No ratings yet
Analyzing "Mommy Dead and Dearest"
6 pages
Disciplinary Literacy Paper-1
No ratings yet
Disciplinary Literacy Paper-1
8 pages
Marilyn Vos Savant
No ratings yet
Marilyn Vos Savant
18 pages
Volleyball - Bumping and Setting
No ratings yet
Volleyball - Bumping and Setting
6 pages
Numbers Up To 1 000
No ratings yet
Numbers Up To 1 000
12 pages
Globalization and Culture
No ratings yet
Globalization and Culture
24 pages
Structured Workplace Learning: QCE Guidelines
No ratings yet
Structured Workplace Learning: QCE Guidelines
16 pages
Module 2 Nursing Assessment
No ratings yet
Module 2 Nursing Assessment
6 pages
Job Interview Script for Law Students
No ratings yet
Job Interview Script for Law Students
5 pages
Faculty of Management Tribhuvan University: Guidelines For BBS (4 Year) Project Report Writing
No ratings yet
Faculty of Management Tribhuvan University: Guidelines For BBS (4 Year) Project Report Writing
24 pages
What Will You Have at A Party LVL C Quiz
No ratings yet
What Will You Have at A Party LVL C Quiz
2 pages
Arellano-Purposive Communication - Lesson 2
No ratings yet
Arellano-Purposive Communication - Lesson 2
6 pages
Chapter 7 - Contrastive Analysis (G3) - Final
100% (2)
Chapter 7 - Contrastive Analysis (G3) - Final
200 pages
TQ Q1 Perdev
No ratings yet
TQ Q1 Perdev
4 pages
Management and Organizational Theory MCQ PDF
No ratings yet
Management and Organizational Theory MCQ PDF
18 pages
Chapter Overview: The Self in A Social World
No ratings yet
Chapter Overview: The Self in A Social World
48 pages
Determinants of Parental Behavior - Determinantes Del Comportamiento Parental
No ratings yet
Determinants of Parental Behavior - Determinantes Del Comportamiento Parental
7 pages