0% found this document useful (0 votes)

12 views118 pages

Deep Learning for Engineering Students

Uploaded by

triệu thi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views118 pages

Deep Learning for Engineering Students

Uploaded by

triệu thi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 118

Lecture 2

Introduction to Deep Learning

1
Introduction to Deep Learning Lecture 2

The Perceptron: Forward Propagation

Linear combination
𝑥1 Output of input

𝑤2 𝑦ො 𝑚
𝑥2 Σ ∫
𝑦ො = 𝑔 ෍ 𝑥𝑖 . 𝑤𝑖
𝑖=1

𝑥𝑚
Non-linear
Activate function
Input Weights Sum Non-linearity Output

HCM City Univ. of Technology, Faculty of Mechanical Engineering 2 Duong Van Tu

Introduction to Deep Learning Lecture 2

1 The Perceptron: Forward Propagation

Linear combination
𝑥1 Output of input
Bias
𝑤2 𝑦ො 𝑚
𝑥2 Σ ∫
𝑦ො = 𝑔 𝑤0 + ෍ 𝑥𝑖 . 𝑤𝑖
𝑖=1

𝑥𝑚
Non-linear
Activate function
Input Weights Sum Non-linearity Output

HCM City Univ. of Technology, Faculty of Mechanical Engineering 3 Duong Van Tu

Introduction to Deep Learning Lecture 2

1 The Perceptron: Forward Propagation

𝑥1
𝑤2 𝑦ො 𝑚
𝑥2 Σ ∫
𝑦ො = 𝑔 𝑏 + ෍ 𝑥𝑖 . 𝑤𝑖
𝑖=1

𝑥𝑚
𝑦ො = 𝑔 𝑋𝑊 + 𝑏

Input Weights Sum Non-linearity Output where

𝑋 = 𝑥1 𝑥2 … 𝑥𝑚
𝑤1
𝑤2
𝑊=
…
𝑤𝑚

HCM City Univ. of Technology, Faculty of Mechanical Engineering 4 Duong Van Tu

Introduction to Deep Learning Lecture 2

1 The Perceptron: Forward Propagation

𝑥1 Activate function
𝑤2 𝑦ො
𝑥2 Σ ∫ 𝑦ො = 𝑔 𝑋𝑊 + 𝑏

For example: Sigmoid function

𝑥𝑚 1
𝑔 𝑧 =𝜎 𝑧 =
1 + 𝑒 −𝑧

Input Weights Sum Non-linearity Output

import numpy as np
from matplotlib import pyplot as plt
x = np.linspace(-5,5,100)
y = 1/(1 + np.exp(-x))
plt.plot(x,y)
plt.grid(True, color = 'k')
plt.show()
HCM City Univ. of Technology, Faculty of Mechanical Engineering 5 Duong Van Tu
Introduction to Deep Learning Lecture 2

Common Activation Function

Sigmoid Hyperbolic Tangent Rectified Linear Unit (ReLU)

1 𝑒 𝑧 − 𝑒 −𝑧 𝑔 𝑧 = max(0, 𝑧)
𝑔 𝑧 = 𝑔 𝑧 = 𝑧
1 + 𝑒 −𝑧 𝑒 + 𝑒 −𝑧 1, 𝑧>0
𝑔ሶ 𝑧 = 𝑔 𝑧 1 − 𝑔 𝑧 𝑔ሶ 𝑧 = 1 − 𝑔 𝑧 2 𝑔ሶ 𝑧 = ቊ
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

tf.math.sigmoid(z) tf.math.tanh(z) tf.nn.relu(z)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 6 Duong Van Tu

Introduction to Deep Learning Lecture 2

Common Activation Function

Sigmoid Hyperbolic Tangent Rectified Linear Unit (ReLU)

import numpy as np import numpy as np import numpy as np

from matplotlib import from matplotlib import pyplot as from matplotlib import pyplot as
pyplot as plt plt plt
import tensorflow as tf import tensorflow as tf import tensorflow as tf
x = np.linspace(-5,5,100) x = np.linspace(-5,5,100) x = np.linspace(-5,5,100)
y = tf.math.sigmoid(x) y = tf.math.tanh(x) y = tf.nn.relu(x)
plt.plot(x,y) plt.plot(x,y) plt.plot(x,y)
plt.grid(True, color = 'k') plt.grid(True, color = 'k') plt.grid(True, color = 'k')
plt.show() plt.show() plt.show()

HCM City Univ. of Technology, Faculty of Mechanical Engineering 7 Duong Van Tu

Introduction to Deep Learning Lecture 2

Importance of Activation Function

Linear activation functions Non-linear produce allow us to

produce linear decision approximate arbitrarily complex function

HCM City Univ. of Technology, Faculty of Mechanical Engineering 8 Duong Van Tu

Introduction to Deep Learning Lecture 2

The Perceptron: Example

1
Activate function
𝑥1
𝑦ො = 𝑔 𝑋𝑊 + 𝑤0
-2 𝑦ො
𝑥2 Σ ∫ 3
𝑦ො = 𝑔(1 + 𝑥1 𝑥2
−2
In this case, we have 𝑦ො = 𝑔(1 + 3𝑥1 − 2𝑥2 )
3
𝑊=
−2 This is just a line in 2D
𝑤0 = 1

HCM City Univ. of Technology, Faculty of Mechanical Engineering 9 Duong Van Tu

Introduction to Deep Learning Lecture 2

The Perceptron: Example

𝑥1
-2 𝑦ො
𝑥2 Σ ∫

𝑦ො = 𝑔(1 + 3𝑥1 − 2𝑥2 )

HCM City Univ. of Technology, Faculty of Mechanical Engineering 10 Duong Van Tu

Introduction to Deep Learning Lecture 2

1 The Perceptron: Example

𝑥1
𝑋 = −1 2
-2 𝑦ො
𝑥2 Σ ∫

It assumes that we have the input

𝑋 = −1 2
𝑦ො = 𝑔 1 + 3. −1 − 2.2
import numpy as np
𝑦ො = 𝑔 −6 = 0.0025 def sigmoid(x):
return 1 / (1 + np.exp(-x))
argument = -6
sigmoid_value = sigmoid(argument)
print(f"The sigmoid value at x = {argument} is
{sigmoid_value:.4f}")
HCM City Univ. of Technology, Faculty of Mechanical Engineering 11 Duong Van Tu
Introduction to Deep Learning Lecture 2

The Perceptron: Simplified

𝑦ො = 𝑔 𝑋𝑊 + 𝑤0

𝑥1
𝑤2 𝑦ො
𝑥2 Σ ∫

𝑥𝑚

Input Weights Sum Non-linearity Output

HCM City Univ. of Technology, Faculty of Mechanical Engineering 12 Duong Van Tu
Introduction to Deep Learning Lecture 2

The Perceptron: Simplified

𝑦ො = 𝑔 𝑋𝑊 + 𝑤0

𝑥1
𝑦ො = 𝑔(𝑧)
𝑥2 𝑧

𝑥𝑚
𝑚

𝑧 = 𝑤0 + ෍ 𝑥𝑖 𝑤𝑖
𝑖=1

HCM City Univ. of Technology, Faculty of Mechanical Engineering 13 Duong Van Tu

Introduction to Deep Learning Lecture 2

Multi Output Perceptron

𝑥1
𝑦ො1 = 𝑔(𝑧1 ) 𝑚
𝑧1
𝑥2 𝑧𝑗 = ෍ 𝑥𝑖 𝑤𝑖𝑗 + 𝑤0𝑗
𝑦ො2 = 𝑔(𝑧2 ) 𝑖=1
𝑧2
𝑥𝑚 import tensorflow as tf
layer = tf.keras.layers.Dense(units = 2)

Because all inputs are densely connected to all outputs, these layers are called
Dense layers

HCM City Univ. of Technology, Faculty of Mechanical Engineering 14 Duong Van Tu

Introduction to Deep Learning Lecture 2

Inputs Hidden Outputs

import tensorflow as tf
𝑚 𝑛
n = 10
model = tf.keras.Sequential([ (1) (1) (2) (2)
𝑧𝑗 = ෍ 𝑥𝑖 𝑤𝑖𝑗 + 𝑤0𝑗 𝑦ො𝑘 = ෍ 𝑔(𝑧𝑗 )𝑤𝑗𝑘 + 𝑤0𝑘
tf.keras.layers.Dense(n),
tf.keras.layers.Dense(2) 𝑖=1 𝑗=1
])
HCM City Univ. of Technology, Faculty of Mechanical Engineering 15 Duong Van Tu
Introduction to Deep Learning Lecture 2

𝑚
(1) (1)
𝑧2 = ෍ 𝑥𝑖 𝑤𝑖2 + 𝑤02
𝑖=1
(1) (1) (1) (1)
z2 = 𝑥1 𝑤12 + 𝑥2 𝑤22 + 𝑥𝑚 𝑤𝑚2 + 𝑤02

HCM City Univ. of Technology, Faculty of Mechanical Engineering 16 Duong Van Tu

Introduction to Deep Learning Lecture 2

Hidden size

Example
Example

Example

HCM City Univ. of Technology, Faculty of Mechanical Engineering 17 Duong Van Tu

Introduction to Deep Learning Lecture 2

Single Layer Neural Network: Example

Bias

1 𝑤0
𝑤1
𝑥1 ∫
Inputs
𝑤2
Sigmoid
𝑥2

HCM City Univ. of Technology, Faculty of Mechanical Engineering 18 Duong Van Tu

Introduction to Deep Learning Lecture 2

Single Layer Neural Network: Example

import matplotlib.pyplot as plt

import numpy as np
np.random.seed(42)
cov = 0.2
cov2 = 0.5
num_points = 250
cluster1 = np.random.multivariate_normal([1.6, 1], [[cov, 0], [0, cov]], num_points // 5)
cluster2 = np.random.multivariate_normal([1, 4], [[cov2, 0], [0, cov2]], num_points // 5)
cluster3 = np.random.multivariate_normal([3, 4.5], [[cov2, 0], [0, cov2]], num_points // 5)
cluster4 = np.random.multivariate_normal([1.5, 6], [[cov2, 0], [0, cov2]], num_points // 5)
cluster5 = np.random.multivariate_normal([4, 2], [[cov2, 0], [0, cov2]], num_points // 5)
plt.scatter(cluster1[:, 0], cluster1[:, 1], color='red', label='Cluster 1')
plt.scatter(cluster2[:, 0], cluster2[:, 1], color='green', label='Cluster 2')
plt.scatter(cluster3[:, 0], cluster3[:, 1], color='green', label='Cluster 3')
plt.scatter(cluster4[:, 0], cluster4[:, 1], color='green', label='Cluster 4')
plt.scatter(cluster5[:, 0], cluster5[:, 1], color='green', label='Cluster 5')
plt.xlabel('$x_1$')
plt.ylabel('$x_2$')
plt.grid(True)
plt.show()

HCM City Univ. of Technology, Faculty of Mechanical Engineering 19 Duong Van Tu

Introduction to Deep Learning Lecture 2

Single Layer Neural Network: Example

1 12
-4
𝑥1 ∫
-1
𝑥2 −4𝑥1 − 𝑥2 + 12 = 0

HCM City Univ. of Technology, Faculty of Mechanical Engineering 20 Duong Van Tu

Introduction to Deep Learning Lecture 2

Single Layer Neural Network: Example

import matplotlib.pyplot as plt
import numpy as np
np.random.seed(42)
cov = 0.2
cov2 = 0.5
num_points = 250
cluster1 = np.random.multivariate_normal([1.6, 1], [[cov, 0], [0, cov]], num_points // 5)
cluster2 = np.random.multivariate_normal([1, 4], [[cov2, 0], [0, cov2]], num_points // 5)
cluster3 = np.random.multivariate_normal([3, 4.5], [[cov2, 0], [0, cov2]], num_points // 5)
cluster4 = np.random.multivariate_normal([1.5, 6], [[cov2, 0], [0, cov2]], num_points // 5)
cluster5 = np.random.multivariate_normal([4, 2], [[cov2, 0], [0, cov2]], num_points // 5)
plt.scatter(cluster1[:, 0], cluster1[:, 1], color='red', label='Cluster 1')
plt.scatter(cluster2[:, 0], cluster2[:, 1], color='green', label='Cluster 2')
plt.scatter(cluster3[:, 0], cluster3[:, 1], color='green', label='Cluster 3')
plt.scatter(cluster4[:, 0], cluster4[:, 1], color='green', label='Cluster 4')
plt.scatter(cluster5[:, 0], cluster5[:, 1], color='green', label='Cluster 5')
x = np.linspace(0.5,3,100)
y = -4*x + 12
plt.plot(x,y, color='b')
plt.text(1.5, 8, r'$-4x_1 - x_2 + 12 = 0$')
plt.xlabel('$x_1$')
plt.ylabel('$x_2$')
plt.grid(True)
plt.show()
HCM City Univ. of Technology, Faculty of Mechanical Engineering 21 Duong Van Tu
Introduction to Deep Learning Lecture 2

Single Layer Neural Network: Example

1 12
-4
𝑥1 ∫
-1
𝑥2 −4𝑥1 − 𝑥2 + 12 = 0

1 3
-1/5
𝑥1 ∫
-1
𝑥2 1
− 𝑥1 − 𝑥2 + 3 = 0
5
HCM City Univ. of Technology, Faculty of Mechanical Engineering 22 Duong Van Tu
Introduction to Deep Learning Lecture 2

Single Layer Neural Network: Example

1 12 𝑦ො = 𝑔 𝑔 𝑧1 ∗ 1.5 + 𝑔 𝑧2 ∗ 1 + 0.5
-4 0.88 import numpy as np
2 𝑥1 𝑧1 def sigmoid(x):
return 1 / (1 + np.exp(-x))
-1 1.5 X = np.array([2, 2])

2 𝑥2 0.92 w11 = np.array([-4,-1])

w12 = np.array([-1/5, -1])
w21 = np.array([1.5, 1])
𝑦ො bias11 = 12
bias12 = 3
bias21 = 0.5
1 3 1 z1 = np.dot(X,w11) + bias11
z2 = np.dot(X,w12) + bias12
g1 = sigmoid(z1)
𝑧2 0.64
-1/5
2 𝑥1 print(g1)
g2 = sigmoid(z2)
-1 0.5 print(g2)
g = np.array([g1, g2])
2 𝑥2 z = np.dot(g,w21) + bias21
y_hat = sigmoid(z)
1 print(y_hat)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 23 Duong Van Tu

Introduction to Deep Learning Lecture 2

Single Layer Neural Network: Example

1 12
-4 0.88
2 𝑥1 𝑧1
-1 1.5
2 𝑥2 0.92

𝑦ො
1 3 1

𝑧2 0.64
-1/5
2 𝑥1
-1 0.5
2 𝑥2
1
HCM City Univ. of Technology, Faculty of Mechanical Engineering 24 Duong Van Tu
Introduction to Deep Learning Lecture 2

Single Layer Neural Network: Example

1 12
-4 4.54
4 𝑥1 𝑧1
-1 1.5
6 𝑥2 0.62

𝑦ො
1 3 1

𝑧2 0.02
-1/5
4 𝑥1
-1 0.5
6 𝑥2
1
HCM City Univ. of Technology, Faculty of Mechanical Engineering 25 Duong Van Tu
Introduction to Deep Learning Lecture 2

Single Layer Neural Network: Example

1 12
-4 0.1
0 𝑥1 𝑧1
-1 1.5
6 𝑥2 0.89

𝑦ො
1 3 1

𝑧2 0.05
-1/5
0 𝑥1
-1 0.5
6 𝑥2
1
HCM City Univ. of Technology, Faculty of Mechanical Engineering 26 Duong Van Tu
Introduction to Deep Learning Lecture 2

Exercise 1
Create a notebook with Google Colab (do not use TensorFlow) to compute the outputs of this
neural network with the given matrices:
0.6 0.9 0.7 −0.1
𝒘(1) = 2.6 0.5 −0.1 0.7
−0.7 1.4 0.3 −1.2 𝒘(1) 𝒘(2)
0.2 −0.5 𝑧1

𝒘(2) =
−0.1 1.6 2 𝑦ො1
−0.5 1.4 𝑧2
−0.5 −0.1 3
Hidden layer: sigmoid 𝑧3
𝑦ො2
Output layer: linear 𝑔 𝑧 = 𝑧
1 𝑧4

Inputs Hidden Outputs

HCM City Univ. of Technology, Faculty of Mechanical Engineering 27 Duong Van Tu
Introduction to Deep Learning Lecture 2

𝑛(𝑙−1)
(𝑙) 𝑙−1 𝑙 (𝑙)
𝑧𝑗 = ෍ 𝑔 𝑧𝑖 𝑤𝑖𝑗 + 𝑤0𝑗
𝑖=1

HCM City Univ. of Technology, Faculty of Mechanical Engineering 28 Duong Van Tu

Introduction to Deep Learning Lecture 2

Deep Neural Network

hidden(𝑙−1) hidden(𝑙)

1 1 Scalar form
𝑑 (𝑙−1)
(𝑙−1) (𝑙) (𝑙) (𝑙−1) (𝑙) (𝑙)
(𝑙−1)
𝑧1 𝑎1 𝑧1 (𝑙)
𝑎1 𝑧𝑗 = ෍ 𝑎𝑖 𝑤𝑖𝑗 + 𝑤0𝑗
𝑖=1
(𝑙−1) (𝑙) (𝑙)
𝑧2 (𝑙−1) 𝑧2 𝑎2 (𝑙−1) 𝑙−1
𝑎2 𝑎𝑖 = 𝑔 𝑧𝑖
(𝑙)
(𝑙−1) (𝑙−1) 𝑤𝑖𝑗 (𝑙)
𝑧𝑖 𝑎𝑖 (𝑙)
𝑧𝑗 𝑎𝑗

(𝑙−1) (𝑙−1) (𝑙) (𝑙)

𝑧𝑑(𝑙−1) 𝑎𝑑(𝑙−1) 𝑧𝑑(𝑙) 𝑎𝑑(𝑙)

𝒛(𝑙−1) 𝒂(𝑙−1) 𝒛(𝑙) 𝒂(𝑙)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 29 Duong Van Tu
Introduction to Deep Learning Lecture 2

Example: Build a model to estimate of test score based on sleep and study hours
Example Sleep hours Study hours Target Output
Student1 0.3 1 75/100
Student2 0.5 0.2 82/100
Student3 1 0.4 93/100
StudentX 8 3 ?
Two features: Sleep hours (𝑥1 ) and study hours (𝑥2 ). import numpy as np
X = np.array([[0.3, 1],
[0.5, 0.2],
𝑥11 𝑥12 0.3 1 0.75 [1, 0.4]])
𝑋 = 𝑥21 𝑥22 = 0.5 0.2 , 𝑦 = 0.82 y = np.array([[0.75],
[0.82],
𝑥31 𝑥32 1 0.4 0.93 [0.93]])
print(X.shape)
print(y.shape)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 31 Duong Van Tu

Introduction to Deep Learning Lecture 2

Example: Build a model to estimate of test score based on sleep and study hours

1 𝑿 ∈ 𝑅𝑁×2
1
𝑾 , 𝒃(1) 𝑾(2) , 𝑏 (2)
1 (1) 𝑾(1) ∈ 𝑅2×3
𝑧1
(1)
𝑎1 𝒃(1) ∈ 𝑅1×3
Sleep 𝑥1 (2)
𝑧1 𝑾(2) ∈ 𝑅3×1
hours
𝑦ො 𝑏 (2) ∈ ℛ
(1) (1)
Study 𝑧2 𝑎2
𝒛(1) = 𝑿𝑾(1) + 𝒃(1) ∈ 𝑅𝑁×3
hours 𝑥2
(1) (1) 𝒂(1) = 𝑔 𝒛(1) ∈ 𝑅𝑁×3
𝑧3 𝑎3
𝑿 𝒛(1) 𝒂(1) 𝒛 (2) 𝒛(2) = 𝒂(1) 𝑾(2) + 𝑏 (2) ∈ 𝑅𝑁×1
2 ෝ = 𝑔 𝒛(2) ∈ 𝑅𝑁×1
Inputs Hidden Output 𝒂 =𝒚
HCM City Univ. of Technology, Faculty of Mechanical Engineering 32 Duong Van Tu
Introduction to Deep Learning Lecture 2

Example: Build a model to estimate of test score based on sleep and study hours
Code Symbol Math Symbol Definition Dimensions
X 𝑋 Input data, each rom in an example (NumExamples, inputLayerSize)
y 𝑦 Target data (NumExamples, outputLayerSize)
W1 𝑊 (1) Layer 1 weights (inputLayerSize, hiddenLayerSize)
b1 𝑏 (1) Layer 1 bias (1, hiddenLayerSize)
W2 𝑊 (2) Layer 2 weights (hiddenLayerSize, outputLayerSize)
b2 𝑏 (2) Layer 2 bias (1, outputLayerSize)
z1 𝑧 (1) Layer 1 activation (NumExamples, hiddenLayerSize)
a1 𝑎(1) Layer 1 output (NumExamples, hiddenLayerSize)
z2 𝑧 (2) Layer 2 activation (NumExamples, outputLayerSize)
a2 𝑎(2) Layer 2 output (NumExamples, outputLayerSize)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 33 Duong Van Tu

Introduction to Deep Learning Lecture 2

Example: Build a model to estimate of test score based on sleep and study hours
import numpy as np
class NeuralNetwork:
def __init__(self):
self.inputLayerSize = 2
self.hiddenLayerSize = 3
self.outputLayerSize = 1
self.W1 = np.random.randn(self.inputLayerSize, self.hiddenLayerSize)
self.b1 = np.random.randn(1, self.hiddenLayerSize)
self.W2 = np.random.randn(self.hiddenLayerSize, self.outputLayerSize)
self.b2 = np.random.randn(1, self.outputLayerSize)
def sigmoid(self, z):
return 1/(1 + np.exp(-z))
def forwardPropagation(self, X):
self.z1 = np.dot(X, self.W1) + self.b1
self.a1 = self.sigmoid(self.z1)
self.z2 = np.dot(self.a1, self.W2) + self.b2
y_hat= self.sigmoid(self.z2)
return y_hat

HCM City Univ. of Technology, Faculty of Mechanical Engineering 34 Duong Van Tu

Introduction to Deep Learning Lecture 2

Example: Build a model to estimate of test score based on sleep and study hours

Test sigmoid function

from matplotlib import pyplot as plt
Import numpy as np
NN = NeuralNetwork()
test_input = np.arange(-6,6,0.01)
plt.plot(test_input,NN.sigmoid(test_input), linewidth = 2)
plt.grid(True)
plt.show()

HCM City Univ. of Technology, Faculty of Mechanical Engineering 35 Duong Van Tu

Introduction to Deep Learning Lecture 2

Example: Build a model to estimate of test score based on sleep and study hours

Implementation of Forward Propagation

NN = NeuralNetwork()
X = np.array([[0.3, 1],
[0.5, 0.2],
[1, 0.4]])
NN.forwardPropagation(X)

array([[0.23570094], Are your results same to these?

[0.19816067],
[0.25203999]])

y = np.array([[0.75],
[0.82],
[0.93]])

HCM City Univ. of Technology, Faculty of Mechanical Engineering 36 Duong Van Tu

Introduction to Deep Learning Lecture 2

Example: Build a model to estimate of test score based on sleep and study hours

Implementation of Forward Propagation using TensorFlow

import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(3),
tf.keras.layers.Dense(1)
])
X = np.array([[0.3, 1],
[0.5, 0.2],
[1, 0.4]])

# Perform forward propagation

y_hat = model(X)
print(y_hat.numpy())

[[0.5063543 ]
[0.03394059]
[0.06788117]]

HCM City Univ. of Technology, Faculty of Mechanical Engineering 37 Duong Van Tu

Introduction to Deep Learning Lecture 2

Exercise 2
Build a model to estimate of house price based on number of room, area, and floor:
Hidden layer: 4 nodes – Activation: sigmoid
Output layer: 1 node – Activation: Linear
Weight: Random
Compute the forward propagation for a single cycle to find out the output.
Example Rooms Area Floor Price (k)
1 2 100 1 75
2 1 60 2 60
3 3 120 1 90
4 2 75 2 80

HCM City Univ. of Technology, Faculty of Mechanical Engineering 38 Duong Van Tu

Introduction to Deep Learning Lecture 2

Loss Function in Deep Learning

A loss or cost function sometimes also called an error function

…is a method of evaluating how well your algorithm.

If the value of the loss function is lower then it’s a good model;

Otherwise, we have to change the parameter of the model

…and minimize the loss.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 39 Duong Van Tu

Introduction to Deep Learning Lecture 2

Squared Error Loss Function

The loss function quantifies how much the model prediction deviates from the ground
true for one particular object (between predicted and actual values in a machine
learning model).

Squared error is also ℒ 2 called loss:

ℒ𝑖2 = 𝑓(𝑥𝑖 ; 𝑤) − 𝑦𝑖 2

There is only global minimum, and

No local minimum.
MSE is less robust to the outlier’s presence.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 40 Duong Van Tu

Introduction to Deep Learning Lecture 2

Squared Error Loss Function

For example: Very often, we use the squared error as the loss function in regression
problems.
ℒ𝑖2 = 𝑓(𝑥𝑖 ; 𝑤) − 𝑦𝑖 2

𝑦ො = 𝑓(𝑥𝑖 ; 𝑤)
Estimate of test score based on sleep and study hours
y_hat = ([[0.62404926], y = np.array([[0.75],
[0.58412263], [0.82],
[0.57624213]]) [0.93]])

Example 𝑦ො 𝑦 ℒ2
1 0.624 0.75 0.015876
2 0.584 0.82 0.055696
3 0.576 0.93 0.125316
HCM City Univ. of Technology, Faculty of Mechanical Engineering 41 Duong Van Tu
Introduction to Deep Learning Lecture 2

Absolute Error Loss Function

Another loss function we often use for regression is the absolute loss which is also called
ℒ1 loss: ℒ𝑖1 = 𝑓 𝑥𝑖 ; 𝑤 − 𝑦𝑖

For instance, let’s say that our model predicts a flat’s price (in thousands of dollars)
based on the number of rooms, area (𝑚2 ), floor, and the neighborhood in the city (A or
B). Let’s suppose that its prediction for 𝑥 = [4, 70, 1, 𝐴] is USD 110k. If the actual selling
price is USD 105k, then absolute loss is:

ℒ1 = 𝑦ො − 𝑦
ℒ1 = 110 − 105 = 5

HCM City Univ. of Technology, Faculty of Mechanical Engineering 42 Duong Van Tu

Introduction to Deep Learning Lecture 2

Huber Loss Function

ℒ 1 loss is more robust to outlier than ℒ 2

ℒ 2 loss is more stable than ℒ1 , when the difference is smaller.
Huber loss takes the advantages of ℒ1 , andℒ 2 , charactering by 𝛿.

1 2
𝑓 𝑥𝑖 ; 𝑤 − 𝑦𝑖 if 𝑓 𝑥𝑖 ; 𝑤 − 𝑦𝑖 < 𝛿
ℒ 𝛿 𝑓 𝑥𝑖 ; 𝑤 − 𝑦𝑖 = 2
1 2
𝛿 𝑓 𝑥𝑖 ; 𝑤 − 𝑦𝑖 − 𝛿 otherwise
2

HCM City Univ. of Technology, Faculty of Mechanical Engineering 43 Duong Van Tu

Introduction to Deep Learning Lecture 2

Binary Cross Entropy Loss Function

The target variables are in binary format, and there is only two classes.
If the probability of class 1 is 𝑓 𝑥𝑖 ; 𝑤 , then that of class 2 is (1 − 𝑓 𝑥𝑖 ; 𝑤 ).
Cross entropy loss for the actual label 𝑦 (only take 0s or 1s) and the predicted
probability can be defined as:
ℒ = − 𝑦𝑙𝑜𝑔 𝑓 𝑥𝑖 ; 𝑤 + 1 − 𝑦 log 1 − 𝑓 𝑥𝑖 ; 𝑤

If 𝑦 = 0, then ℒ = −log 1 − 𝑓 𝑥𝑖 ; 𝑤 ,

otherwise 𝑦 = 1 results in ℒ = − log 𝑓 𝑥𝑖 ; 𝑤

HCM City Univ. of Technology, Faculty of Mechanical Engineering 44 Duong Van Tu

Introduction to Deep Learning Lecture 2

Cost Function

The cost function measures the model’s error on a group of objects, whereas the loss
function deals with a single data instance.
So, if ℒ is our loss function, then we calculate the cost function by aggregating the loss
L over the training, validation, or test data 𝒟 = 𝑥𝑖 , 𝑦𝑖 𝑖=1 .
𝑛
For example, we can
compute the cost as the mean loss:

1 𝑁
𝐽 𝑊 = ෍ ℒ(𝑓 𝑥𝑖 ; 𝑤 , 𝑦𝑖 )
𝑁 𝑖=1

HCM City Univ. of Technology, Faculty of Mechanical Engineering 45 Duong Van Tu

Introduction to Deep Learning Lecture 2

Cost Function
Mean Squared Error (MSE):

1 𝑁 1 𝑁
2 2
𝐽(𝑾) = ෍ ℒ𝑖 = ෍ ෝ 𝑖 − 𝒚𝑖
𝒚
𝑁 𝑖=1 𝑁 𝑖=1

mse = tf.keras.losses.mse(actual_values, predicted_values)

Mean Absolute Error (MAE):

1 𝑁 1 𝑁
1
𝐽(𝑾) = ෍ ℒ𝑖 = ෍ ෝ 𝑖 − 𝒚𝑖
𝒚
𝑁 𝑖=1 𝑁 𝑖=1

mse = tf.keras.losses.mae(actual_values, predicted_values)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 46 Duong Van Tu

Introduction to Deep Learning Lecture 2

Cost Function
Example: Compute the MSE cost function
Let’s say that we have the data on four flats and that our model predicted the sale
prices as follows: 𝒙𝑖 Rooms Area Floor Neighborhood 𝑦𝑖 𝑦ො𝑖
𝒙1 4 70 1 A 105 104.5
𝒙2 2 50 2 A 83 91
𝒙3 1 30 5 B 50 65.3
𝒙4 5 90 2 A 200 114

Solution:
2 2 2 2
104.5 − 105 + 91 − 83 + 65.3 − 50 + 114 − 200 0.52 + 82 + 15.32 + 862
𝐽= = = 1,923.585
4 4

HCM City Univ. of Technology, Faculty of Mechanical Engineering 47 Duong Van Tu

Introduction to Deep Learning Lecture 2

Cost Function

import tensorflow as tf
actual_values = tf.constant([105, 83, 50, 200], dtype=tf.float32)
predicted_values = tf.constant([104.5, 91, 65.3, 114], dtype=tf.float32)
mse = tf.keras.losses.mse(actual_values, predicted_values)
print(mse.numpy())

import tensorflow as tf
import numpy as np
actual_values = np.array([[105],[83], [50], [200]])
predicted_values = np.array([[104.5], [91], [65.3], [114]])
mse = tf.keras.losses.mse(actual_values, predicted_values)
print(mse.numpy())

HCM City Univ. of Technology, Faculty of Mechanical Engineering 48 Duong Van Tu

Introduction to Deep Learning Lecture 2

Cost (Loss) Function Optimization

We want to find the network weights to minimize the loss function.

1 𝑁
𝑾∗ = argmin𝑊 𝐽 𝑾 = argmin𝑊 ෍ ℒ(𝑓(𝒙𝑖 ; 𝑾), 𝒚𝑖 )
𝑁 𝑖=1

• Initial random weights

• Loop until convergence: A

▪ Predict using updated weight

▪ Compute the cost (loss) function
𝜕𝐽 𝑾 B
▪ Compute gradient ;
𝜕𝑾

𝜕𝐽 𝑾
▪ Update weights 𝑾 ← 𝑾 − 𝜂 (where 𝜂 is the learning rate)
𝜕𝑾
HCM City Univ. of Technology, Faculty of Mechanical Engineering 49 Duong Van Tu
Introduction to Deep Learning Lecture 2

Gradient Descent in Single Variable

• Initial random weight (𝑥)

𝑓(𝑥) = 𝑥2 − 4𝑥
• Loop until convergence:

𝑓′ 𝑥 > 0 ▪ Predict using updated weight;

𝑓′ 𝑥 < 0
▪ Compute the cost (loss) function; 𝑓(𝑥)
𝑓′ 𝑥 = 0
𝜕𝐽 𝑾 𝜕𝑓 𝑥
▪ Compute gradient ;( )
Local 𝜕𝑾 𝜕𝑥
minimum 𝜕𝐽 𝑾
▪ Update weights 𝑾 ← 𝑾 − 𝜂
𝜕𝑾

𝜕𝑓 𝑥
𝑥𝑛𝑒𝑤 = 𝑥𝑐𝑢𝑟 − 𝜂 𝜕𝑓 𝑥
𝜕𝑥 𝑥𝑛𝑒𝑤 = 𝑥𝑐𝑢𝑟 − 𝜂
𝜕𝑥

HCM City Univ. of Technology, Faculty of Mechanical Engineering 50 Duong Van Tu

Introduction to Deep Learning Lecture 2

Gradient Descent

• Initial random weight (𝑥)

• Loop until convergence:
▪ Predict using updated weight;
▪ Compute the cost (loss) function; 𝑓(𝑥)
𝜕𝐽 𝑾 𝜕𝑓 𝑥
▪ Compute gradient ;( )
𝜕𝑾 𝜕𝑥

𝜕𝐽 𝑾
▪ Update weights 𝑾 ← 𝑾 − 𝜂
𝜕𝑾

𝜕𝑓 𝑥
𝑥𝑛𝑒𝑤 = 𝑥𝑐𝑢𝑟 − 𝜂
𝜕𝑥

HCM City Univ. of Technology, Faculty of Mechanical Engineering 51 Duong Van Tu

Introduction to Deep Learning Lecture 2

Back Propagation in Scalar

The goals of backpropagation are straightforward: adjust each weight in the network in
proportion to how much it contributes to the overall error.
• Forward propagation can be viewed as a long series of nested equations.
• Backpropagation is an application of Chain rule to find the Derivatives of cost with
respect to any variable in the nested equation.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 52 Duong Van Tu

Introduction to Deep Learning Lecture 2

Back Propagation in Scalar

Recall the gradient descent method
𝜕𝐽 𝑾
𝑾←𝑾−𝜂
𝜕𝑾
It requires to compute these terms:
𝜕𝐽
(𝑙) with 𝑖 = 1 ~𝑑 (𝑙−1) , 𝑗 = 1~𝑑 (𝑙) , 𝑙 = 1~𝐿
𝜕𝑤𝑖𝑗
𝜕𝐽
(𝑙) with 𝑗 = 1~𝑑(𝑙) , 𝑙 = 1~𝐿
𝜕𝑏𝑗

HCM City Univ. of Technology, Faculty of Mechanical Engineering 53 Duong Van Tu

Introduction to Deep Learning Lecture 2

Chain Rule

• Backpropagation computes the chain rule in a manner that is highly efficient.

• Let 𝑓, 𝑔: ℝ → ℝ
• Suppose 𝑦 = 𝑔 𝑥 , 𝑧 = 𝑓 𝑦 = 𝑓 𝑔 𝑥
• Chain rule for Single path:

𝜕𝑧 𝜕𝑧 𝜕𝑦
= 𝜕𝑦 𝜕𝑧
𝜕𝑥 𝜕𝑦 𝜕𝑥
𝑥 𝜕𝑥 𝑦 𝜕𝑦 𝑧

HCM City Univ. of Technology, Faculty of Mechanical Engineering 54 Duong Van Tu

Introduction to Deep Learning Lecture 2

Chain Rule

• Backpropagation computes the chain rule in a manner that is highly efficient.

• Let 𝑓, 𝑔, 𝑢1 , 𝑢2 : ℝ → ℝ
• Suppose 𝑦1 = 𝑢1 𝑥 , 𝑦2 = 𝑢2 𝑥 , 𝑧 = 𝑓 𝑦1 + 𝑔(𝑦2 )
• Chain rule for Two-path:
𝜕𝑦1 𝑦1 𝜕𝑧
𝜕𝑧 𝜕𝑧 𝜕𝑦1 𝜕𝑧 𝜕𝑦2
= + 𝜕𝑥 𝜕𝑦1
𝜕𝑥 𝜕𝑦1 𝜕𝑥 𝜕𝑦2 𝜕𝑥 𝑥 𝑧

Path 1 Path 2
𝑦2
𝜕𝑦2 𝜕𝑧
𝜕𝑥 𝜕𝑦2

HCM City Univ. of Technology, Faculty of Mechanical Engineering 55 Duong Van Tu

Introduction to Deep Learning Lecture 2

Chain Rule

• Chain rule for Multi-Path

𝑦1 𝜕𝑧
𝜕𝑦1
𝑛 𝜕𝑥 𝜕𝑦1
𝜕𝑧 𝜕𝑧 𝜕𝑦𝑖
=෍
𝜕𝑥 𝜕𝑦𝑖 𝜕𝑥 𝜕𝑦2 𝜕𝑧
𝑖=1 𝑦2
𝑥 𝜕𝑥 𝜕𝑦2 𝑧

𝜕𝑦𝑖 𝜕𝑧
𝜕𝑥 𝑦𝑖 𝜕𝑦𝑖

…
𝜕𝑦𝑛 𝑦𝑛 𝜕𝑧
𝜕𝑥 𝜕𝑦𝑛
HCM City Univ. of Technology, Faculty of Mechanical Engineering 56 Duong Van Tu
Introduction to Deep Learning Lecture 2

Back Propagation in Scalar

Let consider the output layer: Applying Chain Rule, we can obtain:
𝐿
(𝐿−1) (𝐿−1) (𝐿) (𝐿) 𝜕𝐽 𝜕𝐽 𝜕𝑧𝑗
𝑧𝑖 𝑎𝑖 𝑤𝑖𝑗 𝑧𝑗 𝑦ො𝑗 = .
(𝐿) (𝐿) (𝐿)
𝒊 𝒋 𝜕𝑤𝑖𝑗 𝜕𝑧𝑗 𝜕𝑤𝑖𝑗

with 𝑖 = 1 ~𝑑 (𝐿−1) , 𝑗 = 1~𝑑 (𝐿)

(𝐿) (𝐿−1) (𝐿) (𝐿) 𝐿
Forward Propagation: 𝑧𝑗 = 𝑎𝑖 𝑤𝑖𝑗 + 𝑏𝑗 , 𝑦ො𝑗 = 𝑔 𝑧𝑗

By defining output layer error: Output layer input:

𝐿
(𝐿) 𝜕𝐽 𝜕𝐽 𝜕𝐽 𝜕𝑦ො𝑗 𝜕𝑧𝑗 (𝐿−1)
𝑒𝑗 ≜ 𝐿
, 𝐿
= . (𝐿)
= 𝑎𝑖
𝜕𝑧𝑗 𝜕𝑧𝑗 𝜕𝑦ො𝑗 𝜕𝑧 𝐿 𝜕𝑤𝑖𝑗
𝑗
𝜕𝐽 (𝐿) (𝐿−1)
Derivative of Derivative of It yields (𝐿)
= 𝑒𝑗 𝑎𝑖
Loss function Activate function 𝜕𝑤𝑖𝑗

HCM City Univ. of Technology, Faculty of Mechanical Engineering 57 Duong Van Tu

Introduction to Deep Learning Lecture 2

Back Propagation in Scalar

Let consider hidden layer 𝑙 : Forward Propagation:
(𝑙−1) (𝑙−1) (𝑙) (𝑙) (𝑙)
𝑧𝑖 𝑎𝑖 𝑤𝑖𝑗 𝑧𝑗 𝑎𝑗 (𝑙) (𝑙−1) (𝑙) (𝑙)
𝒊 𝒋 𝑧𝑗 = 𝑎𝑖 𝑤𝑖𝑗 + 𝑏𝑗

Applying Chain Rule, we can obtain:

𝑙 𝑙
𝜕𝐽 𝜕𝐽 𝜕𝑧𝑗 𝜕𝐽 𝜕𝐽 𝜕𝑧𝑗 𝜕𝐽
(𝑙) = (𝑙) . (𝑙) (𝑙) = (𝑙) . (𝑙) = (𝑙) with 𝑖 = 1 ~𝑑(𝑙−1) , 𝑗 = 1~𝑑(𝑙)
𝜕𝑤𝑖𝑗 𝜕𝑧𝑗 𝜕𝑤𝑖𝑗 𝜕𝑏𝑗 𝜕𝑧𝑗 𝜕𝑏𝑗 𝜕𝑧𝑗

It should be noted that the current layer error and current layer input are
𝑙
(𝑙) 𝜕𝐽 𝜕𝑧𝑗 (𝑙−1) 𝜕𝐽 (𝑙) (𝑙−1)
𝑒𝑗 ≜ (𝑙) (𝑙)
= 𝑎𝑖 Then (𝑙)
= 𝑒𝑗 𝑎𝑖
𝜕𝑧𝑗 𝜕𝑤𝑖𝑗 𝜕𝑤𝑖𝑗

HCM City Univ. of Technology, Faculty of Mechanical Engineering 58 Duong Van Tu

Introduction to Deep Learning Lecture 2

Back Propagation in Scalar

Let consider hidden layer (𝑙 − 1) Forward Propagation:
(𝑙−1) 𝑙−1
(𝑙−1) (𝑙) (𝑙) (𝑙)
𝑎𝑖 = 𝑔 𝑧𝑖
(𝑙−1)
𝑧𝑖 𝑎𝑖 𝑤𝑖𝑗 𝑧𝑗 𝑎𝑗
𝒊 𝒋 (𝑙) (𝑙−1) (𝑙) (𝑙)
𝑧𝑗 = 𝑎𝑖 𝑤𝑖𝑗 + 𝑏𝑗

Current layer error for single path

𝑙
(𝑙−1) 𝜕𝐽 𝜕𝐽 𝜕𝑧𝑗 𝜕𝑎𝑖
𝑙−1
(𝑙) (𝑙) 𝜕𝑔
𝑒𝑗 ≜ (𝑙−1) = (𝑙) . 𝑙−1 . (𝑙−1) = 𝑒𝑗 . 𝑤𝑖𝑗 . (𝑙−1)
𝜕𝑧𝑖 𝜕𝑧𝑗 𝜕𝑎𝑖 𝜕𝑧𝑖 𝜕𝑧𝑖

Next adjacent Next adjacent Derivative of

layer error layer weight activation function
HCM City Univ. of Technology, Faculty of Mechanical Engineering 59 Duong Van Tu
Introduction to Deep Learning Lecture 2

Back Propagation in Scalar

Let consider hidden layer (𝑙 − 1)

(𝑙 − 1) (𝑙)
(𝑙)
𝑤𝑖1 𝟏
(𝑙)
𝑤𝑖𝑗
𝒊 𝒋 Current layer error for multi-path
(𝑙)
𝑤𝑖𝑑(𝑙) 𝑑 (𝑙)
(𝑙−1) (𝑙) (𝑙) 𝜕𝑔
𝒅(𝒍) 𝑒𝑖 = ෍ 𝑒𝑗 . 𝑤𝑖𝑗 (𝑙−1)
𝑗=1 𝜕𝑧𝑖

Next adjacent Next adjacent Derivative of

layer error layer weight activation function
HCM City Univ. of Technology, Faculty of Mechanical Engineering 60 Duong Van Tu
Introduction to Deep Learning Lecture 2

(1)
𝑧1 (1)
𝑎1
(2)
𝑧1 (2)
𝑎1 Example
(2)
(1) 𝑤11 (3)
𝑤11 RL
(2)
RL 𝑤11
𝑤12
𝑥 (1)
𝑤12 (2) 𝑧 (3) LN 𝑦ො
𝑤21
(2)
𝑤22 (3)
RL RL 𝑤21
(1) (1) (2) (2) 𝑏1 𝑏3
𝑧2 𝑎2 𝑧2 𝑎2 𝑧1 𝑎1 𝑧3 𝑎3
𝑤1 𝑤3
RL RL 𝑤7
𝑤4
𝑥 LN 𝑦ො
𝑤5
𝑤2 𝑤6 𝑧5
RL RL 𝑤8 𝑏5
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4
HCM City Univ. of Technology, Faculty of Mechanical Engineering 61 Duong Van Tu
Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
Loss Function
𝑤1 𝑤3
RL RL 𝑤7 2
𝐿 = 𝑦ො − 𝑦
𝑤4
𝑥 LN 𝑦ො Derivative of Loss Function
𝑤5
𝑤2 𝑤6 𝑧5
RL RL 𝑤8 𝑏5 𝜕𝐿
= 2 𝑦ො − 𝑦
𝑧2 𝑎2 𝑧4 𝑎4 𝜕𝑦ො
𝑏2 𝑏4
Forward Propagation
𝑧1 = 𝑥. 𝑤1 + 𝑏1 𝑧3 = 𝑎1 . 𝑤3 + 𝑎2 . 𝑤5 + 𝑏3 𝑧5 = 𝑎3 . 𝑤7 + 𝑎4 . 𝑤8 + 𝑏5
𝑧2 = 𝑥. 𝑤2 + 𝑏2 𝑧4 = 𝑎1 . 𝑤4 + 𝑎2 . 𝑤6 + 𝑏4 𝑦ො = 𝑧5
𝑎1 = 𝑔 𝑧1 𝑎3 = 𝑔 𝑧3
𝑎2 = 𝑔 𝑧2 𝑎4 = 𝑔 𝑧4
HCM City Univ. of Technology, Faculty of Mechanical Engineering 62 Duong Van Tu
Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
𝑤3 Output layer:
𝑤1 RL RL 𝑤7
𝑤4 𝜕𝐿 𝜕𝐿 𝜕𝐿
𝑥 LN 𝑦ො , ,
𝑤5 𝜕𝑤7 𝜕𝑤8 𝜕𝑏5
𝑤2 𝑤6 𝑧5
RL RL 𝑤8 𝑏5 It can be seen that 𝑧5 is function with
𝑧2 𝑎2 𝑧4 𝑎4 respect to 𝑤7 , 𝑤8 .
𝑏2 𝑏4
Back Propagation Applying the Chain rule, we can

𝜕𝐿 𝜕𝐿 𝜕𝐿 𝜕𝐿 𝜕𝐿 𝜕𝐿 𝜕𝐿 𝜕𝐿 obtain these
, , , , , , ,
𝜕𝑤1 𝜕𝑤2 𝜕𝑤3 𝜕𝑤4 𝜕𝑤5 𝜕𝑤6 𝜕𝑤7 𝜕𝑤8
𝜕𝐿 𝜕𝐿 𝜕𝑧5
=
𝜕𝐿 𝜕𝐿 𝜕𝐿 𝜕𝐿 𝜕𝐿 𝜕𝑤7 𝜕𝑧5 𝜕𝑤7
, , , ,
𝜕𝑏1 𝜕𝑏2 𝜕𝑏3 𝜕𝑏4 𝜕𝑏5
HCM City Univ. of Technology, Faculty of Mechanical Engineering 63 Duong Van Tu
Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
𝑤3 Output layer:
𝑤1 RL RL 𝑤7 𝜕𝐿 𝜕𝐿 𝜕𝑧5
𝑤4 =
𝜕𝑤7 𝜕𝑧5 𝜕𝑤7
𝑥 LN 𝑦ො
𝑤5
𝑤2 𝑧5 𝜕𝐿 𝜕𝐿 𝜕𝑧5
𝑤6 𝑤8 𝑒 (𝐿) ≜ ≡ = 2 𝑦ො − 𝑦 = 𝑎3
RL RL 𝑏5 𝜕𝑧5 𝜕𝑦ො 𝜕𝑤7
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4 Output layer error

Recall that:
Current layer error Current layer input w.r.t 𝑤7 𝑧5 = 𝑎3 . 𝑤7 + 𝑎4 . 𝑤8 + 𝑏5
𝜕𝐿 𝑦ො = 𝑧5
=𝑒 𝐿 . a3
𝜕𝑤7
HCM City Univ. of Technology, Faculty of Mechanical Engineering 64 Duong Van Tu
Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
𝑤3 Output layer:
𝑤1 RL RL 𝑤7 𝜕𝐿 𝜕𝐿 𝜕𝑧5
𝑤4 =
𝜕𝑤8 𝜕𝑧5 𝜕𝑤8
𝑥 LN 𝑦ො
𝑤5
𝑤2 𝑧5 𝜕𝐿 𝜕𝐿 𝜕𝑧5
𝑤6 𝑤8 𝑒 (𝐿) ≜ ≡ = 2 𝑦ො − 𝑦 = 𝑎4
RL RL 𝑏5 𝜕𝑧5 𝜕𝑦ො 𝜕𝑤8
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4 Output layer error

Recall that:
Current layer error Current layer input w.r.t 𝑤8 𝑧5 = 𝑎3 . 𝑤7 + 𝑎4 . 𝑤8 + 𝑏5
𝜕𝐿 𝑦ො = 𝑧5
=𝑒 𝐿 . a4
𝜕𝑤8
HCM City Univ. of Technology, Faculty of Mechanical Engineering 65 Duong Van Tu
Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
Output layer:
𝑤1 𝑤3
RL RL 𝑤7 𝜕𝐿 𝜕𝐿 𝜕𝑧5
𝑤4 =
𝜕𝑏5 𝜕𝑧5 𝜕𝑏5
𝑥 LN 𝑦ො
𝑤5
𝑤2 𝑧5 𝜕𝐿 𝜕𝐿 𝜕𝑧5
𝑤6 𝑤8 𝑒 (𝐿) ≜ ≡ = 2 𝑦ො − 𝑦 =1
RL RL 𝑏5 𝜕𝑧5 𝜕𝑦ො 𝜕𝑏5
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4 Output layer error

Recall that:
Current layer error 𝑧5 = 𝑎3 . 𝑤7 + 𝑎4 . 𝑤8 + 𝑏5
𝜕𝐿 𝑦ො = 𝑧5
=𝑒 𝐿
𝜕𝑏5
HCM City Univ. of Technology, Faculty of Mechanical Engineering 66 Duong Van Tu
Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
2nd Hidden layer:
𝑤1 𝑤3
RL RL 𝑤7
𝑤4 Current layer error:
𝑥 LN 𝑦ො
𝑤5
𝑤2 𝑧5 (2) 𝜕𝐿 𝜕𝐿 𝜕𝑎3 𝜕𝐿 𝜕𝑧5 𝜕𝑎3
𝑤6 𝑤8 𝑒3 ≜ = =
RL RL 𝑏5 𝜕𝑧3 𝜕𝑎3 𝜕𝑧3 𝜕𝑧5 𝜕𝑎3 𝜕𝑧3
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4
𝜕𝐿 𝜕𝑧5 𝜕𝑔(𝑧)
=𝑒 𝐿 = 𝑤7
Recall that: 𝜕𝑧5 𝜕𝑎3 𝜕𝑧
Next layer Next layer Derivative of
𝜕𝐿
𝑒 (𝐿) ≜ , error weight activate function
𝜕𝑧5
𝑧5 = 𝑎3 . 𝑤7 + 𝑎4 . 𝑤8 + 𝑏5

HCM City Univ. of Technology, Faculty of Mechanical Engineering 67 Duong Van Tu

Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
2nd Hidden layer:
𝑤1 𝑤3
RL RL 𝑤7
𝑤4
𝜕𝐿 𝜕𝐿 𝜕𝑧3
𝑥 LN 𝑦ො =
𝑤5 𝜕𝑤3 𝜕𝑧3 𝜕𝑤3
𝑤2 𝑤6 𝑧5
RL RL 𝑤8 𝑏5
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4
𝜕𝐿 2 𝜕𝑧3
= 𝑒3 = 𝑎1
Recall that: 𝜕𝑧3 𝜕𝑤3
Current Current layer
(2) 𝜕𝐿
𝑒3 ≜ , layer error input
𝜕𝑧3
𝑧3 = 𝑎1 . 𝑤3 + 𝑎2 . 𝑤5 + 𝑏3

HCM City Univ. of Technology, Faculty of Mechanical Engineering 68 Duong Van Tu

Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
2nd Hidden layer:
𝑤1 𝑤3
RL RL 𝑤7
𝑤4
𝜕𝐿 𝜕𝐿 𝜕𝑧3
𝑥 LN 𝑦ො =
𝑤5 𝜕𝑤5 𝜕𝑧3 𝜕𝑤5
𝑤2 𝑤6 𝑧5
RL RL 𝑤8 𝑏5
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4
𝜕𝐿 2 𝜕𝑧3
= 𝑒3 = 𝑎2
Recall that: 𝜕𝑧3 𝜕𝑤5
Current Current layer
(2) 𝜕𝐿
𝑒3 ≜ , layer error input
𝜕𝑧3
𝑧3 = 𝑎1 . 𝑤3 + 𝑎2 . 𝑤5 + 𝑏3

HCM City Univ. of Technology, Faculty of Mechanical Engineering 69 Duong Van Tu

Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
2nd Hidden layer:
𝑤1 𝑤3
RL RL 𝑤7
𝑤4
𝜕𝐿 𝜕𝐿 𝜕𝑧3
𝑥 LN 𝑦ො =
𝑤5 𝜕𝑏3 𝜕𝑧3 𝜕𝑏3
𝑤2 𝑤6 𝑧5
RL RL 𝑤8 𝑏5
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4
𝜕𝐿 2 𝜕𝑧3
= 𝑒3 =1
Recall that: 𝜕𝑧3 𝜕𝑏3
Current
(2) 𝜕𝐿
𝑒3 ≜ , layer error
𝜕𝑧3
𝑧3 = 𝑎1 . 𝑤3 + 𝑎2 . 𝑤5 + 𝑏3

HCM City Univ. of Technology, Faculty of Mechanical Engineering 70 Duong Van Tu

Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
2nd Hidden layer:
𝑤1 𝑤3
RL RL 𝑤7
𝑤4 Current layer error:
𝑥 LN 𝑦ො
𝑤5
𝑤2 𝑧5 (2) 𝜕𝐿 𝜕𝐿 𝜕𝑎4 𝜕𝐿 𝜕𝑧5 𝜕𝑎4
𝑤6 𝑤8 𝑒4 ≜ = =
RL RL 𝑏5 𝜕𝑧4 𝜕𝑎4 𝜕𝑧4 𝜕𝑧5 𝜕𝑎4 𝜕𝑧4
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4
𝜕𝐿 𝜕𝑧5 𝜕𝑔(𝑧)
=𝑒 𝐿 = 𝑤8
Recall that: 𝜕𝑧5 𝜕𝑎4 𝜕𝑧
Next layer Next layer Derivative of
𝜕𝐿
𝑒 (𝐿) ≜ , error weight activate function
𝜕𝑧5
𝑧5 = 𝑎3 . 𝑤7 + 𝑎4 . 𝑤8 + 𝑏5

HCM City Univ. of Technology, Faculty of Mechanical Engineering 71 Duong Van Tu

Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
2nd Hidden layer:
𝑤1 𝑤3
RL RL 𝑤7
𝑤4
𝜕𝐿 𝜕𝐿 𝜕𝑧4
𝑥 LN 𝑦ො =
𝑤5 𝜕𝑤4 𝜕𝑧4 𝜕𝑤4
𝑤2 𝑤6 𝑧5
RL RL 𝑤8 𝑏5
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4
𝜕𝐿 2 𝜕𝑧4
= 𝑒4 = 𝑎1
Recall that: 𝜕𝑧4 𝜕𝑤4
Current Current layer
(2) 𝜕𝐿
𝑒4 ≜ , layer error input
𝜕𝑧4
𝑧4 = 𝑎1 . 𝑤4 + 𝑎2 . 𝑤6 + 𝑏4

HCM City Univ. of Technology, Faculty of Mechanical Engineering 72 Duong Van Tu

Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
2nd Hidden layer:
𝑤1 𝑤3
RL RL 𝑤7
𝑤4
𝜕𝐿 𝜕𝐿 𝜕𝑧4
𝑥 LN 𝑦ො =
𝑤5 𝜕𝑤6 𝜕𝑧4 𝜕𝑤6
𝑤2 𝑤6 𝑧5
RL RL 𝑤8 𝑏5
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4
𝜕𝐿 2 𝜕𝑧4
= 𝑒4 = 𝑎2
Recall that: 𝜕𝑧4 𝜕𝑤6
Current Current layer
(2) 𝜕𝐿
𝑒4 ≜ , layer error input
𝜕𝑧4
𝑧4 = 𝑎1 . 𝑤4 + 𝑎2 . 𝑤6 + 𝑏4

HCM City Univ. of Technology, Faculty of Mechanical Engineering 73 Duong Van Tu

Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
2nd Hidden layer:
𝑤1 𝑤3
RL RL 𝑤7
𝑤4
𝜕𝐿 𝜕𝐿 𝜕𝑧4
𝑥 LN 𝑦ො =
𝑤5 𝜕𝑏4 𝜕𝑧4 𝜕𝑏4
𝑤2 𝑤6 𝑧5
RL RL 𝑤8 𝑏5
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4
𝜕𝐿 2 𝜕𝑧4
= 𝑒4 =1
Recall that: 𝜕𝑧4 𝜕𝑏4
Current
(2) 𝜕𝐿
𝑒4 ≜ , layer error
𝜕𝑧4
𝑧4 = 𝑎1 . 𝑤4 + 𝑎2 . 𝑤6 + 𝑏4

HCM City Univ. of Technology, Faculty of Mechanical Engineering 74 Duong Van Tu

Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
1st Hidden layer:
𝑤1 𝑤3
RL RL 𝑤7
𝑤4 Current layer error:
𝑥 LN 𝑦ො
𝑤5
𝑤2 𝑧5 (1) 𝜕𝐿 𝜕𝐿 𝜕𝑎1
𝑤6 𝑤8 𝑒1 ≜ =
RL RL 𝑏5 𝜕𝑧1 𝜕𝑎1 𝜕𝑧1
𝑧2 𝑎2 𝑧4 𝑎4 where
𝑏2 𝑏4
𝜕𝐿 𝜕𝐿 𝜕𝑧5 𝜕𝐿 𝜕𝑧5
Recall that: 𝑢 𝑎1 ≜ 𝑎3 , 𝑣 𝑎1 ≜ 𝑎4 = +
𝜕𝑎1 𝜕𝑧5 𝜕𝑢 𝜕𝑧5 𝜕𝑣
𝜕𝐿
𝑒 (𝐿) ≜ , 𝑧5 = 𝑎3 . 𝑤7 + 𝑎4 . 𝑤8 + 𝑏5 𝜕𝐿 𝜕𝑧5 𝜕𝑎3 𝜕𝑧3 𝜕𝐿 𝜕𝑧5 𝜕𝑎4 𝜕𝑧4
𝜕𝑧5 = +
𝜕𝑧5 𝜕𝑎3 𝜕𝑧3 𝜕𝑎1 𝜕𝑧5 𝜕𝑎4 𝜕𝑧4 𝜕𝑎1
= 𝑢 𝑎1 . 𝑤7 + 𝑣 𝑎1 . 𝑤8 + 𝑏5

HCM City Univ. of Technology, Faculty of Mechanical Engineering 75 Duong Van Tu

Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
1st Hidden layer:
𝑤1 𝑤3
RL RL 𝑤7
𝑤4 Current layer error:
𝑥 LN 𝑦ො
𝑤5
𝑤2 𝑧5 (1) 𝜕𝐿 𝜕𝐿 𝜕𝑎1
𝑤6 𝑤8 𝑒1 ≜ =
RL RL 𝑏5 𝜕𝑧1 𝜕𝑎1 𝜕𝑧1
𝑧2 𝑎2 𝑧4 𝑎4 𝜕𝐿 𝜕𝐿 𝜕𝑧5 𝜕𝑎3 𝜕𝑧3 𝜕𝐿 𝜕𝑧5 𝜕𝑎4 𝜕𝑧4
𝑏2 𝑏4 = +
𝜕𝑎1 𝜕𝑧5 𝜕𝑎3 𝜕𝑧3 𝜕𝑎1 𝜕𝑧5 𝜕𝑎4 𝜕𝑧4 𝜕𝑎1
𝜕𝐿
Recall that: 𝑒 (𝐿) ≜ ,
𝜕𝑧5

𝑧5 = 𝑎3 . 𝑤7 + 𝑎4 . 𝑤8 + 𝑏5 𝑒 (𝐿) 𝑤7 𝑤3 𝑤8 𝑤4

𝑧3 = 𝑎1 . 𝑤3 + 𝑎2 . 𝑤5 + 𝑏3
Derivative of
𝑧4 = 𝑎1 . 𝑤4 + 𝑎2 . 𝑤6 + 𝑏4
activate function
HCM City Univ. of Technology, Faculty of Mechanical Engineering 76 Duong Van Tu
Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
Hidden layer:
1st
𝑤1 𝑤3
RL RL 𝑤7 𝜕𝐿 𝜕𝐿 𝜕𝑧5 𝜕𝑎3 𝜕𝑧3 𝜕𝐿 𝜕𝑧5 𝜕𝑎4 𝜕𝑧4
𝑤4 = +
𝜕𝑎1 𝜕𝑧5 𝜕𝑎3 𝜕𝑧3 𝜕𝑎1 𝜕𝑧5 𝜕𝑎4 𝜕𝑧4 𝜕𝑎1
𝑥 LN 𝑦ො
𝑤5 𝜕𝐿
𝑤2 𝑧5 (2) (2) 𝜕𝐿
𝑤6 𝑤8 𝑒3 ≜ 𝑒4 ≜
RL RL 𝑏5 𝜕𝑧3 𝜕𝑧4 𝑤
𝑤3 4
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4 Current layer error:

(1) 𝜕𝐿 2 𝜕𝑎1 2 𝜕𝑎1

𝑒1 ≜ = 𝑒3 . 𝑤3 . + 𝑒4 . 𝑤4 .
𝜕𝑧1 𝜕𝑧1 𝜕𝑧1

Next layer Next layer Derivative of

error weight activate function
HCM City Univ. of Technology, Faculty of Mechanical Engineering 77 Duong Van Tu
Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
1st Hidden layer:
𝑤1 𝑤3
RL RL 𝑤7
𝑤4
𝑥 LN 𝑦ො
𝑤5
𝑤2 𝑤6 𝑧5
RL RL 𝑤8 𝑏5
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4 Current layer error:

(1) 𝜕𝐿 2 𝜕𝑎2 2 𝜕𝑎2

𝑒2 ≜ = 𝑒3 . 𝑤5 . + 𝑒4 . 𝑤6 .
𝜕𝑧2 𝜕𝑧2 𝜕𝑧2

Next layer Next layer Derivative of

error weight activate function
HCM City Univ. of Technology, Faculty of Mechanical Engineering 78 Duong Van Tu
Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
1st Hidden layer:
𝑤1 𝑤3
RL RL 𝑤7
𝑤4
𝜕𝐿 𝜕𝐿 𝜕𝑧1
𝑥 LN 𝑦ො =
𝑤5 𝜕𝑤1 𝜕𝑧1 𝜕𝑤1
𝑤2 𝑤6 𝑧5
RL RL 𝑤8 𝑏5
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4
𝜕𝐿 1 𝜕𝑧1
= 𝑒1 =𝑥
Recall that: 𝜕𝑧1 𝜕𝑤1
Current Current layer
(1) 𝜕𝐿
𝑒1 ≜ , layer error input
𝜕𝑧1
𝑧1 = 𝑥. 𝑤1 + 𝑏1

HCM City Univ. of Technology, Faculty of Mechanical Engineering 79 Duong Van Tu

Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
1st Hidden layer:
𝑤1 𝑤3
RL RL 𝑤7
𝑤4
𝜕𝐿 𝜕𝐿 𝜕𝑧1
𝑥 LN 𝑦ො =
𝑤5 𝜕𝑏1 𝜕𝑧1 𝜕𝑏1
𝑤2 𝑤6 𝑧5
RL RL 𝑤8 𝑏5
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4
𝜕𝐿 1 𝜕𝑧1
= 𝑒1 =1
Recall that: 𝜕𝑧1 𝜕𝑤1
Current
(1) 𝜕𝐿
𝑒1 ≜ , layer error
𝜕𝑧1
𝑧1 = 𝑥. 𝑤1 + 𝑏1

HCM City Univ. of Technology, Faculty of Mechanical Engineering 80 Duong Van Tu

Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
1st Hidden layer:
𝑤1 𝑤3
RL RL 𝑤7
𝑤4
𝜕𝐿 𝜕𝐿 𝜕𝑧2
𝑥 LN 𝑦ො =
𝑤5 𝜕𝑤2 𝜕𝑧2 𝜕𝑤2
𝑤2 𝑤6 𝑧5
RL RL 𝑤8 𝑏5
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4
𝜕𝐿 1 𝜕𝑧2
= 𝑒2 =𝑥
Recall that: 𝜕𝑧2 𝜕𝑤2
Current Current layer
(1) 𝜕𝐿
𝑒2 ≜ , layer error input
𝜕𝑧2
𝑧2 = 𝑥. 𝑤2 + 𝑏2

HCM City Univ. of Technology, Faculty of Mechanical Engineering 81 Duong Van Tu

Introduction to Deep Learning Lecture 2
𝑏1 𝑏3
𝑧1 𝑎1 𝑧3 𝑎3 Example
1st Hidden layer:
𝑤1 𝑤3
RL RL 𝑤7
𝑤4
𝜕𝐿 𝜕𝐿 𝜕𝑧2
𝑥 LN 𝑦ො =
𝑤5 𝜕𝑏2 𝜕𝑧2 𝜕𝑏2
𝑤2 𝑤6 𝑧5
RL RL 𝑤8 𝑏5
𝑧2 𝑎2 𝑧4 𝑎4
𝑏2 𝑏4
𝜕𝐿 1 𝜕𝑧2
= 𝑒2 =1
Recall that: 𝜕𝑧2 𝜕𝑏2
Current
(1) 𝜕𝐿
𝑒2 ≜ , layer error
𝜕𝑧2
𝑧2 = 𝑥. 𝑤2 + 𝑏2

HCM City Univ. of Technology, Faculty of Mechanical Engineering 82 Duong Van Tu

Introduction to Deep Learning Lecture 2

Back Propagation in Matrix form

For 𝑨 not a function of 𝒙:

𝒛 ∈ ℛ𝑚×𝑛 , 𝑨 ∈ ℛ 𝑚×𝑝 , 𝒙 ∈ ℛ 𝑝×𝑛 , 𝒛 = 𝐀𝒙

𝜕𝒛
= 𝑨𝑇
𝜕𝒙
If 𝒘 is a function of 𝒛, which is a function of 𝒚, which is a function of 𝒙, we can obtain
this chain rule:

𝜕𝒘 𝜕𝒚 𝜕𝒛 𝜕𝒘
=
𝜕𝒙 𝜕𝒙 𝜕𝒚 𝜕𝒛
For more understanding, refer to matrix calculus documentation.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 83 Duong Van Tu

Introduction to Deep Learning Lecture 2

Back Propagation in Matrix form

𝑁×𝑑 (𝑙)
Step 1: Execute Forward propagation and store each layer output 𝒂 (𝑙)
∈ℛ
𝜕𝐽 𝜕𝐽 (𝐿)
Step 2: Compute the derivative of loss function = ∈ ℛ 𝑁×𝑑
𝜕𝒂(𝐿) 𝜕ෝ
𝒚

𝜕𝒂(𝐿) 𝜕ෝ
𝒚 (𝐿)
Step 3: Compute the derivative of activate function = ∈ ℛ 𝑁×𝑑
𝜕𝒛(𝐿) 𝜕𝒛(𝐿)

𝜕𝒛(𝐿) 𝐿−1 𝑇 (𝐿−1) ×𝑁

Step 4: Compute the term =𝒂 ∈ ℛ 𝑑 since 𝒛(𝑙) = 𝒂(𝑙−1) 𝑾(𝑙) + 𝒃(𝑙)
𝜕𝑾(𝐿)

Step 5: Compute the gradient of weight

𝜕𝐽
=𝒂 𝐿−1 𝑇 𝒆(𝐿) ∈ ℛ 𝑑 𝐿−1 ×𝑑 𝐿
𝜕𝑾(𝐿)

𝜕𝐽 (𝐿) 𝜕𝐽 𝜕ෝ𝒚
where 𝒆(𝐿) ≜ ∈ ℛ 𝑁×𝑑 is the Hadamard product of ⨀ (𝐿)
𝜕𝒛(𝐿) 𝜕ෝ
𝒚 𝜕𝒛

HCM City Univ. of Technology, Faculty of Mechanical Engineering 84 Duong Van Tu

Introduction to Deep Learning Lecture 2

Back Propagation in Matrix form

𝜕𝐽 𝜕𝐽 𝜕ෝ𝒚 𝐿
Step 6: Compute the gradient of bias = 𝑁
σ𝑖=1 ⨀ (𝐿) ∈ ℛ 1×𝑑
𝜕𝒃(𝐿) 𝜕ෝ
𝒚 𝜕𝒛

Step 7: Repeat step 5 for all other layer to compute the gradient of weight

𝜕𝐽 𝑙−1 𝑇 (𝑙) 𝑑 𝑙−1 ×𝑑 𝑙

(𝑙)
=𝒂 𝒆 ∈ℛ
𝜕𝑾
𝑁×𝑑 𝑙 𝑙+1 𝑇
where 𝒆 (𝑙)
∈ℛ ,𝒆 𝑙
=𝒆 𝑙+1
.𝑾 ⨀𝑔′(𝒛(𝑙) )
Step 8: Repeat step 6 for all other layer to compute the gradient of bias

𝜕𝐽 𝑁 𝜕𝐽 𝜕𝒂(𝑙) 1×𝑑 𝑙
= σ𝑖=1 ⨀ (𝑙) ∈ ℛ
𝜕𝒃(𝑙) 𝜕𝒂(𝑙) 𝜕𝒛

HCM City Univ. of Technology, Faculty of Mechanical Engineering 85 Duong Van Tu

Introduction to Deep Learning Lecture 2

Weight and Bias Update

Update the weight matrix for all layers

𝜕𝐽
𝑾(𝑙) = 𝑾(𝑙) −𝜂 for 𝑙 = 1 … 𝐿
𝜕𝑾(𝑙)

Update the bias matrix for all layers

𝜕𝐽
𝒃(𝑙) = 𝒃(𝑙) − 𝜂 (𝑙) for 𝑙 = 1 … 𝐿
𝜕𝒃

HCM City Univ. of Technology, Faculty of Mechanical Engineering 86 Duong Van Tu

Introduction to Deep Learning Lecture 2

Example: Build a model to estimate of test score based on sleep and study hours

1 𝑿 ∈ 𝑅𝑁×2
1
𝑾 , 𝒃(1) 𝑾(2) , 𝑏 (2)
1 (1) 𝑾(1) ∈ 𝑅2×3
𝑧1
(1)
𝑎1 𝒃(1) ∈ 𝑅1×3
Sleep 𝑥1 𝐿 = 𝑦ො − 𝑦 2
(2)
𝑧1 𝑾(2) ∈ 𝑅3×1
hours
𝑦ො 𝑏 (2) ∈ ℛ
(1) (1)
Study 𝑧2 𝑎2
𝒛(1) = 𝑿𝑾(1) + 𝒃(1) ∈ 𝑅𝑁×3
hours 𝑥2
(1) (1) 𝒂(1) = 𝑔1 𝒛(1) ∈ 𝑅𝑁×3
𝑧3 𝑎3
𝑿 𝒛(1) 𝒂(1) 𝒛 (2) 𝑧 (2) = 𝒂(1) 𝑾(2) + 𝑏 (2) ∈ 𝑅𝑁×1
2 = 𝑦ො = 𝑔2 𝑧 (2) ∈ 𝑅𝑁×1
Inputs Hidden Output 𝑎
HCM City Univ. of Technology, Faculty of Mechanical Engineering 87 Duong Van Tu
Introduction to Deep Learning Lecture 2

Example: Build a model to estimate of test score based on sleep and study hours

Code Symbol Math Symbol Definition Dimensions

X 𝑋 Input data, each rom in an example (3, 2)
y 𝑦 Target data (3, 1)
W1 𝑊 (1) Layer 1 weights (2, 3)
b1 𝑏 (1) Layer 1 bias (1, 3)
W2 𝑊 (2) Layer 2 weights (3, 1)
b2 𝑏 (2) Layer 2 bias (1, 1)
z1 𝑧 (1) Layer 1 activation (3, 3)
a1 𝑎(1) Layer 1 output (3, 3)
z2 𝑧 (2) Layer 2 activation (3, 1)
a2 𝑎(2) Layer 2 output (3, 1)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 88 Duong Van Tu

Introduction to Deep Learning Lecture 2

Example: Build a model to estimate of test score based on sleep and study hours
import numpy as np
# Simulated example data
num_examples = 300
num_input_units = 2
num_hidden_units = 3
num_output_units = 1
# Generate random input features
X = X = np.array([[0.3, 1],
[0.5, 0.2],
[1, 0.4]])
# Generate random output labels
y = np.array([[0.75],
[0.82],
[0.93]])
# Initialize random weights and biases
W_input_hidden = np.random.rand(num_input_units, num_hidden_units)
b_hidden = np.random.rand(1, num_hidden_units)
W_hidden_output = np.random.rand(num_hidden_units, num_output_units)
b_output = np.random.rand(1, num_output_units)
# Learning rate
learning_rate = 0.1

HCM City Univ. of Technology, Faculty of Mechanical Engineering 89 Duong Van Tu

Introduction to Deep Learning Lecture 2

Example: Build a model to estimate of test score based on sleep and study hours
for iteration in range(1, 1001):
# Forward propagation
z_hidden = np.dot(X, W_input_hidden) + b_hidden
Output layer
a_hidden = 1 / (1 + np.exp(-z_hidden)) error
z_output = np.dot(a_hidden, W_hidden_output) + b_output
a_output = z_output # Linear activation for regression 𝜕𝐽 𝜕𝐽
= = 2 𝑦ො − 𝑦
# Calculate cost (MSE) 𝜕𝑦ො 𝜕𝑎(2)
cost = np.mean((a_output - y) ** 2)
# Backpropagation 𝜕𝐽 𝜕𝐽 𝜕𝐽 𝜕𝐽
= = 𝒂 1 𝑇
dJ_da_output = 2 * (a_output - y) 𝜕𝑎(2) 𝜕𝑧 (2) 𝜕𝑾(2) 𝜕𝑧 (2)
dJ_dz_output = dJ_da_output # Linear activation derivative 𝑁
dJ_dW_hidden_output = a_hidden.T.dot(dJ_dz_output) 𝜕𝐽 𝜕𝐽
= ෍
dJ_db_output = np.sum(dJ_dz_output, axis=0, keepdims=True) 𝜕𝑏 (2) 𝜕𝑧 (2)
dJ_da_hidden = dJ_dz_output.dot(W_hidden_output.T) 𝑖=1
dJ_dz_hidden = dJ_da_hidden * a_hidden * (1 - a_hidden) 𝜕𝐽 𝜕𝐽 2 𝑇
dJ_dW_input_hidden = X.T.dot(dJ_dz_hidden) = 𝑾
𝜕𝒂(1) 𝜕𝑧 (2)
dJ_db_hidden = np.sum(dJ_dz_hidden, axis=0, keepdims=True)
# Update weights and biases 𝜕𝐽 𝜕𝐽 1 1−𝒂 1
W_hidden_output -= learning_rate * dJ_dW_hidden_output = 𝒂
𝜕𝒛(1) 𝜕𝒂 1
b_output -= learning_rate * dJ_db_output
W_input_hidden -= learning_rate * dJ_dW_input_hidden 𝜕𝐽 𝜕𝐽
𝑇
b_hidden -= learning_rate * dJ_db_hidden (1)
= 𝑿 (1)
𝜕𝑾 𝜕𝒛
HCM City Univ. of Technology, Faculty of Mechanical Engineering 90 Duong Van Tu
Introduction to Deep Learning Lecture 2

Example: Build a model to estimate of test score based on sleep and study hours
# Log cost every 100 iterations
if iteration % 100 == 0:
print(f"Iteration {iteration}, Cost: {cost:.4f}")
print("Training complete!")

Iteration 100, Cost: 0.0044

Iteration 200, Cost: 0.0025
Iteration 300, Cost: 0.0012
Iteration 400, Cost: 0.0006
Iteration 500, Cost: 0.0003
Iteration 600, Cost: 0.0002
Iteration 700, Cost: 0.0002
Iteration 800, Cost: 0.0001
Iteration 900, Cost: 0.0001
Iteration 1000, Cost: 0.0001
Training complete!

HCM City Univ. of Technology, Faculty of Mechanical Engineering 91 Duong Van Tu

Introduction to Deep Learning Lecture 2

Exercise 3
Build a model to estimate of house price based on number of room, area, and floor:
Hidden layer: 4 nodes – Activation: sigmoid
Output layer: 1 node – Activation: Linear Do not use TensorFlow
Weight: Random – Cost function: MSE
Compute the model and train for 100 iterations, plot the cost, weight.
Example Rooms Area Floor Price (k)
1 2 100 1 75
2 1 60 2 60
3 3 120 1 90
4 2 75 2 80

HCM City Univ. of Technology, Faculty of Mechanical Engineering 92 Duong Van Tu

Introduction to Deep Learning Lecture 2

Epochs vs Iterations

▪ Epoch can be understood as the number of times the algorithm scans the entire

data. For example if we set epoch = 10 then the algorithm will scan the entire

data ten times.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 93 Duong Van Tu

Introduction to Deep Learning Lecture 2

Epochs vs Iterations

▪ Iteration is the number of times a certain batch is passed via an algorithm.

▪ For Example : a dataset having 20 examples, batch size = 4 examples,

and epochs = 5.

▪ Then, in each epoch, there will be:

20 examples /4 examples per batch = 5 batches

▪ Each batch will be passed by the algorithm there will be 5 iterations per epoch.

▪ Total number of iterations will be 5 epochs * 5 iterations per epoch = 25 iterations

HCM City Univ. of Technology, Faculty of Mechanical Engineering 94 Duong Van Tu

Introduction to Deep Learning Lecture 2

Learning Rate

Small learning rate converges slowly and gets stuck in false local minima.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 95 Duong Van Tu

Introduction to Deep Learning Lecture 2

Learning Rate

Large learning rate overshoots, become unstable and diverge.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 96 Duong Van Tu

Introduction to Deep Learning Lecture 2

Learning Rate

Suitable learning rate converges smoothly and avoid local minima.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 97 Duong Van Tu

Introduction to Deep Learning Lecture 2

Optimizer >> Batch Gradient Descent

▪ Batch Gradient Descent involves calculations over the full training set at each step.

▪ This results in that it is very slow on very large training data.

▪ It becomes very computationally expensive to do Batch GD.

▪ This is great for convex or relatively smooth error manifolds.

▪ Batch GD scales well with the number of features.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 98 Duong Van Tu

Introduction to Deep Learning Lecture 2

Optimizer >> Batch Gradient Descent

Algorithm
1. Initialize weight randomly
2. Loop until convergence:
𝜕𝐽(𝑾,𝒃) 𝜕𝐽(𝑾,𝒃)
3. Compute gradient, ,
𝜕𝑾 𝜕𝒃

𝜕𝐽 𝑾,𝒃 𝜕𝐽(𝑾,𝒃)
4. Update weights, bias 𝑾 ← 𝑾 − 𝜂 ,𝒃 ← 𝒃 − 𝜂
𝜕𝑾 𝜕𝒃

5. Return weights, bias

HCM City Univ. of Technology, Faculty of Mechanical Engineering 99 Duong Van Tu

Introduction to Deep Learning Lecture 2

Optimizer >> Stochastic Gradient Descent

▪ Stochastic Gradient Descent (SGD) is a variant of the Gradient Descent algorithm

▪ It deals with large datasets in machine learning projects.
▪ A single random training example (or a small batch) is selected to calculate the
gradient and update the model parameters.
▪ The advantage of using SGD is its computational efficiency, especially when dealing
with large datasets.
▪ The computational cost per iteration is significantly reduced compared to traditional
Gradient Descent methods.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 100 Duong Van Tu
Introduction to Deep Learning Lecture 2

Optimizer >> Stochastic Gradient Descent

Algorithm
1. Initialize weight randomly
2. Loop until convergence:
3. Shuffle the training dataset, pick single data point 𝒊
4. Iterate over each training example
𝜕𝐽𝑖 (𝑾,𝒃) 𝜕𝐽𝑖 (𝑾,𝒃)
5. Compute the gradient of the cost function, ,
𝜕𝑾 𝜕𝒃

𝜕𝐽𝑖 𝑾,𝒃 𝜕𝐽𝑖 (𝑾,𝒃)

6. Update weights, bias 𝑾 ← 𝑾 −𝜂 ,𝒃 ← 𝒃 − 𝜂
𝜕𝑾 𝜕𝒃

7. Return weights, bias

HCM City Univ. of Technology, Faculty of Mechanical Engineering 101 Duong Van Tu
Introduction to Deep Learning Lecture 2

Batch Gradient Descent vs Stochastic Gradient Descent

HCM City Univ. of Technology, Faculty of Mechanical Engineering 102 Duong Van Tu
Introduction to Deep Learning Lecture 2

Optimizer >> Mini Batch Gradient Descent

▪ Parameters are updated after computing the gradient of the error with respect to a
subset of the training set.
Algorithm
1. Initialize weight randomly
2. Loop until convergence:
3. Pick a batch size of examples to train.
𝜕𝐽(𝑾,𝒃) 𝜕𝐽 (𝑾,𝒃)
4. Compute the gradient of the cost function, ,
𝜕𝑾 𝜕𝒃

𝜕𝐽 𝑾,𝒃 𝜕𝐽(𝑾,𝒃)
5. Update weights, bias 𝑾 ← 𝑾 − 𝜂 ,𝒃 ← 𝒃 − 𝜂
𝜕𝑾 𝜕𝒃

6. Return weights, bias

HCM City Univ. of Technology, Faculty of Mechanical Engineering 103 Duong Van Tu
Introduction to Deep Learning Lecture 2

Optimizer >> Adaptive Gradient (AdaGrad)

• Adapts the learning rate for each parameter individually.
• The learning rate will be lower for parameters with a high gradient.
• For parameters with a low gradient, the learning rate will be larger.
• It modifies the general learning rate 𝜂 at each time step 𝑡 for every parameter
𝜃𝑖 based on the past gradients for 𝜃𝑖 .
• The learning rate is updated per each iteration as follows:

𝜂 𝜕𝐽
𝜃𝑡+1 = 𝜃𝑡 − ⨀𝑔𝑡 where 𝑔𝑡 ≜
𝐺𝑡 +𝜖 𝜕𝜃𝑡

HCM City Univ. of Technology, Faculty of Mechanical Engineering 104 Duong Van Tu
Introduction to Deep Learning Lecture 2

Optimizer >> Adaptive Moment Estimation (AdaDelta)

• Adadelta is an extension of Adagrad.

• Adadelta does not require a manual learning rate and can also handle variable-length
sequences.
𝐸 𝑔2 𝑡 = 𝛾𝐸 𝑔2 𝑡−1 + 1 − 𝛾 𝑔𝑡2

𝜂
𝜃𝑡+1 = 𝜃𝑡 − 𝑔𝑡
𝐸 𝑔2 𝑡 +𝜖

HCM City Univ. of Technology, Faculty of Mechanical Engineering 105 Duong Van Tu
Introduction to Deep Learning Lecture 2

Optimizer >> Adaptive Moment Estimation (Adam)

𝑚𝑡 = 𝛽1 𝑚𝑡−1 + 1 − 𝛽1 𝑔𝑡
where 𝑚𝑡 is the first moment, 𝑣𝑡 is the second moment
𝑣𝑡 = 𝛽2 𝑣𝑡−1 + 1 − 𝛽2 𝑔𝑡2

𝑚𝑡
𝑚ෝ𝑡 =
1 − 𝛽1
𝑣𝑡
𝑣ො𝑡 =
1 − 𝛽2

𝜂
𝜃𝑡+1 = 𝜃𝑡 − 𝑚
ෝ𝑡
𝑣ො𝑡 + 𝜖

HCM City Univ. of Technology, Faculty of Mechanical Engineering 106 Duong Van Tu
Introduction to Deep Learning Lecture 2

Optimizer

Algorithm TensorFlow Implementation

• SGD tf.keras.optimizers.SGD
• Adam tf.keras.optimizers.Adam
• Adadelta tf.keras.optimizers.Adadelta
• Adagrad tf.keras.optimizers.Adagrad
• RMSProp tf.keras.optimizers.RMSProp

HCM City Univ. of Technology, Faculty of Mechanical Engineering 107 Duong Van Tu
Introduction to Deep Learning Lecture 2

Exercise 4
Use the requirement described in Ex. 3. Change the learning rate and plot the cost function
corresponding to each value of learning rate.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 108 Duong Van Tu
Introduction to Deep Learning Lecture 2

Bias and Variance

▪ Bias: Assumptions made by a model to make a function easier to learn. It is actually

the error rate of the training data.
▪ When the error rate has a high value, it is called High Bias, otherwise it called Low
Bias.
▪ Variance: The difference between the error rate of training data and testing data is
called variance.
▪ If the difference is high then it’s called high variance, otherwise, it is called low
variance.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 109 Duong Van Tu
Introduction to Deep Learning Lecture 2

Problem of Underfitting
▪ A statistical model is said to have underfitting when it
cannot capture the underlying trend of the data.
▪ The model does not make accurate predictions on
testing data.
▪ This case is called as High bias.
Reason:
• The size of the training dataset used is not enough.
• The model is too simple.
• Training data is not cleaned and also contains noise
in it.
HCM City Univ. of Technology, Faculty of Mechanical Engineering 110 Duong Van Tu
Introduction to Deep Learning Lecture 2

Problem of Underfitting

Techniques to Reduce Underfitting

1. Increase model complexity.
2. Increase the number of features, performing feature
engineering.
3. Remove noise from the data.
4. Increase the number of epochs or increase the
duration of training to get better results.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 111 Duong Van Tu
Introduction to Deep Learning Lecture 2

Problem of Overfitting
▪ A statistical model is said to have overfitting when it
only performs well on training data but performs poorly
on testing data.
▪ This case is called as High Variance.
Reason:
• The size of the training dataset used is not enough.
• The model is too complex.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 112 Duong Van Tu
Introduction to Deep Learning Lecture 2

Problem of Overfitting

Techniques to Reduce Overfitting

1. Increase training data.
2. Reduce model complexity.
3. Early stopping during the training phase (have an eye
over the loss over the training period as soon as loss
begins to increase stop training).
4. Ridge Regularization and Lasso Regularization.
5. Use dropout for neural networks to tackle overfitting.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 113 Duong Van Tu
Introduction to Deep Learning Lecture 2

Dropout Regularization

▪ Dropout regularization is one technique used to tackle overfitting problems in

deep learning.
▪ During training phase, randomly set some
activations to 0.
▪ Typically drop 50% of activation in layers

tf.keras.layers.Dropout(p=0.5)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 114 Duong Van Tu
Introduction to Deep Learning Lecture 2

Early Stopping Regularization

▪ In Regularization by Early Stopping, we stop training the model when the

performance of the model on the validation set is getting worse-increasing loss
or decreasing accuracy or poorer values of the scoring metric.

early_stopping = EarlyStopping(patience=10, restore_best_weights=True)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 115 Duong Van Tu
Introduction to Deep Learning Lecture 2

Implementation on TensorFlow Keras

• Create input and output using TensorFlow constant “tf.constant”

• Create model using Sequential Class “tf.keras.Sequential()”
• Compile the model with the optimizer and loss function
“model.compile(optimizer='adam’, loss=‘mse’)”
• Train the model using “fit” method “model.fit”
• Generate predictions and analyze accuracy using “predict” method “model.predict”

HCM City Univ. of Technology, Faculty of Mechanical Engineering 116 Duong Van Tu
Introduction to Deep Learning Lecture 2

Exercise 5
Build a model to estimate of house price based on number of room, area, and floor:
Hidden layer: 4 nodes – Activation: Sigmoid
Output layer: 1 node Use TensorFlow >> Keras API
Weight: Random – Cost function: MSE
Compute the model and train for 100 iterations. Compute the estimated price for 5th example.
Example Rooms Area Floor Price (k)
1 2 100 1 75
2 1 60 2 60
3 3 120 1 90
4 2 75 2 80
5 1 45 1 ?
HCM City Univ. of Technology, Faculty of Mechanical Engineering 117 Duong Van Tu
Introduction to Deep Learning Lecture 2

Exercise 6
Predict the compressive strength of concrete manufactured according to various recipes. The
dataset is taken from BKEL.
Build the model two hidden layers with the same 32 nodes, output layer with a single node.
Hidden layer uses ReLU function as activation function.

HCM City Univ. of Technology, Faculty of Mechanical Engineering 118 Duong Van Tu

Deep Learning: Multilayer Perceptrons
No ratings yet
Deep Learning: Multilayer Perceptrons
33 pages
Deep Learning PDF
No ratings yet
Deep Learning PDF
289 pages
Deep Learning for Beginners
No ratings yet
Deep Learning for Beginners
151 pages
FDL Module1
No ratings yet
FDL Module1
102 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
101 pages
1 Slides ANN
No ratings yet
1 Slides ANN
90 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Unit 1 and Unit 2
No ratings yet
Unit 1 and Unit 2
30 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
104 pages
Machine Learning: The Hundred-Page Book
No ratings yet
Machine Learning: The Hundred-Page Book
17 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
Unit I
No ratings yet
Unit I
90 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
Intro to Neural Networks Lecture
No ratings yet
Intro to Neural Networks Lecture
65 pages
ST M Hdstat RNN Deep Learning
No ratings yet
ST M Hdstat RNN Deep Learning
17 pages
Ch03-Deep Learning Network
No ratings yet
Ch03-Deep Learning Network
36 pages
Wa0000.
No ratings yet
Wa0000.
14 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
Deep Learning for Tech Enthusiasts
No ratings yet
Deep Learning for Tech Enthusiasts
95 pages
Deep Learning
No ratings yet
Deep Learning
11 pages
Chapter 2 - 3 Deep Neural Network
No ratings yet
Chapter 2 - 3 Deep Neural Network
23 pages
Lec 8 Training NN
No ratings yet
Lec 8 Training NN
71 pages
Neural Networks
No ratings yet
Neural Networks
108 pages
16 DL 1
No ratings yet
16 DL 1
9 pages
1703929933487-NLP Language
No ratings yet
1703929933487-NLP Language
106 pages
Lec 8 Training NN
No ratings yet
Lec 8 Training NN
71 pages
Unit 4 ML NN, DL, CNN-1
No ratings yet
Unit 4 ML NN, DL, CNN-1
84 pages
UNIT 1 Introduction Part 1
No ratings yet
UNIT 1 Introduction Part 1
37 pages
Neural Networks & Gradient Descent
No ratings yet
Neural Networks & Gradient Descent
77 pages
Deep Learning
No ratings yet
Deep Learning
13 pages
Deep Learning Basics by Romain Tavenard
No ratings yet
Deep Learning Basics by Romain Tavenard
49 pages
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
No ratings yet
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
40 pages
Instructor's Solution Manual For Neural Networks
0% (1)
Instructor's Solution Manual For Neural Networks
40 pages
Deep Learning Basics Lecture 1 Feedforward
No ratings yet
Deep Learning Basics Lecture 1 Feedforward
31 pages
10 Neural Nets
No ratings yet
10 Neural Nets
61 pages
Cours 2
No ratings yet
Cours 2
25 pages
Deep Learning & Neural Networks
No ratings yet
Deep Learning & Neural Networks
10 pages
Deep Learning
100% (4)
Deep Learning
100 pages
Neural Networks for CS Students
100% (1)
Neural Networks for CS Students
22 pages
Introduction To Artificial Neural Networks
No ratings yet
Introduction To Artificial Neural Networks
31 pages
Tutorial 1,2
No ratings yet
Tutorial 1,2
12 pages
Neural Networks: Feedforward Basics
No ratings yet
Neural Networks: Feedforward Basics
24 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
10 nn1
No ratings yet
10 nn1
162 pages
Deep Learning and Computacional Physics
No ratings yet
Deep Learning and Computacional Physics
88 pages
MachineLearningSlides PartOne
No ratings yet
MachineLearningSlides PartOne
252 pages
10 Multilayer Perceptrons
No ratings yet
10 Multilayer Perceptrons
54 pages
Main
No ratings yet
Main
183 pages
Module 2 DL Snotes P1
No ratings yet
Module 2 DL Snotes P1
16 pages
Unit 7 - Week 6: Assignment 6
No ratings yet
Unit 7 - Week 6: Assignment 6
4 pages
FLAT PYQs
No ratings yet
FLAT PYQs
9 pages
cs394r Quiz Handout
No ratings yet
cs394r Quiz Handout
2 pages
Gate Questions On Finite Automata - Theory-of-Computation - AcademyEra PDF
No ratings yet
Gate Questions On Finite Automata - Theory-of-Computation - AcademyEra PDF
13 pages
Neural Network Architecture
No ratings yet
Neural Network Architecture
3 pages
Lag Llama
No ratings yet
Lag Llama
23 pages
ChatGPT Is All You Need Architecture Training Procedure Capabilities, Limitations Applications NAIK
No ratings yet
ChatGPT Is All You Need Architecture Training Procedure Capabilities, Limitations Applications NAIK
16 pages
Week 9 - Neural Networks
No ratings yet
Week 9 - Neural Networks
27 pages
LSTM & Gru
No ratings yet
LSTM & Gru
17 pages
Triangular Distribution in Simulation
No ratings yet
Triangular Distribution in Simulation
8 pages
The Science of Deep Learning Iddo Drori PDF Download
100% (1)
The Science of Deep Learning Iddo Drori PDF Download
73 pages
Lloseng CH 05 E2
No ratings yet
Lloseng CH 05 E2
66 pages
Sample
No ratings yet
Sample
2 pages
Neural Networks and Fuzzy Systems: Neurolab
No ratings yet
Neural Networks and Fuzzy Systems: Neurolab
17 pages
Tutorial 3 Discussion Document - Batch 03
No ratings yet
Tutorial 3 Discussion Document - Batch 03
5 pages
Stochastic Volatility in Macroeconomics
No ratings yet
Stochastic Volatility in Macroeconomics
20 pages
P02 Sarima
No ratings yet
P02 Sarima
26 pages
Object-Oriented Analysis and Design Overview
No ratings yet
Object-Oriented Analysis and Design Overview
3 pages
Understanding Random Variables and Distributions
No ratings yet
Understanding Random Variables and Distributions
227 pages
RNN Basics & Implementation Guide
No ratings yet
RNN Basics & Implementation Guide
66 pages
13 - Question Bank - DL (23CAI109) - SEC B
No ratings yet
13 - Question Bank - DL (23CAI109) - SEC B
6 pages
Clothing Sales Forecast Based On ARIMA-BP
No ratings yet
Clothing Sales Forecast Based On ARIMA-BP
6 pages
Video Classification Techniques Overview
No ratings yet
Video Classification Techniques Overview
52 pages
Regular Expression, DFA and NFA: Prepared By: Prof. J. S. Dhobi Prof. M. D. Mehta
No ratings yet
Regular Expression, DFA and NFA: Prepared By: Prof. J. S. Dhobi Prof. M. D. Mehta
82 pages
OOAD & UML Interview Questions
No ratings yet
OOAD & UML Interview Questions
2 pages
Wazir Financial Data Analysis
No ratings yet
Wazir Financial Data Analysis
51 pages
ML Exam Prep
No ratings yet
ML Exam Prep
14 pages
Module 8
No ratings yet
Module 8
3 pages
Deep Learning
No ratings yet
Deep Learning
38 pages
State Space Control Systems Notes
No ratings yet
State Space Control Systems Notes
13 pages

Deep Learning for Engineering Students

Uploaded by

Deep Learning for Engineering Students

Uploaded by

Lecture 2

Introduction to Deep Learning

The Perceptron: Forward Propagation

HCM City Univ. of Technology, Faculty of Mechanical Engineering 2 Duong Van Tu

1 The Perceptron: Forward Propagation

HCM City Univ. of Technology, Faculty of Mechanical Engineering 3 Duong Van Tu

1 The Perceptron: Forward Propagation

Input Weights Sum Non-linearity Output where

HCM City Univ. of Technology, Faculty of Mechanical Engineering 4 Duong Van Tu

1 The Perceptron: Forward Propagation

For example: Sigmoid function

Input Weights Sum Non-linearity Output

Common Activation Function

Sigmoid Hyperbolic Tangent Rectified Linear Unit (ReLU)

tf.math.sigmoid(z) tf.math.tanh(z) tf.nn.relu(z)

HCM City Univ. of Technology, Faculty of Mechanical Engineering 6 Duong Van Tu

Common Activation Function

Sigmoid Hyperbolic Tangent Rectified Linear Unit (ReLU)

import numpy as np import numpy as np import numpy as np

HCM City Univ. of Technology, Faculty of Mechanical Engineering 7 Duong Van Tu

Importance of Activation Function

Linear activation functions Non-linear produce allow us to

HCM City Univ. of Technology, Faculty of Mechanical Engineering 8 Duong Van Tu

The Perceptron: Example

HCM City Univ. of Technology, Faculty of Mechanical Engineering 9 Duong Van Tu

The Perceptron: Example

𝑦ො = 𝑔(1 + 3𝑥1 − 2𝑥2 )

HCM City Univ. of Technology, Faculty of Mechanical Engineering 10 Duong Van Tu

1 The Perceptron: Example

It assumes that we have the input

The Perceptron: Simplified

Input Weights Sum Non-linearity Output

The Perceptron: Simplified

HCM City Univ. of Technology, Faculty of Mechanical Engineering 13 Duong Van Tu

Multi Output Perceptron

HCM City Univ. of Technology, Faculty of Mechanical Engineering 14 Duong Van Tu

Single Layer Neural Network: Forward Propagation

Inputs Hidden Outputs

Single Layer Neural Network: Example

HCM City Univ. of Technology, Faculty of Mechanical Engineering 16 Duong Van Tu

Single Layer Neural Network: Example

HCM City Univ. of Technology, Faculty of Mechanical Engineering 17 Duong Van Tu

Single Layer Neural Network: Example

HCM City Univ. of Technology, Faculty of Mechanical Engineering 18 Duong Van Tu

Single Layer Neural Network: Example

import matplotlib.pyplot as plt

HCM City Univ. of Technology, Faculty of Mechanical Engineering 19 Duong Van Tu

Single Layer Neural Network: Example

HCM City Univ. of Technology, Faculty of Mechanical Engineering 20 Duong Van Tu

Single Layer Neural Network: Example

Single Layer Neural Network: Example

Single Layer Neural Network: Example

2 𝑥2 0.92 w11 = np.array([-4,-1])

HCM City Univ. of Technology, Faculty of Mechanical Engineering 23 Duong Van Tu

Single Layer Neural Network: Example

Single Layer Neural Network: Example

Single Layer Neural Network: Example

Inputs Hidden Outputs

Deep Neural Network

HCM City Univ. of Technology, Faculty of Mechanical Engineering 28 Duong Van Tu

Deep Neural Network

(𝑙−1) (𝑙−1) (𝑙) (𝑙)

𝒛(𝑙−1) 𝒂(𝑙−1) 𝒛(𝑙) 𝒂(𝑙)

Deep Neural Network

HCM City Univ. of Technology, Faculty of Mechanical Engineering 31 Duong Van Tu

HCM City Univ. of Technology, Faculty of Mechanical Engineering 33 Duong Van Tu

HCM City Univ. of Technology, Faculty of Mechanical Engineering 34 Duong Van Tu

Test sigmoid function

HCM City Univ. of Technology, Faculty of Mechanical Engineering 35 Duong Van Tu

Implementation of Forward Propagation

array([[0.23570094], Are your results same to these?

HCM City Univ. of Technology, Faculty of Mechanical Engineering 36 Duong Van Tu

Implementation of Forward Propagation using TensorFlow

# Perform forward propagation

HCM City Univ. of Technology, Faculty of Mechanical Engineering 37 Duong Van Tu

HCM City Univ. of Technology, Faculty of Mechanical Engineering 38 Duong Van Tu