Deep Learning
Vazgen Mikayelyan
October 20, 2020
V. Mikayelyan Deep Learning October 20, 2020 1 / 15
Activation functions
1
1. Sigmoid: σ (x) =
1 + e −x
V. Mikayelyan Deep Learning October 20, 2020 2 / 15
Activation functions
1
1. Sigmoid: σ (x) =
1 + e −x
V. Mikayelyan Deep Learning October 20, 2020 2 / 15
Activation functions
e x − e −x
2. Tanh: tanh (x) =
e x + e −x
V. Mikayelyan Deep Learning October 20, 2020 3 / 15
Activation functions
e x − e −x
2. Tanh: tanh (x) =
e x + e −x
V. Mikayelyan Deep Learning October 20, 2020 3 / 15
Activation functions
3. Rectified linear unit: ReLU (x) = max (0, x)
V. Mikayelyan Deep Learning October 20, 2020 4 / 15
Activation functions
3. Rectified linear unit: ReLU (x) = max (0, x)
V. Mikayelyan Deep Learning October 20, 2020 4 / 15
Activation functions
(
0.01x, for x < 0
4. Leaky ReLU: LR (x) =
x, for x ≥ 0
V. Mikayelyan Deep Learning October 20, 2020 5 / 15
Activation functions
(
0.01x, for x < 0
4. Leaky ReLU: LR (x) =
x, for x ≥ 0
V. Mikayelyan Deep Learning October 20, 2020 5 / 15
Activation functions
(
ax, for x < 0
5. Parametric ReLU: PR (x) =
x, for x ≥ 0
V. Mikayelyan Deep Learning October 20, 2020 6 / 15
Activation functions
(
ax, for x < 0
5. Parametric ReLU: PR (x) =
x, for x ≥ 0
V. Mikayelyan Deep Learning October 20, 2020 6 / 15
Activation functions
(
a (e x − 1) , for x < 0
6. Exponential linear unit: ELU (x) =
x, for x ≥ 0
V. Mikayelyan Deep Learning October 20, 2020 7 / 15
Activation functions
(
a (e x − 1) , for x < 0
6. Exponential linear unit: ELU (x) =
x, for x ≥ 0
V. Mikayelyan Deep Learning October 20, 2020 7 / 15
Activation functions
7. SoftPlus: SP (x) = log (1 + e x )
V. Mikayelyan Deep Learning October 20, 2020 8 / 15
Activation functions
7. SoftPlus: SP (x) = log (1 + e x )
V. Mikayelyan Deep Learning October 20, 2020 8 / 15
Activation functions
e x1 e x2 e xn
8. Softmax: S (x1 , x2 , . . . , xn ) =
Pn , n , . . . , n
x x x
P P
e i e i e i
i=1 i=1 i=1
V. Mikayelyan Deep Learning October 20, 2020 9 / 15
Questions
1 Why do we need activation functions?
V. Mikayelyan Deep Learning October 20, 2020 10 / 15
Questions
1 Why do we need activation functions?
2 How should we define activation functions, for a layer or for a neuron?
V. Mikayelyan Deep Learning October 20, 2020 10 / 15
Outline
1 Gradient Descent
2 Linear and Logistic Regressions
V. Mikayelyan Deep Learning October 20, 2020 11 / 15
Gradient Descent
Let f : Rk → R be a convex function and we want to find its global
minimum.
V. Mikayelyan Deep Learning October 20, 2020 12 / 15
Gradient Descent
Let f : Rk → R be a convex function and we want to find its global
minimum. This optimization algorithm is based on the fact that the fastest
decreasing direction of the function is the opposite direction of gradient:
xn+1 = xn − α∇f (xn )
and x0 ∈ Rk is a arbitrary point.
V. Mikayelyan Deep Learning October 20, 2020 12 / 15
Outline
1 Gradient Descent
2 Linear and Logistic Regressions
V. Mikayelyan Deep Learning October 20, 2020 13 / 15
Linear Regression
Let (xi , yi )ni=1 , xi ∈ Rk , yi ∈ R be our training data.
V. Mikayelyan Deep Learning October 20, 2020 14 / 15
Linear Regression
Let (xi , yi )ni=1 , xi ∈ Rk , yi ∈ R be our training data. Consider the function
f (x) = f x 1 , x 2 , . . . , x k = w 1 x 1 + w 2 x 2 + . . . + w k x k + b = w T x + b.
V. Mikayelyan Deep Learning October 20, 2020 14 / 15
Linear Regression
Let (xi , yi )ni=1 , xi ∈ Rk , yi ∈ R be our training data. Consider the function
f (x) = f x 1 , x 2 , . . . , x k = w 1 x 1 + w 2 x 2 + . . . + w k x k + b = w T x + b.
Our aim is to find parameters b, w 1 , w 2 , . . . , w k such that
f (xi ) ≈ yi , i = 1, . . . , n.
V. Mikayelyan Deep Learning October 20, 2020 14 / 15
Linear Regression
Let (xi , yi )ni=1 , xi ∈ Rk , yi ∈ R be our training data. Consider the function
f (x) = f x 1 , x 2 , . . . , x k = w 1 x 1 + w 2 x 2 + . . . + w k x k + b = w T x + b.
Our aim is to find parameters b, w 1 , w 2 , . . . , w k such that
f (xi ) ≈ yi , i = 1, . . . , n.
We choose L2 distance as our loss function:
n
1X
(f (xl ) − yl )2 .
n
l=1
V. Mikayelyan Deep Learning October 20, 2020 14 / 15
Questions
1 Should we minimize the loss function using gradient descent?
V. Mikayelyan Deep Learning October 20, 2020 15 / 15
Questions
1 Should we minimize the loss function using gradient descent?
2 Can you represent this model as a neural network?
V. Mikayelyan Deep Learning October 20, 2020 15 / 15