Activation Functions

The document discusses the importance of activation functions in neural networks, emphasizing their role in enabling the learning of complex, non-linear data. It covers various activation functions such as Sigmoid, ReLU, and Mish, detailing their characteristics, advantages, and drawbacks, particularly in relation to issues like vanishing gradients. The document also compares these functions and highlights the benefits of newer functions like Mish over traditional ones like ReLU.

Uploaded by

Cát Lăng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views23 pages

Activation Functions

Uploaded by

Cát Lăng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Activation Functions in

Neural Networks
Recall Linearly/Non-Linearly Separable Data

y= mx+c

The 2 classes can be The 2 classes can be separated

separated by a straight line by a curve or a more complex
function that a straight line
Why do we need activations?
• Our real world data in non-linear, and cannot be separated by a
straight line. We wish to learn much more complex functions to be
able to predict/classify the data we are working with.
Therefore, we need an Activation function f(x) to make our neural network more
powerful and enable it to learn complex and complicated data and represent non-
linear complex arbitrary functional mappings between inputs and outputs, in
addition to multiple MLPs. An Activation function is a non-linear function which takes
a linear scalar 𝑧1 as its input and maps it to another numerical value 𝑦1

Activation Function

Linear Non-Linear
x1 𝑧1 𝑦1
෍(𝑠𝑢𝑚) f(x)

x2
𝑧2 𝑦2
෍(𝑠𝑢𝑚)
f(x)

x3
Sigmoid/Logistic Activation

Any input will be scaled to a value

between 0-1
Ex:
x = 2  f(x) = 1 / (1 + 𝑒 −2 ) = 0.88080
x = -1  f(x) = 1 / (1 + 𝑒 1 ) = 0.26894
Sigmoid Function Derivative
The problem in Sigmoid
The blue is Sigmoid. The orange graph is the derivative of Sigmoid. This is the cause of vanishing gradients in feedforward
networks. The f’ terms are all outputting values << 1. When we multiply lots of numbers << 1 together, we end up with out
gradient being killed.

Let’s even take the best case. Realize that the maximum of the sigmoid derivative is around 0.2. With many layers there will still
be a vanishing gradient problem. With only 4 layers of 0.2 valued derivatives we have a product of 0.24 . BUT, Practically we
have very deep architectures! Most of the layers would die!

0.23 * 0.23 * 0.23 * 0.23 = 0.00279

0.1 * 0.2 * 0.15 * 0.18 = 0.00054

Gradients Vanish Easily

1 − 𝑒 −2𝑥
𝑓 𝑥 =
1 + 𝑒 −2𝑥
Examples:
f(0) = 0
f(1) = 0.761
f(-0.5) = -0.462
f(1.2) = 0.833
Derivative of Tanh

𝑑
tanh 𝑥 = 1 − tanh(𝑥)2
𝑑𝑥
If the input is negative  Output is Zero
If the input is positive  Stays Positive

Ex: 66
00
-3  0
22
Derivative of ReLU

One small problem:

When the input is negative, the output is 0 and the gradient
will die.

Example: When initializing from the normal distribution

N(0,1), half of the values are negative. Activating with ReLU
means setting half of the values to 0.
If α = 0.01  Leaky ReLU
y

ReLU PReLU x
α is a learnable parameter

PReLU solves the dying ReLU problem (ReLU suffers from dead neurons of output 0),
where gradients become zero.

PReLU derivative:
Exponentially Linear Units (ELUs)

For negative values, we

have a exponential curve
rather than a flat line

ELU derivative:
Comparison
GLU (Gated Linear Units)
Paper: Language Modeling with Gated Convolutional Networks

To activate a layer of dimension d, we make it output double its dimension (2xd). Then we split it into 2 halves. The first half acts
as the original layer output, and the second half acts as a gating layer. The gating layer is followed by a sigmoid activation
function which squashes each value to a range of 0-1. The gate values are then element-wise multiplied with the first half.

(𝑋𝑊1 + 𝑏1 )⊗𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑋𝑊2 + 𝑏2 )
Example: d=2
dx2
d d
⊗
to activate
this layer : Activated
.
.
. .
.
.
.
.
.
𝜎 .
.
Sigmoid .
Swish
Multiplication of the input x with the sigmoid function for x
Swish is a smooth function. That means that it does not abruptly change direction like ReLU does near x = 0.
Rather, it smoothly bends from 0 towards values < 0 and then upwards again.
This observation means that it’s also non-monotonic. It thus does not remain stable or move in one direction,
such as ReLU and the other two activation functions.
Why is it better than ReLU?
• Sparsity: Very negative weights are simply zeroed out.
• For very large values, the outputs do not saturate to the maximum value
• Small negative values are zeroed out in ReLU. However, those negative
values may still be relevant for capturing patterns underlying the data,
which are still preserved in Swish.
Softplus

1
𝑓 𝑥 = log(1 + 𝑒 𝛽𝑥 )
𝛽 SoftPlus compared to ReLU:

If 𝛽 = 1:

𝑓 𝑥 = log(1 + 𝑒 𝑥 )

Softplus Derivative:

The sigmoid function!

1
𝑓′ 𝑥 =
1 + 𝑒 −𝑥
Mish Activation
a modified Gated form of Softplus Activation Function

x*([Link]([Link](x)))

• Unbounded above  no saturation

• Small negative values are not zero, which allows better
gradient flow vs a hard zero bound as in ReLU.
• Continuous, unlike ReLU which is which is
discontinuous at zero, which helps in effective
optimization and generalization
Mish Compared to other activation functions
Results of Mish
Mish activation function:

2.15% increase from ReLU!

003 Activation Functions in Machine Learning
No ratings yet
003 Activation Functions in Machine Learning
19 pages
Activation
No ratings yet
Activation
7 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Lect 5 - Non Linear Activation Functions
No ratings yet
Lect 5 - Non Linear Activation Functions
41 pages
Deep Learning: International Islamic University of Chittagong
No ratings yet
Deep Learning: International Islamic University of Chittagong
31 pages
Neural Network Activation Guide
No ratings yet
Neural Network Activation Guide
43 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
Neural Networks: A Deep Dive
No ratings yet
Neural Networks: A Deep Dive
34 pages
Feedforward Neural Network Overview
No ratings yet
Feedforward Neural Network Overview
35 pages
Machine Learning (CSO851) - Lecture 08
No ratings yet
Machine Learning (CSO851) - Lecture 08
27 pages
Activation Function
No ratings yet
Activation Function
36 pages
Arjun Yadav 32, Activation Function Assignment
No ratings yet
Arjun Yadav 32, Activation Function Assignment
7 pages
Deep Learning Activation Functions
No ratings yet
Deep Learning Activation Functions
10 pages
UNIT-III Activation-Function
No ratings yet
UNIT-III Activation-Function
6 pages
Lecture 9-NN - Modified
No ratings yet
Lecture 9-NN - Modified
94 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
34 pages
Activation Function
No ratings yet
Activation Function
34 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
33 pages
Activation Functions
No ratings yet
Activation Functions
9 pages
Activation Functions - Ipynb - Colaboratory
No ratings yet
Activation Functions - Ipynb - Colaboratory
10 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
7 pages
Deeplearning Shreiyans
No ratings yet
Deeplearning Shreiyans
18 pages
Activation Functions
No ratings yet
Activation Functions
4 pages
Activation Function
No ratings yet
Activation Function
44 pages
ML PPT Activation Functions
100% (1)
ML PPT Activation Functions
12 pages
Activation Function
No ratings yet
Activation Function
31 pages
Unit 2
No ratings yet
Unit 2
35 pages
Forward and Backward Propagation Deep Learning 1703697260
No ratings yet
Forward and Backward Propagation Deep Learning 1703697260
9 pages
Activation Function
No ratings yet
Activation Function
10 pages
Understanding Perceptron in Machine Learning
No ratings yet
Understanding Perceptron in Machine Learning
11 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
7 pages
Dl-Module 2
No ratings yet
Dl-Module 2
138 pages
Derivatives of Activation Functions
No ratings yet
Derivatives of Activation Functions
89 pages
Act Fun
No ratings yet
Act Fun
7 pages
Activation Functions and Loss
No ratings yet
Activation Functions and Loss
17 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Neural Network Activation Guide
No ratings yet
Neural Network Activation Guide
11 pages
Choosing Neural Network Activation Functions
No ratings yet
Choosing Neural Network Activation Functions
36 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
Activation Functions for Multi-Class Output
No ratings yet
Activation Functions for Multi-Class Output
15 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
12 pages
Activation Function
No ratings yet
Activation Function
6 pages
Lec08-1Activation Functions
No ratings yet
Lec08-1Activation Functions
19 pages
Lecture - 05 (Introduction To ANN)
No ratings yet
Lecture - 05 (Introduction To ANN)
27 pages
CNN Activation Functions Explained
No ratings yet
CNN Activation Functions Explained
5 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
Artificial Neural Networks (ANN)
No ratings yet
Artificial Neural Networks (ANN)
67 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
Neural Network Activation Guide
No ratings yet
Neural Network Activation Guide
14 pages
Deep Learning
No ratings yet
Deep Learning
10 pages
4 4 Choosing The Right Activation Function For Neural Networks
No ratings yet
4 4 Choosing The Right Activation Function For Neural Networks
25 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
Activation Function: Deep Neural Networks
No ratings yet
Activation Function: Deep Neural Networks
47 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Neural Network Activation Functions
No ratings yet
Neural Network Activation Functions
15 pages
Activation Functions
No ratings yet
Activation Functions
8 pages
4-Neural Networks and Activation Function
No ratings yet
4-Neural Networks and Activation Function
28 pages
Performance Analysis of Various Activation Functio
No ratings yet
Performance Analysis of Various Activation Functio
7 pages
Deep Learning Activation Functions Guide
No ratings yet
Deep Learning Activation Functions Guide
12 pages
RNNs and LSTMs
No ratings yet
RNNs and LSTMs
41 pages
Regularization and Normalization
No ratings yet
Regularization and Normalization
29 pages
Softmax
No ratings yet
Softmax
5 pages
KL Divergence
No ratings yet
KL Divergence
8 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
98 pages
Backpropogation Algorithm
No ratings yet
Backpropogation Algorithm
48 pages
Very Highspeed BJT Buffer For Trackandhold Amplifiers With Enhan
No ratings yet
Very Highspeed BJT Buffer For Trackandhold Amplifiers With Enhan
4 pages
CMOS Delay Optimization Guide
No ratings yet
CMOS Delay Optimization Guide
12 pages
Deep Learning EECS 6327
No ratings yet
Deep Learning EECS 6327
43 pages
Lecture 11 - Introduction To Artificial Neural Networks (ANN)
No ratings yet
Lecture 11 - Introduction To Artificial Neural Networks (ANN)
35 pages
Deep Learning for Spring Semester
No ratings yet
Deep Learning for Spring Semester
3 pages
Nirmal Activation Function - 250729 - 192641
No ratings yet
Nirmal Activation Function - 250729 - 192641
4 pages
Answers
No ratings yet
Answers
23 pages
Introduction to the MNIST Database
No ratings yet
Introduction to the MNIST Database
2 pages
Week 0 - Introduction To SDGAI
No ratings yet
Week 0 - Introduction To SDGAI
8 pages
End-to-End ML Project - Feature Engieering
No ratings yet
End-to-End ML Project - Feature Engieering
23 pages
Coursera JP6TPB8VTFAU - Introduction To AI
No ratings yet
Coursera JP6TPB8VTFAU - Introduction To AI
1 page
Deep Learning Course Overview
No ratings yet
Deep Learning Course Overview
6 pages
Questions Bank On DL
No ratings yet
Questions Bank On DL
7 pages
GIKI BootCamp
No ratings yet
GIKI BootCamp
4 pages
Chapter 18 SB Answers
No ratings yet
Chapter 18 SB Answers
9 pages
Convolutional Neural Network (CNN) Architectures - GeeksforGeeks
No ratings yet
Convolutional Neural Network (CNN) Architectures - GeeksforGeeks
17 pages
Lung Cancer Detection Using CNN
No ratings yet
Lung Cancer Detection Using CNN
15 pages
History of Artificial Intelligence
No ratings yet
History of Artificial Intelligence
10 pages
AI Book
No ratings yet
AI Book
5 pages
Diffdefense: Defending Against Adversarial Attacks Via Diffusion Models
No ratings yet
Diffdefense: Defending Against Adversarial Attacks Via Diffusion Models
12 pages
Vlocnet: Nguyen Anh Minh - IVSR - 2021
No ratings yet
Vlocnet: Nguyen Anh Minh - IVSR - 2021
14 pages
Foundations of Artificial Intelligence
No ratings yet
Foundations of Artificial Intelligence
60 pages
Tanh Activation Function in Python
No ratings yet
Tanh Activation Function in Python
9 pages
Video Vision Transformer Models
No ratings yet
Video Vision Transformer Models
14 pages
FAI IMP Questions (Unit Wise)
No ratings yet
FAI IMP Questions (Unit Wise)
6 pages
Machine Learning for Predictive Maintenance
No ratings yet
Machine Learning for Predictive Maintenance
7 pages
New Microsoft Office Word Document
No ratings yet
New Microsoft Office Word Document
6 pages
Certificate: This Is To Certify That The Project Work (Phase 2) Entitled Is The Bonafide Work Done
No ratings yet
Certificate: This Is To Certify That The Project Work (Phase 2) Entitled Is The Bonafide Work Done
5 pages
Twitter Hate Speech Detection Advances
No ratings yet
Twitter Hate Speech Detection Advances
2 pages
07 Ais302 CNN
No ratings yet
07 Ais302 CNN
56 pages
MLP Numerical
No ratings yet
MLP Numerical
19 pages
3 Steps To Forecast Time Series - LSTM With TensorFlow Keras - Towards Data Science
No ratings yet
3 Steps To Forecast Time Series - LSTM With TensorFlow Keras - Towards Data Science
16 pages