Activation Functions
Activation Functions
• Activation functions are a crucial component of
artificial neural networks, used to introduce non-
linearity into the model.
• They determine whether a neuron should be activated
(fire) or not, based on the weighted sum of inputs.
• There are several types of activation functions, each
with its own characteristics and use cases.
• To put in simple terms, an artificial neuron calculates
the ‘weighted sum’ of its inputs and adds a bias, as
shown in the following figure the net input.
Activation Functions
Activation Functions
• Now the value of net input can be any anything from -inf to
+inf.
• The neuron doesn’t really know how to bound to value and
thus is not able to decide the firing pattern.
• Thus the activation function is an important part of an artificial
neural network.
• They basically decide whether a neuron should be activated or
not.
• Thus it bounds the value of the net input.
The activation function is a non-linear transformation that we
do over the input before sending it to the next layer of neurons
or finalizing it as output.
Types of Activation Functions
• Types of Activation Functions –
– Several different types of activation functions are
used in Machine Learning.
– Some of them are explained next.
Step Function
• Step Function is one of the simplest kind of
activation functions.
• In this, we consider a threshold value and if
the value of net input say y is greater than the
threshold then the neuron is activated.
Mathematically,
Given below is the graphical representation of
step function.
Sigmoid Function
• Sigmoid function is a widely used activation
function. It is defined as:
Sigmoid Function
• This is a smooth function and is continuously
differentiable.
• The biggest advantage that it has over step and linear
function is that it is non-linear.
• This is an incredibly cool feature of the sigmoid
function.
• This essentially means that when I have multiple
neurons having sigmoid function as their activation
function – the output is non linear as well.
• The function ranges from 0-1 having an S shape.
ReLU
• The ReLU function is the Rectified linear unit.
• It is the most widely used activation function.
It is defined as:
ReLU
• Graphically,
ReLU
• The main advantage of using the ReLU
function over other activation functions is that
it does not activate all the neurons at the
same time.
• It means if the input is negative it will convert
it to zero and the neuron does not get
activated.
Leaky ReLU
• Leaky ReLU function is nothing but an
improved version of the ReLU function.
• Instead of defining the Relu function as 0 for x
less than 0, we define it as a small linear
component of x. It can be defined as:
Leaky ReLU
• Graphically,
Parametric Rectified Linear Unit (PReLU)
• Similar to Leaky ReLU, but a is learned during
training rather than being a fixed constant.
• Provides flexibility to adapt the slope of the
negative part of the activation function.
Exponential Linear Unit (ELU)
• Formula:
f(x) = x for x >= 0,
f(x) = a * (e^x - 1) for x < 0 (where a is a positive
constant)
• Output range: (-a, ∞)
• Similar to Leaky ReLU but differentiable and
has a non-zero mean.
• It addresses the vanishing gradient problem.
Scaled Exponential Linear Unit (SELU)
• Formula:
– Similar to ELU but with a particular choice for a
and scaling of weights.
• Designed to have self-normalizing properties
and improve training stability in deep
networks.
Hyperbolic Tangent (Tanh)
• Formula: f(x) = (e^(2x) - 1) / (e^(2x) + 1)
• Output range: (-1, 1)
• Similar to the sigmoid but centered at 0.
• It's zero-centered and helps mitigate the
vanishing gradient problem.
• Used in hidden layers of neural networks,
especially when outputs are normalized.
Swish
• Formula: f(x) = x / (1 + e^(-x))
• Designed to be smoother than ReLU and has
shown promise in some applications.
Implementation
• Step Function
import numpy as np
import [Link] as plt
def step_function(x):
return [Link](x >= 0, 1, 0)
x = [Link](-5, 5, 100)
y = step_function(x)
[Link](x, y)
[Link]("Step Function")
[Link]()
[Link]()
Sigmoid Gunction
import numpy as np
import [Link] as plt
def sigmoid(x):
return 1 / (1 + [Link](-x))
x = [Link](-5, 5, 100)
y = sigmoid(x)
[Link](x, y)
[Link]("Sigmoid Function")
[Link]()
[Link]()
Hyperbolic Tangent (Tanh)
import numpy as np
import [Link] as plt
def tanh(x):
return [Link](x)
x = [Link](-5, 5, 100)
y = tanh(x)
[Link](x, y)
[Link]("Hyperbolic Tangent (Tanh)")
[Link]()
[Link]()
Rectified Linear Unit (ReLU)
import numpy as np
import [Link] as plt
def relu(x):
return [Link](0, x)
x = [Link](-5, 5, 100)
y = relu(x)
[Link](x, y)
[Link]("Rectified Linear Unit (ReLU)")
[Link]()
[Link]()
Leaky Rectified Linear Unit (Leaky ReLU)
import numpy as np
import [Link] as plt
def leaky_relu(x, alpha=0.01):
return [Link](x >= 0, x, alpha * x)
x = [Link](-5, 5, 100)
y = leaky_relu(x)
[Link](x, y)
[Link]("Leaky Rectified Linear Unit (Leaky ReLU)")
[Link]()
[Link]()
Parametric Rectified Linear Unit (PReLU)
import numpy as np
import [Link] as plt
def prelu(x, a=0.01):
return [Link](x >= 0, x, a * x)
x = [Link](-5, 5, 100)
y = prelu(x)
[Link](x, y)
[Link]("Parametric Rectified Linear Unit (PReLU)")
[Link]()
[Link]()
Exponential Linear Unit (ELU)
import numpy as np
import [Link] as plt
def elu(x, alpha=1.0):
return [Link](x >= 0, x, alpha * ([Link](x) - 1))
x = [Link](-5, 5, 100)
y = elu(x)
[Link](x, y)
[Link]("Exponential Linear Unit (ELU)")
[Link]()
[Link]()
Exponential Linear Unit (ELU)
• In this program, we define the ELU activation
function using the formula
elu(x, alpha) = x for x >= 0,
and elu(x, alpha) = alpha * (exp(x) - 1) for x < 0.
• You can adjust the alpha parameter to control
the slope of the negative part of the curve.
• The code then creates a range of x values,
computes the corresponding y values using
the ELU function, and plots the result.
Scaled Exponential Linear Unit (SELU)
• The Scaled Exponential Linear Unit (SELU) is a
self-normalizing activation function that can
maintain mean activations close to 0 and
standard deviations close to 1 during training.
• Here's a Python implementation of the SELU
activation function:
Scaled Exponential Linear Unit (SELU)
import numpy as np
import [Link] as plt
def selu(x, alpha=1.67326, scale=1.0507):
return scale * [Link](x > 0, x, alpha * ([Link](x) - 1))
x = [Link](-5, 5, 100)
y = selu(x)
[Link](x, y)
[Link]("Scaled Exponential Linear Unit (SELU)")
[Link]()
[Link]()
Scaled Exponential Linear Unit (SELU)
Scaled Exponential Linear Unit (SELU)
• In this implementation, we define the SELU
activation function using the formula
selu(x, alpha, scale) = scale * x for x >= 0,
and selu(x, alpha, scale) = scale * (alpha * (exp(x) - 1)) for
x < 0.
• The alpha and scale parameters are specific values
that are part of the SELU definition.
• The code then creates a range of x values,
computes the corresponding y values using the
SELU function, and plots the result.
Swish Function
• The Swish activation function is defined as:
swish(x) = x / (1 + exp(-x))
Swish Function
import numpy as np
import [Link] as plt
def swish(x):
return x / (1 + [Link](-x))
x = [Link](-5, 5, 100)
y = swish(x)
[Link](x, y)
[Link]("Swish Activation Function")
[Link]()
[Link]()
Swish Function
• In this code, we define the Swish function
using the formula provided.
• We create a range of x values, calculate the
corresponding y values using the Swish
function, and plot the curve.
• The Swish function is known for being smooth
and continuous, allowing it to be a viable
choice as an activation function in neural
networks.