Softmax function
• its purpose is to convert a real valued array
into probabilities (with range 0 to 1), rather
than just introduce a nonlinearity.
• It differs from the logistic function in that it
does not operate element-wise on a vector.
Rather the softmax applies to an entire vector
The softmax function
The use of Softmax
• Softmax layer as the output layer
Ordinary Layer
z1
y1 z1
In general, the output of
z2
y2 z 2
network can be any value.
May not be easy to interpret
z3
y3 z3
Softmax
Probability:
• Softmax layer as the output layer 1 > 𝑦𝑖 > 0
𝑖 𝑦𝑖 = 1
Softmax Layer
3 0.88 3
e
20
z1 e e z1
y1 e z1 zj
j 1
1 0.12 3
z2 e e z 2 2.7
y2 e z2
e
zj
j 1
0.05 ≈0
z3 -3
3
e
z3
e y3 e z3 zj
e
3 j 1
e zj
j 1
softmax for multi-class classification
• Softmax pushes the largest component of the vector towards 1
while pushing all the other components towards zero. Also, all the
outputs sum to 1, regardless of the sum of the components of the
input vector. Thus, the output of the softmax function can be
intepreted as a probability distribution.
• A common application is to use softmax in the output layer for a
classi-fication problem. The output vector has a component
corresponding to each target class, and the softmax output is
interpreted as the probability of the input belonging to the
corresponding class.
• Excellent combination with Cross entropy loss ( will give an
assignment problem)