Gradient descent vanishing
& exploding
By
LOGESHWARI P
(CB.EN.P2BME23009)
GRADIENT DESCENT
Gradient descent is an optimization algorithm which is
commonly-used to train machine learning models and neural
networks.
Chain rule in back propagation
Vanishing gradient
Exploding gradient
• Softmax and sigmoid are both activation functions commonly used in machine learning for • 4. **Independence:**
different purposes. Let's compare them in terms of their characteristics and use cases:
• - **Softmax:** The probabilities sum to 1, and the output for one class is dependent on
the scores of other classes.
• 1. **Function Form:** • - **Sigmoid:** Each sigmoid output is independent of the others. It's applied element-
wise to each output node.
• - **Softmax:** It is used for multi-class classification problems. The softmax function
takes a vector of arbitrary real-valued scores and squashes them to a probability distribution
over multiple classes. The output is a vector of probabilities that sum to 1.
• 5. **Numerical Stability:**
• - **Sigmoid:** It is used for binary classification problems. The sigmoid function takes a
real-valued input and squashes it to the range [0, 1]. It's commonly used to produce the • - **Softmax:** The softmax function involves exponentiation, and in practice, it can be
probability of belonging to a particular class. sensitive to large input values, potentially leading to numerical instability issues.
• - **Sigmoid:** Generally more numerically stable compared to softmax.
• 2. **Output Range:**
• - **Softmax:** Produces a probability distribution over multiple classes, with each • 6. **Derivative:**
element in the range (0, 1). The sum of all elements in the output vector is 1.
• - **Softmax:** The derivative of the softmax function involves multiple terms, and it's
• - **Sigmoid:** Produces an output in the range (0, 1) and is suitable for binary often used in conjunction with the cross-entropy loss during backpropagation in
classification problems. It can be interpreted as the probability of belonging to the positive classification tasks.
class.
• - **Sigmoid:** The derivative of the sigmoid function has a simple and interpretable
form, making it computationally efficient during backpropagation.
• 3. **Application:**
• - **Softmax:** Typically used in the output layer of a neural network for multi-class • In summary, softmax is suitable for multi-class classification tasks, while sigmoid is
classification problems. It's especially useful when there are more than two classes. commonly used in binary classification problems. The choice between them depends on the
nature of the task and the number of classes involved.
• - **Sigmoid:** Commonly used in binary classification problems. It's also used in the
hidden layers of neural networks to model non-linear relationships in the data.