PowerPoint Lecture: Gradient Descent in Deep Learning
Slide 1: Title Slide
Gradient Descent Gradient
Subtitle: The Engine Behind Model Training
Presenter: [Ahsan Ullah]
Institution: [Your Institution]
Date: [Presentation Date]
Slide 2: Objectives
Understand what Gradient Descent is
Learn how it works in neural networks
Explore different types of Gradient Descent
Analyze performance and optimization strategies
Slide 3: Introduction to Optimization
Optimization is the process of finding the best parameters (weights) for a model that minimizes
the loss function.
Slide 4: What is Gradient Descent?
Definition: Gradient Descent is an optimization algorithm used to minimize the loss function by
iteratively moving towards the minimum value.
Goal: Adjust weights to minimize the error (loss).
Slide 5: The Gradient
The gradient is a vector of partial derivatives.
It points in the direction of the greatest rate of increase of a function.
To minimize the function, move in the opposite direction of the gradient.
Slide 6: Mathematical Formulation
Weight Update Rule:
Where:
: weights
: learning rate
: gradient of the loss function with respect to weights
Slide 7: Learning Rate ()
Controls how big a step is taken in the direction of the negative gradient
Too large: may overshoot the minimum
Too small: slow convergence or stuck in local minima
Slide 8: Visual Representation
Show a graph of a convex loss function with steps of gradient descent moving toward the
minimum
Slide 9: Types of Gradient Descent
1. Batch Gradient Descent
2. Stochastic Gradient Descent (SGD)
3. Mini-Batch Gradient Descent
Slide 10: Batch Gradient Descent
Uses the entire training dataset to compute the gradient
Pros: Stable convergence
Cons: Slow and memory-intensive for large datasets
Slide 11: Stochastic Gradient Descent (SGD)
Updates weights using one training example at a time
Pros: Fast and can escape local minima
Cons: Noisy updates, less stable convergence
Slide 12: Mini-Batch Gradient Descent
Uses small batches of data (e.g., 32, 64) for each update
Pros: Balance between speed and accuracy
Common in deep learning
Slide 13: Gradient Descent in Neural Networks
Used during backpropagation to update weights
Repeats for each epoch to gradually reduce loss
Works with different loss functions depending on task (e.g., MSE, Cross-Entropy)
Slide 14: Challenges of Gradient Descent
Local Minima and Saddle Points
Vanishing and Exploding Gradients
Learning Rate Scheduling
Slide 15: Advanced Optimization Techniques
Momentum
RMSProp
Adam Optimizer
Nesterov Accelerated Gradient
Slide 16: Momentum
Adds a fraction of the previous update to the current one
Helps accelerate convergence and dampen oscillations
Slide 17: Adam Optimizer
Combines Momentum and RMSProp
Adaptive learning rates for each parameter
Popular choice in deep learning tasks
Slide 18: Gradient Clipping
Restricts the magnitude of gradients
Helps prevent exploding gradients
Slide 19: Learning Rate Scheduling
Adjusts learning rate during training
Common strategies: Step Decay, Exponential Decay, Reduce on Plateau
Slide 20: Example Code (Keras - Adam)
from keras.optimizers import Adam
model.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy',
metrics=['accuracy'])
Slide 21: Practical Tips
Always normalize data
Monitor loss and accuracy plots
Use validation set to tune hyperparameters
Try multiple optimizers
Slide 22: Summary
Gradient Descent is central to training neural networks
Different variants offer trade-offs in speed and accuracy
Proper tuning is essential for model performance