Skip to content

ak811/gradient-descent-lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gradient Descent Lab: Methods and Empirical Behavior

This repository implements gradient-based optimization methods across three classic problem families:

  • 2D quadratic objective (smooth, strongly convex)
  • 1D nonconvex objective (multi-start behavior and local minima)
  • Least-squares linear regression (Diabetes dataset)

Implemented methods:

  • fixed-step Gradient Descent (GD)
  • theorem-based stepsize choice via smoothness constant (1/L)
  • SciPy baseline solver (BFGS)
  • Exact line search for least squares
  • Backtracking Armijo line search

Install

python -m venv .venv
# Linux/Mac
source .venv/bin/activate
# Windows (PowerShell)
.venv\Scripts\Activate.ps1

pip install -r requirements.txt

Reproduce all figures

python scripts/make_all.py

Where figures are saved

Figures are saved to figures/ via src/gdlab/config.py:

figures_DIR = PROJECT_ROOT / "figures"

The image links below assume figures live in figures/.


Main math formulas

Gradient Descent (GD)

For a differentiable objective ( f:\mathbb{R}^d\to\mathbb{R} ), GD iterates as:

$$ x_{k+1} = x_k - \alpha_k \nabla f(x_k). $$

Optimality gap (used in many plots):

$$ \mathrm{gap}_k := f(x_k) - f^\star, \qquad f^\star = \min_x f(x). $$


Objective 1: 2D quadratic

Definition

For parameter ( \beta \in \mathbb{R} ),

$$ f(x,y) = x^2 + y^2 + \beta xy + x + 2y. $$

Gradient and Hessian

$$ \nabla f(x,y)= \begin{pmatrix} 2x + \beta y + 1\\ 2y + \beta x + 2 \end{pmatrix}, \qquad \nabla^2 f = \begin{pmatrix} 2 & \beta\\ \beta & 2 \end{pmatrix}. $$

Minimizer

The minimizer satisfies ( \nabla f(x^\star)=0 ), equivalently:

$$ \nabla^2 f,x^\star + \begin{pmatrix}1\2\end{pmatrix}=0 \quad\Rightarrow\quad x^\star = -(\nabla^2 f)^{-1}\begin{pmatrix}1\2\end{pmatrix}. $$

Smoothness constant (L) and theorem step size

For this quadratic, ( \nabla f ) is (L)-Lipschitz with

$$ L = \lambda_{\max}(\nabla^2 f). $$

A standard safe fixed step size is

$$ \alpha = \frac{1}{L}. $$


Objective 2: 1D nonconvex

Definition

$$ f(x) = \left(x^2 + x(1+\sin x) + 2\right)\exp\left(-\frac{\lvert x\rvert}{3}\right). $$

This objective has multiple local minima, so local optimizers can converge to different solutions depending on initialization.


Objective 3: least squares (Diabetes dataset)

Definition

Given (A \in \mathbb{R}^{m\times n}), (y\in\mathbb{R}^m),

$$ f(x) = \frac{1}{m}\lVert Ax-y\rVert_2^2. $$

Gradient and Hessian

$$ \nabla f(x) = \frac{2}{m}A^\top(Ax-y), \qquad \nabla^2 f(x) = \frac{2}{m}A^\top A. $$

Smoothness (L) and strong convexity (\mu)

$$ L = \lambda_{\max}!\left(\frac{2}{m}A^\top A\right), \qquad \mu = \lambda_{\min}!\left(\frac{2}{m}A^\top A\right). $$

Exact line search (least squares)

Along steepest descent direction (d_k=-\nabla f(x_k)),

$$ \alpha_k = \arg\min_{\alpha\ge 0} f(x_k+\alpha d_k). $$

For least squares (quadratic), this has a closed form:

$$ \alpha_k = \frac{\lVert \nabla f(x_k)\rVert_2^2}{\nabla f(x_k)^\top \nabla^2 f ,\nabla f(x_k)}, \qquad \nabla^2 f=\frac{2}{m}A^\top A. $$

Backtracking Armijo line search

Starting from (\alpha_0), shrink (\alpha \leftarrow \beta\alpha) until:

$$ f(x_k + \alpha d_k) \le f(x_k) + c_1\alpha \nabla f(x_k)^\top d_k, \qquad d_k=-\nabla f(x_k), \qquad 0<c_1<1,; 0<\beta<1. $$


Results

Quadratic (2D)

  1. Objective surface

Quadratic surface

  1. Objective contours

Quadratic contours

  1. Contour plot with a single reference point

Single-point trajectory

  1. Gradient Descent trajectory (α = 0.5)

GD trajectory alpha=0.5

  1. Gradient Descent trajectory (α = 1.0)

GD trajectory alpha=1.0

  1. Gradient Descent trajectory (α = 1/L)

GD trajectory theorem alpha

  1. Optimality gap (α = 0.5 vs α = 1/L)

Optimality gaps b vs d

  1. SciPy solver iterates on contour plot

SciPy iterates contour

  1. Optimality gaps: SciPy vs GD

SciPy vs GD gaps

Nonconvex (1D)

  1. Multi-start histogram summary (minimizers and objective values)

Nonconvex histograms

  1. Objective function with detected local minima

Nonconvex function minima

Least Squares (Diabetes dataset)

  1. Fixed-stepsize GD: optimality gaps

Fixed-stepsize GD gaps

  1. Fixed steps vs Exact line search

Fixed vs exact

  1. Fixed steps vs Exact line search vs Backtracking

Fixed vs exact vs backtracking

  1. Stepsizes over iterations (Exact vs Backtracking)

Stepsizes exact vs backtracking


Tests

pytest -q

The test suite includes:

  • finite-difference gradient checks for the quadratic
  • descent sanity checks
  • Armijo condition verification for backtracking
  • decrease check for exact line search on least squares

Citation

If you use this repository as a reference, please cite via CITATION.cff.


License

MIT (see LICENSE).

About

Gradient Descent Lab: Methods and Empirical Behavior

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages