0% found this document useful (0 votes)
46 views3 pages

Optimization For Machine Learning - Lecture Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views3 pages

Optimization For Machine Learning - Lecture Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Optimization for Machine Learning

Lecture Notes CS-439, Spring 2025

Bernd Gärtner, ETH


Martin Jaggi, EPFL

February 17, 2025


Contents

1 Theory of Convex Functions 2–38

2 Gradient Descent 38–61

3 Projected and Proximal Gradient Descent 61–77

4 Subgradient Descent 77–88

5 Stochastic Gradient Descent 88–96

6 Nonconvex functions 96–115

7 Newton’s Method 115–127

8 Quasi-Newton Methods 127–145

9 Coordinate Descent 145–160

10 The Frank-Wolfe Algorithm 160–180

1
Chapter 1

Theory of Convex Functions

Contents
1.1 Mathematical Background . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 The Cauchy-Schwarz inequality . . . . . . . . . . . . 4
1.1.3 The spectral norm . . . . . . . . . . . . . . . . . . . . 6
1.1.4 The mean value theorem . . . . . . . . . . . . . . . . . 7
1.1.5 The fundamental theorem of calculus . . . . . . . . . 7
1.1.6 Differentiability . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 The mean value inequality . . . . . . . . . . . . . . . 10
1.3 Convex functions . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.1 First-order characterization of convexity . . . . . . . 16
1.3.2 Second-order characterization of convexity . . . . . . 19
1.3.3 Operations that preserve convexity . . . . . . . . . . 21
1.4 Minimizing convex functions . . . . . . . . . . . . . . . . . . 21
1.4.1 Strictly convex functions . . . . . . . . . . . . . . . . . 23
1.4.2 Example: Least squares . . . . . . . . . . . . . . . . . 24
1.4.3 Constrained Minimization . . . . . . . . . . . . . . . . 25
1.5 Existence of a minimizer . . . . . . . . . . . . . . . . . . . . . 26
1.5.1 Sublevel sets and the Weierstrass Theorem . . . . . . 27
1.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.1 Handwritten digit recognition . . . . . . . . . . . . . 28
1.6.2 Master’s Admission . . . . . . . . . . . . . . . . . . . 29

You might also like