Plots comparing momentum-based GD and Nesterov’s AGD

In this previous post, I introduced momentum-based gradient descent (GD) as well as Nesterov’s accelerated gradient descent (AGD). In this post, I show a few plots of how the paths that GD, GD with momentum and AGD can differ in the simple case of minimizing a 2-dimensional quadratic form. The code for generating this plot is available at this colab. Feel free to make a copy and play around with the parameters, see what types of paths you can get!

In this first plot, we are trying to minimize the 2-dimensional function

\begin{aligned} f(x, y) = \frac{1}{2}  \begin{pmatrix} x & y \end{pmatrix}^\top \begin{pmatrix} 1 & 0.5 \\ 0.8 & 20 \end{pmatrix}\begin{pmatrix} x \\ y \end{pmatrix} + \begin{pmatrix} 0 & 1\end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} \end{aligned}

starting at the point (x, y) = (15, 1.5). We compare vanilla GD against GD with momentum 0.2, 0.5 and 0.8. For all 4 methods, we set the step size to 0.02 and take 30 steps. In the plot below, the bigger empty circles indicate the final value of each method after 30 steps.

The plot shows that for larger values of momentum, the algorithm tends to move further in each step. Large momentum values can cause the algorithm to oscillate/move toward the minimum is a less direct way (compare the teal line to the orange and blue ones), but in the case above momentum with 0.8 actually got closest to the minimum despite its winding route.

In the second plot, we are trying to maximize the same objective function from the same initial point as the first plot, but we instead set the step size to a larger value, 0.1, and take just 10 steps. With this setting, we see vanilla GD oscillating wildly. Small amounts of momentum (0.2 or 0.5 in this case) can help to dampen the oscillations, resulting in a more direct path to the minimum.

In this final plot, we want to minimize

\begin{aligned} f(x, y) = \frac{1}{2}  \begin{pmatrix} x & y \end{pmatrix}^\top \begin{pmatrix} 1 & 0.5 \\ 2 & 25 \end{pmatrix}\begin{pmatrix} x \\ y \end{pmatrix} + \begin{pmatrix} 2 & 1\end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} \end{aligned}

starting at the point (x, y) = (30, 10). Here, we compare vanilla GD, GD with momentum 0.7, and Nesterov’s AGD with momentum 0.7. We set the step size to 0.03 for all methods and take 10 steps for each method. We see that for this particular setting, AGD oscillates less than GD with momentum, and is able to move more quickly to the minimum than vanilla GD.

These comparisons are not universal, in the sense that these algorithms can take very different paths depending on the objective function, starting point, as well as algorithm parameters like step size and momentum coefficient. It’s worth duplicating the colab and playing around with the parameters yourself to see what happens under different settings. Once again, the code for creating these plots is available here.

Python idioms for reading/writing from files/pickles

Writing to and reading from text files and pickle files is a very common operation, yet I often find myself forgetting the syntax for doing so in Python. This post contains code snippets which I can copy and use whenever I forget them.

To write to a file:

# write to file string by string
# NOTE: This will overwrite 'filename.txt' if
# it already exists!
with open('filename.txt', 'w') as f:
  f.write("text to file")
  f.write("more text to file")

# write a list of strings to file all at once
with open('filename.txt', 'w') as f:
  f.writelines(list_of_lines)

To read from a file:

# get entire file as a single string
with open('filename.txt', 'r') as f:
    contents = f.read()

# read a single line
# (Note: If we run this code again, we will get the
# first line again, not the second line.)
with open('filename.txt', 'r') as f:
    first_line = f.readline()

# read entire file with each line being an element of a list
with open('filename.txt', 'r') as f:
    contents = f.readlines()

To write to a pickle file:

# assume that `object` is the object we want to pickle
import pickle

with open('filename.pkl', 'wb') as handle:
    pickle.dump(object, handle)

To read a pickle file:

import pickle

with open('filename.pkl', 'rb') as handle:
    object = pickle.load(handle)

By using the with keyword, we avoid having to explicitly close the files.

Learn more about pickles here.

Getting matplotlib’s default colors

When you make plots with Python’s matplotlib package, the package chooses some default colors for you. In the code below, we draw the lines y = x^2 + i for i = 1, \dots, 11.

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0.0, 2.0, 0.01)

for i in range(1, 12):
    y = x**2 + i
    plt.plot(x, y)

plt.show()

Notice that the first and the 11th line have the same color. That’s because matplotlib only has 10 default colors. We can use the code snippet below to get the hex codes for these 10 colors:

print(plt.rcParams['axes.prop_cycle'].by_key()['color'])
# ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']

The code snippet below is an example of how to pick out specific default colors. For example, if we wanted all the lines to be green (the 3rd color from the bottom):

default_color = plt.rcParams['axes.prop_cycle'].by_key()['color']
for i in range(1, 12):
    y = x**2 + i
    plt.plot(x, y, color = default_color[2])

plt.show()

Update (2023-05-27): As commenter jared f points out, there are several other ways to define colors and get the default matplotlib colors. This reference describes all the different ways to do so.

References:

  1. Stack Overflow. “Get default line colour cycle.