4/29/23, 10:16 PM Copy of NeuralNetworks.
ipynb - Colaboratory
Imports for Python libraries
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import numpy as np
import torch
import torchvision
import matplotlib.pyplot as plt
from time import time
from torchvision import datasets, transforms
from torch import nn
from torch import optim
Set up the mini-batch size
Batch Size
#@title Batch Size mini_batch_size: 64
mini_batch_size = 64 #@param {type: "integer"}
Download the dataset, pre-process, and divide into mini-batches
### Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)),])
### Download and load the training data
trainset = datasets.MNIST('MNIST_data/', download=True, train=True, transform=transform)
valset = datasets.MNIST('MNIST_data/', download=True, train=False, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=mini_batch_size, shuffle=True)
valloader = torch.utils.data.DataLoader(valset, batch_size=mini_batch_size, shuffle=True)
dataiter = iter(trainloader)
images, labels = next(dataiter)
print(type(images))
print(images.shape)
print(labels.shape)
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to MNIST_data/MNIST/raw/train-images-idx3-ubyte.gz
100%|██████████| 9912422/9912422 [00:00<00:00, 105981542.43it/s]
Extracting MNIST_data/MNIST/raw/train-images-idx3-ubyte.gz to MNIST_data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to MNIST_data/MNIST/raw/train-labels-idx1-ubyte.gz
100%|██████████| 28881/28881 [00:00<00:00, 45488431.78it/s]
Extracting MNIST_data/MNIST/raw/train-labels-idx1-ubyte.gz to MNIST_data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to MNIST_data/MNIST/raw/t10k-images-idx3-ubyte.gz
100%|██████████| 1648877/1648877 [00:00<00:00, 27091076.95it/s]
Extracting MNIST_data/MNIST/raw/t10k-images-idx3-ubyte.gz to MNIST_data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to MNIST_data/MNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████| 4542/4542 [00:00<00:00, 19761959.30it/s]
Extracting MNIST_data/MNIST/raw/t10k-labels-idx1-ubyte.gz to MNIST_data/MNIST/raw
<class 'torch.Tensor'>
torch.Size([64, 1, 28, 28])
torch.Size([64])
Explore the processed data
plt.imshow(images[0].numpy().squeeze(), cmap='gray_r'); # Change the index of images[] to get different numbers
https://colab.research.google.com/drive/1Dbvec9pbkzkdoVYycqsqKAWDzxHhs-mV#printMode=true 1/5
4/29/23, 10:16 PM Copy of NeuralNetworks.ipynb - Colaboratory
figure = plt.figure()
num_of_images = 60
for index in range(1, num_of_images + 1):
plt.subplot(6, 10, index)
plt.axis('off')
plt.imshow(images[index].numpy().squeeze(), cmap='gray_r')
Set up the neural network
https://colab.research.google.com/drive/1Dbvec9pbkzkdoVYycqsqKAWDzxHhs-mV#printMode=true 2/5
4/29/23, 10:16 PM Copy of NeuralNetworks.ipynb - Colaboratory
# Please change the runtime to GPU if you'd like to have some speed-up on Colab
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
### Layer details for the neural network
input_size = 784
hidden_sizes = [128, 64]
output_size = 10
### Build a feed-forward network
model = nn.Sequential(
nn.Linear(input_size, hidden_sizes[0]), # Fully Connected Layer
nn.ReLU(), # Activation
nn.Linear(hidden_sizes[0], hidden_sizes[1]), # Fully Connected Layer
nn.ReLU(), # Activation
nn.Linear(hidden_sizes[1], output_size), # Fully Connected Layer
nn.LogSoftmax(dim=1) # (Log) Softmax Layer: Output a probability distribution and apply log
)
print(model)
model.to(device)
Sequential(
(0): Linear(in_features=784, out_features=128, bias=True)
(1): ReLU()
(2): Linear(in_features=128, out_features=64, bias=True)
(3): ReLU()
(4): Linear(in_features=64, out_features=10, bias=True)
(5): LogSoftmax(dim=1)
)
Sequential(
(0): Linear(in_features=784, out_features=128, bias=True)
(1): ReLU()
(2): Linear(in_features=128, out_features=64, bias=True)
(3): ReLU()
(4): Linear(in_features=64, out_features=10, bias=True)
(5): LogSoftmax(dim=1)
)
Set up the optimization model
Optimizer
#@title Optimizer lr: 0.003
lr = 0.003 #@param {type: "number"}
optimizer = optim.Adam(model.parameters(), lr=lr) # Feel free to try out other optimizers as you see fit!
Set up the loss function to optimize over
time0 = time()
epochs = 15
criterion = nn.NLLLoss() # Negative log likelihood loss function is used
images, labels = next(iter(trainloader))
images = images.view(images.shape[0], -1).to(device)
logps = model(images) # Model spits out the log probability of image belonging to different classes
loss = criterion(logps, labels.to(device))
Train the neural network
for e in range(epochs):
running_loss = 0
for images, labels in trainloader:
# Flatten MNIST images into a 784 long vector
images = images.view(images.shape[0], -1).to(device)
labels = labels.to(device)
# Training pass
optimizer.zero_grad()
https://colab.research.google.com/drive/1Dbvec9pbkzkdoVYycqsqKAWDzxHhs-mV#printMode=true 3/5
4/29/23, 10:16 PM Copy of NeuralNetworks.ipynb - Colaboratory
output = model(images).to(device)
loss = criterion(output, labels)
# backpropagation: calculate the gradient of the loss function w.r.t model parameters
loss.backward()
# And optimizes its weights here
optimizer.step()
running_loss += loss.item()
else:
print("Epoch {} - Training loss: {}".format(e, running_loss/len(trainloader)))
print("\nTraining Time (in minutes) =", (time()-time0)/60)
Epoch 0 - Training loss: 0.33197643931534115
Epoch 1 - Training loss: 0.17194996493012665
Epoch 2 - Training loss: 0.13555304701628684
Epoch 3 - Training loss: 0.12093326378204643
Epoch 4 - Training loss: 0.11512036151975008
Epoch 5 - Training loss: 0.10048197690712245
Epoch 6 - Training loss: 0.09697565546870898
Epoch 7 - Training loss: 0.09047750682820246
Epoch 8 - Training loss: 0.08670042812633616
Epoch 9 - Training loss: 0.08094053777956914
Epoch 10 - Training loss: 0.0775009308762199
Epoch 11 - Training loss: 0.07600253954538946
Epoch 12 - Training loss: 0.07148050718777217
Epoch 13 - Training loss: 0.06898530548134857
Epoch 14 - Training loss: 0.067057527230009
Training Time (in minutes) = 4.103502643108368
Evaluate the trained neural network
correct_count, all_count = 0, 0
for images, labels in valloader:
for i in range(len(labels)):
img = images[i].view(1, 784).to(device)
labels = labels.to(device)
# Forward pass only during evaluation
with torch.no_grad():
logps = model(img)
# Output of the network are log-probabilities, need to take exponential for probabilities
ps = torch.exp(logps)
probab = list(ps.cpu().numpy()[0])
pred_label = probab.index(max(probab))
true_label = labels.cpu().numpy()[i]
if true_label == pred_label:
correct_count += 1
all_count += 1
print("Number Of Images Tested =", all_count)
print("\nModel Accuracy =", (correct_count/all_count))
Number Of Images Tested = 10000
Model Accuracy = 0.9689
Predict using the trained neural network
def view_classify(img, ps):
""" Function for viewing an image and it's predicted classes."""
ps = ps.data.numpy().squeeze()
fig, (ax1, ax2) = plt.subplots(figsize=(6,9), ncols=2)
ax1.imshow(img.resize_(1, 28, 28).numpy().squeeze())
ax1.axis('off')
ax2.barh(np.arange(10), ps)
ax2.set_aspect(0.1)
ax2.set_yticks(np.arange(10))
ax2.set_yticklabels(np.arange(10))
https://colab.research.google.com/drive/1Dbvec9pbkzkdoVYycqsqKAWDzxHhs-mV#printMode=true 4/5
4/29/23, 10:16 PM Copy of NeuralNetworks.ipynb - Colaboratory
ax2.set_title('Class Probability')
ax2.set_xlim(0, 1.1)
plt.tight_layout()
images, labels = next(iter(valloader))
img = images[0].view(1, 784).to(device)
# Turn off gradients
with torch.no_grad():
logps = model(img)
# Output of the network are log-probabilities, need to take exponential for probabilities
ps = torch.exp(logps)
probab = list(ps.cpu().numpy()[0])
print("Predicted Digit =", probab.index(max(probab)))
view_classify(img.cpu().view(1, 28, 28), ps.cpu())
Predicted Digit = 3
https://colab.research.google.com/drive/1Dbvec9pbkzkdoVYycqsqKAWDzxHhs-mV#printMode=true 5/5
4/29/23, 10:15 PM Copy of GradientDescent.ipynb - Colaboratory
Data Generation
%matplotlib inline
import numpy as np
import numpy.linalg as la
import matplotlib.pyplot as plt
dim_theta = 10
data_num = 1000
scale = .1
theta_true = np.ones((dim_theta,1))
print('True theta:', theta_true.reshape(-1))
A = np.random.uniform(low=-1.0, high=1.0, size=(data_num,dim_theta))
y_data = A @ theta_true + np.random.normal(loc=0.0, scale=scale, size=(data_num, 1))
A_test = np.random.uniform(low=-1.0, high=1.0, size=(50, dim_theta))
y_test = A_test @ theta_true + np.random.normal(loc=0.0, scale=scale, size=(50, 1))
True theta: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
Solving for the exact mean squared loss (solving Ax = b)
# print('Not implemented.')
'''
Hints:
1. See the least squares solution to Ax = b (when it is covered in lecture).
2. Use Numpy functions like Numpy's linear algebra functions to solve for x in Ax = b.
In fact, the linear algebra module is already imported with ```import numpy.linalg as la```.
3. Use the defined variable A in Ax = b. Use y_data as b. Use theta_pred as x.
'''
theta_pred, x, y, z = la.lstsq(A, y_data, rcond=None) # TODO: Implement the analytical solution
print('Empirical theta', theta_pred.reshape(-1))
Empirical theta [1.00388576 0.99857043 1.00131909 0.99940448 0.99480262 1.00234668
0.99791933 1.0073558 0.99768973 0.9945276 ]
SGD Variants Noisy Function
batch_size = 1
max_iter = 1000
lr = 0.001
theta_init = np.random.random((10,1)) * 0.1
def noisy_val_grad(theta_hat, data_, label_, deg_=2.):
gradient = np.zeros_like(theta_hat)
loss = 0
for i in range(data_.shape[0]):
x_ = data_[i, :].reshape(-1,1)
y_ = label_[i, 0]
err = np.sum(x_ * theta_hat) - y_
# print('Not implemented.')
'''
Hints:
1. Find the gradient and loss for each data point x_.
2. For grad, you need err, deg_, and x_.
3. For l, you need err and deg_ only.
4. Checkout the writeup for more hints.
'''
https://colab.research.google.com/drive/1JUeckROt18oQtYGTCBzpjJ3fTZkOYbN6#scrollTo=EGmQD7nDbmUg&printMode=true 1/5
4/29/23, 10:15 PM Copy of GradientDescent.ipynb - Colaboratory
grad = deg_ * np.sign(err) * np.abs(err)**(deg_ - 1) * x_ # TODO: Implement the analytical gradient
l = np.abs(err)**deg_ # TODO: Implement the loss function
loss += l / data_.shape[0]
gradient += grad / data_.shape[0]
return loss, gradient
Running SGD Variants
Parameters
#@title Parameters deg_: 5
deg_ = 5 #@param {type: "number"}
num_rep = 10 #@param {type: "integer"}
max_iter = 1000 #@param {type: "integer"} num_rep: 10
fig, ax = plt.subplots(figsize=(10,10))
best_vals = {} max_iter: 1000
test_exp_interval = 50 #@param {type: "integer"}
grad_artificial_normal_noise_scale = 0. #@param {type: "number"}
test_exp_interval: 50
grad_artificial_normal_noise_scale: 0.
for method_idx, method in enumerate(['adam', 'sgd', 'adagrad']):
test_loss_mat = []
train_loss_mat = []
https://colab.research.google.com/drive/1JUeckROt18oQtYGTCBzpjJ3fTZkOYbN6#scrollTo=EGmQD7nDbmUg&printMode=true 2/5
4/29/23, 10:15 PM Copy of GradientDescent.ipynb - Colaboratory
for replicate in range(num_rep):
if replicate % 20 == 0:
print(method, replicate)
if method == 'adam':
# print('Adam Not implemented.')
beta_1 = 0.9
beta_2 = 0.999
m = 0 # TODO: Initialize parameters
v = 0
epsilon = 1e-8
if method == 'adagrad':
# print('Adagrad Not implemented.')
epsilon = 1e-8 # TODO: Initialize parameters
squared_sum = 0
theta_hat = theta_init.copy()
test_loss_list = []
train_loss_list = []
for t in range(max_iter):
idx = np.random.choice(data_num, batch_size) # Split data
train_loss, gradient = noisy_val_grad(theta_hat, A[idx,:], y_data[idx,:], deg_=deg_)
artificial_grad_noise = np.random.randn(10, 1) * grad_artificial_normal_noise_scale + np.sign(np.random.random((10, 1)
gradient = gradient + artificial_grad_noise
train_loss_list.append(train_loss)
if t % test_exp_interval == 0:
test_loss, _ = noisy_val_grad(theta_hat, A_test[:,:], y_test[:,:], deg_=deg_)
test_loss_list.append(test_loss)
if method == 'adam':
# print('Adam Not implemented.') # TODO: Implement Adam
m = beta_1 * m + (1 - beta_1) * gradient
v = beta_2 * v + (1 - beta_2) * gradient**2
m_hat = m / (1 - (beta_1)**(t+1))
v_hat = v / (1 - (beta_2)**(t+1))
theta_hat = theta_hat - lr * m_hat / (np.sqrt(v_hat) + epsilon)
elif method == 'adagrad':
# print('Adagrad Not implemented.')
squared_sum = squared_sum + (gradient * gradient) # TODO: Implement Adagrad
theta_hat = theta_hat - lr * (1 / np.sqrt(squared_sum + epsilon)) * gradient
elif method == 'sgd':
theta_hat = theta_hat - lr * gradient
test_loss_mat.append(test_loss_list)
train_loss_mat.append(train_loss_list)
print(method, 'done')
x_axis = np.arange(max_iter)[::test_exp_interval]
print(theta_hat)
print('test_loss_np is a 2d array with num_rep rows and each column denotes a specific update stage in training')
print('The elements of test_loss_np are the test loss values computed in each replicate and training stage.')
test_loss_np = np.array(test_loss_mat)
# print('Not implemented.')
'''
Hints:
1. Use test_loss_np in np.mean() with axis = 0
'''
test_loss_mean = np.mean(test_loss_np, axis=0) # TODO: Calculate the mean test loss
'''
Hints:
1. Use test_loss_np in np.std() with axis = 0
2. Divide by np.sqrt() using num_rep as a parameter
'''
test_loss_se = np.std(test_loss_np, axis=0) / np.sqrt(num_rep) # TODO: Calculate the standard error for test loss
plt.errorbar(x_axis, test_loss_mean, yerr=2.5*test_loss_se, label=method)
best_vals[method] = min(test_loss_mean)
https://colab.research.google.com/drive/1JUeckROt18oQtYGTCBzpjJ3fTZkOYbN6#scrollTo=EGmQD7nDbmUg&printMode=true 3/5
4/29/23, 10:15 PM Copy of GradientDescent.ipynb - Colaboratory
best_vals = { k: int(v * 1000) / 1000. for k,v in best_vals.items() } # A weird way to round numbers
plt.title(f'Test Loss \n(objective degree: {deg_}, best values: {best_vals})')
plt.ylabel('Test Loss')
plt.legend()
plt.xlabel('Updates')
adam 0
adam done
[[0.14885809]
[0.19352583]
[0.26809462]
[0.17285942]
[0.15590926]
[0.22953613]
[0.20367141]
[0.15498251]
[0.17638002]
[0.20228849]]
test_loss_np is a 2d array with num_rep rows and each column denotes a specific update stage in training
The elements of test_loss_np are the test loss values computed in each replicate and training stage.
sgd 0
sgd done
[[0.81581469]
[0.86732407]
[0.92465587]
[0.82485162]
[0.90093489]
[0.91063236]
[0.83412241]
[0.7657022 ]
[0.97693746]
[0.65311502]]
test_loss_np is a 2d array with num_rep rows and each column denotes a specific update stage in training
The elements of test_loss_np are the test loss values computed in each replicate and training stage.
adagrad 0
adagrad done
[[0.03318229]
[0.04045313]
[0.10975987]
[0.08051692]
[0.02471577]
[0.08987534]
[0.0789562 ]
[0.02577295]
[0.05225122]
[0.09454105]]
test_loss_np is a 2d array with num_rep rows and each column denotes a specific update stage in training
The elements of test_loss_np are the test loss values computed in each replicate and training stage.
https://colab.research.google.com/drive/1JUeckROt18oQtYGTCBzpjJ3fTZkOYbN6#scrollTo=EGmQD7nDbmUg&printMode=true 4/5
4/29/23, 10:15 PM Copy of GradientDescent.ipynb - Colaboratory
0s completed at 10:15 PM
Could not connect to the reCAPTCHA service. Please check your internet connection and reload to get a reCAPTCHA challenge.
https://colab.research.google.com/drive/1JUeckROt18oQtYGTCBzpjJ3fTZkOYbN6#scrollTo=EGmQD7nDbmUg&printMode=true 5/5