0% found this document useful (0 votes)
218 views10 pages

361 Project Code

1. The document describes downloading and preprocessing the MNIST dataset to create training and validation datasets in mini-batches. 2. A neural network model with two hidden layers is defined and trained on the MNIST dataset for 15 epochs using negative log likelihood loss and the Adam optimizer. 3. The trained model achieves 96.89% accuracy on the validation set and is used to predict digits in images from the validation set.

Uploaded by

skdlf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
218 views10 pages

361 Project Code

1. The document describes downloading and preprocessing the MNIST dataset to create training and validation datasets in mini-batches. 2. A neural network model with two hidden layers is defined and trained on the MNIST dataset for 15 epochs using negative log likelihood loss and the Adam optimizer. 3. The trained model achieves 96.89% accuracy on the validation set and is used to predict digits in images from the validation set.

Uploaded by

skdlf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

4/29/23, 10:16 PM Copy of NeuralNetworks.

ipynb - Colaboratory

Imports for Python libraries

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import numpy as np
import torch
import torchvision
import matplotlib.pyplot as plt
from time import time
from torchvision import datasets, transforms
from torch import nn
from torch import optim

Set up the mini-batch size

Batch Size

#@title Batch Size mini_batch_size: 64
mini_batch_size = 64 #@param {type: "integer"}

Download the dataset, pre-process, and divide into mini-batches

### Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)),])

### Download and load the training data
trainset = datasets.MNIST('MNIST_data/', download=True, train=True, transform=transform)
valset = datasets.MNIST('MNIST_data/', download=True, train=False, transform=transform)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=mini_batch_size, shuffle=True)
valloader = torch.utils.data.DataLoader(valset, batch_size=mini_batch_size, shuffle=True)
dataiter = iter(trainloader)
images, labels = next(dataiter)
print(type(images))
print(images.shape)
print(labels.shape)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to MNIST_data/MNIST/raw/train-images-idx3-ubyte.gz
100%|██████████| 9912422/9912422 [00:00<00:00, 105981542.43it/s]
Extracting MNIST_data/MNIST/raw/train-images-idx3-ubyte.gz to MNIST_data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to MNIST_data/MNIST/raw/train-labels-idx1-ubyte.gz
100%|██████████| 28881/28881 [00:00<00:00, 45488431.78it/s]
Extracting MNIST_data/MNIST/raw/train-labels-idx1-ubyte.gz to MNIST_data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to MNIST_data/MNIST/raw/t10k-images-idx3-ubyte.gz
100%|██████████| 1648877/1648877 [00:00<00:00, 27091076.95it/s]
Extracting MNIST_data/MNIST/raw/t10k-images-idx3-ubyte.gz to MNIST_data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to MNIST_data/MNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████| 4542/4542 [00:00<00:00, 19761959.30it/s]
Extracting MNIST_data/MNIST/raw/t10k-labels-idx1-ubyte.gz to MNIST_data/MNIST/raw

<class 'torch.Tensor'>
torch.Size([64, 1, 28, 28])
torch.Size([64])

Explore the processed data

plt.imshow(images[0].numpy().squeeze(), cmap='gray_r'); # Change the index of images[] to get different numbers

https://colab.research.google.com/drive/1Dbvec9pbkzkdoVYycqsqKAWDzxHhs-mV#printMode=true 1/5
4/29/23, 10:16 PM Copy of NeuralNetworks.ipynb - Colaboratory

figure = plt.figure()
num_of_images = 60
for index in range(1, num_of_images + 1):
    plt.subplot(6, 10, index)
    plt.axis('off')
    plt.imshow(images[index].numpy().squeeze(), cmap='gray_r')

Set up the neural network

https://colab.research.google.com/drive/1Dbvec9pbkzkdoVYycqsqKAWDzxHhs-mV#printMode=true 2/5
4/29/23, 10:16 PM Copy of NeuralNetworks.ipynb - Colaboratory
# Please change the runtime to GPU if you'd like to have some speed-up on Colab
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

### Layer details for the neural network
input_size = 784
hidden_sizes = [128, 64]
output_size = 10

### Build a feed-forward network
model = nn.Sequential(
    nn.Linear(input_size, hidden_sizes[0]), # Fully Connected Layer
    nn.ReLU(), # Activation
    nn.Linear(hidden_sizes[0], hidden_sizes[1]), # Fully Connected Layer
    nn.ReLU(), # Activation
    nn.Linear(hidden_sizes[1], output_size), # Fully Connected Layer
    nn.LogSoftmax(dim=1) # (Log) Softmax Layer: Output a probability distribution and apply log
)
print(model)

model.to(device)

Sequential(
(0): Linear(in_features=784, out_features=128, bias=True)
(1): ReLU()
(2): Linear(in_features=128, out_features=64, bias=True)
(3): ReLU()
(4): Linear(in_features=64, out_features=10, bias=True)
(5): LogSoftmax(dim=1)
)
Sequential(
(0): Linear(in_features=784, out_features=128, bias=True)
(1): ReLU()
(2): Linear(in_features=128, out_features=64, bias=True)
(3): ReLU()
(4): Linear(in_features=64, out_features=10, bias=True)
(5): LogSoftmax(dim=1)
)

Set up the optimization model

Optimizer

#@title Optimizer lr: 0.003


lr = 0.003 #@param {type: "number"}
optimizer = optim.Adam(model.parameters(), lr=lr) # Feel free to try out other optimizers as you see fit!

Set up the loss function to optimize over

time0 = time()
epochs = 15
criterion = nn.NLLLoss() # Negative log likelihood loss function is used
images, labels = next(iter(trainloader))
images = images.view(images.shape[0], -1).to(device)

logps = model(images) # Model spits out the log probability of image belonging to different classes
loss = criterion(logps, labels.to(device))

Train the neural network

for e in range(epochs):
    running_loss = 0
    for images, labels in trainloader:
        # Flatten MNIST images into a 784 long vector
        images = images.view(images.shape[0], -1).to(device)
        labels = labels.to(device)

        # Training pass
        optimizer.zero_grad()
https://colab.research.google.com/drive/1Dbvec9pbkzkdoVYycqsqKAWDzxHhs-mV#printMode=true 3/5
4/29/23, 10:16 PM Copy of NeuralNetworks.ipynb - Colaboratory

        output = model(images).to(device)
        loss = criterion(output, labels)

        # backpropagation: calculate the gradient of the loss function w.r.t model parameters
        loss.backward()

        # And optimizes its weights here
        optimizer.step()

        running_loss += loss.item()
    else:
        print("Epoch {} - Training loss: {}".format(e, running_loss/len(trainloader)))
print("\nTraining Time (in minutes) =", (time()-time0)/60)

Epoch 0 - Training loss: 0.33197643931534115


Epoch 1 - Training loss: 0.17194996493012665
Epoch 2 - Training loss: 0.13555304701628684
Epoch 3 - Training loss: 0.12093326378204643
Epoch 4 - Training loss: 0.11512036151975008
Epoch 5 - Training loss: 0.10048197690712245
Epoch 6 - Training loss: 0.09697565546870898
Epoch 7 - Training loss: 0.09047750682820246
Epoch 8 - Training loss: 0.08670042812633616
Epoch 9 - Training loss: 0.08094053777956914
Epoch 10 - Training loss: 0.0775009308762199
Epoch 11 - Training loss: 0.07600253954538946
Epoch 12 - Training loss: 0.07148050718777217
Epoch 13 - Training loss: 0.06898530548134857
Epoch 14 - Training loss: 0.067057527230009

Training Time (in minutes) = 4.103502643108368

Evaluate the trained neural network

correct_count, all_count = 0, 0
for images, labels in valloader:
    for i in range(len(labels)):
        img = images[i].view(1, 784).to(device)
        labels = labels.to(device)
        # Forward pass only during evaluation
        with torch.no_grad():
            logps = model(img)

        # Output of the network are log-probabilities, need to take exponential for probabilities
        ps = torch.exp(logps)
        probab = list(ps.cpu().numpy()[0])
        pred_label = probab.index(max(probab))
        true_label = labels.cpu().numpy()[i]
        if true_label == pred_label:
            correct_count += 1
        all_count += 1

print("Number Of Images Tested =", all_count)
print("\nModel Accuracy =", (correct_count/all_count))

Number Of Images Tested = 10000

Model Accuracy = 0.9689

Predict using the trained neural network

def view_classify(img, ps):
    """ Function for viewing an image and it's predicted classes."""
    ps = ps.data.numpy().squeeze()

    fig, (ax1, ax2) = plt.subplots(figsize=(6,9), ncols=2)
    ax1.imshow(img.resize_(1, 28, 28).numpy().squeeze())
    ax1.axis('off')
    ax2.barh(np.arange(10), ps)
    ax2.set_aspect(0.1)
    ax2.set_yticks(np.arange(10))
    ax2.set_yticklabels(np.arange(10))

https://colab.research.google.com/drive/1Dbvec9pbkzkdoVYycqsqKAWDzxHhs-mV#printMode=true 4/5
4/29/23, 10:16 PM Copy of NeuralNetworks.ipynb - Colaboratory
    ax2.set_title('Class Probability')
    ax2.set_xlim(0, 1.1)
    plt.tight_layout()

images, labels = next(iter(valloader))

img = images[0].view(1, 784).to(device)
# Turn off gradients
with torch.no_grad():
    logps = model(img)

# Output of the network are log-probabilities, need to take exponential for probabilities
ps = torch.exp(logps)
probab = list(ps.cpu().numpy()[0])
print("Predicted Digit =", probab.index(max(probab)))
view_classify(img.cpu().view(1, 28, 28), ps.cpu())

Predicted Digit = 3

https://colab.research.google.com/drive/1Dbvec9pbkzkdoVYycqsqKAWDzxHhs-mV#printMode=true 5/5
4/29/23, 10:15 PM Copy of GradientDescent.ipynb - Colaboratory

Data Generation

%matplotlib inline
import numpy as np
import numpy.linalg as la
import matplotlib.pyplot as plt

dim_theta = 10
data_num = 1000
scale = .1

theta_true = np.ones((dim_theta,1))
print('True theta:', theta_true.reshape(-1))

A = np.random.uniform(low=-1.0, high=1.0, size=(data_num,dim_theta))
y_data = A @ theta_true + np.random.normal(loc=0.0, scale=scale, size=(data_num, 1))

A_test = np.random.uniform(low=-1.0, high=1.0, size=(50, dim_theta))
y_test = A_test @ theta_true + np.random.normal(loc=0.0, scale=scale, size=(50, 1))

True theta: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

Solving for the exact mean squared loss (solving Ax = b)

# print('Not implemented.')

'''
Hints:
1. See the least squares solution to Ax = b (when it is covered in lecture).

2. Use Numpy functions like Numpy's linear algebra functions to solve for x in Ax = b.
In fact, the linear algebra module is already imported with ```import numpy.linalg as la```.

3. Use the defined variable A in Ax = b. Use y_data as b. Use theta_pred as x.
'''
theta_pred, x, y, z = la.lstsq(A, y_data, rcond=None) # TODO: Implement the analytical solution

print('Empirical theta', theta_pred.reshape(-1))

Empirical theta [1.00388576 0.99857043 1.00131909 0.99940448 0.99480262 1.00234668


0.99791933 1.0073558 0.99768973 0.9945276 ]

SGD Variants Noisy Function

batch_size = 1
max_iter = 1000
lr = 0.001
theta_init = np.random.random((10,1)) * 0.1

def noisy_val_grad(theta_hat, data_, label_, deg_=2.):
    gradient = np.zeros_like(theta_hat)
    loss = 0
    
    for i in range(data_.shape[0]):
        x_ = data_[i, :].reshape(-1,1)
        y_ = label_[i, 0]
        err = np.sum(x_ * theta_hat) - y_
        
        # print('Not implemented.')

        '''
        Hints:
        1. Find the gradient and loss for each data point x_.
        2. For grad, you need err, deg_, and x_.
        3. For l, you need err and deg_ only.
        4. Checkout the writeup for more hints.
        '''

https://colab.research.google.com/drive/1JUeckROt18oQtYGTCBzpjJ3fTZkOYbN6#scrollTo=EGmQD7nDbmUg&printMode=true 1/5
4/29/23, 10:15 PM Copy of GradientDescent.ipynb - Colaboratory

        grad = deg_ * np.sign(err) * np.abs(err)**(deg_ - 1) * x_ # TODO: Implement the analytical gradient
        l = np.abs(err)**deg_ # TODO: Implement the loss function
        
        loss += l / data_.shape[0]
        gradient += grad / data_.shape[0]
        
    return loss, gradient

Running SGD Variants

Parameters

#@title Parameters deg_: 5
deg_ = 5 #@param {type: "number"}
num_rep = 10 #@param {type: "integer"}
max_iter = 1000 #@param {type: "integer"} num_rep: 10
fig, ax = plt.subplots(figsize=(10,10))
best_vals = {} max_iter: 1000
test_exp_interval = 50 #@param {type: "integer"}
grad_artificial_normal_noise_scale = 0. #@param {type: "number"}
test_exp_interval: 50

grad_artificial_normal_noise_scale: 0.

for method_idx, method in enumerate(['adam', 'sgd', 'adagrad']):
    test_loss_mat = []
    train_loss_mat = []

https://colab.research.google.com/drive/1JUeckROt18oQtYGTCBzpjJ3fTZkOYbN6#scrollTo=EGmQD7nDbmUg&printMode=true 2/5
4/29/23, 10:15 PM Copy of GradientDescent.ipynb - Colaboratory
    
    for replicate in range(num_rep):
        if replicate % 20 == 0:
            print(method, replicate)
            
        if method == 'adam':
            # print('Adam Not implemented.')
            beta_1 = 0.9
            beta_2 = 0.999
            m = 0 # TODO: Initialize parameters
            v = 0
            epsilon = 1e-8

        if method == 'adagrad':
            # print('Adagrad Not implemented.')
            epsilon = 1e-8 # TODO: Initialize parameters
            squared_sum = 0
            
        theta_hat = theta_init.copy()
        test_loss_list = []
        train_loss_list = []

        for t in range(max_iter):
            idx = np.random.choice(data_num, batch_size) # Split data
            train_loss, gradient = noisy_val_grad(theta_hat, A[idx,:], y_data[idx,:], deg_=deg_)
            artificial_grad_noise = np.random.randn(10, 1) * grad_artificial_normal_noise_scale + np.sign(np.random.random((10, 1)
            gradient = gradient + artificial_grad_noise
            train_loss_list.append(train_loss)
            
            if t % test_exp_interval == 0:
                test_loss, _ = noisy_val_grad(theta_hat, A_test[:,:], y_test[:,:], deg_=deg_)
                test_loss_list.append(test_loss)                
            
            if method == 'adam':
                # print('Adam Not implemented.') # TODO: Implement Adam
                m = beta_1 * m + (1 - beta_1) * gradient
                v = beta_2 * v + (1 - beta_2) * gradient**2
                m_hat = m / (1 - (beta_1)**(t+1))
                v_hat = v / (1 - (beta_2)**(t+1))
                theta_hat = theta_hat - lr * m_hat / (np.sqrt(v_hat) + epsilon)
            
            elif method == 'adagrad':
                # print('Adagrad Not implemented.')
                squared_sum = squared_sum + (gradient * gradient) # TODO: Implement Adagrad
                theta_hat = theta_hat - lr * (1 / np.sqrt(squared_sum + epsilon)) * gradient
            
            elif method == 'sgd':
                theta_hat = theta_hat - lr * gradient
        
        test_loss_mat.append(test_loss_list)
        train_loss_mat.append(train_loss_list)
        
    print(method, 'done')
    x_axis = np.arange(max_iter)[::test_exp_interval]

    print(theta_hat)
    
    print('test_loss_np is a 2d array with num_rep rows and each column denotes a specific update stage in training')
    print('The elements of test_loss_np are the test loss values computed in each replicate and training stage.')
    test_loss_np = np.array(test_loss_mat)
    
    # print('Not implemented.')
    '''
    Hints:
    1. Use test_loss_np in np.mean() with axis = 0
    '''
    test_loss_mean = np.mean(test_loss_np, axis=0) # TODO: Calculate the mean test loss

    '''
    Hints:
    1. Use test_loss_np in np.std() with axis = 0 
    2. Divide by np.sqrt() using num_rep as a parameter
    '''
    test_loss_se = np.std(test_loss_np, axis=0) / np.sqrt(num_rep) # TODO: Calculate the standard error for test loss

    plt.errorbar(x_axis, test_loss_mean, yerr=2.5*test_loss_se, label=method)
    best_vals[method] = min(test_loss_mean)

https://colab.research.google.com/drive/1JUeckROt18oQtYGTCBzpjJ3fTZkOYbN6#scrollTo=EGmQD7nDbmUg&printMode=true 3/5
4/29/23, 10:15 PM Copy of GradientDescent.ipynb - Colaboratory
    best_vals = { k: int(v * 1000) / 1000. for k,v in best_vals.items() } # A weird way to round numbers
    plt.title(f'Test Loss \n(objective degree: {deg_},  best values: {best_vals})')
    plt.ylabel('Test Loss')

    plt.legend()
    plt.xlabel('Updates')  

adam 0
adam done
[[0.14885809]
[0.19352583]
[0.26809462]
[0.17285942]
[0.15590926]
[0.22953613]
[0.20367141]
[0.15498251]
[0.17638002]
[0.20228849]]
test_loss_np is a 2d array with num_rep rows and each column denotes a specific update stage in training
The elements of test_loss_np are the test loss values computed in each replicate and training stage.
sgd 0
sgd done
[[0.81581469]
[0.86732407]
[0.92465587]
[0.82485162]
[0.90093489]
[0.91063236]
[0.83412241]
[0.7657022 ]
[0.97693746]
[0.65311502]]
test_loss_np is a 2d array with num_rep rows and each column denotes a specific update stage in training
The elements of test_loss_np are the test loss values computed in each replicate and training stage.
adagrad 0
adagrad done
[[0.03318229]
[0.04045313]
[0.10975987]
[0.08051692]
[0.02471577]
[0.08987534]
[0.0789562 ]
[0.02577295]
[0.05225122]
[0.09454105]]
test_loss_np is a 2d array with num_rep rows and each column denotes a specific update stage in training
The elements of test_loss_np are the test loss values computed in each replicate and training stage.

https://colab.research.google.com/drive/1JUeckROt18oQtYGTCBzpjJ3fTZkOYbN6#scrollTo=EGmQD7nDbmUg&printMode=true 4/5
4/29/23, 10:15 PM Copy of GradientDescent.ipynb - Colaboratory

 0s completed at 10:15 PM
Could not connect to the reCAPTCHA service. Please check your internet connection and reload to get a reCAPTCHA challenge.

https://colab.research.google.com/drive/1JUeckROt18oQtYGTCBzpjJ3fTZkOYbN6#scrollTo=EGmQD7nDbmUg&printMode=true 5/5

You might also like