Lab Work: Linear Regression via Gradient
Descent, and Visualizing Loss Surfaces
06/09/2024
Objective:
▶ Implement a simple Linear regression algorithm via gradient
descent algorithm.
▶ Visualize the line of best fit changing with each iteration of
gradient descent.
▶ Track and plot the loss (Mean Squared Error aka MSE) over
iterations.
▶ Visualize the 3D loss surface and the gradient descent path.
▶ Do experiments with a real dataset finally
▶ Upload your final python codes here:
https://docs.google.com/forms/d/
12R3DtRFeKahmLu5DgHBE53ptRha1qCGxSiXJCNbD59A/edit
▶ It is recommended to do the code in raw python instead of
notebook.
Task 1: Data Generation
Step 1: Generate Data
import numpy as np
import matplotlib . pyplot as plt
# Generate data ( y = 3 x + 4 + noise )
np . random . seed (42)
X = 2 * np . random . rand (100 , 1)
y = 4 + 3 * X + np . random . randn (100 , 1)
Step 2: Initialize Parameters
m = -20 # Initial slope
b = 20 # Initial intercept
learning_rate = _____ # Define learning rate
iterations = _____ # Define number of iterations
n = len ( X )
Task 2: Prediction and Loss Functions
Step 3: Prediction Function
def predict (X , m , b ):
# Complete the function to return predictions
return _____
Step 4: Compute Loss (MSE)
def compute_loss (X , y , m , b ):
y_pred = predict (X , m , b )
# Complete the loss function ( Hint : MSE formula )
return _____
Task 3: Gradient Descent Step Function
Step 5: Implement Gradient Descent Step
def step_gradient (X , y , m , b , learning_rate ):
y_pred = predict (X , m , b )
error = y_pred - y
m_gradient = (2/ n ) * np . sum ( X * error )
b_gradient = (2/ n ) * np . sum ( error )
# Update m and b based on gradients and learning rate
m = _____ # Complete this step
b = _____ # Complete this step
return m , b
Task 4: Visualization Setup
Step 6: Set up Figure and Axes
from matplotlib . animation import FuncAnimation
from mpl_toolkits . mplot3d import Axes3D
# Create grid of (m , b ) values for the loss surface
m_vals = np . linspace ( -40 , 40 , 50)
b_vals = np . linspace ( -40 , 40 , 50)
M , B = np . meshgrid ( m_vals , b_vals )
loss_surface = np . zeros ( M . shape )
# Compute loss surface
for i in range ( len ( m_vals )):
for j in range ( len ( b_vals )):
loss_surface [i , j ] = compute_loss (X , y , M [i , j ] , B [i , j ])
# Set up figure with subplots
fig = plt . figure ( figsize =(18 , 6))
# Subplot 1: Data points and prediction line
ax1 = fig . add_subplot (131)
ax1 . set_xlim ( np . min ( X ) , np . max ( X ))
ax1 . set_ylim ( np . min ( y ) , np . max ( y ))
ax1 . scatter (X , y , color = ’ blue ’ , label = ’ Data points ’)
line , = ax1 . plot (X , predict (X , m , b ) , ’r - ’ , label = ’ Prediction ’)
Task 4: Loss Plot and 3D Surface
Step 6 (continued): Subplots for Loss Function and 3D
Surface
# Subplot 2: Loss function over iterations
ax2 = fig . add_subplot (132)
ax2 . set_xlim (1 , iterations +1)
ax2 . set_ylim (0 , 50) # Start with an arbitrary range for loss values
loss_line , = ax2 . plot ([] , [] , ’g - ’ , label = ’ Loss ( MSE ) ’)
ax2 . set_xlabel ( ’ Iteration ’)
ax2 . set_ylabel ( ’ Loss ( MSE ) ’)
ax2 . set_title ( ’ Loss Function during Gradient Descent ’)
ax2 . legend ()
# Subplot 3: 3 D Loss surface plot
ax3 = fig . add_subplot (133 , projection = ’3 d ’)
ax3 . set_xlim ( -40 , 40)
ax3 . set_ylim ( -40 , 40)
ax3 . plot_surface (M , B , loss_surface , cmap = ’ viridis ’ , alpha =0.6 , rstride =1 , cstride =1)
path_line , = ax3 . plot ([] , [] , [] , ’r - ’ , marker = ’o ’) # Path line for gradient descent
ax3 . set_xlabel ( ’ Slope ( m ) ’)
ax3 . set_ylabel ( ’ Intercept ( b ) ’)
ax3 . set_zlabel ( ’ Loss ( MSE ) ’)
ax3 . set_title ( ’ Loss Surface and Gradient Descent Path ’)
Task 5: Gradient Descent Updates
Step 7: Update Function for Animation
def update ( frame ):
global m , b
s te ps _ pe r_ fr a me = 10 # Number of gradient descent steps per frame
for _ in range ( s te ps _ pe r_ fr a me ):
if len ( loss_history ) >= iterations :
break
m , b = step_gradient (X , y , m , b , learning_rate )
current_loss = compute_loss (X , y , m , b )
loss_history . append ( current_loss )
mb_history . append (( m , b ))
# Update the line in the first subplot
line . set_ydata ( predict (X , m , b ))
# Update the loss function plot
loss_line . set_data ( range ( len ( loss_history )) , loss_history )
ax2 . set_ylim (0 , max ( loss_history ) + 5)
# Update the 3 D loss surface plot
m_path , b_path = zip (* mb_history )
path_line . set_data ( m_path , b_path )
path_line . s e t _ 3 d _ p r o p e r t i e s ( loss_history )
return line , loss_line , path_line
Task 6: Run the Animation
Step 8: Initialize and Run the Animation
# Initialize the loss plot and 3 D plot with empty data
def init ():
loss_line . set_data ([] , [])
path_line . set_data ([] , [])
path_line . s e t _ 3 d _ p r o p e r t i e s ([])
return line , loss_line , path_line
# Create the animation
frames = iterations // 10
interval = 5
ani = FuncAnimation ( fig , update , frames = frames , init_func = init , interval = interval , blit = True )
plt . tight_layout ()
plt . show ()
Questions
▶ What happens to the loss (MSE) as the number of iterations
increases?
▶ How does changing the learning rate affect the convergence of
the gradient descent?
▶ If we have more parameters because we have more features
than what is in this problem, how will you visualize the loss
surface?
▶ Attempt the same problem using BOSTON-HOUSING
dataset. Download the data from https://github.com/
selva86/datasets/blob/master/BostonHousing.csv
▶ Use feature RM(rooms avg in house) from dataset to fit
MEDV( price of house)
▶ Use 80% data for training and 20% data for testing the fit
Boston dataset(I just learnt that its a racist dataset tbh!)
▶ CRIM - per capita crime rate by town
▶ ZN - proportion of residential land zoned for lots over 25,000
sq.ft.
▶ INDUS - proportion of non-retail business acres per town.
▶ CHAS - Charles River dummy variable (1 if tract bounds river;
0 otherwise)
▶ NOX - nitric oxides concentration (parts per 10 million)
▶ RM - average number of rooms per dwelling
▶ AGE - proportion of owner-occupied units built prior to 1940
▶ DIS - weighted distances to five Boston employment centres
▶ RAD - index of accessibility to radial highways
▶ TAX - full-value property-tax rate per $10,000
▶ PTRATIO - pupil-teacher ratio by town
▶ B - 1000(Bk - 0.63)2̂ where Bk is the proportion of blacks by
town
▶ LSTAT - % lower status of the population
▶ MEDV - Median value of owner-occupied homes in $1000’s