0% found this document useful (0 votes)
17 views12 pages

02 Multiple Regression Gradient Vectorized

This document outlines a module on multiple regression, focusing on extending data structures and routines to support multiple features using Python libraries like NumPy and Matplotlib. It includes sections on model prediction, cost computation, and gradient descent, providing code examples for implementing these concepts. The goal is to build a linear regression model for predicting housing prices based on multiple features such as size, bedrooms, and age.

Uploaded by

lakshmipoojaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views12 pages

02 Multiple Regression Gradient Vectorized

This document outlines a module on multiple regression, focusing on extending data structures and routines to support multiple features using Python libraries like NumPy and Matplotlib. It includes sections on model prediction, cost computation, and gradient descent, providing code examples for implementing these concepts. The goal is to build a linear regression model for predicting housing prices based on multiple features such as size, bedrooms, and age.

Uploaded by

lakshmipoojaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized

02 - Multiple Regression
In this module, you will extend the data structures and previously developed routines
to support multiple features. Several routines are updated making the module appear
lengthy, but it makes minor adjustments to previous routines making it quick to
review.

Outline
1.1 Goals
1.2 Tools
1.3 Notation
2 Problem Statement
2.1 Matrix X containing our examples
2.2 Parameter vector w, b
3 Model Prediction With Multiple Variables
3.1 Single Prediction element by element
3.2 Single Prediction, vector
4 Compute Cost With Multiple Variables
5 Gradient Descent With Multiple Variables
5.1 Compute Gradient with Multiple Variables
5.2 Gradient Descent With Multiple Variables

1.1 Goals
Extend our regression model routines to support multiple features
Extend data structures to support multiple features
Rewrite prediction, cost and gradient routines to support multiple features
Utilize NumPy np.dot to vectorize their implementations for speed and
simplicity

1.2 Tools
In this lab, we will make use of:
NumPy, a popular library for scientific computing
Matplotlib, a popular library for plotting data
In [1]: import copy, math
import numpy as np
import matplotlib.pyplot as plt

file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 1/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized

plt.style.use('./deeplearning.mplstyle')
np.set_printoptions(precision=2) # reduced display precision on numpy ar

1.3 Notation
Here is a summary of some of the notation you will encounter, updated for multiple
features.
|General Notation | Description | Python (if applicable) | |:------------|:---------------
---------------------------------------------|:----------| | | scalar, non bold || | | a a

vector, bold || | | matrix, bold capital || | Regression | | | | | | training example


A X

matrix | X_train |
| | training example targets | y_train | , | Training Example | X[i] ,
y x
(i)
y
(i)
ith

y[i] | | m | number of training examples | m | | n | number of features in each


example | n | | | parameter: weight, | w | | | parameter: bias | b |
w b

|fw,b (x | The result of the model evaluation at parameterized by :


(i)
) x
(i)
w, b

fw,b (x
(i)
) = w ⋅ x | f_wb |
(i)
+ b

2 Problem Statement
You will use the motivating example of housing price prediction. The training dataset
contains three examples with four features (size, bedrooms, floors and, age) shown
in the table below. Note that size is in sqft rather than 1000 sqft. This causes an
issue, which you will solve in the next notebook!
Size Number of Number of Age of Price (1000s
(sqft) Bedrooms floors Home dollars)
2104 5 1 45 460
1416 3 2 40 232
852 2 1 35 178
You will build a linear regression model using these values so you can then predict
the price for other houses. For example, a house with 1200 sqft, 3 bedrooms, 1 floor,
40 years old.
Run the following code cell to create your X_train and y_train variables.
In [3]: X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])

2.1 Matrix X containing our examples


file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 2/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized

Similar to the table above, examples are stored in a NumPy matrix X_train . Each
row of the matrix represents one example. When you have training examples ( m m

is three in our example), and there are features (four in our example), is a matrix
n X

with dimensions ( , ) (m rows, n columns).


m n

(0) (0) (0)


⎛ x x ⋯ x ⎞
0 1 n−1

⎜ (1) (1) (1) ⎟


⎜ x x ⋯ x ⎟
X = ⎜ ⎟
0 1 n−1
⎜ ⎟
⎜ ⋯ ⎟
⎜ ⎟
(m−1) (m−1) (m−1)
⎝ ⎠
x x ⋯ x
0 1 n−1

notation:
x is vector containing example i.
(i)
x
(i)
= (x
(i)
,x
(i)
,⋯,x
(i)
)

is element j in example i. The superscript in parenthesis indicates the


0 1 n−1

(i)
x

example number while the subscript represents an element.


j

Display the input data.


In [4]: # data is stored in numpy array/matrix
print(f"X Shape: {X_train.shape}, X Type:{type(X_train)})")
print(X_train)
print(f"y Shape: {y_train.shape}, y Type:{type(y_train)})")
print(y_train)

X Shape: (3, 4), X Type:<class 'numpy.ndarray'>)


[[2104 5 1 45]
[1416 3 2 40]
[ 852 2 1 35]]
y Shape: (3,), y Type:<class 'numpy.ndarray'>)
[460 232 178]

2.2 Parameter vector w, b


w is a vector with elements.
n

Each element contains the parameter associated with one feature.


in our dataset, n is 4.
notionally, we draw this as a column vector
w0
⎛ ⎞

⎜ w1 ⎟
w = ⎜ ⎟
⎜ ⋯ ⎟

⎝ ⎠
wn−1

is a scalar parameter.
b

For demonstration, and will be loaded with some initial selected values that are
w b

near the optimal. is a 1-D NumPy vector.


w

file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 3/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized

In [6]: b_init = 785.1811367994083


w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])
print(f"w_init shape: {w_init.shape}, b_init type: {type(b_init)}")

w_init shape: (4,), b_init type: <class 'float'>

3 Model Prediction With Multiple Variables


The model's prediction with multiple variables is given by the linear model:
fw,b (x) = w0 x0 + w1 x1 +. . . +wn−1 xn−1 + b (1)

or in vector notation:
fw,b (x) = w ⋅ x + b (2)

where is a vector dot product


To demonstrate the dot product, we will implement prediction using (1) and (2).

3.1 Single Prediction element by element


Our previous prediction multiplied one feature value by one parameter and added a
bias parameter. A direct extension of our previous implementation of prediction to
multiple features would be to implement (1) above using loop over each element,
performing the multiply with its parameter and then adding the bias parameter at the
end.
In [7]: def predict_single_loop(x, w, b):
"""
single predict using linear regression

Args:
x (ndarray): Shape (n,) example with multiple features
w (ndarray): Shape (n,) model parameters
b (scalar): model parameter

Returns:
p (scalar): prediction
"""
n = x.shape[0]
p = 0
for i in range(n):
p_i = x[i] * w[i]
p = p + p_i
p = p + b
return p

In [8]: # get a row from our training data


x_vec = X_train[0,:]
print(f"x_vec shape {x_vec.shape}, x_vec value: {x_vec}")
# make a prediction

file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 4/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized

f_wb = predict_single_loop(x_vec, w_init, b_init)


print(f"f_wb shape {f_wb.shape}, prediction: {f_wb}")

x_vec shape (4,), x_vec value: [2104 5 1 45]


f_wb shape (), prediction: 459.9999976194083

Note the shape of x_vec . It is a 1-D NumPy vector with 4 elements, (4,). The result,
f_wb is a scalar.

3.2 Single Prediction, vector


Noting that equation (1) above can be implemented using the dot product as in (2)
above. We can make use of vector operations to speed up predictions.
Recall from the Python/Numpy lab that NumPy np.dot() [link] can be used to
perform a vector dot product.
In [9]: def predict(x, w, b):
"""
single predict using linear regression
Args:
x (ndarray): Shape (n,) example with multiple features
w (ndarray): Shape (n,) model parameters
b (scalar): model parameter

Returns:
p (scalar): prediction
"""
p = np.dot(x, w) + b
return p

In [10]: # get a row from our training data


x_vec = X_train[0,:]
print(f"x_vec shape {x_vec.shape}, x_vec value: {x_vec}")

# make a prediction
f_wb = predict(x_vec,w_init, b_init)
print(f"f_wb shape {f_wb.shape}, prediction: {f_wb}")

x_vec shape (4,), x_vec value: [2104 5 1 45]


f_wb shape (), prediction: 459.99999761940825

The results and shapes are the same as the previous version which used looping.
Going forward, np.dot will be used for these operations. The prediction is now a
single statement. Most routines will implement it directly rather than calling a
separate predict routine.

4 Compute Cost With Multiple Variables


The equation for the cost function with multiple variables J (w, b) is:

file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 5/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized

m−1
1
(i) (i) 2
J (w, b) = ∑ (fw,b (x ) − y ) (3)
2m
i=0

where:
(i) (i)
fw,b (x ) = w ⋅ x + b (4)

Below is an implementation of equations (3) and (4). Note that this uses a standard
pattern for this course where a for loop over all m examples is used.
Exercise 1 - Compute Cost - Non-vectorized
Implement the compute_cost_nonvectorized() function, below, according to the
specifications below, including the input parameters and return value (cost). This
function should not make use of any vectorization.
In [11]: def compute_cost_nonvectorized(X, y, w, b):
"""
compute cost
Args:
X (ndarray (m,n)): Data, m examples with n features
y (ndarray (m,)) : target values
w (ndarray (n,)) : model parameters
b (scalar) : model parameter

Returns:
cost (scalar): cost
"""
# WRITE CODE HERE

m = X.shape[0]
total = 0.0

# ensure 1-D shapes for indexing


w = w.reshape(-1)
y = y.reshape(-1)

for i in range(m):
f_wb_i = np.dot(X[i], w) + b # prediction for example i
err_i = f_wb_i - y[i] # residual
total += err_i ** 2 # squared error

cost = total / (2 * m)
return cost

In [12]: # Compute and display cost using our pre-chosen optimal parameters.
cost = compute_cost_nonvectorized(X_train, y_train, w_init, b_init)
print(f'Cost at optimal w : {cost}')

Cost at optimal w : 1.5578904880036537e-12

Expected Result: Cost at optimal w : 1.5578904045996674e-12


Exercise 2 - Compute Cost - Vectorized
file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 6/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized

Implement the compute_cost_vectorized() function, below, according to the


specifications below, including the input parameters and return value (cost). This
function should have a vectorization-based implementation.
In [13]: def compute_cost_vectorized(X, y, w, b):
"""
compute cost
Args:
X (ndarray (m,n)): Data, m examples with n features
y (ndarray (m,)) : target values
w (ndarray (n,)) : model parameters
b (scalar) : model parameter

Returns:
cost (scalar): cost
"""
# WRITE CODE HERE

m = X.shape[0]
y = y.reshape(-1) # ensure 1-D
preds = X @ w + b # (m,)
errors = preds - y # (m,)
cost = (errors @ errors) / (2 * m) # scalar
return cost

In [14]: # Compute and display cost using our pre-chosen optimal parameters.
cost = compute_cost_vectorized(X_train, y_train, w_init, b_init)
print(f'Cost at optimal w : {cost}')

Cost at optimal w : 1.5578904880036537e-12

Expected Result: Cost at optimal w : 1.5578904045996674e-12

5 Gradient Descent With Multiple Variables


Gradient descent for multiple variables:
repeat until convergence: {

∂J (w, b)
wj = wj − α for j = 0..n-1 (5)
∂wj

∂J (w, b)
b = b − α
∂b

where, n is the number of features, parameters , , are updated simultaneously


wj b

and where

file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 7/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized

m−1
∂J (w, b) 1
(i) (i) (i)
= ∑ (fw,b (x ) − y )x (6)
j
∂wj m
i=0

m−1
∂J (w, b) 1
(i) (i)
= ∑ (fw,b (x ) − y ) (7)
∂b m
i=0

m is the number of training examples in the data set


fw,b (x is the model's prediction, while is the target value
(i)
) y
(i)

5.1 Compute Gradient with Multiple Variables


An implementation for calculating the equations (6) and (7) is below. There are many
ways to implement this. In this version, there is an
outer loop over all m examples.
for the example can be computed directly and accumulated
∂J (w,b)

in a second loop over all n features:


∂b

is computed for each .


∂J (w,b)

∂wj
wj

Exercise 3 - Compute Gradient - Non-vectorized


Implement the compute_gradient_nonvectorized() function, below, according to the
specifications below, including the input parameters and return value (cost). This
function should not make use of any vectorization.
In [22]: def compute_gradient_nonvectorized(X, y, w, b):
"""
Computes the gradient for linear regression
Args:
X (ndarray (m,n)): Data, m examples with n features
y (ndarray (m,)) : target values
w (ndarray (n,)) : model parameters
b (scalar) : model parameter

Returns:
dj_db (scalar): The gradient of the cost w.r.t. the parameter
dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameter
"""
# WRITE CODE HERE

m, n = X.shape
y = np.asarray(y).reshape(-1)
w = np.asarray(w).reshape(-1)
if w.size != n: # handle scalar w accidentally p
w = np.full(n, float(w))

dj_db = 0.0
dj_dw = np.zeros(n, dtype=float)

# outer loop over all examples

file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 8/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized

for i in range(m):
# prediction for example i (no vectorization)
f_wb_i = b
for j in range(n):
f_wb_i += w[j] * X[i, j]

err_i = f_wb_i - y[i]

dj_db += err_i
# inner loop over all features
for j in range(n):
dj_dw[j] += err_i * X[i, j]

dj_db /= m
for j in range(n):
dj_dw[j] /= m

return dj_db, dj_dw

In [23]: #Compute and display gradient


tmp_dj_db, tmp_dj_dw = compute_gradient_nonvectorized(X_train, y_train, w
print(f'dj_dw at initial w,b: {tmp_dj_dw}')
print(f'dj_db at initial w,b: {tmp_dj_db}')

dj_dw at initial w,b: [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05]


dj_db at initial w,b: -1.6739251880911372e-06

Expected Result:
dj_dw at initial w,b: [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05]
dj_db at initial w,b: -1.6739251122999121e-06
Exercise 4 - Compute Gradient - Vectorized
Implement the compute_cost_vectorized() function, below, according to the
specifications below, including the input parameters and return value (cost). This
function should have a vectorization-based implementation.
In [24]: def compute_gradient_vectorized(X, y, w, b):
"""
Computes the gradient for linear regression
Args:
X (ndarray (m,n)): Data, m examples with n features
y (ndarray (m,)) : target values
w (ndarray (n,)) : model parameters
b (scalar) : model parameter

Returns:
dj_db (scalar): The gradient of the cost w.r.t. the parameter
dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameter
"""

# WRITE CODE HERE


m = X.shape[0]
y = np.asarray(y).reshape(-1)
w = np.asarray(w).reshape(-1)

errors = X @ w + b - y # (m,)

file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 9/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized

dj_db = errors.sum() / m # scalar


dj_dw = (X.T @ errors) / m # (n,)

return float(dj_db), dj_dw

In [25]: #Compute and display gradient


tmp_dj_db, tmp_dj_dw = compute_gradient_vectorized(X_train, y_train, w_in
print(f'dj_db at initial w,b: {tmp_dj_db}')
print(f'dj_dw at initial w,b: {tmp_dj_dw}')

dj_db at initial w,b: -1.673925169143331e-06


dj_dw at initial w,b: [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05]

Expected Result:
dj_db at initial w,b: -1.6739251122999121e-06
dj_dw at initial w,b: [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05]

5.2 Gradient Descent With Multiple Variables


The routine below implements equation (5) above.
In [26]: def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function,
"""
Performs batch gradient descent to learn w and b. Updates w and b by
num_iters gradient steps with learning rate alpha

Args:
X (ndarray (m,n)) : Data, m examples with n features
y (ndarray (m,)) : target values
w_in (ndarray (n,)) : initial model parameters
b_in (scalar) : initial model parameter
cost_function : function to compute cost
gradient_function : function to compute the gradient
alpha (float) : Learning rate
num_iters (int) : number of iterations to run gradient descent

Returns:
w (ndarray (n,)) : Updated values of parameters
b (scalar) : Updated value of parameter
"""

# An array to store cost J and w's at each iteration primarily for gr


J_history = []
w = copy.deepcopy(w_in) #avoid modifying global w within function
b = b_in

for i in range(num_iters):

# Calculate the gradient and update the parameters


dj_db,dj_dw = gradient_function(X, y, w, b) ##None

# Update Parameters using w, b, alpha and gradient


w = w - alpha * dj_dw ##None
b = b - alpha * dj_db ##None

file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 10/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized

# Save cost J at each iteration


if i<100000: # prevent resource exhaustion
J_history.append( cost_function(X, y, w, b))

# Print cost every at intervals 10 times or as many iterations if


if i% math.ceil(num_iters / 10) == 0:
print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f} ")

return w, b, J_history #return final w,b and J history for graphing

In the next cell you will test the implementation.


In [27]: # initialize parameters
initial_w = np.zeros_like(w_init)
initial_b = 0.
# some gradient descent settings
iterations = 1000
alpha = 5.0e-7
# run gradient descent
w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w,
compute_cost_vectoriz
alpha, iterations)
print(f"b,w found by gradient descent: {b_final:0.2f},{w_final} ")
m,_ = X_train.shape
for i in range(m):
print(f"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, tar

Iteration 0: Cost 2529.46


Iteration 100: Cost 695.99
Iteration 200: Cost 694.92
Iteration 300: Cost 693.86
Iteration 400: Cost 692.81
Iteration 500: Cost 691.77
Iteration 600: Cost 690.73
Iteration 700: Cost 689.71
Iteration 800: Cost 688.70
Iteration 900: Cost 687.69
b,w found by gradient descent: -0.00,[ 0.2 0. -0.01 -0.07]
prediction: 426.19, target value: 460
prediction: 286.17, target value: 232
prediction: 171.47, target value: 178

Expected Result:
b,w found by gradient descent: -0.00,[ 0.2 0. -0.01 -0.07]
prediction: 426.19, target value: 460
prediction: 286.17, target value: 232
prediction: 171.47, target value: 178
In [28]: # plot cost versus iteration
fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12
ax1.plot(J_hist)
ax2.plot(100 + np.arange(len(J_hist[100:])), J_hist[100:])
ax1.set_title("Cost vs. iteration"); ax2.set_title("Cost vs. iteration (
ax1.set_ylabel('Cost') ; ax2.set_ylabel('Cost')
ax1.set_xlabel('iteration step') ; ax2.set_xlabel('iteration step')
plt.show()

file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 11/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized

These results are not inspiring! Cost is still declining and our predictions are not very
accurate. The next module will explore how to improve on this.

file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 12/12

You might also like