02 Multiple Regression Gradient Vectorized
02 Multiple Regression Gradient Vectorized
02 - Multiple Regression
In this module, you will extend the data structures and previously developed routines
to support multiple features. Several routines are updated making the module appear
lengthy, but it makes minor adjustments to previous routines making it quick to
review.
Outline
1.1 Goals
1.2 Tools
1.3 Notation
2 Problem Statement
2.1 Matrix X containing our examples
2.2 Parameter vector w, b
3 Model Prediction With Multiple Variables
3.1 Single Prediction element by element
3.2 Single Prediction, vector
4 Compute Cost With Multiple Variables
5 Gradient Descent With Multiple Variables
5.1 Compute Gradient with Multiple Variables
5.2 Gradient Descent With Multiple Variables
1.1 Goals
Extend our regression model routines to support multiple features
Extend data structures to support multiple features
Rewrite prediction, cost and gradient routines to support multiple features
Utilize NumPy np.dot to vectorize their implementations for speed and
simplicity
1.2 Tools
In this lab, we will make use of:
NumPy, a popular library for scientific computing
Matplotlib, a popular library for plotting data
In [1]: import copy, math
import numpy as np
import matplotlib.pyplot as plt
file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 1/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized
plt.style.use('./deeplearning.mplstyle')
np.set_printoptions(precision=2) # reduced display precision on numpy ar
1.3 Notation
Here is a summary of some of the notation you will encounter, updated for multiple
features.
|General Notation | Description | Python (if applicable) | |:------------|:---------------
---------------------------------------------|:----------| | | scalar, non bold || | | a a
matrix | X_train |
| | training example targets | y_train | , | Training Example | X[i] ,
y x
(i)
y
(i)
ith
fw,b (x
(i)
) = w ⋅ x | f_wb |
(i)
+ b
2 Problem Statement
You will use the motivating example of housing price prediction. The training dataset
contains three examples with four features (size, bedrooms, floors and, age) shown
in the table below. Note that size is in sqft rather than 1000 sqft. This causes an
issue, which you will solve in the next notebook!
Size Number of Number of Age of Price (1000s
(sqft) Bedrooms floors Home dollars)
2104 5 1 45 460
1416 3 2 40 232
852 2 1 35 178
You will build a linear regression model using these values so you can then predict
the price for other houses. For example, a house with 1200 sqft, 3 bedrooms, 1 floor,
40 years old.
Run the following code cell to create your X_train and y_train variables.
In [3]: X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])
Similar to the table above, examples are stored in a NumPy matrix X_train . Each
row of the matrix represents one example. When you have training examples ( m m
is three in our example), and there are features (four in our example), is a matrix
n X
notation:
x is vector containing example i.
(i)
x
(i)
= (x
(i)
,x
(i)
,⋯,x
(i)
)
(i)
x
⎜ w1 ⎟
w = ⎜ ⎟
⎜ ⋯ ⎟
⎝ ⎠
wn−1
is a scalar parameter.
b
For demonstration, and will be loaded with some initial selected values that are
w b
file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 3/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized
or in vector notation:
fw,b (x) = w ⋅ x + b (2)
To demonstrate the dot product, we will implement prediction using (1) and (2).
Args:
x (ndarray): Shape (n,) example with multiple features
w (ndarray): Shape (n,) model parameters
b (scalar): model parameter
Returns:
p (scalar): prediction
"""
n = x.shape[0]
p = 0
for i in range(n):
p_i = x[i] * w[i]
p = p + p_i
p = p + b
return p
file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 4/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized
Note the shape of x_vec . It is a 1-D NumPy vector with 4 elements, (4,). The result,
f_wb is a scalar.
Returns:
p (scalar): prediction
"""
p = np.dot(x, w) + b
return p
# make a prediction
f_wb = predict(x_vec,w_init, b_init)
print(f"f_wb shape {f_wb.shape}, prediction: {f_wb}")
The results and shapes are the same as the previous version which used looping.
Going forward, np.dot will be used for these operations. The prediction is now a
single statement. Most routines will implement it directly rather than calling a
separate predict routine.
file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 5/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized
m−1
1
(i) (i) 2
J (w, b) = ∑ (fw,b (x ) − y ) (3)
2m
i=0
where:
(i) (i)
fw,b (x ) = w ⋅ x + b (4)
Below is an implementation of equations (3) and (4). Note that this uses a standard
pattern for this course where a for loop over all m examples is used.
Exercise 1 - Compute Cost - Non-vectorized
Implement the compute_cost_nonvectorized() function, below, according to the
specifications below, including the input parameters and return value (cost). This
function should not make use of any vectorization.
In [11]: def compute_cost_nonvectorized(X, y, w, b):
"""
compute cost
Args:
X (ndarray (m,n)): Data, m examples with n features
y (ndarray (m,)) : target values
w (ndarray (n,)) : model parameters
b (scalar) : model parameter
Returns:
cost (scalar): cost
"""
# WRITE CODE HERE
m = X.shape[0]
total = 0.0
for i in range(m):
f_wb_i = np.dot(X[i], w) + b # prediction for example i
err_i = f_wb_i - y[i] # residual
total += err_i ** 2 # squared error
cost = total / (2 * m)
return cost
In [12]: # Compute and display cost using our pre-chosen optimal parameters.
cost = compute_cost_nonvectorized(X_train, y_train, w_init, b_init)
print(f'Cost at optimal w : {cost}')
Returns:
cost (scalar): cost
"""
# WRITE CODE HERE
m = X.shape[0]
y = y.reshape(-1) # ensure 1-D
preds = X @ w + b # (m,)
errors = preds - y # (m,)
cost = (errors @ errors) / (2 * m) # scalar
return cost
In [14]: # Compute and display cost using our pre-chosen optimal parameters.
cost = compute_cost_vectorized(X_train, y_train, w_init, b_init)
print(f'Cost at optimal w : {cost}')
∂J (w, b)
wj = wj − α for j = 0..n-1 (5)
∂wj
∂J (w, b)
b = b − α
∂b
and where
file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 7/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized
m−1
∂J (w, b) 1
(i) (i) (i)
= ∑ (fw,b (x ) − y )x (6)
j
∂wj m
i=0
m−1
∂J (w, b) 1
(i) (i)
= ∑ (fw,b (x ) − y ) (7)
∂b m
i=0
∂wj
wj
Returns:
dj_db (scalar): The gradient of the cost w.r.t. the parameter
dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameter
"""
# WRITE CODE HERE
m, n = X.shape
y = np.asarray(y).reshape(-1)
w = np.asarray(w).reshape(-1)
if w.size != n: # handle scalar w accidentally p
w = np.full(n, float(w))
dj_db = 0.0
dj_dw = np.zeros(n, dtype=float)
file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 8/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized
for i in range(m):
# prediction for example i (no vectorization)
f_wb_i = b
for j in range(n):
f_wb_i += w[j] * X[i, j]
dj_db += err_i
# inner loop over all features
for j in range(n):
dj_dw[j] += err_i * X[i, j]
dj_db /= m
for j in range(n):
dj_dw[j] /= m
Expected Result:
dj_dw at initial w,b: [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05]
dj_db at initial w,b: -1.6739251122999121e-06
Exercise 4 - Compute Gradient - Vectorized
Implement the compute_cost_vectorized() function, below, according to the
specifications below, including the input parameters and return value (cost). This
function should have a vectorization-based implementation.
In [24]: def compute_gradient_vectorized(X, y, w, b):
"""
Computes the gradient for linear regression
Args:
X (ndarray (m,n)): Data, m examples with n features
y (ndarray (m,)) : target values
w (ndarray (n,)) : model parameters
b (scalar) : model parameter
Returns:
dj_db (scalar): The gradient of the cost w.r.t. the parameter
dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameter
"""
errors = X @ w + b - y # (m,)
file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 9/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized
Expected Result:
dj_db at initial w,b: -1.6739251122999121e-06
dj_dw at initial w,b: [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05]
Args:
X (ndarray (m,n)) : Data, m examples with n features
y (ndarray (m,)) : target values
w_in (ndarray (n,)) : initial model parameters
b_in (scalar) : initial model parameter
cost_function : function to compute cost
gradient_function : function to compute the gradient
alpha (float) : Learning rate
num_iters (int) : number of iterations to run gradient descent
Returns:
w (ndarray (n,)) : Updated values of parameters
b (scalar) : Updated value of parameter
"""
for i in range(num_iters):
file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 10/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized
Expected Result:
b,w found by gradient descent: -0.00,[ 0.2 0. -0.01 -0.07]
prediction: 426.19, target value: 460
prediction: 286.17, target value: 232
prediction: 171.47, target value: 178
In [28]: # plot cost versus iteration
fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12
ax1.plot(J_hist)
ax2.plot(100 + np.arange(len(J_hist[100:])), J_hist[100:])
ax1.set_title("Cost vs. iteration"); ax2.set_title("Cost vs. iteration (
ax1.set_ylabel('Cost') ; ax2.set_ylabel('Cost')
ax1.set_xlabel('iteration step') ; ax2.set_xlabel('iteration step')
plt.show()
file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 11/12
9/7/25, 11:35 PM 02_multiple_regression_gradient_vectorized
These results are not inspiring! Cost is still declining and our predictions are not very
accurate. The next module will explore how to improve on this.
file:///Users/pokeapokemon/Downloads/02_multiple_regression_gradient_vectorized.html 12/12