Linear Regression
University of Information Technology
-
Vietnam National University Ho Chi Minh City
1 / 40
New Packages
numpy → very frequently used in ML (python)
Link: [Link]
> > import numpy as np
matplotlib → for visualization
Link: [Link]
> > import [Link] as plt
Generate A Regression Problem
> > from [Link] import make regression
> > X, y = make regression(n samples=500, n features=1,
n informative=1, noise=25, random state=42)
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 3 / 40
Data Visualization
> > [Link](X, y, facecolor=’tab:blue’, edgecolor=’white’, s=70)
[Link](’X’)
[Link](’y’)
[Link]()
4 / 40
5 / 40
Recall (Linear Regression)
Figure: The general concept of Linear Regression
7 / 40
Minimizing cost function with gradient descent
Cost function (Squared Error):
1Σ
J(w ) = (y (i ) − yˆ(i))2 (1)
2 i
Update the weights:
wt+1 := wt + ∆w (2)
∆w = −η∇J(w) (3)
(4)
(5)
8 / 40
Minimizing cost function with gradient descent (cont.)
9 / 40
Pseudocode of the Training Process
Algorithm 1 Gradient Descent
1: Initialize the weights, w
2: while Stopping Criteria is not satisfied do
3: Compute the output value, yˆ
4: Updates the weights
5: Compute the difference between y and yˆ
6: Update the intercept
7: Update the coefficients
8: end while
10 / 40
Components
Hyperparameters
eta (float): the initial learning rate
max iter (int): the maximum number of iterations
random state (int)
Parameters
w (list/array): the weight values
costs (list/array): the list containing the cost values over iterations
Methods
fit(X , y)
predict(X )
11 / 40
Implement (code from scratch)
class LinearRegression GD:
def init (self, eta = 0.001, max iter = 20, random state = 42):
[Link] = eta
[Link] iter = max iter
[Link] state = random state
self.w = None
[Link] = [ ]
def predict(self, X):
return [Link](X, self.w[1:]) + self.w[0]
12 / 40
’fit’ method
def fit(self, X, y):
rgen = [Link]([Link] state)
self.w = [Link](loc = 0.0, scale = 0.01, size = 1 + [Link][1])
[Link] = [ ]
for n iters in range([Link] iter):
y pred = [Link](X)
diff = y - y pred
self.w[0] += [Link] * [Link](diff)
for j in range([Link][1]): / / j ← [0, 1, ..., [Link][1]]
delta = 0.0
for i in range([Link][0]): / / i ← [0, 1, ..., [Link][0]]
delta += [Link] * diff[i] * X[i][j]
self.w[j + 1] += delta
cost = [Link](diff ** 2) / 2
[Link](cost)
13 / 40
’fit’ method (2)
def fit(self, X, y):
rgen = [Link]([Link] state)
self.w = [Link](loc = 0.0, scale = 0.01, size = 1 + [Link][1])
[Link] = [ ]
for n iters in range ([Link] iter):
y pred = [Link](X)
diff = y - y pred
self.w[0] += [Link] * [Link](diff)
self.w[1:] += [Link] * [Link](X.T, diff)
cost = [Link](diff ** 2) / 2
[Link](cost)
14 / 40
Train Model
Gradient Descent
> > reg GD = LinearRegression GD(eta=0.001, max iter=20,
random state=42)
reg [Link](X, y)
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 15 / 40
Visualize the trend in the cost values (Gradient Descent)
> > [Link](range(1, len(reg [Link]) + 1), reg [Link])
[Link](’Epochs’)
[Link](’Cost’)
[Link](’Gradient Descent’)
[Link]()
16 / 40
17 / 40
Visualize on Data
> > [Link](X, y, facecolor=’tab:blue’, edgecolor=’white’, s=70)
[Link](X, reg [Link](X), color=’green’, lw=6, label=’Gradient
Descent’)
[Link](’X’)
[Link](’y’)
[Link]()
[Link]()
18 / 40
19 / 40
Weight values
> > w GD = reg GD.w
w GD
> > [-0.9794002, 63.18592509]
20 / 40
Implement (package)
Stochastic Gradient Descent
from [Link] model import SGDRegressor
Hyperparameters Parameters Methods
eta0
intercept fit(X, y)
max iter
coef predict(X)
random state
21 / 40
Implement (package) (cont.)
Normal Equation
from sklearn.linear_model import LinearRegression
Parameters Methods
intercept_ fit(X, y)
coef_ predict(X)
22 / 40
Differences
Gradient Descent
w := w + ∆w
Σ
∆w = η i (y (i) − yˆ(i))xi
Stochastic Gradient Descent
w := w + ∆w
∆w = η(y (i ) − yˆ(i))xi
Normal Equation
w = (X T X ) −1 X T y
23 / 40
Practice (cont.)
Stochastic Gradient Descent
> > from [Link] model import SGDRegressor
> > reg_SGD = SGDRegressor(eta0=0.001, max iter=20,
random state=42, learning rate=’constant’)
reg_SGD.fit(X, y)
Normal Equation
> > from [Link] model import LinearRegression
> > reg_NE = LinearRegression()
reg_NE.fit(X, y)
24 / 40
Weight Values Comparisons
Gradient Descent (ours)
> > w GD = reg GD.w
w GD
> > [-0.9794002, 63.18592509]
Stochastic Gradient Descent
> > w SGD = [Link](reg [Link] , reg [Link] )
w SGD
> > [-1.02681553, 63.08630288]
Normal Equation
> > w NE = [Link](reg [Link] , reg [Link] )
w NE
> > [-0.97941333, 63.18605572]
25 / 40
Visualize on Data (all)
> > [Link](X, y, facecolor=’tab:blue’, edgecolor=’white’, s=70)
[Link](X, reg [Link](X), color=’green’, lw=6, label=’Gradient
Descent’)
[Link](X, reg [Link](X), color=’black’, lw=4,
label=’Stochastic Gradient Descent’)
[Link](X, reg [Link](X), color=’orange’, lw=2, label=’Normal
Equation’)
[Link](’X’)
[Link](’y’)
[Link]()
[Link]()
Machine Learning
27 / 40
Performance Evaluation
Mean Absolute Error (MAE)
1Σ
MAE (y, yˆ) = |y (i ) − yˆ(i)| (6)
n
i
Mean Squared Error (MSE)
1 Σ (y (i ) − yˆ(i))2
MSE(y, yˆ) = (7)
n
i
R-Squared (R2) Σ
(y (i ) − yˆ(i) )2
R2 (y, yˆ) = 1 − Σi (8)
i
(y (i ) − y)2
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 28 / 40
Performance Evaluation
> > from [Link] import mean absolute error as MAE
from [Link] import mean squared error as MSE
from [Link] import r2 score as R2
> > y pred GD = reg [Link](X)
> > y pred SGD = reg [Link](X)
> > y pred NE = reg [Link](X)
29 / 40
Performance Evaluation (cont.)
Mean Absolute Error
> > print(’MAE of GD:’, round(MAE(y, y_pred_GD), 6))
print(’MAE of SGD:’, round(MAE(y, y_pred_SGD), 6))
print(’MAE of NE:’, round(MAE(y, y_ pred_NE), 6))
Mean Squared Error
> > print(’MSE of GD:’, round(MSE(y, y_pred_GD), 6))
print(’MSE of SGD:’, round(MSE(y, y_pred_SGD),
6)) print(’MSE of NE:’, round(MSE(y, y_pred_NE),
6))
R2 score
> > print(’R2 of GD:’, round(R2(y, y_pred_GD), 6))
print(’R2 of SGD:’, round(R2(y, y_pred_SGD), 6))
print(’R2 of NE:’, round(R2(y, y_pred_NE), 6))
30 / 40
Run Gradient Descent with lr = 0.005
31 / 40
Polynominal Regression
Example
X = [258.0, 270.0, 294.0, 320.0, 342.0, 368.0, 396.0, 446.0, 480.0, 586.0]
y = [236.4, 234.4, 252.8, 298.6, 314.2, 342.2, 360.8, 368.0, 391.2, 390.8]
> > X = [Link]([258.0, 270.0, 294.0, 320.0, 342.0, 368.0, 396.0, 446.0,
480.0, 586.0])[:, [Link]]
y = [Link]([236.4, 234.4, 252.8, 298.6, 314.2, 342.2, 360.8, 368.0,
391.2, 390.8])
> > [Link](X, y, label=’Training points’)
[Link](’X’)
[Link](’y’)
[Link]()
[Link]()
32 / 40
Visualize data
33 / 40
Experiment with Linear Regression
> > from [Link] model import LinearRegression
lr = LinearRegression()
[Link](X, y)
Machine Learning October 7, 2022 34 / 40
Experiment with Linear Regression (cont.)
35 / 40
Experiment with Polynominal Regression
Syntax
from [Link] import PolynomialFeatures
> > from [Link] import PolynomialFeatures
quadratic = PolynomialFeatures(degree=2)
X quad = [Link] transform(X)
pr = LinearRegression()
[Link](X quad, y)
36 / 40
Experiment with Polynominal Regression (cont.)
37 / 40
> > X test = [Link](250, 600, 10)[:, [Link]]
> > y pred linear = [Link](X test)
y pred quad = [Link]([Link] transform(X test))
> > [Link](X, y, label=’Training points’)
[Link](’X’)
[Link](’y’)
[Link](X test, y pred linear, label=’Linear fit’, c=’black’)
[Link](X test, y pred quad, label=’Quadratic fit’, c=’orange’)
[Link]()
[Link]()
38 / 40
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 39 / 40
Practice
Dataset: ’Boston Housing’ ([Link]) (14 attributes: 13
independent variables + 1 target variable)
File: boston [Link]
40 / 40