0% found this document useful (0 votes)
8 views40 pages

LinearRegression Tutorial

Uploaded by

Hieu Le
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views40 pages

LinearRegression Tutorial

Uploaded by

Hieu Le
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Linear Regression

University of Information Technology


-
Vietnam National University Ho Chi Minh City

1 / 40
New Packages

numpy → very frequently used in ML (python)


Link: [Link]

> > import numpy as np

matplotlib → for visualization


Link: [Link]

> > import [Link] as plt


Generate A Regression Problem

> > from [Link] import make regression


> > X, y = make regression(n samples=500, n features=1,
n informative=1, noise=25, random state=42)

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 3 / 40
Data Visualization

> > [Link](X, y, facecolor=’tab:blue’, edgecolor=’white’, s=70)


[Link](’X’)
[Link](’y’)
[Link]()

4 / 40
5 / 40
Recall (Linear Regression)

Figure: The general concept of Linear Regression

7 / 40
Minimizing cost function with gradient descent

Cost function (Squared Error):



J(w ) = (y (i ) − yˆ(i))2 (1)
2 i

Update the weights:


wt+1 := wt + ∆w (2)
∆w = −η∇J(w) (3)

(4)

(5)

8 / 40
Minimizing cost function with gradient descent (cont.)

9 / 40
Pseudocode of the Training Process

Algorithm 1 Gradient Descent


1: Initialize the weights, w
2: while Stopping Criteria is not satisfied do
3: Compute the output value, yˆ
4: Updates the weights
5: Compute the difference between y and yˆ
6: Update the intercept
7: Update the coefficients
8: end while

10 / 40
Components

Hyperparameters
eta (float): the initial learning rate
max iter (int): the maximum number of iterations
random state (int)

Parameters
w (list/array): the weight values
costs (list/array): the list containing the cost values over iterations

Methods
fit(X , y)
predict(X )

11 / 40
Implement (code from scratch)

class LinearRegression GD:


def init (self, eta = 0.001, max iter = 20, random state = 42):
[Link] = eta
[Link] iter = max iter
[Link] state = random state
self.w = None
[Link] = [ ]

def predict(self, X):


return [Link](X, self.w[1:]) + self.w[0]

12 / 40
’fit’ method

def fit(self, X, y):


rgen = [Link]([Link] state)
self.w = [Link](loc = 0.0, scale = 0.01, size = 1 + [Link][1])
[Link] = [ ]
for n iters in range([Link] iter):
y pred = [Link](X)
diff = y - y pred
self.w[0] += [Link] * [Link](diff)
for j in range([Link][1]): / / j ← [0, 1, ..., [Link][1]]
delta = 0.0
for i in range([Link][0]): / / i ← [0, 1, ..., [Link][0]]
delta += [Link] * diff[i] * X[i][j]
self.w[j + 1] += delta
cost = [Link](diff ** 2) / 2
[Link](cost)

13 / 40
’fit’ method (2)

def fit(self, X, y):


rgen = [Link]([Link] state)
self.w = [Link](loc = 0.0, scale = 0.01, size = 1 + [Link][1])
[Link] = [ ]
for n iters in range ([Link] iter):
y pred = [Link](X)
diff = y - y pred
self.w[0] += [Link] * [Link](diff)
self.w[1:] += [Link] * [Link](X.T, diff)
cost = [Link](diff ** 2) / 2
[Link](cost)

14 / 40
Train Model

Gradient Descent
> > reg GD = LinearRegression GD(eta=0.001, max iter=20,
random state=42)
reg [Link](X, y)

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 15 / 40
Visualize the trend in the cost values (Gradient Descent)

> > [Link](range(1, len(reg [Link]) + 1), reg [Link])


[Link](’Epochs’)
[Link](’Cost’)
[Link](’Gradient Descent’)
[Link]()

16 / 40
17 / 40
Visualize on Data

> > [Link](X, y, facecolor=’tab:blue’, edgecolor=’white’, s=70)


[Link](X, reg [Link](X), color=’green’, lw=6, label=’Gradient
Descent’)
[Link](’X’)
[Link](’y’)
[Link]()
[Link]()

18 / 40
19 / 40
Weight values

> > w GD = reg GD.w


w GD
> > [-0.9794002, 63.18592509]

20 / 40
Implement (package)

Stochastic Gradient Descent


from [Link] model import SGDRegressor

Hyperparameters Parameters Methods


eta0
intercept fit(X, y)
max iter
coef predict(X)
random state

21 / 40
Implement (package) (cont.)

Normal Equation
from sklearn.linear_model import LinearRegression

Parameters Methods
intercept_ fit(X, y)
coef_ predict(X)

22 / 40
Differences

Gradient Descent
w := w + ∆w
Σ
∆w = η i (y (i) − yˆ(i))xi

Stochastic Gradient Descent


w := w + ∆w
∆w = η(y (i ) − yˆ(i))xi

Normal Equation
w = (X T X ) −1 X T y

23 / 40
Practice (cont.)

Stochastic Gradient Descent


> > from [Link] model import SGDRegressor
> > reg_SGD = SGDRegressor(eta0=0.001, max iter=20,
random state=42, learning rate=’constant’)
reg_SGD.fit(X, y)

Normal Equation

> > from [Link] model import LinearRegression


> > reg_NE = LinearRegression()
reg_NE.fit(X, y)

24 / 40
Weight Values Comparisons

Gradient Descent (ours)


> > w GD = reg GD.w
w GD
> > [-0.9794002, 63.18592509]

Stochastic Gradient Descent


> > w SGD = [Link](reg [Link] , reg [Link] )
w SGD
> > [-1.02681553, 63.08630288]

Normal Equation
> > w NE = [Link](reg [Link] , reg [Link] )
w NE
> > [-0.97941333, 63.18605572]
25 / 40
Visualize on Data (all)

> > [Link](X, y, facecolor=’tab:blue’, edgecolor=’white’, s=70)


[Link](X, reg [Link](X), color=’green’, lw=6, label=’Gradient
Descent’)
[Link](X, reg [Link](X), color=’black’, lw=4,
label=’Stochastic Gradient Descent’)
[Link](X, reg [Link](X), color=’orange’, lw=2, label=’Normal
Equation’)
[Link](’X’)
[Link](’y’)
[Link]()
[Link]()

Machine Learning
27 / 40
Performance Evaluation

Mean Absolute Error (MAE)



MAE (y, yˆ) = |y (i ) − yˆ(i)| (6)
n
i

Mean Squared Error (MSE)


1 Σ (y (i ) − yˆ(i))2
MSE(y, yˆ) = (7)
n
i

R-Squared (R2) Σ
(y (i ) − yˆ(i) )2
R2 (y, yˆ) = 1 − Σi (8)
i
(y (i ) − y)2

Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 28 / 40
Performance Evaluation

> > from [Link] import mean absolute error as MAE


from [Link] import mean squared error as MSE
from [Link] import r2 score as R2

> > y pred GD = reg [Link](X)

> > y pred SGD = reg [Link](X)

> > y pred NE = reg [Link](X)

29 / 40
Performance Evaluation (cont.)

Mean Absolute Error


> > print(’MAE of GD:’, round(MAE(y, y_pred_GD), 6))
print(’MAE of SGD:’, round(MAE(y, y_pred_SGD), 6))
print(’MAE of NE:’, round(MAE(y, y_ pred_NE), 6))

Mean Squared Error


> > print(’MSE of GD:’, round(MSE(y, y_pred_GD), 6))
print(’MSE of SGD:’, round(MSE(y, y_pred_SGD),
6)) print(’MSE of NE:’, round(MSE(y, y_pred_NE),
6))
R2 score
> > print(’R2 of GD:’, round(R2(y, y_pred_GD), 6))
print(’R2 of SGD:’, round(R2(y, y_pred_SGD), 6))
print(’R2 of NE:’, round(R2(y, y_pred_NE), 6))

30 / 40
Run Gradient Descent with lr = 0.005

31 / 40
Polynominal Regression

Example
X = [258.0, 270.0, 294.0, 320.0, 342.0, 368.0, 396.0, 446.0, 480.0, 586.0]
y = [236.4, 234.4, 252.8, 298.6, 314.2, 342.2, 360.8, 368.0, 391.2, 390.8]

> > X = [Link]([258.0, 270.0, 294.0, 320.0, 342.0, 368.0, 396.0, 446.0,
480.0, 586.0])[:, [Link]]
y = [Link]([236.4, 234.4, 252.8, 298.6, 314.2, 342.2, 360.8, 368.0,
391.2, 390.8])

> > [Link](X, y, label=’Training points’)


[Link](’X’)
[Link](’y’)
[Link]()
[Link]()

32 / 40
Visualize data

33 / 40
Experiment with Linear Regression

> > from [Link] model import LinearRegression


lr = LinearRegression()
[Link](X, y)

Machine Learning October 7, 2022 34 / 40


Experiment with Linear Regression (cont.)

35 / 40
Experiment with Polynominal Regression

Syntax
from [Link] import PolynomialFeatures

> > from [Link] import PolynomialFeatures


quadratic = PolynomialFeatures(degree=2)
X quad = [Link] transform(X)
pr = LinearRegression()
[Link](X quad, y)

36 / 40
Experiment with Polynominal Regression (cont.)

37 / 40
> > X test = [Link](250, 600, 10)[:, [Link]]

> > y pred linear = [Link](X test)


y pred quad = [Link]([Link] transform(X test))

> > [Link](X, y, label=’Training points’)


[Link](’X’)
[Link](’y’)
[Link](X test, y pred linear, label=’Linear fit’, c=’black’)
[Link](X test, y pred quad, label=’Quadratic fit’, c=’orange’)
[Link]()
[Link]()

38 / 40
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 39 / 40
Practice

Dataset: ’Boston Housing’ ([Link]) (14 attributes: 13


independent variables + 1 target variable)

File: boston [Link]

40 / 40

You might also like