0% found this document useful (0 votes)
12 views61 pages

M1 LinearRegression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views61 pages

M1 LinearRegression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Linear Regression

Anshu Pandey

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
• A Supervised Learning Algorithm
that learns from a set of training
samples

• It estimates relationship between


a dependent variable
(target/label) and one or more
independent variable
(predictors).

WHAT IS LINEAR REGRESSION?


www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Linear Regression
Univariate
Linear
Regression
Multivariat
e Linear
Regression
Polynomial
Linear
Regression
www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
• During the training
period the regression
line is getting more fit.

Univariate Linear Regression


www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Housing Prices
Prediction

Price in Lakh (INR)


Area Price
( sq ft) In INR

1200 20,00,000

1800 42,00,000

3200 44,00,000

3800 25,00,000 Area in 1000 sq. feet

4200 62,00,000

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
y: Dependent Variable, criterion variable, or regressand.
Housing Prices x: Independent variable, predictor variables or regressors.

Prediction
Area Price

Price in Lakh (INR)


( sq ft) (x) In INR (y)

1200 20,00,000

1800 42,00,000

3200 44,00,000

3800 25,00,000 Area in 1000 sq. feet

4200 62,00,000

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Housing Prices
Prediction Linear Regression in one Variable

Area Price
( sq ft) In INR

Price in Lakh (INR)


1200 20,00,000

1800 42,00,000

3200 44,00,000

3800 25,00,000

4200 62,00,000 Area in 1000 sq. feet

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Housing Prices
Prediction

Price in Lakh (INR)


Area in 1000 sq. feet

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Variables affecting Regression Equation

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Housing Prices
Prediction

Price in Lakh (INR)


Area in 1000 sq. feet

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Regression Equation:

Parameters

Cost Function:

Goal

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Gradient Descent Algorithm

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Gradient Descent Algorithm Linear Regression Model

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Univariate
Linear
Regression
Linear Regression Process Visualization

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
• Establish If there is a relationship between two
variables.
Examples – relationship between housing process
Objective of and area of house, no of hours of study and the
marks obtained, income and spending etc.
Linear
• Prediction of new possible values
Regression Based on the area of house predicting the house
prices in a particular month; based on number of
hour studied predicting the possible marks. Sales
in next 3months etc.

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
LINEAR REGRESSION USE CASES

Real Estate • To model residential home prices as a function of the home's living
area, bathrooms, number of bedrooms, lot size.

• To analyze the effect of a proposed radiation treatment on


Medicine reducing tumor sizes based on patient attributes such as age or
weight.

Demand Forecasting • To predict demand for goods and services. For example, restaurant
chains can predict the quantity of food depending on weather.

• To predict company’s sales based on previous month’s sales and


Marketing stock prices of a company.

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Programming with Python

Simple Linear Regression


www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Import the libraries

import numpy
import matplotlib as plt
Import pandas

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Import dataset

dataset=pandas.read_csv(‘salary_data.csv’)
X=[Link][:,:-1].values
Y=[Link][:,1].values

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Import dataset

dataset=pandas.read_csv(‘salary_data.csv’)
X=[Link][:,:-1].values
Y=[Link][:,1].values

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Train test split

from sklearn.model_selection import train_test_split


xtrain,xtest,ytrain,ytest =
train_test_split(X,y,test_size=0.2,random_state=0)

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Simple Linear Regression

from sklearn import linear_model


alg = linear_model.LinearRegression()
[Link](xtrain,ytrain)

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Predicting the test results

ypred=[Link](xtest)

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Visualizing the training results
[Link](xtrain,ytrain,‘g’)
[Link](xtrain,[Link](xtrain),’r’)
[Link](“Training set”)
[Link](“Experience”)
[Link](“Salary”)
[Link]()

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Visualizing the test results
[Link](xtest,ytest,‘g’)
[Link](xtest,[Link](xtest),’r’)
[Link](“Test set”)
[Link](“Experience”)
[Link](“Salary”)
[Link]()

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Test Score (Accuracy on test data)
accuracy=[Link](xtest,ytest)
print(accuracy)

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Coefficient and intercept value
#for printing coefficient
alg.coef_
# for printing intercept value
alg.intercept_

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Performance Analysis
from [Link] import mean_squared_error, r2_score
# The mean squared error
print("Mean squared error: %.2f"%mean_squared_error(ytest,ypred))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % r2_score(ytest, ypred))

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Programming with Python

Multivariate Linear Regression


www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
One Hot Encoding
When some inputs are categories (e.g. gender) rather than numbers (e.g.
age) we need to represent the category values as numbers so they can be
used in our linear regression equations.

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Dummy Variables Dummy Variables

Salary Credit Score Age State Californi


New York
192,451 485 42 New York a
118,450 754 35 California 1 0
258,254 658 28 California 0 1
200,123 755 48 New York 0 1
152,485 654 52 California 1 0
0 1

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Encoding Categorical Data
from [Link] import LabelEncoder
from [Link] import OneHotEncoder
labelencoder = LabelEncoder()
#considering X is dataset from above slide
# 3 is the index number of state
X[:, 3] = labelencoder.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [3])
X = onehotencoder.fit_transform(X).toarray()

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Avoiding the Dummy variable trap
• X=X[:,1:]

• NOTE : if you have n dummy variables remove one dummy variable to


avoid the dummy variable trap. However the linear regression model that
is built in R and Python takes care of this. But there is no harm in
removing it by ourselves

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Feature Scaling

Standardization Normalization

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Standard Scale using sklearn
from [Link] import StandardScaler
sc_x = StandardScaler()
sc_y = StandardScaler()
X_std = sc_x.fit_transform(X)
y_std = sc_y.fit_transform(y)

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Boston housing data
In [1]: boston = pd.read_csv('[Link]’)
In [2]: print([Link]()
CRIM ZN INDUS CHAS NX RM AGE
DIS RAD TAX \
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1
296.0
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2
242.0
2 0.02729
PTRATIO B 0.0 7.07
LSTAT 0 MEDV 0.469 7.185 61.1 4.9671 2
0 242.0
15.3 396.90 4.98 24.0
31 0.03237
17.8 396.90 0.0
9.14 2.18
21.6 0 0.458 6.998 45.8 6.0622 3
2 222.0
17.8 392.83 4.03 34.7
43 0.06905
18.7 394.63 0.0
2.94 2.18
33.4 0 0.458 7.147 54.2 6.0622 3
4 222.0
18.7 396.90 5.33 36.2

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Creating feature and target arrays
In [3]: X = [Link]('MEDV', axis=1).values
In [4]: y = boston['MEDV'].values

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Predicting house value from a single feature
In [5]: X_rooms = X[:,5]
In [6]: type(X_rooms), type(y)
Out[6]: ([Link], [Link])
In [7]: y = [Link](-1, 1)
In [8]: X_rooms = X_rooms.reshape(-1, 1)

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Plotting house value vs. number of rooms
In [9]: [Link](X_rooms, y)
In [10]: [Link]('Value of house /1000 ($)’)
In [11]: [Link]('Number of rooms’)
In [12]: [Link]()

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Plotting house value vs. number of rooms

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Fitting a regression model
In [13]: from numpy import linspace
In [14]: from sklearn import linear_model
In [15]: alg = linear_model.LinearRegression()
In [16]: [Link](X_rooms, y)
In [17]: k=linspace(min(X_rooms),max(X_rooms)).reshape(-1,1)
In [18]: [Link](X_rooms, y, color='blue’)
In [19]: [Link](k, [Link](k),'b', linewidth=3)
In [20]: [Link]()

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Fitting a regression model

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Linear regression on all features
In [1]: from sklearn.model_selection import train_test_split
In [2]: X_train, X_test, y_train, y_test = train_test_split(X,
y,test_size = 0.3, random_state=42)
In [3]: alg2 = linear_model.LinearRegression()
In [4]: [Link](X_train, y_train)
In [5]: y_pred = [Link](X_test)
In [6]: [Link](X_test, y_test)
Out[6]: 0.71122600574849526

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Cross Validation

Split 1 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Metric 1


Split 2 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Metric 2
Split 3 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Metric 3
Split 4 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Metric 4
Split 5 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Metric 5

Test Set

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Cross-validation and model performance

• 5 folds = 5-fold CV
• 10 folds = 10-fold CV
• k folds = k-fold CV
• More folds = More computationally expensive

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Cross-validation in scikit-learn
In [1]: from sklearn.model_selection import cross_val_score
In [2]: alg = linear_model.LinearRegression()
In [3]: cv_results = cross_val_score(alg, X, y, cv=5)
In [4]: print(cv_results)
[ 0.63919994 0.71386698 0.58702344 0.07923081 -0.25294154]
In [5]: [Link](cv_results)
Out[5]: 0.35327592439587058

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Overfitting & Generalisation
• As we train our model with more and more data the it may start to fit
the training data more and more accurately, but become worse at
handling test data that we feed to it later.
• This is known as “over-fitting” and results in an increased generalization
error.

• Large coefficients lead to overfitting


• Penalizing large coefficients: Regularization

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
How to minimize?
• To minimize the generalization error we should

• Collect as much sample data as possible.

• Use a random subset of our sample data for training.

• Use the remaining sample data to test how well our model copes
with data it was not trained with.

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
L1 Regularisation (Lasso)
(Least Absolute Shrinkage and Selection Operator)

• Having a large number of samples (n) with respect to the number of


dimensionality (d) increases the quality of our model.
• One way to reduce the effective number of dimensions is to use those that
most contribute to the signal and ignore those that mostly act as noise.
• L1 regularization achieves this by adding a penalty that results in the
weight for the dimensions that act as noise becoming 0.
• L1 regularisation encourages a sparse vector of weights in which few are
non-zero and many are zero.

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
L1 Regularisation (Lasso)
• Depending on the regularization strength, certain weights can become zero,
which makes the LASSO also useful as a supervised feature selection technique:

• A limitation of the LASSO is that it selects at most n variables if m > n.

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Lasso regression in scikit-learn
In [1]: from sklearn.linear_model import Lasso
In [2]: X_train, X_test, y_train, y_test = train_test_split(X,
y,test_size = 0.3, random_state=42)
In [3]: lasso = Lasso(alpha=0.1, normalize=True)
In [4]: [Link](X_train, y_train)
In [5]: lasso_pred = [Link](X_test)
In [6]: [Link](X_test, y_test)
Out[6]: 0.59502295353285506

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
L2 Regularisation (Ridge)
• Another way to reduce the complexity of our model and prevent
overfitting to outliers is L2 regression, which is also known as ridge
regression.

• In L2 Regularization we introduce an additional term to the cost function


that has the effect of penalizing large weights and thereby minimizing
this skew.

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
L2 Regularisation (Ridge)
• Ridge regression is an L2 penalized model where we simply add the squared
sum of the weights to our least-squares cost function:

• By increasing the value of the hyperparameter λ , we increase the regularization


strength and shrink the weights of our model.

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Ridge regression in scikit-learn
In [1]: from sklearn.linear_model import Ridge
In [2]: X_train, X_test, y_train, y_test = train_test_split(X,
y,test_size = 0.3, random_state=42)
In [3]: ridge = Ridge(alpha=0.1, normalize=True)
In [4]: [Link](X_train, y_train)
In [5]: ridge_pred = [Link](X_test)
In [6]: [Link](X_test, y_test)
Out[6]: 0.69969382751273179

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
L1 & L2 Regularisation (Elastic Net)
• • L1 Regularisation minimises the impact of dimensions that have low weights
and are thus largely “noise”.

• • L2 Regularisation minimise the impacts of outliers in our training data.

• • L1 & L2 Regularisation can be used together and the combination is referred


to as Elastic Net regularisation.

• • Because the differential of the error function contains the sigmoid which has
no inverse, we cannot solve for w and must use gradient descent.

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Lasso regression for feature selection

• Can be used to select important features of a dataset

• Shrinks the coefficients of less important features to exactly 0.

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Lasso regression for feature selection
In [1]: from sklearn.linear_model import Lasso
In [2]: names = [Link]('MEDV', axis=1).columns
In [3]: lasso = Lasso(alpha=0.1)
In [4]: lasso_coef = [Link](X, y).coef_
In [5]: [Link](range(len(names)), lasso_coef)
In [6]: [Link](range(len(names)), names, rotation=60)
In [7]: [Link]('Coefficients')
In [8]: [Link]()

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Lasso regression for feature selection

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
Practice Datasets

[Link]

[Link]

[Link]

[Link]

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216
For more information or to set up an appointment, please contact us today.
jointact@[Link]

www. [Link] • jointact@[Link] • 18008338228 +65 31586636 +1(973) 598-3969 +44 203-808-4216

You might also like