0% found this document useful (0 votes)

355 views17 pages

Understanding Linear Regression Methods

The document discusses linear regression using the least squares method. It explains the concept, provides the formula to calculate the slope and y-intercept, and shows an example calculation. It also demonstrates implementing linear regression in Python by predicting salary based on years of experience using real-world data.

Uploaded by

saeed wedyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

355 views17 pages

Understanding Linear Regression Methods

Uploaded by

saeed wedyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Experiment 1: Linear regression

Objectives:
1- Comprehend the concept of linear regression using the least square
method.
2- Understand the code in Python used for linear regression.
3- Run real-world applications in Python.

Introduction:
Least Square Method
The least square method is the process of finding a regression line or best-fitted
line for any data set that is described by an equation. This method requires
reducing the sum of the squares of the residual parts of the points from the curve
or line and the trend of outcomes is found quantitatively. The method of curve
fitting is seen while regression analysis and the fitting equations to derive the
curve is the least square method.
Least Square Method Definition
The least-squares method is a statistical method used to find the line of best fit of
the form of an equation such as y = mx + b to the given data. The curve of the
equation is called the regression line. Our main objective in this method is to
reduce the sum of the squares of errors as much as possible. This is the reason
this method is called the least-squares method. This method is often used in data
fitting where the best-fit result is assumed to reduce the sum of squared errors
that are considered to be the difference between the observed values and
corresponding fitted value. The sum of squared errors helps in finding the
variation in observed data. For example, we have 4 data points and using this
method we arrive at the following graph.
Figure 1

The two basic categories of least-square problems are ordinary or linear least
squares and nonlinear least squares.
Limitations for Least Square Method
Even though the least-squares method is considered the best method to find the
line of best fit, it has a few limitations. They are:

• This method exhibits only the relationship between the two variables.
All other causes and effects are not taken into consideration.
• This method is unreliable when data is not evenly distributed.
• This method is very sensitive to outliers. This can skew the results of
the least-squares analysis.
Least Square Method Graph
Look at the graph below, the straight line shows the potential relationship
between the independent variable and the dependent variable. The ultimate goal
of this method is to reduce this difference between the observed response and
the response predicted by the regression line. Less residual means that the
model fits better. The data points need to be minimized by the method of
reducing residuals of each point from the line. There are vertical residuals and
perpendicular residuals. Vertical is mostly used in polynomials and hyperplane
problems while perpendicular is used in general as seen in the image below.
Figure 2
Least Square Method Formula
Least-square method is the curve that best fits a set of observations with a
minimum sum of squared residuals or errors. Let us assume that the given points
of data are (x1, y1), (x2, y2), (x3, y3), …, (xn, yn) in which all x’s are independent
variables, while all y’s are dependent ones. This method is used to find
a linear line of the form y = mx + b, where y and x are variables, m is the slope,
and b is the y-intercept. The formula to calculate slope m and the value of b is
given by:
m = (n∑xy - ∑y∑x)/n∑x2 - (∑x)2

b = (∑y - m∑x)/n

Here, n is the number of data points.

Following are the steps to calculate the least square using the above formulas.

• Step 1: Draw a table with 4 columns where the first two columns are for x
and y points.
• Step 2: In the next two columns, find xy and (x)2.
• Step 3: Find ∑x, ∑y, ∑xy, and ∑(x)2.
• Step 4: Find the value of slope m using the above formula.
• Step 5: Calculate the value of b using the above formula.
• Step 6: Substitute the value of m and b in the equation y = mx + b

Let us look at an example to understand this better.

Example: Let's say we have data as shown below.

x 1 2 3 4 5

y 2 5 3 8 7

Solution: We will follow the steps to find the linear line.

x y xy x2

1 2 2 1

2 5 10 4

3 3 9 9

4 8 32 16

5 7 35 25

∑x =15 ∑y = 25 ∑xy = 88 ∑x2 = 55

Find the value of m by using the formula,

m = (n∑xy - ∑y∑x)/n∑x2 - (∑x)2

m = [(5×88) - (15×25)]/(5×55) - (15)2

m = (440 - 375)/(275 - 225)

m = 65/50 = 13/10

Find the value of b by using the formula,

b = (∑y - m∑x)/n

b = (25 - 1.3×15)/5

b = (25 - 19.5)/5

b = 5.5/5

So, the required equation of least squares is y = mx + b = 13/10x + 5.5/5.

Important Notes
• The least-squares method is used to predict the behavior of the
dependent variable concerning the independent variable.
• The sum of the squares of errors is called variance.
• The main aim of the least-squares method is to minimize the sum of the
squared errors.

Implementing the least-squares method using Python:

# Linear Regression implementation using Numpy
import matplotlib.pyplot as plt
import numpy as np
x=np.array([1,2,3,4,5,6])
y=np.array([2,2,4,4,6,6])
plt.scatter(x,y)

n=len(x)
xy=0
sumx=0
sumy=0
sq_sumx=0
xy=np.sum(x*y)
sumx=np.sum(x)
sumy=np.sum(y)
sq_sumx=np.sum(x*x)
b=(n*xy-sumx*sumy)/(n*sq_sumx-sumx**2)
print('b= ',b)
a=(sumy-b*sumx)/n
print('a= ',a)

print('The linear Regression equation is \ny=',a,'+',b,'x')

import matplotlib.pyplot as plt from scipy import stats
slope =b
intercept=a
def myfunc(x):
return slope * x + intercept
mymodel = list(map(myfunc, x))
plt.plot(x, mymodel)
plt.show()

Implementing regression on the real-world data set:

The below graph explains the relation between Salary and Years of Experience

Equation : y = mx + c

This is the simple linear regression equation where c is the constant and m is

the slope and describes the relationship between x (independent

variable) and y (dependent variable). The coefficient can be positive or negative

and is the degree of change in the dependent variable for every 1 unit of change in

the independent variable.

β0 (y-intercept) and β1 (slope) are the coefficients whose values represent the
accuracy of predicted values with the actual values.
Implement Simple Linear Regression in Python

In this example, we will use the salary data concerning the experience of

employees. In this dataset, we have two columns YearsExperience and Salary

Step 1: Import the required python packages

We need Pandas for data manipulation, NumPy for mathematical calculations,

and MatplotLib, and Seaborn for visualizations. Sklearn libraries are used for

machine learning operations

# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from pandas.core.common import random_state
from sklearn.linear_model import LinearRegression

Step 2: Load the dataset

Download the dataset from here and upload it to your notebook and read it into

the pandas dataframe.

# Get dataset
df_sal = pd.read_csv('/content/Salary_Data.csv')
df_sal.head()
Step 3: Data analysis

Now that we have our data ready, let's analyze and understand its trend in detail.

To do that we can first describe the data below -

# Describe data
df_sal.describe()

Here, we can see Salary ranges from 37731 to 122391 and a median of 65237.

We can also find how the data is distributed visually using Seaborn distplot

# Data distribution
plt.title('Salary Distribution Plot')
sns.distplot(df_sal['Salary'])
plt.show()

A distplot or distribution plot shows the variation in the data distribution.

It represents the data by combining a line with a histogram.

Then we check the relationship between Salary and Experience -

# Relationship between Salary and Experience

plt.scatter(df_sal['YearsExperience'], df_sal['Salary'], color
= 'lightcoral')
plt.title('Salary vs Experience')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.box(False)
plt.show()
It is clearly visible now, our data varies linearly. That means, that an individual

receives more Salary as they gain Experience.

Step 4: Split the dataset into dependent/independent variables

Experience (X) is the independent variable

Salary (y) is dependent on experience

# Splitting variables
X = df_sal.iloc[:, :1] # independent
y = df_sal.iloc[:, 1:] # dependent

Step 4: Split data into Train/Test sets

Further, split your data into training (80%) and test (20%) sets

using train_test_split
# Splitting dataset into test/train
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size = 0.2, random_state = 0)

Step 5: Train the regression model

Pass the X_train and y_train data into the regressor model by regressor.fit to

train the model with our training data.

# Regressor model
regressor = LinearRegression()
regressor.fit(X_train, y_train)

Step 6: Predict the result

Here comes the interesting part, when we are all set and ready to predict any value

of y (Salary) dependent on X (Experience) with the trained model

using regressor.predict

# Prediction result
y_pred_test = regressor.predict(X_test) # predicted value
of y_test
y_pred_train = regressor.predict(X_train) # predicted value
of y_train

Step 7: Plot the training and test results

Its time to test our predicted results by plotting graphs

• Plot training set data vs predictions

First we plot the result of training sets (X_train,

y_train) with X_train and predicted value

of y_train (regressor.predict(X_train))

# Prediction on training set

plt.scatter(X_train, y_train, color = 'lightcoral')
plt.plot(X_train, y_pred_train, color = 'firebrick')
plt.title('Salary vs Experience (Training Set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.legend(['X_train/Pred(y_test)', 'X_train/y_train'], title
= 'Sal/Exp', loc='best', facecolor='white')
plt.box(False)
plt.show()

• Plot test set data vs predictions

Secondly, we plot the result of test sets (X_test,

y_test) with X_train and predicted value of y_train

(regressor.predict(X_train))
# Prediction on test set
plt.scatter(X_test, y_test, color = 'lightcoral')
plt.plot(X_train, y_pred_train, color = 'firebrick')
plt.title('Salary vs Experience (Test Set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.legend(['X_train/Pred(y_test)', 'X_train/y_train'], title
= 'Sal/Exp', loc='best', facecolor='white')
plt.box(False)
plt.show()

We can see, in both plots, the regressor line covers train and test data.

Also, you can plot results with the predicted value of y_test

(regressor.predict(X_test)) but the regression line would remain the same at

it is generated from the unique equation of linear regression with the same

training data.
If you remember from the beginning of this article, we discussed the linear

equation y = mx + c, we can also get the c (y-

intercept) and m (slope/coefficient) from the regressor model.

# Regressor coefficients and intercept

print(f'Coefficient: {regressor.coef_}')
print(f'Intercept: {regressor.intercept_}')

The fully implemented code:

# Import libraries

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from pandas.core.common import random_state

from sklearn.linear_model import LinearRegression

df_sal = pd.read_csv('D:\datasets/Salary_Data.csv')

df_sal.head()

# Describe data

df_sal.describe()

# Data distribution

plt.title('Salary Distribution Plot')

sns.distplot(df_sal['Salary'])

plt.show()

# Relationship between Salary and Experience

plt.scatter(df_sal['YearsExperience'], df_sal['Salary'], color = 'lightcoral')

plt.title('Salary vs Experience')

plt.xlabel('Years of Experience')

plt.ylabel('Salary')

plt.box(False)

plt.show()

# Splitting variables

X = df_sal.iloc[:, :1] # independent

y = df_sal.iloc[:, 1:] # dependent

# Splitting dataset into test/train

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state =

# Regressor model

regressor = LinearRegression()

regressor.fit(X_train, y_train)

# Prediction result

y_pred_test = regressor.predict(X_test) # predicted value of y_test

y_pred_train = regressor.predict(X_train) # predicted value of y_train

# Prediction on training set

plt.scatter(X_train, y_train, color = 'lightcoral')

plt.plot(X_train, y_pred_train, color = 'firebrick')

plt.title('Salary vs Experience (Training Set)')

plt.xlabel('Years of Experience')

plt.ylabel('Salary')

plt.legend(['X_train/Pred(y_test)', 'X_train/y_train'], title = 'Sal/Exp', loc='best',

facecolor='white')

plt.box(False)

plt.show()

# Prediction on test set

plt.scatter(X_test, y_test, color = 'lightcoral')

plt.plot(X_train, y_pred_train, color = 'firebrick')

plt.title('Salary vs Experience (Test Set)')

plt.xlabel('Years of Experience')

plt.ylabel('Salary')

plt.legend(['X_train/Pred(y_test)', 'X_train/y_train'], title = 'Sal/Exp', loc='best',

facecolor='white')

plt.box(False)

plt.show()

# Regressor coefficients and intercept

print(f'Coefficient: {regressor.coef_}')

print(f'Intercept: {regressor.intercept_}')
TASK:
implement linear regression on one of the below datasets:
1. Cancer linear regression
2. CDC data: nutrition, physical activity, obesity
3. Fish market dataset for regression
4. Medical insurance costs
5. New York Stock Exchange dataset
6. OLS regression challenge
7. Real estate price prediction
8. Red wine quality
9. Vehicle dataset from CarDekho
10. WHO statistics on life expectancy

E104 - Superposition Theorem and Linearity
No ratings yet
E104 - Superposition Theorem and Linearity
19 pages
Midterm 1 Practice Solutions
No ratings yet
Midterm 1 Practice Solutions
12 pages
Cross-Validation & Bootstrapping Guide
No ratings yet
Cross-Validation & Bootstrapping Guide
17 pages
Course Code: Course Title TPC Version No. Course Pre-Requisites/ Co-Requisites Anti-Requisites (If Any)
No ratings yet
Course Code: Course Title TPC Version No. Course Pre-Requisites/ Co-Requisites Anti-Requisites (If Any)
4 pages
Numerical Methods in Engineering Math
No ratings yet
Numerical Methods in Engineering Math
6 pages
Lumped vs. Distributed Systems PDF
No ratings yet
Lumped vs. Distributed Systems PDF
2 pages
AC Waveforms and Phasors Guide
No ratings yet
AC Waveforms and Phasors Guide
29 pages
ECE 606 Exam 2 Solutions Spring 2013
No ratings yet
ECE 606 Exam 2 Solutions Spring 2013
7 pages
Lecture - 3 Probability Theory
No ratings yet
Lecture - 3 Probability Theory
25 pages
UEE001 Electrical Engineering Syllabus
No ratings yet
UEE001 Electrical Engineering Syllabus
33 pages
Special Distributions
No ratings yet
Special Distributions
30 pages
Basic Numerical Method Using Scilab-ID2069
No ratings yet
Basic Numerical Method Using Scilab-ID2069
8 pages
Last 6 Year Question Papers
No ratings yet
Last 6 Year Question Papers
17 pages
Power System Impedance Diagrams
No ratings yet
Power System Impedance Diagrams
9 pages
Power System Analysis and Design EE-461: Tassawar Kazmi Lecturer, EE Department, Seecs, Nust
No ratings yet
Power System Analysis and Design EE-461: Tassawar Kazmi Lecturer, EE Department, Seecs, Nust
10 pages
Numerical Methods
No ratings yet
Numerical Methods
1 page
Lecture #4 - Simplification of Boolean Functions
No ratings yet
Lecture #4 - Simplification of Boolean Functions
56 pages
Understanding Regression and Classification
No ratings yet
Understanding Regression and Classification
26 pages
Newton Forward Backward Interpolation
No ratings yet
Newton Forward Backward Interpolation
4 pages
Real-Time Air Quality Monitoring With Edge AI and Machine Learning Algorithm
No ratings yet
Real-Time Air Quality Monitoring With Edge AI and Machine Learning Algorithm
6 pages
Tellegen's Theorem Verification Manual
No ratings yet
Tellegen's Theorem Verification Manual
7 pages
Understanding Statistical Moments
No ratings yet
Understanding Statistical Moments
26 pages
Numerical Methods for Engineering Solutions
No ratings yet
Numerical Methods for Engineering Solutions
118 pages
Lecture Notes Interpolation and Data Fitting
No ratings yet
Lecture Notes Interpolation and Data Fitting
16 pages
ELECTRICAL CIRCUITS NOTES FINAL
No ratings yet
ELECTRICAL CIRCUITS NOTES FINAL
64 pages
Experiment1 Octave
No ratings yet
Experiment1 Octave
8 pages
Chapter 1-2
No ratings yet
Chapter 1-2
30 pages
Eeg 823
No ratings yet
Eeg 823
71 pages
Electric Circuits: Resistance & Power
No ratings yet
Electric Circuits: Resistance & Power
38 pages
Matlab
100% (2)
Matlab
83 pages
Cubic Polynomial via Newton's Method
No ratings yet
Cubic Polynomial via Newton's Method
7 pages
Grid State Estimation Basics
No ratings yet
Grid State Estimation Basics
34 pages
MATLAB for Complex Arithmetic
No ratings yet
MATLAB for Complex Arithmetic
9 pages
Introduction to Numerical Methods
No ratings yet
Introduction to Numerical Methods
9 pages
1preliminary Pages
No ratings yet
1preliminary Pages
3 pages
MATLAB Functions for Probability Analysis
No ratings yet
MATLAB Functions for Probability Analysis
22 pages
ML Course Outline
No ratings yet
ML Course Outline
4 pages
Python
No ratings yet
Python
10 pages
15MA102 Advanced Calculus and Complex Analysis PDF
No ratings yet
15MA102 Advanced Calculus and Complex Analysis PDF
2 pages
Control System MATLAB Manual
No ratings yet
Control System MATLAB Manual
69 pages
Module 3 - Ckt-Expr Analysis-Sop-Pos
No ratings yet
Module 3 - Ckt-Expr Analysis-Sop-Pos
35 pages
MHD Power Generation Overview
No ratings yet
MHD Power Generation Overview
14 pages
Emi All Units PDF
No ratings yet
Emi All Units PDF
381 pages
Solving Equations and Eigenvalue Problems
No ratings yet
Solving Equations and Eigenvalue Problems
28 pages
Electronic Principles Overview
No ratings yet
Electronic Principles Overview
12 pages
NAME: Ma. Clarissa C. Marasigan Section: CS11S1 MATLAB ACTIVITY 1-Fundamentals of MATLAB
No ratings yet
NAME: Ma. Clarissa C. Marasigan Section: CS11S1 MATLAB ACTIVITY 1-Fundamentals of MATLAB
3 pages
Homework 7 Solutions
No ratings yet
Homework 7 Solutions
4 pages
Matlab Symbolic Editor & Control Toolbox Guide
No ratings yet
Matlab Symbolic Editor & Control Toolbox Guide
62 pages
Assignment On Bisection Method GIVEN ON 06/10/2020: Program
No ratings yet
Assignment On Bisection Method GIVEN ON 06/10/2020: Program
13 pages
Elementary ODEs Review & Methods
No ratings yet
Elementary ODEs Review & Methods
31 pages
Chapter 10 PDF
100% (1)
Chapter 10 PDF
127 pages
21A - Charge and Polarization
No ratings yet
21A - Charge and Polarization
7 pages
Newton-Raphson Method in Python
No ratings yet
Newton-Raphson Method in Python
6 pages
Numerical Differentiation PDF
No ratings yet
Numerical Differentiation PDF
28 pages
Unit-1 - Matlab Programming - Question Bank - Solution
No ratings yet
Unit-1 - Matlab Programming - Question Bank - Solution
25 pages
Calculus & Differential Equations Guide
No ratings yet
Calculus & Differential Equations Guide
19 pages
Unit 2
No ratings yet
Unit 2
26 pages
A) The Least-Squares Method
No ratings yet
A) The Least-Squares Method
19 pages
Mohit Final REASEARCH PAPER
No ratings yet
Mohit Final REASEARCH PAPER
20 pages
FDSA Unit V LECTURE NOTS
No ratings yet
FDSA Unit V LECTURE NOTS
28 pages
Perceptron Learning Algorithm in Python
No ratings yet
Perceptron Learning Algorithm in Python
15 pages
Module 2 Threat Management and Cybersecurity Resources
No ratings yet
Module 2 Threat Management and Cybersecurity Resources
47 pages
Perceptron Learning with Python
No ratings yet
Perceptron Learning with Python
10 pages
Introduction to Network Security Fundamentals
No ratings yet
Introduction to Network Security Fundamentals
40 pages
Endpoint Security and Threat Intelligence Guide
100% (1)
Endpoint Security and Threat Intelligence Guide
42 pages
Module 3 Threats and Attacks On Endpoints
No ratings yet
Module 3 Threats and Attacks On Endpoints
42 pages
Module 5 Mobile, Embedded, and Specialized Device Security
100% (2)
Module 5 Mobile, Embedded, and Specialized Device Security
38 pages
Diode Logic Circuits
No ratings yet
Diode Logic Circuits
2 pages
C++ Control Structures Guide
No ratings yet
C++ Control Structures Guide
134 pages
Classic PC Games Compatibility Guide
No ratings yet
Classic PC Games Compatibility Guide
14 pages
M.Sc. Computer Applications Syllabus 2023
No ratings yet
M.Sc. Computer Applications Syllabus 2023
36 pages
AI Learning Plan
No ratings yet
AI Learning Plan
3 pages
LLK-B Coupler Spare Parts Catalogue
No ratings yet
LLK-B Coupler Spare Parts Catalogue
6 pages
Service Tech cs4336 cs4342 cs4354
100% (1)
Service Tech cs4336 cs4342 cs4354
90 pages
Color TV & VCR Service Manual MV13K3C
No ratings yet
Color TV & VCR Service Manual MV13K3C
70 pages
Term Project Report
No ratings yet
Term Project Report
32 pages
PCB Machine Setup for Engineers
No ratings yet
PCB Machine Setup for Engineers
20 pages
Cybersecurity Threats & Mitigation
No ratings yet
Cybersecurity Threats & Mitigation
53 pages
Language Translator Project
No ratings yet
Language Translator Project
11 pages
IoT Uses in Oil Gas Industry 1646190315
No ratings yet
IoT Uses in Oil Gas Industry 1646190315
13 pages
TR-369a2 - User Services Platform (USP)
No ratings yet
TR-369a2 - User Services Platform (USP)
213 pages
Daily Security Activities Report - Dec 2022
No ratings yet
Daily Security Activities Report - Dec 2022
8 pages
Evan Williams: Sabela Aurelia Sanjaya Xmipa5 29
No ratings yet
Evan Williams: Sabela Aurelia Sanjaya Xmipa5 29
2 pages
Guide Anssi Secure Admin Is Pa 022 en v2
No ratings yet
Guide Anssi Secure Admin Is Pa 022 en v2
68 pages
Hatf Energy SDN BHD QT-03514
No ratings yet
Hatf Energy SDN BHD QT-03514
1 page
Developer Guide - MT5692SMI-At Commands Rev B
No ratings yet
Developer Guide - MT5692SMI-At Commands Rev B
118 pages
Brochure Teledyne - Dms Motion Sensors
No ratings yet
Brochure Teledyne - Dms Motion Sensors
4 pages
CT Scanning: Principles & Techniques
No ratings yet
CT Scanning: Principles & Techniques
19 pages
Decision Tree - Jupyter Notebook
No ratings yet
Decision Tree - Jupyter Notebook
4 pages
Introduction to Distributed Systems
No ratings yet
Introduction to Distributed Systems
284 pages
Syed Imran
No ratings yet
Syed Imran
17 pages
Internship Report
No ratings yet
Internship Report
8 pages
Nokia RAN Parameters and Recommendations
No ratings yet
Nokia RAN Parameters and Recommendations
1,987 pages
Prodigy Advance
No ratings yet
Prodigy Advance
12 pages
Taskalfa 2551ci
No ratings yet
Taskalfa 2551ci
756 pages
Operating System Real-Time: Jitter
No ratings yet
Operating System Real-Time: Jitter
1 page
Tableros
No ratings yet
Tableros
404 pages
HPE - A00028947en - Us - ArubaOS-Switch and ArubaOS-CX Transceiver Guide (Edition 10)
No ratings yet
HPE - A00028947en - Us - ArubaOS-Switch and ArubaOS-CX Transceiver Guide (Edition 10)
98 pages
Create An Ad Templates - Template
No ratings yet
Create An Ad Templates - Template
7 pages

Understanding Linear Regression Methods

Uploaded by

Understanding Linear Regression Methods

Uploaded by

Experiment 1: Linear regression

Here, n is the number of data points.

Let us look at an example to understand this better.

Solution: We will follow the steps to find the linear line.

∑x =15 ∑y = 25 ∑xy = 88 ∑x2 = 55

Find the value of m by using the formula,

m = (n∑xy - ∑y∑x)/n∑x2 - (∑x)2

m = (440 - 375)/(275 - 225)

Find the value of b by using the formula,

So, the required equation of least squares is y = mx + b = 13/10x + 5.5/5.

Implementing the least-squares method using Python:

print('The linear Regression equation is \ny=',a,'+',b,'x')

Implementing regression on the real-world data set:

the slope and describes the relationship between x (independent

the independent variable.

employees. In this dataset, we have two columns YearsExperience and Salary

Step 1: Import the required python packages

We need Pandas for data manipulation, NumPy for mathematical calculations,

machine learning operations

Step 2: Load the dataset

the pandas dataframe.

To do that we can first describe the data below -

A distplot or distribution plot shows the variation in the data distribution.

It represents the data by combining a line with a histogram.

Then we check the relationship between Salary and Experience -

# Relationship between Salary and Experience

receives more Salary as they gain Experience.

Step 4: Split the dataset into dependent/independent variables

Experience (X) is the independent variable

Salary (y) is dependent on experience

Step 4: Split data into Train/Test sets

Step 5: Train the regression model

train the model with our training data.

Step 6: Predict the result

of y (Salary) dependent on X (Experience) with the trained model

Step 7: Plot the training and test results

Its time to test our predicted results by plotting graphs

First we plot the result of training sets (X_train,

y_train) with X_train and predicted value

# Prediction on training set

• Plot test set data vs predictions

Secondly, we plot the result of test sets (X_test,

y_test) with X_train and predicted value of y_train

(regressor.predict(X_test)) but the regression line would remain the same at

equation y = mx + c, we can also get the c (y-

intercept) and m (slope/coefficient) from the regressor model.

# Regressor coefficients and intercept

The fully implemented code:

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from pandas.core.common import random_state

from sklearn.linear_model import LinearRegression

plt.title('Salary Distribution Plot')

# Relationship between Salary and Experience

plt.scatter(df_sal['YearsExperience'], df_sal['Salary'], color = 'lightcoral')

X = df_sal.iloc[:, :1] # independent

y = df_sal.iloc[:, 1:] # dependent

# Splitting dataset into test/train

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state =

y_pred_test = regressor.predict(X_test) # predicted value of y_test

y_pred_train = regressor.predict(X_train) # predicted value of y_train

plt.scatter(X_train, y_train, color = 'lightcoral')

plt.plot(X_train, y_pred_train, color = 'firebrick')

plt.title('Salary vs Experience (Training Set)')

plt.legend(['X_train/Pred(y_test)', 'X_train/y_train'], title = 'Sal/Exp', loc='best',

# Prediction on test set

plt.scatter(X_test, y_test, color = 'lightcoral')

plt.plot(X_train, y_pred_train, color = 'firebrick')

plt.title('Salary vs Experience (Test Set)')

plt.legend(['X_train/Pred(y_test)', 'X_train/y_train'], title = 'Sal/Exp', loc='best',

# Regressor coefficients and intercept

You might also like