Module 2
Linear Regression
Logistic regression
Classification
8/12/2023 1
Machine Learning Algorithms
Supervised Machine Learning
1.Regression
2.Classification
Unsupervised Machine Learning
1.Clustering
2.Association
Reinforcement Learning
8/12/2023 2
Supervised Learning
Regression
Regression is usually termed as determining
relationship(s) between two or more variables.
Given the following values of X and Y, what is the value
of Y when X = 5
(1,1), (2,2), (4,4), (50,50), (20, 20)
(1,1), (2,4), (4,16), (100,10000), (20, 400)
8/12/2023 3
Regression
Regression analysis is a part of predictive modeling
technique which investigates the relationship between
dependent(target) and
independent(Explonatory/Predicator)
Simple Linear Multiple Linear
Regression Regression
8/12/2023 4
Simple Linear Regression
Finding relationship between two variables
The idea is to find the line that best fits the data
Basic Terminologies
-Independent variable
-Dependent variable
-Regression line
-Error
8/12/2023 5
Simple Linear Regression
8/12/2023 6
8/12/2023 7
Problem Statement
Last year, five randomly selected students took a math
aptitude test before they began their statistics course.
The Statistics Department has three questions.
What linear regression equation best predicts statistics
performance, based on math aptitude scores?
If a student made an 80 on the aptitude test, what
grade would we expect her to make in statistics?
8/12/2023 8
8/12/2023 9
8/12/2023 10
8/12/2023 11
8/12/2023 12
8/12/2023 13
Apply linear regression to following data and predict Y for X = 10
Sr no TT1(X) TT2(Y)
1 25 16
2 24 23
3 15 16
4 14 20
5 20 12
6 18 13
7 22 23
8 14 16
8/12/2023 14
8/12/2023 15
Problem Statement
Rent of the property is related to its area.Given area in
square feet and rent in dollars.Find the relationship
between area and rent using the concept of linear
regression.Also predict the rent for a property of area
790
8/12/2023 16
Sr No Area Rent
1 340 500
2 1080 1700
3 640 1100
4 880 800
5 990 1400
6 510 500
8/12/2023 17
Problem Statement
I want to buy home with 1000 sq ft area,how much it
will cost me?
y=mx+c
x=Area (sqft)
y=Total Cost (Rs)
x=1000 what will be the value of y=?
8/12/2023 18
Linear Regression
Dataset: Real Estate
Total records:92
Columns:2
Fit a Line
1000 TotalCost
8/12/2023 19
Implementation
Simple Linear Regression
Multiple Regression
8/12/2023 20
8/12/2023 21
8/12/2023 22
8/12/2023 23
8/12/2023 24
8/12/2023 25
8/12/2023 26
8/12/2023 27
8/12/2023 28
Model Evaluation
MAE
MSE
RMSE
8/12/2023 29
MAE is the absolute differences between actual and
predicted values
8/12/2023 30
MAE is the absolute differences between actual and
predicted values
8/12/2023 31
8/12/2023 32
8/12/2023 33
8/12/2023 34
8/12/2023 35
8/12/2023 36
8/12/2023 37
8/12/2023 38
8/12/2023 39
Model evaluation
Confusion Matrix
8/12/2023 40
8/12/2023 41
Accuracy
Recall
Precision
F-Score
Specificity
8/12/2023 42
8/12/2023 43
Logistic Regression
8/12/2023 44
8/12/2023 45
8/12/2023 46
It is a statistical classification model
Deals with categorical dependent variables
Could be binary or dichotomous
Could be multinomial
Takes both continuous and discrete input data
8/12/2023 47
Define the variable
Plot labelled data
Draw Regression line
Find out best using MLE
8/12/2023 48
Define the variable
Independent variable
Count of spam words
Most common spam words
Buy
Get paid
Guarantee
Winner
Unlimited
Dependent Variable:
Probability of email being spam(1,0)
8/12/2023 49
8/12/2023 50
8/12/2023 51
8/12/2023 52
Need to find regression curve that would best fit
That would be our logistic curve
But how do we find the best fit
8/12/2023 53
How
Convert Y axis from 1 and 0 to log odds
Draw random regression line
With the help of sigmoid function covert log odds to
probability
From this find individual likelihood, log
likelyhood,probability
8/12/2023 54
8/12/2023 55
What are the terms ?
8/12/2023 56
8/12/2023 57
8/12/2023 58
8/12/2023 59
8/12/2023 60
8/12/2023 61
8/12/2023 62
8/12/2023 63
Individual likelihood is probability
8/12/2023 64
A:0.084 and B=-0.207
8/12/2023 65
8/12/2023 66