0% found this document useful (0 votes)
17 views22 pages

Chapter 8 Logistic Regression

Logistic Regression is a binary classification algorithm used for predicting dichotomous outcomes, such as determining if an email is spam or if a loan application will be approved. It models the probability of an event occurring using the logistic function, which transforms probabilities to odds and then takes the logarithm to create a model that is unbounded. The parameters of the model are estimated using maximum likelihood estimation, which maximizes the probability of the observed data.

Uploaded by

prajapatiaryank
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views22 pages

Chapter 8 Logistic Regression

Logistic Regression is a binary classification algorithm used for predicting dichotomous outcomes, such as determining if an email is spam or if a loan application will be approved. It models the probability of an event occurring using the logistic function, which transforms probabilities to odds and then takes the logarithm to create a model that is unbounded. The parameters of the model are estimated using maximum likelihood estimation, which maximizes the probability of the observed data.

Uploaded by

prajapatiaryank
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Logistic Regression

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
1
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
 In linear regression, the response Y is
continuous
 If output is discrete, it is a classification
problem
 E.g. predict whether a person is male or female
based on height.
 Log. Reg. is a binary classification algorithm
used when the response variable is
dichotomous (1 or 0)
 Output : a random variable Yi that take values
(1 and 0) with probabilities pi and 1 − pi ,
respectively
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
 Let p denote probability that Y = 1 when X =
x.
 For linear model to describe p, the model for
the probability would be

p = Pr(Y = 1 | X = x) =β0 + β1x


 Since p is probability it must lie between 0
and 1
 The linear function is unbounded, and hence
cannot be used to model probability.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
 Spam Detection: Predicting if an email is
Spam or not
 Credit Card Fraud: Predicting if a given credit
card transaction is fraud or not
 Health: Predicting if a given mass of tissue is
benign or malignant
 Marketing: Predicting if a given user will buy
an insurance product or not
 Banking: Predicting if a customer will default
on a loan.

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
 The odds of an event is the ratio of the
expected number of times that an event will
occur to the expected number of times it will
not occur.
 If p is the probability of an event and O is the
odds of the event, then

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
 Unlike probabilities, there is no upper bound
on the odds.

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
 Transforming the probability to odds removes
the upper bound.
 If we then take the logarithm of the odds, we
also remove the lower bound.
 Thus, we get the logistic model as

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
 This function is the Logistic Regression
Function.

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
 In linear regression we used the method of
least squares to estimate regression
coefficients.
 In logistic regression we use another
approach called maximum likelihood
estimation.
 The maximum likelihood estimate of a
parameter is that value that maximizes the
probability of the observed data.

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
 Let us consider the example of predicting
whether a home loan application will be
approved based on the credit score of the
applicant

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
 For a continuous independent variable the
odds ratio can be defined as:

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
 Odd’s Ratio = 3.337/3.289 =1.01

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
 Used to estimate regression coefficients
 The maximum likelihood estimate is the value
that maximizes the probability of the
observed data.
 Likelihood function is probability that the
observed values of the dependent variable
may be predicted from the observed values of
the independent variables.
 The likelihood varies from 0 to 1
 It is easier to work with the logarithm of the
likelihood function. This function is known as
the log-likelihood.
“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
 In logistic regression, we observe binary
outcome.
 Suppose in a population, each individual has the
same probability p that an event occurs.
 For sample of size n, Yi =1 indicates that an
event occurs for the ith subject, otherwise, Yi =0.
 The observed data are Y1, . . . ,Yn and X1, . . . , Xn

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
 The joint probability of the data (the
likelihood) is given by

 Natural logarithm of the likelihood is

 In which

“Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.
 Estimating the parameters α and β is done
using the first derivatives of log-likelihood,
and solving them for α and β.
 For this, iterative computing is used.
 An arbitrary value for the coefficients
(usually 0) is first chosen.
 Then log-likelihood is computed and
variation of coefficients values observed.
 Reiteration is then performed until
maximization of L
 The results are the maximum likelihood
estimates of α and β. “Machine Learning”
by Anuradha Srinivasaraghavan & Vincy Joseph
Copyright  2019 Wiley India Pvt. Ltd. All rights reserved.

You might also like