0% found this document useful (0 votes)
6 views19 pages

Logistic Regression

Logistic Regression is a probabilistic classification method that predicts the likelihood of an event occurring based on historical data. It utilizes the sigmoid function to model the probability of binary outcomes and is optimized using maximum likelihood estimation and gradient ascent. Regularization techniques can be applied to mitigate overfitting, especially in high-dimensional datasets.

Uploaded by

gorakhnathhome
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views19 pages

Logistic Regression

Logistic Regression is a probabilistic classification method that predicts the likelihood of an event occurring based on historical data. It utilizes the sigmoid function to model the probability of binary outcomes and is optimized using maximum likelihood estimation and gradient ascent. Regularization techniques can be applied to mitigate overfitting, especially in high-dimensional datasets.

Uploaded by

gorakhnathhome
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Logistic Regression

1
Probabilistic classification
• Most real-life prediction scenarios with discreet
outputs, such as Yes/No, are probabilistic:
• Will you carry an umbrella if it is raining?
• Will you carry an umbrella if it is sunny?
• Will you carry an umbrella if it drizzles?
• Logistic Regression gives the probability of an
event occurring, given historical data to train-test
the model

2
Y1
Probability of carrying Umbrella

Carry Umbrella (history) Y2


YES

NO

Probability of Rain X

N Y Historical data (Y2 Axis)

Prediction (Y1 Axis)


Decide a Threshold to Classify Decision Boundary

Y1
Probability of carrying Umbrella
P(Y=1|X)= 0.8

Carry Umbrella? Y2
YES
P(Y=1|X)= 0.5
P(Y=1|X)= 0.3
NO

Probability of Rain X

N Y Historical data (Y2 Axis)

Prediction (Y1 Axis)


Logistic Regression

w are the adjustable weight parameters

• This is the Sigmoid function

5
A Comparison

• Predicted value not limited between 0 and 1


• Predicted and actual outputs have same
units
• Constant slope = w 1
• Used for regression
• Predicted output is a probability,
• Predicted output is unitless
• Slope varies from 0 to max at
centre
• Used typically for Classification
Midpoint & Slope

7
Performance Tallies

False -ve True +ve

True +ve False +ve

8
Log odds or Logit
• Assume there are two classes, y = 0 and y = 1 and

• Odds:

• Log Odds:

• That is, the log odds of class 1 w.r.t class 2, is a linear function
of x

9
Model Fitting
Let p1 be P(y=1|x,w)

Sequence n: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Actual Data y: 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1
Prediction p: p1 p1 p1 1-p1 1-p1 1-p1 p1 p1 p1 p1 p1 1-p1 1-p1 1-p1 1-p1 p1 p1 p1

10
Likelihood of a match?
Let p be P(y=1|x,w)
1

Sequence n: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Actual Data y: 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1
Prediction p: p1 p1 p1 1-p1 1-p1 1-p1 p1 p1 p1 p1 p1 1-p1 1-p1 1-p1 1-p1 p1 p1 p1

Likelihood of a match? Note yn can be either 1 or 0


Log Likelihood:

11
Training
• Maximum Likehood Estimation MLE.

• Note:
• Here xl and yl are pre-determined from training data.
• Intercept w0, and coefficients wi calculated so as to maximize
probability
• So, how many w should we try out?

12
Computing the Log-Likelihood

• We can re-express the log of the conditional


likelihood as:

• Need to maximize l(w)

13
Fitting LogR by Gradient Ascent
• Unfortunately, there is no closed form solution to maximizing l(w)
with respect to w. Therefore, one common approach is to use
gradient ascent
• The i th component of the vector gradient has the form

14
Fitting LogR by Gradient Ascent
• Use standard gradient ascent to optimize w. Begin
with initial weights = zero

15
Regularization in Logistic Regression
• Overfitting the training data is a problem that can arise
in Logistic Regression, especially when data has very
high dimensions and is sparse.

• One approach to reducing overfitting is regularization,


in which we create a modified “penalized log likelihood
function,” which penalizes large values of w.

( )
16
Regularization in Logistic Regression
• The derivative of this penalized log likelihood function is similar to our
earlier derivative, with one additional penalty term

• which gives us the modified gradient descent rule

17
Summary of Logistic Regression
• Learns the Conditional Probability Distribution P(y|x)
• Local Search.
• Begins with initial weight vector.
• Modifies it iteratively to maximize an objective function.
• The objective function is the conditional log likelihood of the data – so the
algorithm seeks the probability distribution P(y|x) that is most likely given the
data.

18
What you should know LogR
• In general, NB and LR make different assumptions
• NB: Features independent given class -> assumption on P(X|Y)
• LR: Functional form of P(Y|X), no assumption on P(X|Y)
• LogR can be used as a linear classifier
• decision rule is a hyperplane
• LogR optimized by conditional likelihood
• no closed-form solution
• concave -> global optimum with gradient ascent

19

You might also like