0% found this document useful (0 votes)
6 views

ML 03 Logistic Regression

Uploaded by

Mrs.SANTHOSHI A
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

ML 03 Logistic Regression

Uploaded by

Mrs.SANTHOSHI A
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

CS 60050

Machine Learning

Classification: Logistic Regression

Some slides taken from course materials of Andrew Ng


Classification

Email: Spam / Not Spam?


Online Transactions: Fraudulent / Genuine?
Tumor: Malignant / Benign ?

0: “Negative Class” (e.g., benign tumor)


1: “Positive Class” (e.g., malignant tumor)

Andrew Ng
(Yes) 1

Malignant ?

(No) 0
Tumor Size

Can we solve the problem using linear regression?

Andrew Ng
(Yes) 1

Malignant ?

(No) 0
Tumor Size

Can we solve the problem using linear regression? E.g., fit


a straight line and define a threshold at 0.5
Threshold classifier output at 0.5:
If , predict “y = 1”
If , predict “y = 0”
Andrew Ng
(Yes) 1

Malignant ?

(No) 0
Tumor Size

Can we solve the problem using linear regression? E.g., fit


a straight line and define a threshold at 0.5
Threshold classifier output at 0.5:
Failure due to
If , predict “y = 1” adding a new point
If , predict “y = 0”
Andrew Ng
Classification: y = 0 or 1 Another drawback
of using linear
regression for this
can be > 1 or < 0 problem

What we need:

Logistic Regression:

Andrew Ng
Logistic Regression Model
Want

Sigmoid function 0

Logistic function
A useful property: easy to compute differential at any point
Andrew Ng
Interpretation of Hypothesis Output
= estimated probability that y = 1 on input x

“probability that y = 1, given x,


parameterized by ”

Example: If
Tell patient that 70% chance of tumor being malignant

Andrew Ng
Logistic regression

Suppose predict “ “ if

predict “ “ if

Andrew Ng
Logistic regression

Suppose predict “ “ if When 𝛩Tx ≥ 0

predict “ “ if When 𝛩Tx < 0

Andrew Ng
Separating two classes of points
• We are attempting to separate two given sets /
classes of points
• Separate two regions of the feature space
• Concept of Decision Boundary
• Finding a good decision boundary => learn
appropriate values for the parameters 𝛩

Andrew Ng
Decision Boundary
x2
3
2

1 2 3 x1

Andrew Ng
Decision Boundary
x2
3
2

1 2 3 x1

Predict if

How to get the parameter values – will be discussed soon

Andrew Ng
Non-linear decision boundaries
x2

-1 1 x1
-1 We can learn more complex decision
boundaries where the hypothesis function
contains higher order terms.

(remember polynomial regression)

Andrew Ng
Non-linear decision boundaries
x2

-1 1 x1

Predict if
-1

How to get the parameter values – will be discussed soon

Andrew Ng
Cost function for
Logistic Regression
How to get the parameter values?
Training set:

m examples

How to choose parameters ?


Andrew Ng
Cost function

Linear regression:

Squared error cost function:

However this cost function is non-convex for the hypothesis of


logistic regression.

Andrew Ng
Logistic regression cost function

Cost

Andrew Ng
Logistic regression cost function

Andrew Ng
Logistic regression cost function

Andrew Ng
Logistic regression cost function

This cost function is convex


To fit parameters :

Andrew Ng
Gradient Descent

Want :
Repeat

(simultaneously update all )

Andrew Ng
Gradient Descent

Want :
Repeat

(simultaneously update all )

Algorithm looks identical to linear regression, but the hypothesis


function is different for logistic regression.
Andrew Ng
Thus we can gradient descent to learn parameter values, and
hence compute for a new input:

To make a prediction given new :

Output

= estimated probability that y = 1 on input x

Andrew Ng
How to use the estimated probability?
• Refraining from classifying unless confident
• Ranking items
• Multi-class classification

Andrew Ng
Multi-class classification:
one vs. all
Multiclass classification
News article tagging: Politics, Sports, Movies, Religion, …

Medical diagnosis: Not ill, Cold, Flu, Fever

Weather: Sunny, Cloudy, Rain, Snow

Andrew Ng
Binary classification: Multi-class classification:

x2 x2

x1 x1
Andrew Ng
x2
One-vs-all (one-vs-rest):

x1
x2 x2

x1 x1
x2
Class 1:
Class 2:
Class 3:
x1
Andrew Ng
One-vs-all

Train a logistic regression classifier for each


class to predict the probability that .

On a new input , to make a prediction, pick the


class that maximizes

Andrew Ng
Advanced Optimization algorithms (not part of this course)

Optimization algorithms:
- Gradient descent
- Conjugate gradient
- BFGS
- L-BFGS
Advantages of the other algorithms:
- No need to manually pick learning rate
- Often converges faster than gradient descent
Disadvantages:
- More complex
Andrew Ng

You might also like