Logistic Regression
“Classification”
Dr.Mohanad Ali Deif
Deference Between
classification, clustering, and regression.
Deference Between
classification, clustering, and regression.
Deference Between
Classification and regression.
function that used to represent
our hypothesis in classification
sigmoid function of hθ(x)
hθ(x) = θ0 + θ1x1 + θ2x2
hθ(x) = g(θ0 + θ1x1 + θ2x2)
sigmoid function
Cost function
Numerical Example
X y
3 1
x z 𝒆−𝐳 𝟏 + 𝒆−(𝑧) 𝐠(𝐳) 𝐲
2 1
1 1 3 0 1 2 0.5 1
𝑒0
5 0
4 0 2 2 1 0.13536335 1.13536335 0.8808 1
6 0 𝑒2
h(x)=-2x+6 1 4 1 0.018323237 1.018323237 0.9820 1
𝑒4
𝑦𝜖 0,1 5 -4 54.57551085 55.57551085 0.0179 0
𝑒4
Estimated probability 4 -2 7.387524 8.387524 0.1192 0
1 𝑒2
𝑔(𝑧) =
1 + 𝑒 −(𝑧) 6 -6 403.1778962 404.1778962 0.00247 0
𝑒6
𝑥 : tumor size when 𝑧 ≥ 0
Then 𝑔(𝑧) ≥ 0.5
𝑧=ℎ 𝑥 when 𝑧 < 0
Then 𝑔(𝑧) < 0.5
𝑝(𝑦 = 1 ∣ 𝑥; 𝜃)
How to minimize
the logistic regression cost function
In Linear regression, to find optimal values for 𝜃 s:
1 2
1. Choose a cost function 𝑗(𝜃) = 2 ∑𝑚𝑖=1 ℎ 𝑥
(𝑖)
− 𝑦 (𝑖)
𝒉(𝒙) = 𝜃 𝑇 𝑥 is a linear function
▪ What if we choose the same cos
function for Logistic regression?
1
𝑔(𝑧) = 𝜃𝑇 𝑥
1 + 𝑒−
𝑧 = ℎ(𝑥)
▪ Can we find a convex cost function ??
𝒋(𝜽) = cost(𝒉(𝒙), 𝒚)
−log(ℎ(𝑥)), if 𝑦 = 1
cost(ℎ(𝑥), 𝑦) = ቊ
−log(1 − ℎ(𝑥)), if 𝑦 = 0
This will give us the convexity
Logistic Regression Cost Function
−log(ℎ(𝑥)), if 𝑦 = 1
cost(ℎ(𝑥), 𝑦) = ቊ
−log(1 − ℎ(𝑥)), if 𝑦 = 0
𝑗(𝜃) = 𝑦 ∗ −log(ℎ(𝑥)) − (1 − 𝑦) ∗ log(1 − ℎ(𝑥))
𝑗(𝜃) = −𝑦log(ℎ(𝑥)) − (1 − 𝑦)log(1 − ℎ(𝑥))
𝑚
1
𝑗(𝜃) = − 𝑦 (𝑖) log ℎ 𝑥 (𝑖) + 1 − 𝑦 𝑖 log 1 − ℎ 𝑥 (𝑖)
𝑚
𝑖=1
Binary Cross-Entropy Cost Function
How to minimize
the logistic regression cost function
Gradient Descent 𝑑
𝜃𝑗 : = 𝜃𝑗 − 𝛾 𝐽(𝜃)
𝑑𝜃𝑗
𝑚
𝑑 1
𝑗(𝜃) = 𝑔(𝑧)𝑖 − 𝑦 𝑖 ⋅ 𝑥𝑗𝑖
𝑑𝜃𝑗 𝑚
𝑖=1
𝑚
1
𝜽𝒋 : = 𝜽𝒋 − 𝜸 𝟏 𝑔 ቀ𝑧1)𝑖 − 𝑦 𝑖 ⋅ 𝑥𝑗𝑖
𝑚
𝑖=1
Cross Entropy Cost Function Derivative (Optional)
Logistic Regression Steps Summary
Step 1: 𝒉(𝒙) = 𝜽𝑇𝑖 ⋅ 𝒙𝑖 Learning Model Linear classifier
𝑧 = 𝒉(𝒙)
1
Step 2: 𝑔(𝑧) = 1+𝑒 −(𝑧) convert a real value into one that can be interpreted as a probability
1
Step 3: 𝑗(𝜃) = − 𝑚 ∑𝑚 (𝑖)
𝑖=1 𝑦 log 𝑔 𝑧
(𝑖) + 1 − 𝑦 𝑖 log 1 − 𝑔 𝑧 (𝑖) Binary Cross-Entropy Cost
Function
Step 4: minimize (𝑗(𝜃)) Using an optimization algorithm
𝑑
Gradient Descent 𝜃𝑗 : = 𝜃𝑗 − 𝛾 𝐽(𝜃)
𝑑𝜃𝑗
Multi-class Classification
Open Discussion