LOGISTIC REGRESSION
Likelihood Vs Probability
Probability vs Statistics
• In probability theory we consider some underlying process which has
some randomness or uncertainty modeled by random variables, and
we figure out what happens.
• In statistics we observe something that has happened, and try to
figure out what underlying process would explain those observations.
• Likelihood function is a fundamental concept in statistical inference.
• It indicates how likely a particular population is to produce an
observed sample.
• Probability points to chances while likelihood denotes a possibility.
Likelihood Vs Probability
• Probability is simply how likely something is to happen.
• The occurrence of discrete values yk is expressed by the probability P(yk).
• The distribution of all possible values of discrete random variable y is
expressed as probability distribution.
• We assume that there is some a priori probability (or simply prior) P(yk)
that the next feature vector belongs to the class k.
• P(x| yk) is called the class likelihood and is the conditional probability that a
pattern belonging to class yk has the associated observation value x.
• Any class that maximizes P(x| yq) is called Maximum Likelihood (ML) class.
Likelihood Vs Probability
• Probability follows clear parameters and computations while a likelihood is
based merely on observed factors/data.
• P(data; μ, σ) It means “the probability density of observing the data with
model parameters μ and σ”. It’s worth noting that we can generalise this to
any number of parameters and any distribution.
• On the other hand L(μ, σ; data) means “the likelihood of the parameters μ
and σ taking certain values given that we’ve observed a bunch of data.”
• But despite these two things being equal, the likelihood and the
probability density are fundamentally asking different questions — one is
asking about the data and the other is asking about the parameter values.
Example of Probability
• Consider a dataset containing the heights of the people of a particular
country. Let’s say the mean of the data is 170 & the standard deviation is 3.5.
• When Probability has to be calculated of any situation using this dataset,
then the dataset features will be constant i.e. mean & standard deviation of
the dataset will be constant, they will not be altered.
• Let’s say the probability of height > 170 cm has to be calculated for a random
record in the dataset, then that will be calculated using the information
shown below:
• While calculating probability, feature value can be varied, but the
characteristics(mean & Standard Deviation) of the data distribution cannot
be altered.
Example of Likelihood
• Likelihood calculation involves calculating the best distribution or best
characteristics of data given a particular feature value or situation.
• Consider the exactly same dataset example as provided above for
probability, if their likelihood of height > 170 cm has to be calculated then it
will be done using the information shown below:
• In the calculation of the Likelihood, the equation of the conditional
probability flips as compared to the equation in the probability calculation.
• Here, the dataset features will be varied, i.e. Mean & Standard Deviation of
the dataset will be varied to get the maximum likelihood for height > 170 cm.
• The likelihood in very simple terms means to increase the chances of a
particular situation to happen/occur by varying the characteristics of the
dataset distribution.
Logistic Regression Implementation
Hypothetical function
• In Logistic Regression, we apply the sigmoid activation function on the
hypothetical function of linear regression.
• So the resultant hypothetical function for logistic regression is given below:
h( x ) = sigmoid( wx + b )
Here, w is the weight vector.
x is the feature vector.
b is the bias.
sigmoid( z ) = 1 / ( 1 + e( - z ) )
Cost function
• The cost function of linear regression (mean square error) can’t be used in
logistic regression because it is a non-convex function of weights.
• Optimizing algorithms like i.e gradient descent only converge convex
function into a global minimum.
• So, the simplified cost function we use :
J = - ylog( h(x) ) - ( 1 - y )log( 1 - h(x) ) (it’s derived in last class)
here, y is the real target value
h( x ) = sigmoid( wx + b )
For y = 0, J = - log( 1 - h(x) )
and y = 1, J = - log( h(x) )
Gradient Descent Calculation
repeat until convergence {
tmpi = wi - alpha * dwi
wi = tmpi
}
where alpha is the learning rate.
• The chain rule is used to calculate the gradients like i.e dwi.
• here, a = sigmoid( z ) and z = wx + b.
Next?
• Update weights in an iterative process
• After completing all iterations, calculate Hypothetical function h( x )
Threshold classifier output h( x ) at 0.5:
If h( x ) , predict “y = 1”
If h( x ) , predict “y = 0”
Logistic Regression Numerical
Example 1
• Some samples of two classes of
articles: Technical (1) and Non-
technical (0) are given.
• Each class has two features:
• Time, which represent the
average time required to read an
article in hours,
• Sentences, representing a
number of sentences in a book
• first, we need to train our logistic
regression model.
1.9 3.1 ?
Example 1
• Training involves finding optimal
values of coefficients which are B0,
β1, and β2.
• While training, we find some value
of coefficients in the first step and
use those coefficients in another
step to optimize their value.
• We continue to do it until we get
consistent accuracy from the model.
=
1.9 3.1 ?
Example 1
• After 20 iteration, we get:
B0 = -0.1068913
B1 = 0.41444855
B2 = -0.2486209
• Thus, the decision boundary is given
as:
Z = B0+B1*X1+B2*X2
Z = -0.1068913 +0.41444855*Time-
0.2486209*Sentences
1.9 3.1 ?
Example 1
• For, X1 = 1.9 and X2 = 3.1, we get:
Z = -0.101818+0.41444855*1.9 -
0.2486209*3.1
Z = -0.085090545
• Now, we use sigmoid function to
find the probability and thus
predicting the class of given
variables.
• As y=0.477, is less than 0.5, we can
safely classify given sample to class
Non-technical. 1.9 3.1 ?
Examples 2 & 3
• Can be seen here in the following links
• https://machinelearningmastery.com/logistic-regression-tutorial-for-
machine-learning/
• https://courses.lumenlearning.com/introstats1/chapter/introduction-
to-logistic-regression/