Logistic Regression Course: Machine Learning
Machine Learning (VR17)
IV B.Tech – I Semester
UNIT-1
Lecture:3
Topic: Logistic Regression
COURSE INSTRUCTOR:
Dr.R.UMamaheswari
Assoc.prof & HoD ECM
Department of Electronics and Computer Engineering Slide No. 1
Topic: Logistic Regression Course: Machine Learning
Regression:
Regression analysis is a statistical method used to model
the relationship between a dependent (target) and
independent (predictor) variables with one or more
independent variables.
It helps us to understand how the value of the
dependent variable is changing corresponding to an
independent variable when other independent variables
are held fixed.
It is mainly used for prediction, forecasting, time series
modeling, and determining the causal-effect relationship
between variables.
It predicts continuous/real values such as temperature,
age, salary, price, etc.
Department of Electronics and Computer Engineering Slide No. 3
Topic: Logistic Regression Course: Machine Learning
Terminologies Related to the Regression Analysis:
Dependent Variable: The main factor in Regression analysis which we
want to predict or understand is called the dependent variable. It is also
called target variable.
Independent Variable: The factors which affect the dependent variables
or which are used to predict the values of the dependent variables are
called independent variable, also called as a predictor.
Outliers: Outlier is an observation which contains either very low value or
very high value in comparison to other observed values. An outlier may
hamper the result, so it should be avoided.
Department of Electronics and Computer Engineering Slide No. 4
Topic: Logistic Regression Course: Machine Learning
Why do we use Regression Analysis?
Regression estimates the relationship between the target and the
independent variable.
It is used to find the trends in data.
It helps to predict real/continuous values.
By performing the regression, we can confidently determine the most
important factor, the least important factor, and how each factor is
affecting the other factors.
Types of Regression:
Linear Regression Decision Tree Regression
Logistic Regression Random Forest Regression
Polynomial Regression Ridge Regression
Support Vector Regression Lasso Regression
Department of Electronics and Computer Engineering Slide No. 5
Topic: Logistic Regression Course: Machine Learning
Logistic Regression:
Logistic regression is the appropriate regression analysis to conduct
when the dependent variable is dichotomous (binary).
Like all regression analyses, the logistic regression is a predictive
analysis.
The output of the Logistic Regression problem can be only between the
0 and 1.
Logistic regression is used to describe data and to explain the
relationship between one dependent binary variable and one or more
nominal, ordinal, interval or ratio-level independent variables.
Department of Electronics and Computer Engineering Slide No. 6
Topic: Logistic Regression Course: Machine Learning
Mathematically, we can represent a logistic regression as:
y range lies between ‘0’ and ‘1’ , it represents probability of
event to be happen.
y can be 0.1, 0.2, 0.3…… that’s the reason we set the threshold
value, so that you can categorize the output as ‘0’ or ‘1’.
x1, x2, x3 …xn are the Independent Variable that are used to
calculate the dependent variable.
b0= intercept of the line (Gives an additional degree of
freedom)b1, b2, b3 are the coefficients of x1,x2,x3…
Department of Electronics and Computer Engineering Slide No. 3
Topic: Logistic Regression Course: Machine Learning
Logistic regression is based on the concept of Maximum Likelihood
estimation. According to this estimation, the observed data should be most
probable.
In logistic regression, we pass the weighted sum of inputs through an
activation function that can map values in between 0 and 1.
Such activation function is known as sigmoid function and the curve
obtained is called as sigmoid curve or S-curve.
Types of Questions Binary Logistic Regression Can Answer:
How does the probability of getting lung cancer (yes vs. no) change for
every additional pound a person is overweight and for every pack of
cigarettes smoked per day?
Do body weight, calorie intake, fat intake, and age have an influence on the
probability of having a heart attack (yes vs. no)?
Department of Electronics and Computer Engineering Slide No. 7
Topic: Logistic Regression Course: Machine Learning
Logistic Regression — Detailed Overview:
Logistic Regression was used in the biological sciences in early
twentieth century.
It was then used in many social science applications. Logistic
Regression is used when the dependent variable(target) is categorical.
Department of Electronics and Computer Engineering Slide No. 8
Topic: Logistic Regression Course: Machine Learning
Example:
To predict whether an email is spam (1) or (0)
Whether the tumor is malignant (1) or not (0)
Consider a scenario where we need to classify whether an email
is spam or not.
If we use linear regression for this problem, there is a need for
setting up a threshold based on which classification can be done.
Say if the actual class is malignant, predicted continuous value
0.4 and the threshold value is 0.5, the data point will be classified
as not malignant which can lead to serious consequence in real
time.
From this example, it can be inferred that linear regression is
not suitable for classification problem. Linear regression is
unbounded, and this brings logistic regression into picture. Their
value strictly ranges from 0 to 1.
Department of Electronics and Computer Engineering Slide No. 9
Topic: Logistic Regression Course: Machine Learning
Simple Logistic Regression:
The sigmoid function also called the logistic function gives an ‘S’ shaped
curve that can take any real-valued number and map it into a value between
0 and 1.
Model
Output = 0 or 1
Hypothesis => Z = WX + B
hΘ(x) = sigmoid (Z)
Sigmoid Function
If ‘Z’ goes to infinity,
Y(predicted) will become
1 and if ‘Z’ goes to
infinity, Y(predicted) will
become 0.
Department of Electronics and Computer Engineering Slide No. 10
Topic: Logistic Regression Course: Machine Learning
Types of Logistic Regression:
1. Binary Logistic Regression
The categorical response has only two 2 possible outcomes. Example:
Spam or Not
2. Multinomial Logistic Regression
Three or more categories without ordering. Example: Predicting which
food is preferred more (Veg, Non-Veg, Vegan)
3. Ordinal Logistic Regression
Three or more categories with ordering. Example: Movie rating from 1 to
5
Department of Electronics and Computer Engineering Slide No. 11
Topic: Logistic Regression Course: Machine Learning
Real World Applications:
Example 1:
The response variable in the model will be heart attack and it has two
potential outcomes:
A heart attack occurs.
A heart attack does not occur.
The results of the model will tell researchers exactly how changes in
exercise and weight affect the probability that a given individual has a heart
attack.
Example 2:
The response variable in the model will be “acceptance” and it has two
potential outcomes:
A student gets accepted.
A student does not get accepted.
The results of the model will tell researchers exactly how changes in GPA,
ACT score, and number of AP classes taken affect the probability that a
given individual gets accepted into the university.
Department of Electronics and Computer Engineering Slide No. 12
Topic: Logistic Regression Course: Machine Learning
Real World Applications:
Example 3:
The response variable in the model will be “fraudulent” and it has two
potential outcomes:
The transaction is fraudulent.
The transaction is not fraudulent.
The results of the model will tell the company exactly how changes in
transaction amount and credit score affect the probability of a given
transaction being fraudulent.
Department of Electronics and Computer Engineering Slide No. 13
Topic: Logistic Regression Course: Machine Learning
Difference Between Linear and Logistic Regression:
Linear Regression Logistic Regression
Linear regression is used to predict the Logistic Regression is used to predict
continuous dependent variable using a the categorical dependent variable using
given set of independent variables. a given set of independent variables.
Linear Regression is used for solving Logistic regression is used for solving
Regression problem. Classification problems.
In Linear regression, we predict the In logistic Regression, we predict the
value of continuous variables. values of categorical variables.
In linear regression, we find the best fit In Logistic Regression, we find the S-
line, by which we can easily predict the curve by which we can classify the
output. samples.
Least square estimation method is used Maximum likelihood estimation method
for estimation of accuracy. is used for estimation of accuracy.
Department of Electronics and Computer Engineering Slide No. 14
Topic: Logistic Regression Course: Machine Learning
Difference Between Linear and Logistic Regression:
Linear Regression Logistic Regression
The output for Linear Regression must The output of Logistic Regression must
be a continuous value, such as price, be a Categorical value such as 0 or 1,
age, etc. Yes or No, etc.
In Linear regression, it is required that In Logistic regression, it is not required
relationship between dependent variable to have the linear relationship between
and independent variable must be linear. the dependent and independent variable.
In linear regression, there may be In logistic regression, there should not
collinearity between the independent be collinearity between the independent
variables. variable.
Department of Electronics and Computer Engineering Slide No. 15
Topic: Logistic Regression Course: Machine Learning
Thank You
Department of Electronics and Computer Engineering Slide No. 18