0% found this document useful (0 votes)
22 views10 pages

Attachment LogR

Uploaded by

Katam Ashok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views10 pages

Attachment LogR

Uploaded by

Katam Ashok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Scenario 1:

We have a data containing the performance of students (in percentage scored) in an exam. The data also contains
information about the hours the students spent on self study and the number of classes they attended (max 60 classes)
before taking up the exam.

StudentID hours_Per_Day Classes_attended Percentage


1001 3 50 70.5
1002 3.25 49 75.9
1003 2.75 55 67.9
1004 2 42 66.3
1005 3.15 51 70.4
1006 4 60 80.5
1007 4.5 58 89.75
1008 2.5 47 67.67
1009 3.5 53 72.7
1010 4 55 76.5

We want to build a model, that can predict the percentage scored by a student by taking into consideration the hours of
self study and the no. of classes attended.
Scenario 2:
We have another dataset, which contains information about the amount of self study and the no. of class attended by a
student. It also contains information about whether a particular student has passed or failed in the exam.

StudentID hours_Per_Day Classes_attended Passed


1001 3 39 No
1002 3.25 40 Yes
1003 2.75 29 No
1004 2 31 No
1005 3.15 55 Yes
1006 4 60 Yes
1007 4.5 58 Yes
1008 2 41 No
1009 3.5 49 Yes
1010 4 58 Yes

We are interested in building a predictive model, that takes the hours of self study and no. of classes attended into
consideration and predicts whether a student will pass or fail in the exam.
Comparing the scenarios:

Scenario 1 Scenario 2
Dependent attribute (outcome) Percentage Scored Passed or Not
Independent attribute (Predictor(s)) hours/Day, classes attended hours/Day, classes attended
Nature of Dependent attribute Quantitative Categorical
Nature of independent attribute Both quantitative Both Quantitative

In both the scenarios, we are building the model, using the existing data i.e. training data. Hence this kind of
machine learning falls under supervised machine learning.
Nature of Outcome attribute Algorithm to be used
Quantitative Regression
Categorical Classification Supervised Machine Learning

Regression Classification

Regression is used to predict a Quantity i.e. Percentage in scenario 1

Classification is used to predict a category or class: Passed or not in Scenario 2


Regression:

• Simple Linear Regression


• Multiple Liner Regression

Classification:

• Logistic Regression
• Decision Tree
• SVM
• KNN
• Random Forest
Logistic Regression:
Logistic regression is a classification technique in which the probability or Odds of the response
taking a particular value is modeled based on combination of values taken by the predictors.

The predictors may be quantitative, categorical as well as mixed in nature.

Based on the nature of the response variable logistic regression can be classified as

• Binary logistic regression


• Multinomial logistic regression

If the response variable is binary is nature i.e. having exactly 2 categories or labels , we use binary
logistic regression

If the response variable has more than 2 categories or labels, we use multinomial logistic
regression.
Binary Logistic Regression

Assumptions:

• Observations are independent of each other

• Distribution of outcome variable is binominal

• The dependent attributes needs not be normally distributed

• It assumes a liner relationship between the logit of response and the predictors
Binomial Distribution
If an experiment having exactly 2 outcomes i.e. (Failure or success) is carried out n number of times
and if X is a random variable denoting the number of successes in the n trial , then X has a binomial
distribution provided

• All the n trials are carried out in the same manner.

• The n trials are independent of each other.

• The probability of success denoted as P is same for all the trials.


Example: Binomial Distribution
X = no. of heads if a fair coin is tossed 5 times.
Distribution
Here all the observations are independent. 0.35

0.3
The probability of getting a head is same for all observations i.e. 50% .
0.25

0.2
Hence X can be treated as a random variable having binomial distribution.
0.15

5
P(X=0) = 0
∗ 0.50 ∗ 0.55 = 0.03125 0.1

5 0.05
P(X=1) = 1
∗ 0.51 ∗ 0.54 = 0.15625
5 0
P(X=2) = 2
∗ 0.52 ∗ 0.53 = 0.3125 X=0 X=1 X=2 X=3 X=4 X=5
5
P(X=3) = 2
∗ 0.53 ∗ 0.52 = 0.3125
5
P(X=4) = 4
∗ 0.54 ∗ 0.51 = 0.15625
5
P(X=0) = 5
∗ 0.55 ∗ 0.50 = 0.03125
Logit:
Logistic regression model assumes that natural logarithm of odds ratio of the response variable varies
linearly with the predictors.

𝑃𝑖
𝑙𝑜𝑔𝑖𝑡 𝑃𝑖 = log = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ 𝛽𝑛 𝑋𝑛
1 − 𝑃𝑖

Taking the exponential of both the sides and solving the equation we can find

1
𝑃𝑖 =
1 + 𝑒 −(𝛽0+𝛽1 𝑋1 +𝛽2 𝑋2 +𝛽3 𝑋3 +⋯ 𝛽𝑛 𝑋𝑛 )

The above equation represents the logistic regression model. For different values of predictors Xi, the probability of a
event occurring is calculated using the above equation.

You might also like