0% found this document useful (0 votes)

39 views6 pages

Logistic Regression Analysis in R

The document discusses logistic regression analysis in R, focusing on its application to predict a binary outcome based on independent variables. It explains the mathematical foundation of logistic regression, including the transformation of probabilities and the use of the logit function. A practical example is provided using a dataset from UCLA to model the probability of student admission based on GRE scores, GPA, and undergraduate school rank.

Uploaded by

Ajay Thakare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views6 pages

Logistic Regression Analysis in R

Uploaded by

Ajay Thakare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Experiment No 2

Title: Logistic Regression Analysis in R

Theory
Logistic regression analysis studies the association between a categorical dependent
variable and a set of independent (explanatory) variables. The name logistic regression is
used when the dependent variable has only two values, such as 0 and 1 or Yes and No. The
name multinomial logistic regression is usually reserved for the case when the dependent
variable has three or more unique values, such as Married, Single, Divored, or Widowed.
Although the type of data used for the dependent variable is different from that of multiple
regression, the practical use of the procedure is similar.

Logistic regression competes with discriminant analysis as a method for analyzing

categorical-response variables. Many statisticians feel that logistic regression is more
versatile and better suited for modelling most situations than is discriminant analysis. This
is because logistic regression does not assume that the independent variables are normally
distributed, as discriminant analysis does.

We know that a liner model assumes that response variable is normally distributed. We
have equation of the form Yi = β0 + β1X+ εi, where we predict the value of Y for some value
of X. We know this is linear because for each unit change in X, it will affect the Y by some
magnitude β1. Also, the error term εi is assumed to be normally distributed and if that error
term is added to each output of Y, then Y is also becoming normally distributed, which
means that for each value of X we get Y and that Y is contributing to that normal
distribution. Now, this is all good when the value of Y can be -∞ to + ∞, but if the value
needs to be TRUE or FALSE, 0 or 1, YES or No then our variables does not follow normal
distribution pattern. All we have is the counts of 0s and 1s which is only useful to find
probabilities for example say if you have five 0s and fifteen 1s then getting 0 has probability
of 0.25 and getting 1 has the probability of 0.75. But how can we use that probability to
make a kind of smooth distribution that fits a line (Not linear) as close as possible to all the
points you have, given that those points are either 0 or 1.
To do that you have to imagine that the probability can only be between 0 and 1 and when
you try to fit a line to those points, it cannot be a straight line but rather a S-shape curve.

Fig. Linear vs Logistic

If you have a greater number of 1s then that S will be skewed upwards and if you
have greater numbers of 0s then it will be skewed downwards. Note that the number 0 on Y-
axis represents that half of the counts of total number is on left and half of total count is on
right, but it cannot be the case always.

Now the question arises, how do we map binary information of 1s and 0s to

regression model which uses continuous variables? The reason we do that mapping is
because we want our model to be capable of finding the probability of desired outcome
being true. Below I am going to describe how we do that mapping. Keep in mind that the
main premise of logistic regression is still based upon a typical regression model with a few
methodical changes.

Now, to find the probability of desired outcome, two things we must always be followed.
1. That the probability cannot be negative, so we introduce a term called exponential in
our normal regression model to make it logistic regression.
2. Since the probability can never be greater than 1, we need to divide our outcome by
something bigger than itself.
And based on those two things, our formula for logistic regression unfolds as following:

1. Regression formula give us Y using formula Yi = β0 + β1X+ εi.

2. We have to use exponential so that it does not become negative and hence we get
P = exp(β0 + β1X+ εi).
3. We divide that P by something bigger than itself so that it remains less than one and
hence we get P = e( β0 + β1X+ εi)/e( β0 + β1X+ εi) +1.
4. After doing some calculations that formula in 3rd step can be re-written as
log(p/(1-p)) = β0 + β1X+ εi.
5. log(p/(1-p)) is called the odds of probability. If you look closely it is the probability
of desired outcome being true divided by the probability of desired outcome not
being true and this is called logit function.

Working

When you calculate total number of 1s and 0s you can calculate the value of log(p/(1-p))
quite easily and we know that this value is equal to β0 + β1X+ εi. Now you can put that value
into the formula P = e(β0 + β1X+ εi)/e( β0 + β1X+ εi) +1 and get the value of P. That P will be
the probability of your outcome being TRUE based on some given parameters.

From a different perspective, let’s say you have your regression formula available with
intercept and slope already given to you, you just need to put in the value of X to predict Y.
But you know in logistic regression it doesn’t work that way, that is why you put your X
value here in this formula P = e(β0 + β1X+ εi)/e( β0 + β1X+ εi) +1 and map the result on x-
axis and y-axis. If the value is above 0.5 then you know it is towards the desired outcome
(that is 1) and if it is below 0.5 then you know it is towards not-desired outcome (that is 0).

Dataset and Task

The source of the data is from UCLA which has 4 variable called admit, GRE score, GPA and
rank of their undergrad school. Our aim is to build a model so that predict the probability of
that student getting admit if we are given his profile.
R Program

df <- read.csv("D:/VinodData/Year 2019 20/Practicals/ML/2/binary.csv")

str(df)

sum(is.na(df))

summary(df)

xtabs(~ admit +rank ,data=df)

df$rank <- as.factor(df$rank)

logit <- glm(admit ~ gre+gpa+rank,data=df,family="binomial")

summary(logit)

x <- data.frame(gre=790,gpa=3.8,rank=as.factor(1))

p<- predict(logit,x)

Output of the program and analysis

> df <- read.csv("D:/VinodData/Year 2019 20/Practicals/ML/2/binary.csv")
> str(df)
'data.frame': 400 obs. of 4 variables:
$ admit: int 0 1 1 1 0 1 1 0 1 0 ...
$ gre : int 380 660 800 640 520 760 560 400 540 700 ...
$ gpa : num 3.61 3.67 4 3.19 2.93 3 2.98 3.08 3.39 3.92 ...
$ rank : int 3 3 1 4 4 2 1 2 3 2 ...

We see that variable are either integer or number

> sum(is.na(df))
[1] 0

No null values
> summary(df)
admit gre gpa rank
Min. :0.0000 Min. :220.0 Min. :2.260 Min. :1.000
1st Qu.:0.0000 1st Qu.:520.0 1st Qu.:3.130 1st Qu.:2.000
Median :0.0000 Median :580.0 Median :3.395 Median :2.000
Mean :0.3175 Mean :587.7 Mean :3.390 Mean :2.485
3rd Qu.:1.0000 3rd Qu.:660.0 3rd Qu.:3.670 3rd Qu.:3.000
Max. :1.0000 Max. :800.0 Max. :4.000 Max. :4.000
We can notice that there are a greater number of rejects than there are acceptance since the
mean of variable admit is less than “0.5”.

> xtabs(~ admit +rank ,data=df)

rank
admit 1 2 3 4
0 28 97 93 55
1 33 54 28 12

We do this to check if the admits are distributed well enough in each category of rank. If
let’s say one rank has only 5 admit or reject information, then it will not be necessary to
include that rank in analysis.

> df$rank <- as.factor(df$rank)

> logit <- glm(admit ~ gre+gpa+rank,data=df,family="binomial")
> summary(logit)

Call:
glm(formula = admit ~ gre + gpa + rank, family = "binomial",
data = df)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.6268 -0.8662 -0.6388 1.1490 2.0790

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.989979 1.139951 -3.500 0.000465 ***
gre 0.002264 0.001094 2.070 0.038465 *
gpa 0.804038 0.331819 2.423 0.015388 *
rank2 -0.675443 0.316490 -2.134 0.032829 *
rank3 -1.340204 0.345306 -3.881 0.000104 ***
rank4 -1.551464 0.417832 -3.713 0.000205 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 499.98 on 399 degrees of freedom

Residual deviance: 458.52 on 394 degrees of freedom
AIC: 470.52

Number of Fisher Scoring iterations: 4

Now we run our logit function, but before that we convert rank variable from integer to factor.
1- Each one-unit change in gre will increase the log odds of getting admit by 0.002, and its
p-value indicates that it is somewhat significant in determining the admit.
2- Each unit increase in GPA increases the log odds of getting admit by 0.80 and p-value
indicates that it is somewhat significant in determining the admit.
3- The interpretation of rank is different from others, going to rank-2 college from rank-1
college will decrease the log odds of getting admit by -0.67. Going from rank-2 to rank-
3 will decrease it by -1.340
4- The difference between Null deviance and Residual deviance tells us that the model is a
good fit. Greater the difference better the model. Null deviance is the value when you
only have intercept in your equation with no variables and Residual deviance is the
value when you are taking all the variables into account. It makes sense to consider the
model good if that difference is big enough.

> x <- data.frame(gre=790,gpa=3.8,rank=as.factor(1))

> p<- predict(logit,x)
>p
1
0.85426

Let’s say a student have a profile with 790 in GRE,3.8 GPA and he studied from a rank-1
college. Now you want to predict the chances of that boy getting admit in future

Logit Regression - R Data Analysis Examples
No ratings yet
Logit Regression - R Data Analysis Examples
12 pages
Logistic Regression in R
No ratings yet
Logistic Regression in R
19 pages
Logistic Regression for Students
No ratings yet
Logistic Regression for Students
10 pages
Statistical Modelling Assignment II
No ratings yet
Statistical Modelling Assignment II
3 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Probit Model Analysis and Applications
No ratings yet
Probit Model Analysis and Applications
17 pages
Logit and Spss
No ratings yet
Logit and Spss
37 pages
Logistic Regression Derivation Explained
No ratings yet
Logistic Regression Derivation Explained
6 pages
T3 Logistic Regression
No ratings yet
T3 Logistic Regression
53 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
Logistic Regression (With R) : 1 Theory
No ratings yet
Logistic Regression (With R) : 1 Theory
15 pages
L9 Logistical Regression Models Updated
No ratings yet
L9 Logistical Regression Models Updated
10 pages
Atelier Regression Logistique
No ratings yet
Atelier Regression Logistique
4 pages
Logistic Regression Lab Guide
No ratings yet
Logistic Regression Lab Guide
10 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
Seu Ds610 Mod03
No ratings yet
Seu Ds610 Mod03
45 pages
XSTK
No ratings yet
XSTK
8 pages
Lecture 7 Logistic Regression
No ratings yet
Lecture 7 Logistic Regression
33 pages
Understanding Logistic Regression Basics
100% (1)
Understanding Logistic Regression Basics
5 pages
Logistic Regression Analysis in R
No ratings yet
Logistic Regression Analysis in R
5 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Binary Response Analysis in Regression
No ratings yet
Binary Response Analysis in Regression
39 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
208 pages
HR Analytics with Logistic Regression
No ratings yet
HR Analytics with Logistic Regression
9 pages
Regression Basics for Epidemiologists
No ratings yet
Regression Basics for Epidemiologists
18 pages
Understanding Logistic Regression in Biostatistics
No ratings yet
Understanding Logistic Regression in Biostatistics
32 pages
ES714glm Generalized Linear Models
No ratings yet
ES714glm Generalized Linear Models
26 pages
RM - Binary Logistic Regression Model - Estimation
No ratings yet
RM - Binary Logistic Regression Model - Estimation
19 pages
Assignment 2
No ratings yet
Assignment 2
11 pages
07 GLM
No ratings yet
07 GLM
49 pages
2101 F 12 Logistic Regression With R1
No ratings yet
2101 F 12 Logistic Regression With R1
10 pages
Regresi Logistik
No ratings yet
Regresi Logistik
34 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
R Code Default Data PDF
No ratings yet
R Code Default Data PDF
10 pages
Capstone - Https:Users - Ox.ac - Uk: Jesu0073:Lecture 3:LogisticRegression
No ratings yet
Capstone - Https:Users - Ox.ac - Uk: Jesu0073:Lecture 3:LogisticRegression
17 pages
Logistic Regression Insights
No ratings yet
Logistic Regression Insights
54 pages
Q 10 A Q 6B Logistic Regression Class
No ratings yet
Q 10 A Q 6B Logistic Regression Class
18 pages
Day 13 Logistic Regression
No ratings yet
Day 13 Logistic Regression
28 pages
Non-Linear Regression Guide
No ratings yet
Non-Linear Regression Guide
10 pages
Logistic Regression Analysis Overview
No ratings yet
Logistic Regression Analysis Overview
16 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Logistic Regresson
No ratings yet
Logistic Regresson
32 pages
26GeneralizedLinearModelBernoulliAnnotated PDF
No ratings yet
26GeneralizedLinearModelBernoulliAnnotated PDF
46 pages
Logistic
No ratings yet
Logistic
5 pages
Roni Presentation
No ratings yet
Roni Presentation
17 pages
Logistic Regression vs Discriminant Analysis
No ratings yet
Logistic Regression vs Discriminant Analysis
54 pages
Logistic Regression & Model Evaluation
100% (1)
Logistic Regression & Model Evaluation
11 pages
Lec 20
No ratings yet
Lec 20
16 pages
Binary Logistic Regression
No ratings yet
Binary Logistic Regression
8 pages
Nhso401 r6 LogisticRegression
No ratings yet
Nhso401 r6 LogisticRegression
14 pages
GLM & Berkeley Admissions Analysis
No ratings yet
GLM & Berkeley Admissions Analysis
24 pages
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
No ratings yet
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
31 pages
Logit Probit
No ratings yet
Logit Probit
66 pages
Clustering Algorithms & Evaluation in R
No ratings yet
Clustering Algorithms & Evaluation in R
11 pages
Random Forest and Parameter Tuning in R
No ratings yet
Random Forest and Parameter Tuning in R
9 pages
Regression Analysis and Plot Interpretations in R
No ratings yet
Regression Analysis and Plot Interpretations in R
13 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
58 pages
2025-07-29T07 - 19 - 53.438Z - Dsa Basics
No ratings yet
2025-07-29T07 - 19 - 53.438Z - Dsa Basics
24 pages
Hema Poultry Farming Project Report
No ratings yet
Hema Poultry Farming Project Report
10 pages
Marketing & Finance Compliance Guide
No ratings yet
Marketing & Finance Compliance Guide
10 pages
Integrating Technology and Heritage Design
No ratings yet
Integrating Technology and Heritage Design
25 pages
Validitas Reliabilitas Instrumen Kehamilan
No ratings yet
Validitas Reliabilitas Instrumen Kehamilan
4 pages
Cambridge First Certificate in English3 For Updated Exam Upper Intermediate Students Book With Answers Frontmatter PDF
100% (1)
Cambridge First Certificate in English3 For Updated Exam Upper Intermediate Students Book With Answers Frontmatter PDF
4 pages
KF Series Low Impedance Capacitors
No ratings yet
KF Series Low Impedance Capacitors
5 pages
Infinity Delta and Delta XL Infinity® Bedside Solutions
No ratings yet
Infinity Delta and Delta XL Infinity® Bedside Solutions
12 pages
Maintenance Cost Analysis and Prediction
No ratings yet
Maintenance Cost Analysis and Prediction
6 pages
Nitrogen & Oxygen Generators
No ratings yet
Nitrogen & Oxygen Generators
16 pages
SL7810E Parts Manual
100% (1)
SL7810E Parts Manual
210 pages
2D TMDC-Review Paper-10.1007/s40820-017-0152-6
No ratings yet
2D TMDC-Review Paper-10.1007/s40820-017-0152-6
23 pages
OBLICON QUIZ Kinds of Obligations
100% (1)
OBLICON QUIZ Kinds of Obligations
4 pages
Gr11 FinancialStatements MEMO
No ratings yet
Gr11 FinancialStatements MEMO
21 pages
3 - Mondejar V Buban
No ratings yet
3 - Mondejar V Buban
2 pages
El Libro de La Sabiduria Español Harry B Joseph - Free Download, Borrow, and Streaming - Internet Archive
No ratings yet
El Libro de La Sabiduria Español Harry B Joseph - Free Download, Borrow, and Streaming - Internet Archive
4 pages
Assignment A242
No ratings yet
Assignment A242
3 pages
Network Protocols and OSI Model Overview
No ratings yet
Network Protocols and OSI Model Overview
118 pages
Basic Concepts of Statistics
83% (29)
Basic Concepts of Statistics
36 pages
Tutorials Origin Pro 9
100% (1)
Tutorials Origin Pro 9
920 pages
SVAN 956 Short Manual
No ratings yet
SVAN 956 Short Manual
3 pages
MySQL Functions
No ratings yet
MySQL Functions
28 pages
Analysis of Natural Frequencies For Cant
No ratings yet
Analysis of Natural Frequencies For Cant
8 pages
Mirpuri Vs Court of Appeals, 318 SCRA 516, G.R. No. 114508, November 19, 1999
No ratings yet
Mirpuri Vs Court of Appeals, 318 SCRA 516, G.R. No. 114508, November 19, 1999
51 pages
CLIA Certificate 39D0673919 2024 02 02
No ratings yet
CLIA Certificate 39D0673919 2024 02 02
1 page
No Speed Limit Three Essays On Accelerationism Forerunners Ideas First Steven Shaviro 2015 University of Minnesota Press PDF
No ratings yet
No Speed Limit Three Essays On Accelerationism Forerunners Ideas First Steven Shaviro 2015 University of Minnesota Press PDF
33 pages
Jaa TGL-10 (Rnav) PDF
No ratings yet
Jaa TGL-10 (Rnav) PDF
29 pages
Tabular Analysis of Transactions
No ratings yet
Tabular Analysis of Transactions
3 pages
Constitution-Trade, Commerce and Intercourse
No ratings yet
Constitution-Trade, Commerce and Intercourse
12 pages
Sports Management Essentials
No ratings yet
Sports Management Essentials
4 pages
Academic Reading Sample Task
No ratings yet
Academic Reading Sample Task
2 pages

Logistic Regression Analysis in R

Uploaded by

Logistic Regression Analysis in R

Uploaded by

Experiment No 2

Title: Logistic Regression Analysis in R

Logistic regression competes with discriminant analysis as a method for analyzing

Fig. Linear vs Logistic

Now the question arises, how do we map binary information of 1s and 0s to

1. Regression formula give us Y using formula Yi = β0 + β1X+ εi.

Dataset and Task

df <- read.csv("D:/VinodData/Year 2019 20/Practicals/ML/2/binary.csv")

xtabs(~ admit +rank ,data=df)

df$rank <- as.factor(df$rank)

logit <- glm(admit ~ gre+gpa+rank,data=df,family="binomial")

Output of the program and analysis

We see that variable are either integer or number

> xtabs(~ admit +rank ,data=df)

> df$rank <- as.factor(df$rank)

Null deviance: 499.98 on 399 degrees of freedom

Number of Fisher Scoring iterations: 4

> x <- data.frame(gre=790,gpa=3.8,rank=as.factor(1))

You might also like