Business
Analytics
OUR ANALYSIS- USING LOGISTIC
REGRESSION
Logistic regression is a statistical
method for analyzing a dataset in which
there are one or more independent
variables that determine an outcome
The outcome is measured with a
dichotomous variable (in which there are
only two possible outcomes)
OUR ANALYSIS- USING LOGISTIC
REGRESSION
Works on the maximum likelihood
concept
Coeffients will fit the regression model
such predicted probability p(xi) of default
for each individual is as close as possible to
the individuals observed default probability
Introducti
on
INTRODUCTION
In the era of Web 2.0, consumers
share their ratings or comments easily
with other people after watching a
movie
To measure the quality of the film
- Rely on critics (More time consuming)
- Use your own instincts (Simplified and
a great indicator; unreliable)
INTRODUCTION
There should be a better way to tell if
a movie will win awards without
relying on critics or our ones instincts
This study develops an award
prediction model which uses logistic
regression technique
Question
Will a movie win an award,
considering factors like no. of
reviews, movie ratings count,
award nominations, etc.?
DATASET
1650 movies
Independent variables-IMDb Rating,Rating
Count,Number of Nominations,Number of
Photos,Number of News Articles,Number of User
Reviews,Number of Genre
Dependent variable Award win- Yes/ No
Dichotomous scale
Regression Technique- Logistic Regression
METHODOLOGY
Data collection
& streamlining
Running logistic
regression
Analyzing
results
Prediction
Accuracy check
ANALYSIS
Using the
regression
coefficients to
predict the
values on test
data and then
Divided the
checking
data into
accuracy of the
training and
model
test data
On the
training data-
logistic
regression
analysis
using R
STEP 1: Collecting the data set
STEP 2 Logistic regression on Training Data
Step 3- Prediction on test data
ACCURACY OF THE MODEL
USING ONLY SIGNIFICANT VARIABLES
ACCURACY OF THE MODEL
ERROR 0.043378995
ACCURACY 0.956621005
PREDICTED PROBABILITY OF WINNING USING LOGISTIC
REGRESSION
Classification
1.00
0.90
0.80
0.70
0.60
Probability
0.50
0.40
0.30
0.20
0.10
0.00
0 1 2 3 4 5 6 7 8 9 10
IMDB Rating
PREDICTED PROBABILITY OF WINNING USING LOGISTIC
REGRESSION
Classification
1
0.9
0.8
0.7
0.6
Probability
0.5
0.4
0.3
0.2
0.1
0
0 20 40 60 80 100 120 140
Number of Nominations
INTERPRETATION- SIGNIFICANT
VARIABLES
Significant Variables:
IMDb rating
Number of nominations
Insignificant
variables:
Rating count
Number of photos
Number of news
articles
p(x)= -9.008+(1.091*imdbRating)+(-
0.000003268*ratingCount)+(1.810*nrOfNomination)+(- Number of user
0.001*nrOfPhotos)+(- reviews
0.000049*nrOfNewsArticles)+(0.002*nrOfUserReviews Number of genres
)+(-0.066*nrOfGenre)
p(x)= --8.9192+(1.076*imdbRatin)+(1.8920*nrOfNomination)
APPLICATIONS
Movies are an integral part of modern
culture, generating over billions and billions
of dollars each year
The film industry serves as a deeply
expressive artistic medium and also has a
significant impact on the global economy
This model can be use to predict if a
movie will be critically acclaimed as an
award winning movie or not
LIMITATIONS
Many of these factors are either
impossible to quantify or difficult to
obtain reliably
- Profitability of a movies release period
- Popularity of a movies content
- Strength of a movies marketing
strategies
- Advertising budget of a movie
CONCLUSION
Logistic Regression was the optimal
solution
Emphasis on 2 dependent variables is
high
Change in IMDb Ratings and Number of
movie nominations will significantly affects
the probability of a movie winning an award
High accuracy value and lower error
value (Model strength is good )
Thank You!