0% found this document useful (0 votes)

282 views29 pages

Telecom Customer Churn Prediction Model

The document summarizes a project to build a predictive model for customer churn using logistic regression, KNN, and Naive Bayes models. It includes: 1) Exploratory data analysis of a telecom customer dataset to understand patterns and relationships between variables. 2) Building an initial logistic regression model and refining it by addressing multicollinearity. 3) Evaluating the logistic regression model on test data and interpreting the results.

Uploaded by

Ramachandran Venkataraman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

282 views29 pages

Telecom Customer Churn Prediction Model

Uploaded by

Ramachandran Venkataraman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Telecom

Customer Churn Prediction Modelling

R Venkataraman

24th May 2020

—
PGP BABI
—
Group 5
Index

Project Description & Objective ………………………………………………………………. 3

Project Report……………………………………………………………………………………..... 4-28

Reference……………………………………………………………………………………………….. 29
Telecom Customer Churn Prediction Assessment

Project Description

Customer Churn is a burning problem for Telecom companies. In this project, we simulate
one such case of customer churn where we work on a data of postpaid customers with a
contract. The data has information about the customer usage behavior, contract details and
the payment details. The data also indicates which were the customers who canceled their
service. Based on this past data, we need to build a model which can predict whether a
customer will cancel their service in the future or not.

Project Objective

• EDA - Basic data summary, Univariate, Bivariate analysis, graphs

• EDA - Check for Outliers and missing values and check the summary of the dataset
• EDA - Check for Multicollinearity - Plot the graph based on Multicollinearity & treat it.
• EDA - Summarize the insights you get from EDA
• Applying Logistic Regression
• Interpret Logistic Regression
• Applying KNN Model
• Interpret KNN Model
• Applying Naive Bayes Model
• Interpret Naive Bayes Model
• Confusion matrix interpretation for all models
• Interpretation of other Model Performance Measures for logistic <KS, AUC, GINI>
• Remarks on Model validation exercise <Which model performed the best>
• Actionable Insights and Recommendations

3|Page
Telecom Customer Churn Prediction Assessment

Project Report

EDA - Basic data summary, Univariate, Bivariate analysis, graphs

With necessary R libraries loaded and setting the default working directory, the dataset is
loaded into R. Initial glimpse of the data as follows:

We have 10 independent variables and 1 dependent variable (‘Churn’) in the given data set.
We have 3333 rows which can be split into test & train dataset for various model building.

Data Description:

Churn 1 if customer cancelled service, 0 if not

AccountWeeks number of weeks customer has had active account
ContractRenewal 1 if customer recently renewed contract, 0 if not
DataPlan 1 if customer has data plan, 0 if not
DataUsage gigabytes of monthly data usage
CustServCalls number of calls into customer service
DayMins average daytime minutes per month
DayCalls average number of daytime calls
MonthlyCharge average monthly bill
OverageFee largest overage fee in last 12 months
RoamMins average number of roaming minutes

Initial summary of the data

4|Page
Telecom Customer Churn Prediction Assessment

Data structure just after loading

Missing values checked and results as below:

No missing values

Converted the factor variables from numeric

mycell$Churn=as.factor(mycell$Churn)
mycell$ContractRenewal=as.factor(mycell$ContractRenewal)
mycell$DataPlan=as.factor(mycell$DataPlan)

Final summary of the dataset

5|Page
Telecom Customer Churn Prediction Assessment

Univariate Analysis

14% of the customers (483) have

cancelled service while 86% (2850)
have continued.

Customer details with account

weeks as below:

Weeks #Customers %
<25 weeks 94 3
25-49 232 7
50-74 531 16
75-99 770 23
100-150 1350 41
>125 356 11

90% of the customers have

renewed their service recently

6|Page
Telecom Customer Churn Prediction Assessment

72% of the customers doesn’t have data

service offered while 28% have it.

Data Usage:
Mean =.81 with Std.dev=1.2727

No of outliers: 11

Service calls:
Mean = 1.56 with Std.dev=1.3154

No of outliers: 267

7|Page
Telecom Customer Churn Prediction Assessment

Daytime usage:
Mean = 179.8 with Std.dev=54.467

No of outliers: 25

Daycalls:
Mean =100.4 with Std.dev=20.069

No of outliers: 23

Monthly charges:
Mean=56.31 with Std.dev=16.426

No of outliers: 34

8|Page
Telecom Customer Churn Prediction Assessment

Overage Fee:
Mean = 10.05 with Std.dev=2.535

No of outliers: 24

Roaming minutes:
Mean=10.24 with Std.dev=2.791

No of outliers: 46

No treating of outliers carried out in this exercise and it is not explicitly asked
for.

Bivariate Analysis

We will analyze how the independent variables stack up with the dependent varirable
(Customer Churn)

Accountweeks Vs Customer Churn

No major trend in this

9|Page
Telecom Customer Churn Prediction Assessment

Contract Renewal Vs Churn

42% probability of customer churn if Contract is not renewed.

Dataplan Vs Churn

Higher probability of churn if customers don’t have dataplan as against having a plan

10 | P a g e
Telecom Customer Churn Prediction Assessment

Datausage Vs Churn

High probability of churn if datausage is very less or 0

Customer service calls Vs Churn

If the number of calls are between 3 to 7, churn % is higher.

Daytime usage Vs Churn

As the daytime usage goes up > 120, the probability of churn also increases.

11 | P a g e
Telecom Customer Churn Prediction Assessment

Daycalls Vs Churn

Higher probability of customer churn if the day calls are between 66 and 132.

Monthly charge Vs Churn

Higher probability of customer churn once the monthly charge are between 33 and 78

There is
a

12 | P a g e
Telecom Customer Churn Prediction Assessment

Overage Fee Vs Churn

Probability of churning out higher between the overage Fee range of 7.28 and 14.6

Roaming Vs Churn

Probability of churning is higher between the ranges : 8 to 14

13 | P a g e
Telecom Customer Churn Prediction Assessment

Multicollinearity:

cor.plot(mycellcor)
corrplot(mycellcor,method="number")
corrplot(mycellcor,method="ellipse")

Multicollinearity exists between Datausage/Dataplan, Monthly charge with Dataplan & Data
usage. This will be treated after confirming the VIF values in the logistic regression using
vif_func (). The results of this will be used for Regression, KNN & Naïve Bayes models (by
ignoring the fields)

14 | P a g e
Telecom Customer Churn Prediction Assessment

Model Building

First logistic regression model will be built on the dataset.

Train and Test data sets are created with a split of 70% & 30%.

set.seed(101)
mysplit=sample.split(mycell[,c(-12)],SplitRatio = 0.7)
mycell_train=subset(mycell[,c(-12)],mysplit==TRUE)
mycell_test=subset(mycell[,c(-12)],mysplit==FALSE)
str(mycell_train)
str(mycell_test)

First draft run of the logistic regression done on the training dataset with all the
columns included in the regression and the corresponding VIF is checked to act upon
the multicollinearity.

myglm1=glm(Churn~.,data=mycell_train,family="binomial")
summary(myglm1)
vif(myglm1)

15 | P a g e
Telecom Customer Churn Prediction Assessment

Vif output:

As illustrated in the corr.plot, Dataplan, Datausage,Daymins,Monthlycharge variables

have high VIF values.

We will use a step reduction using VIF_FUNC to remove the variables:

vif_func=dget("vif_func.R")
myvif=vif_func(in_frame=mycell_train,thresh=5,trace=TRUE)

Based on the above output, we will ignore Monthlycharge and

Datausage from the regression.

16 | P a g e
Telecom Customer Churn Prediction Assessment

New regression built by removing the above two variables(multicollinearity effect)

myglm2=glm(Churn~. -MonthlyCharge -DataUsage,data=mycell_train,family="binomial")

summary(myglm2)
vif(myglm2)

VIF is also checked and it is within accepted values.

Variables Accountweeks & Daycalls are not significant in the regression and this can be
discarded in the equation.

Final regression is built on the training set by removing these two variables.

myglm3=glm(Churn~. -MonthlyCharge -DataUsage -AccountWeeks -

DayCalls,data=mycell_train,family="binomial")
summary(myglm3)
vif(myglm3)

17 | P a g e
Telecom Customer Churn Prediction Assessment

Summary of the final regression in the training dataset:

VIF output

We will plot the prediction to check the threshold on the training dataset.

18 | P a g e
Telecom Customer Churn Prediction Assessment

Based on the plot above, we will use a cut off .16 to predict.

pred2.churn=ifelse(pred.test>0.2,1,0)

Confusion matrix on the training data set:

As the confusion matrix parameters (after threshold adjustment) looks optimized, we will
apply the same on the test dataset. Same threshold of .16 applied.

ROC Plot:

AUC : 81.25%

GINI: 62.5%

19 | P a g e
Telecom Customer Churn Prediction Assessment

Confusion Matrix

KS Plots

KS Value: 0.5268

20 | P a g e
Telecom Customer Churn Prediction Assessment

Summary of the final logistic

regression model Odds Ratio

Interpretation of Logistic regression model / Odds ratio:

Independent variables with positive higher Z values (Custservcalls, Daymins, OverageFee &
Roammins) are very significant which influences the churn.

Odds ratio explains negative relationship of ContractRenewal and Dataplan variables. Positive
relationship for Custservcalls, Daymins, OverageFee and Roammins. So for the positive
relationship variables, each increase in their score, the odds of being churn increase by the
factor as in the OR table.

Variable importance:

The model has an accuracy of : 82.6% with sensitivity of 87% and specificity of 60%.
Area under the curve: 81.25 and GINI : 62.5
KS Score of the model : 53

21 | P a g e
Telecom Customer Churn Prediction Assessment

KNN Model

Before creating the model, the dataset is normalized.

norm = function(x) { (x- min(x))/(max(x) - min(x)) }

mycell_orig.data = as.data.frame(lapply(mycell_orig, norm))
mycell_norm.data = cbind(mycell_orig[,1], mycell_orig.data)

Then from the normalized dataset, training and test split is done.

Best value of K is found using train function. We will be using the same independent variables
which are qualified in the logistic regression(effect of multicollinearity and significance)

This returns a best fit of

K=5

We will use K=5 for the model building.

mypred1 = knn(mycellknn_train[,c(3,4,6,7,10,11)], mycellknn_test[,c(3,4,6,7,10,11)],

mycellknn_train$Churn, k = 5)

Confusion matrix

22 | P a g e
Telecom Customer Churn Prediction Assessment

ROC Plots

AUC : 73.73%

GINI : 47.45%

KS Plots:

KS Statistic : 47.45

23 | P a g e
Telecom Customer Churn Prediction Assessment

Interpretation of KNN Model

Variable importance of the model can be found using varImp as below

Variable importance:

The model has an accuracy of 91% with sensitivity of 99% and specificity of 48%.
AUC=73.73, GINI = 47.45 and KS =47.45

With lesser value of K there will be more noise built into the classification prediction, whereas
with higher K it will be a overfit.

Naïve Bayes Model

We use the same test and training split done for the logistic regression.

Plotting the prediction on the training dataset.

24 | P a g e
Telecom Customer Churn Prediction Assessment

After few iterations, found that .15 threshold is getting the best results.

nb_train.churn=ifelse(nbpred1[,2]>.15,1,0)

ROC & AUC on the training dataset Confusion matrix

AUC = 87%

As the model performance measures are good in the training set, we will apply on the test data
set

25 | P a g e
Telecom Customer Churn Prediction Assessment

Prediction plot: threshold of > .15

ROC Plot Confusion Matrix

AUC =85.69
GINI = 71.38

26 | P a g e
Telecom Customer Churn Prediction Assessment

We will calculate KS static

KS= 67.23

Interpretation of Naïve Bayes model:

This model has an accuracy of 86% with sensitivity of % and specificity of 90% and 60%.
AUC = 86 , GINI = 71 and KS = 67

Model performance Measures chart:

Logistic
Measures Regression KNN Model Naïve Bayes
Confusion Matrix
Accuracy 80% 91% 85%
Sensitivity 82% 99% 86%
Specificity 67% 48% 81%
Balanced Accuracy 74% 91% 84%

AUC 81% 74% 86%

GINI 63% 47% 71%

KS 53% 47% 67%

27 | P a g e
Telecom Customer Churn Prediction Assessment

In the confusion matrix parameters, KNN outscores in accuracy and sensitivity, but less
specificity while NB has a balanced figure overall closely followed by logistic regression.

On the other parameters of AUC,GINI and KS, NB has outscored both Logistic regression and
KNN.

Result: NB is the best model for this case based on the above parameters for this
dataset.

Actionable insights & Recommendations:

Based on the variable importance parameters and bi-variate analysis, it is evident that:

Those who use more data time on the day have higher probability of churning out, may be
looking for faster/better service and they have to be identified and to be offered with free
additional data plans or other features.

Those who make more calls to the customer service have higher probability of churning out.
The company can look into better service models, self service portals/Apps, incident analysis
on the calls to pre-empt the customer with solutions etc...

Those who haven’t renewed their contract recently have higher probability of churning out.
The company can look into new attractive packages, longer duration contracts etc to lock in
the customers.

28 | P a g e
Telecom Customer Churn Prediction Assessment

References:

Great Learning Videos & Course Materials

CRAN package documentation

29 | P a g e

Telecom Churn Prediction Analysis
100% (1)
Telecom Churn Prediction Analysis
23 pages
Slide 1: No-Churn Telecom
100% (1)
Slide 1: No-Churn Telecom
11 pages
Telecom Customer Churn Analysis
No ratings yet
Telecom Customer Churn Analysis
22 pages
Telecom Customer Churn Project Report
50% (2)
Telecom Customer Churn Project Report
25 pages
Predictive Modelling Project - Business Report
100% (1)
Predictive Modelling Project - Business Report
23 pages
Telecom Churn Prediction Model
100% (3)
Telecom Churn Prediction Model
15 pages
Churn Prediction Model for Telecom
100% (5)
Churn Prediction Model for Telecom
28 pages
Customer Churn Analysis
No ratings yet
Customer Churn Analysis
10 pages
Telecom Customer Churn Prediction
No ratings yet
Telecom Customer Churn Prediction
7 pages
Telecom Customer Churn Prediction Model
0% (1)
Telecom Customer Churn Prediction Model
39 pages
Customer Churn Prediction Using Machine Learning: D. Deepika, Nihal Chandra
100% (1)
Customer Churn Prediction Using Machine Learning: D. Deepika, Nihal Chandra
14 pages
Churn Prediction
100% (1)
Churn Prediction
11 pages
Customer Churn Prediction in Telecommunication
No ratings yet
Customer Churn Prediction in Telecommunication
13 pages
Telecom Churn Analysis with Logistic Regression
No ratings yet
Telecom Churn Analysis with Logistic Regression
6 pages
Customer Churn Prediction Strategies
No ratings yet
Customer Churn Prediction Strategies
33 pages
Retail Banking Churn Analysis
100% (1)
Retail Banking Churn Analysis
20 pages
Credit Card Churn Prediction Model
No ratings yet
Credit Card Churn Prediction Model
12 pages
Lead Conversion Strategies and Metrics
No ratings yet
Lead Conversion Strategies and Metrics
3 pages
Lead Scoring for X Education
No ratings yet
Lead Scoring for X Education
24 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
ML1+Project+ (Coded) + +Sample+Business+Report
No ratings yet
ML1+Project+ (Coded) + +Sample+Business+Report
56 pages
Churn Predict Analysis
100% (1)
Churn Predict Analysis
23 pages
PM Guided Project Sample Business Report
100% (1)
PM Guided Project Sample Business Report
52 pages
Predicting Mode of Transport (ML) : Akalya KS
No ratings yet
Predicting Mode of Transport (ML) : Akalya KS
17 pages
Answer Report (Preditive Modelling)
100% (1)
Answer Report (Preditive Modelling)
29 pages
REport Time Series
100% (2)
REport Time Series
57 pages
Tour Insurance Claim Prediction Models
0% (1)
Tour Insurance Claim Prediction Models
16 pages
Car Transport Machine Learning
89% (9)
Car Transport Machine Learning
28 pages
Lead Scoring Case Study: Aparna Trivedi Ashish Nipane DS C29
No ratings yet
Lead Scoring Case Study: Aparna Trivedi Ashish Nipane DS C29
13 pages
EdTech Lead Conversion Insights
No ratings yet
EdTech Lead Conversion Insights
21 pages
Churn Modelling in Big Data Analytics
No ratings yet
Churn Modelling in Big Data Analytics
31 pages
UL Coded Project Report - KC
No ratings yet
UL Coded Project Report - KC
30 pages
Data Mining Project Report
100% (1)
Data Mining Project Report
98 pages
Churn Prediction Analysis in Telecom
No ratings yet
Churn Prediction Analysis in Telecom
57 pages
Telecommunication Customer Churn (New)
100% (1)
Telecommunication Customer Churn (New)
23 pages
ML 2 Project Business Report - Nandini
No ratings yet
ML 2 Project Business Report - Nandini
43 pages
Car Transport Prediction
100% (2)
Car Transport Prediction
27 pages
Time Series Forecasting - Rose - Buisness Report
100% (1)
Time Series Forecasting - Rose - Buisness Report
69 pages
Transport Mode Prediction Analysis
100% (2)
Transport Mode Prediction Analysis
21 pages
Final Churn Prediction
No ratings yet
Final Churn Prediction
16 pages
ML-2 Guided Project Report
No ratings yet
ML-2 Guided Project Report
63 pages
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
No ratings yet
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
12 pages
Customer Churn Prediction
100% (1)
Customer Churn Prediction
32 pages
Pre-Paid Customer Churn Prediction Using SPSS
No ratings yet
Pre-Paid Customer Churn Prediction Using SPSS
18 pages
Lead Scoring for X Education
100% (1)
Lead Scoring for X Education
3 pages
FoodHub Data Insights for Growth
No ratings yet
FoodHub Data Insights for Growth
20 pages
Data Mining Project
No ratings yet
Data Mining Project
11 pages
Churn Prediction
100% (3)
Churn Prediction
41 pages
Data Mining Project - 27.06.2021
No ratings yet
Data Mining Project - 27.06.2021
6 pages
Lead Score Case Study Presentation
No ratings yet
Lead Score Case Study Presentation
16 pages
Data Mining Project: Clustering & Model Analysis
100% (1)
Data Mining Project: Clustering & Model Analysis
40 pages
Capstone Presentation
No ratings yet
Capstone Presentation
58 pages
Financial Risk Analysis Report
No ratings yet
Financial Risk Analysis Report
13 pages
Assighment Project 1
100% (3)
Assighment Project 1
18 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
6 pages
CMSU Student Survey Analysis
No ratings yet
CMSU Student Survey Analysis
10 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
19 pages
Churn Prediction in Telecom Using Machine Learning in R
No ratings yet
Churn Prediction in Telecom Using Machine Learning in R
9 pages
Hierarchical Time Series Forecasting in R
No ratings yet
Hierarchical Time Series Forecasting in R
7 pages
Data Structures and Algorithms - Alfred v. Aho
No ratings yet
Data Structures and Algorithms - Alfred v. Aho
5 pages
Factor-Hair RV PDF
No ratings yet
Factor-Hair RV PDF
23 pages
Australian Gas Production Forecasting
No ratings yet
Australian Gas Production Forecasting
17 pages
Thera Bank Loan Purchase Modeling Analysis
No ratings yet
Thera Bank Loan Purchase Modeling Analysis
26 pages
Osmocom TAC Database Overview
No ratings yet
Osmocom TAC Database Overview
4 pages
Global Review Tilapia
No ratings yet
Global Review Tilapia
7 pages
Fed-Batch and Continuous Culture Kinetics
No ratings yet
Fed-Batch and Continuous Culture Kinetics
30 pages
Review For Exam#1
No ratings yet
Review For Exam#1
5 pages
HPE - Sd00003740en - Us - HPE Web Services API Developer Guide v1 For HPE Alletra 9000 and HPE Primera
No ratings yet
HPE - Sd00003740en - Us - HPE Web Services API Developer Guide v1 For HPE Alletra 9000 and HPE Primera
422 pages
Ingenieurbüro Mencke & Tegtmeyer GMBH - Schwarzer Weg 43A - 31789 Hameln - Germany - WWW - Ib-Mut - de
No ratings yet
Ingenieurbüro Mencke & Tegtmeyer GMBH - Schwarzer Weg 43A - 31789 Hameln - Germany - WWW - Ib-Mut - de
4 pages
Cloning and Biotechnology Overview
No ratings yet
Cloning and Biotechnology Overview
5 pages
Weyand Et Al 2010 The Biological Limits To Running Speed Are Imposed From The Ground Up
No ratings yet
Weyand Et Al 2010 The Biological Limits To Running Speed Are Imposed From The Ground Up
12 pages
Free Range Farming Manual
No ratings yet
Free Range Farming Manual
55 pages
Parts Reference List MODEL: MFC7420 / 7820N DCP7010 / 7010L / 7025
No ratings yet
Parts Reference List MODEL: MFC7420 / 7820N DCP7010 / 7010L / 7025
33 pages
Credit Card Fraud Detection1
No ratings yet
Credit Card Fraud Detection1
5 pages
Software Engineering Project A...
No ratings yet
Software Engineering Project A...
16 pages
Roblox Skins - Google Search
No ratings yet
Roblox Skins - Google Search
1 page
Introduction To Documentary
No ratings yet
Introduction To Documentary
45 pages
Piano Quintet
No ratings yet
Piano Quintet
16 pages
STP Analysis: Citibank: Market Segmentation
No ratings yet
STP Analysis: Citibank: Market Segmentation
2 pages
E-Sax Manual July 2011 Rev A
No ratings yet
E-Sax Manual July 2011 Rev A
7 pages
Question Bank For 5 Units of BPPK
75% (8)
Question Bank For 5 Units of BPPK
3 pages
English Grade 09 Worksheet 2
No ratings yet
English Grade 09 Worksheet 2
3 pages
Erebuni Yerevan - Concert Instruments
No ratings yet
Erebuni Yerevan - Concert Instruments
1 page
ACC262 SPECIMEN PAPER (Nov 2024) (1) - Merged
No ratings yet
ACC262 SPECIMEN PAPER (Nov 2024) (1) - Merged
22 pages
Grades 1-12 Performance Overview
100% (2)
Grades 1-12 Performance Overview
12 pages
Engineering Management Thesis Support
100% (3)
Engineering Management Thesis Support
5 pages
Carol Rivers'board Review Flashcards
No ratings yet
Carol Rivers'board Review Flashcards
440 pages
Brochure
No ratings yet
Brochure
3 pages
Zatka Machine
No ratings yet
Zatka Machine
6 pages
AnglesB1 Ceo
No ratings yet
AnglesB1 Ceo
12 pages
A Matrix For Learning
No ratings yet
A Matrix For Learning
2 pages
Ubuntu Server 10.04 LTS
No ratings yet
Ubuntu Server 10.04 LTS
54 pages
Scope of Work
100% (1)
Scope of Work
2 pages

Telecom Customer Churn Prediction Model

Uploaded by

Telecom Customer Churn Prediction Model

Uploaded by

Telecom

Customer Churn Prediction Modelling

24th May 2020

Project Description & Objective ………………………………………………………………. 3

Project Report……………………………………………………………………………………..... 4-28

• EDA - Basic data summary, Univariate, Bivariate analysis, graphs

EDA - Basic data summary, Univariate, Bivariate analysis, graphs

Churn 1 if customer cancelled service, 0 if not

Initial summary of the data

Data structure just after loading

Missing values checked and results as below:

Converted the factor variables from numeric

Final summary of the dataset

14% of the customers (483) have

Customer details with account

90% of the customers have

72% of the customers doesn’t have data

Accountweeks Vs Customer Churn

No major trend in this

Contract Renewal Vs Churn

42% probability of customer churn if Contract is not renewed.

High probability of churn if datausage is very less or 0

Customer service calls Vs Churn

If the number of calls are between 3 to 7, churn % is higher.

Daytime usage Vs Churn

Monthly charge Vs Churn

Overage Fee Vs Churn

Probability of churning is higher between the ranges : 8 to 14

First logistic regression model will be built on the dataset.

As illustrated in the corr.plot, Dataplan, Datausage,Daymins,Monthlycharge variables

We will use a step reduction using VIF_FUNC to remove the variables:

Based on the above output, we will ignore Monthlycharge and

New regression built by removing the above two variables(multicollinearity effect)

myglm2=glm(Churn~. -MonthlyCharge -DataUsage,data=mycell_train,family="binomial")

VIF is also checked and it is within accepted values.

myglm3=glm(Churn~. -MonthlyCharge -DataUsage -AccountWeeks -

Summary of the final regression in the training dataset:

Confusion matrix on the training data set:

Summary of the final logistic

Interpretation of Logistic regression model / Odds ratio:

Before creating the model, the dataset is normalized.

norm = function(x) { (x- min(x))/(max(x) - min(x)) }

This returns a best fit of

We will use K=5 for the model building.

mypred1 = knn(mycellknn_train[,c(3,4,6,7,10,11)], mycellknn_test[,c(3,4,6,7,10,11)],

Interpretation of KNN Model

Variable importance of the model can be found using varImp as below

Naïve Bayes Model

Plotting the prediction on the training dataset.

ROC & AUC on the training dataset Confusion matrix

Prediction plot: threshold of > .15

ROC Plot Confusion Matrix

We will calculate KS static

Interpretation of Naïve Bayes model:

Model performance Measures chart:

AUC 81% 74% 86%

GINI 63% 47% 71%

KS 53% 47% 67%

Actionable insights & Recommendations:

Great Learning Videos & Course Materials

CRAN package documentation

You might also like