Machine Learning Project Documentation
Project Title
Customer Churn Prediction using Machine Learning
Table of Contents
1. Introduction
2. Problem Statement
3. Dataset Description
4. Tools and Technologies
5. Methodology
6. Model Development
7. Evaluation Metrics
8. Results
9. Conclusion
10. Future Work
11. References
Introduction
This project aims to predict customer churn in a telecom company using historical customer data
and machine learning models.
Problem Statement
Customer retention is crucial. The project predicts whether a customer is likely to leave the
company, enabling proactive engagement strategies.
Dataset Description
Source: Kaggle - Telco Customer Churn
Records: 7,043
Features: Customer demographics, service plans, billing info
Target Variable: Churn (Yes/No)
Tools and Technologies
Language: Python
Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn, xgboost
IDE: Jupyter Notebook / VS Code
Methodology
1. Data Cleaning
2. Exploratory Data Analysis (EDA)
3. Feature Engineering
4. Model Selection
5. Model Training
6. Model Evaluation
7. Deployment (optional)
Model Development
Models used:
- Logistic Regression
- Random Forest
- XGBoost
Hyperparameter tuning was done using GridSearchCV.
Evaluation Metrics
- Accuracy
- Precision, Recall
- F1 Score
- ROC-AUC Curve
Results
XGBoost performed the best with:
- Accuracy: 82%
- Precision: 79%
- Recall: 76%
Conclusion
Using ML models like XGBoost helped predict customer churn with high accuracy, which can
significantly aid customer retention efforts.
Future Work
- Integrate with a real-time dashboard
- Use deep learning for improved accuracy
- Collect more customer behavior data
References
- Kaggle Telco Dataset: https://www.kaggle.com/blastchar/telco-customer-churn
- Scikit-learn Documentation