0% found this document useful (0 votes)

115 views15 pages

Predicting Mobile Data Usage

This project explores the application of machine learning algorithms to predict daily mobile data usage based on user behavior and smartphone characteristics, utilizing a dataset of 700 records. Three models were trained: Linear Regression, Random Forest, and Gradient Boosting, with Gradient Boosting achieving the best performance. Key features influencing data usage were identified as Battery Drain and App Usage Time, and recommendations for further model improvement were provided.

Uploaded by

Loweh Fonyuy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

115 views15 pages

Predicting Mobile Data Usage

Uploaded by

Loweh Fonyuy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Table of Content

PREFACE........................................................................................................................................................i
RESEARCH................................................................................................................................................... 1
Introduction.................................................................................................................................................... 1
Project Overview....................................................................................................................... 1
Dataset:...................................................................................................................................... 1
Approach:...................................................................................................................................1
Problem Definition.................................................................................................................................. 1
Algorithms Used........................................................................................................................ 2
a. Linear Regression.............................................................................................................2
b. Random Forest Regressor............................................................................................................. 2
c. Gradient Boosting Regressor........................................................................................................ 3
Exploratory Data Analysis (EDA)................................................................................................... 4
Dataset Overview.......................................................................................................................4
Target Variable Distribution.......................................................................................................5
Target Variable Distribution.......................................................................................................6
Preprocessing & Feature Engineering............................................................................................................7
Steps...........................................................................................................................................7
Train-Test Split.......................................................................................................................... 8
Model Training & Evaluation.......................................................................................................... 9
Models Used.............................................................................................................................. 9
Training Code............................................................................................................................ 9
Model initialization and fitting............................................................................................ 9
Evaluation Metrics................................................................................................................... 10
Making a prediction on new data.............................................................................................11
Saving the Model for Future Use.............................................................................................11
Feature Importance.................................................................................................................. 12
Conclusion............................................................................................................................... 13
What We Can Do to Improve the Model Further.................................................................................. 13
Final Thoughts....................................................................................................................................... 13
PREFACE

This project investigates how different machine learning algorithms can be applied to predict
daily mobile data usage based on user behavior and smartphone characteristics. With the
growing need for efficient data plan management and usage forecasting, this project seeks to
demonstrate the practicality of predictive models like Linear Regression, Random Forest, and
Gradient Boosting in estimating mobile data consumption. The goal is to provide insight into
how such models can assist telecom companies, device manufacturers, and end-users in
understanding and managing mobile data consumption patterns.

RESEARCH

We utilized the Smartphone Usage and Behavioral Dataset sourced from Kaggle. This dataset
contains 700 records and 11 features, including App Usage Time, Screen On Time, Battery
Drain, Number of Installed Apps, Age, Gender, Device Model, and Operating System. The target
variable is continuous Daily Data Usage in MB.

To begin, we conducted exploratory data analysis (EDA) using tools such as Pandas, Seaborn,
and Matplotlib. This helped us understand the distribution of data, detect outliers, and examine
relationships between features and the target variable. We observed a right-skewed distribution
for data usage, indicating that most users consume moderate amounts of data while a few
consume very high amounts.

Following the EDA, we preprocessed the data through One-Hot Encoding for categorical
variables and Standard Scaling for numeric variables. The dataset was then split into training and
testing subsets in an 80:20 ratio.

We trained three machine learning models Linear Regression, Random Forest Regressor, and
Gradient Boosting Regressor and compared their performances using evaluation metrics like
Mean Absolute Error (MAE) and Root Mean Squared Log Error (RMSLE). Gradient Boosting
yielded the best results in our experiments, suggesting its suitability for this kind of regression
task. Furthermore, we examined feature importance, revealing that Battery Drain and App Usage
Time were the most influential predictors of daily mobile data consumption.

I
Introduction
Objective: Predict daily mobile data usage (MB/day) based on user behavior and device
characteristics.

Project Overview

We'll analyze a dataset containing information about mobile device usage and user behavior to
predict daily data consumption. The dataset includes features like app usage time, screen time,
battery drain, number of apps installed, and demographic information.

Dataset:

● 700 rows, 11 features (e.g., App Usage Time, Screen On Time, Battery Drain, Age, Gender).
● Target Variable: Data Usage (MB/day) (continuous).
● Source: Smartphone Usage and Behavioral Dataset

Approach:

1. Exploratory Data Analysis (EDA)

2. Preprocessing & Feature Engineering
3. Model Training (3 Algorithms)
4. Evaluation & Comparison

Problem Definition

We're trying to predict how much mobile data (in MB) a user will consume per day based on
their device characteristics and usage patterns. This is a regression problem since we're
predicting a continuous numerical value

1
Figure 1: Image shows a code snippet that shows the project dependencies.

Algorithms Used

a. Linear Regression

● Type: Simple and interpretable linear model.
● How it works: It finds the best-fitting straight line through the data by
minimizing the difference between predicted and actual values (using least
squares).
● Use case: Good for baseline models and when relationships between features and
the target are mostly linear.
● Equation

b. Random Forest Regressor

● Type: Ensemble model (uses multiple decision trees).
● How it works: Builds many decision trees on random subsets of the data and
averages their predictions to reduce overfitting and improve accuracy.
● Strength: Handles non-linear relationships, missing data, and categorical
variables well.

2
● Key concept: Bagging – training each tree on a different random sample of the
data.

c. Gradient Boosting Regressor

● Type: Ensemble model using boosting.
● How it works: Builds decision trees sequentially, where each new tree learns
from the errors of the previous ones.
● Strength: Highly accurate, great for capturing complex patterns in data.
● Key concept: Boosting correcting the previous model's mistakes step-by-step to
improve performance.

3
Exploratory Data Analysis (EDA)

Dataset Overview

Figure 2: Image shows a code snippet that prints the first 5 sample in our datatset

Figure 3: Image shows a code snippet that displays the information about the dataset.

4
Here we have imported the data and printed the first 5 rows. To import the data we used Pandas.

Key Observations:
● Mixed data types (numeric + categorical).
● No missing values ([Link]() confirms all columns are complete).

Target Variable Distribution

Figure 4: Image shows a plot distribution of the target variables.

5
Interpretation:
● Right-skewed distribution.
● Most users consume 300 to 1000 MB/day, with outliers (>2000 MB).

Target Variable Distribution

Then we draw our boxplot to see the distribution of the User behavior class against Data Usage.

Figure 5: Image shows a plot of user behavior class against their data usage.

6
Preprocessing & Feature Engineering

Steps

1. Categorical Encoding:

○ OneHotEncoder for Device Model, Operating System, Gender.
2. Numerical Scaling:
○ StandardScaler for Battery Drain, Screen On Time, etc.

Figure 6: Image shows the feature extraction and preprocessing of categorical data.

7
Train-Test Split

80% Train, 20% Test (train_test_split).

Figure 7: Image shows how the dataset has been splitted into the trained and test set.

8
Model Training & Evaluation

Models Used

Training Code

Model initialization and fitting.

Figure 8: Image shows model initialization and fitting.

9
Evaluation Metrics

Model Comparison:

Linear Regression Performance:

Mean Absolute Error: 117.04
Root Mean Squared Log Error: 0.2014

Random Forest Performance:

Mean Absolute Error: 114.74
Root Mean Squared Log Error: 0.2023

Gradient Boosting Performance:

Mean Absolute Error: 113.64
Root Mean Squared Log Error: 0.1980

10
Interpretation:
● Gradient Boosting performs best (lowest MAE and MSLE).

Making a prediction on new data

Figure 9: Image shows sample predictions made on the model.

Saving the Model for Future Use

Save the best model so we can deploy it for use.

11
Figure 10: Image shows saving the model for future use.

Feature Importance
After we have train the model we want to see the features that contributed well to our model.

Figure 11: Image shows the most important features that we considered during the training.

12
Top Features:
1. Battery Drain contributed about 45%
2. App Usage Time contributed to about 28%
3. Number of App installed contributed to about 19%

N:B The ones that contributed less to our model like OS, Device model, Gender etc can be
removed if more data is to be collected since its contribution to the model is not significant.

Conclusion
In this project, we successfully built a predictive system to estimate daily data usage (MB/day)
based on user and device behavior. We used three machine learning models.

What We Can Do to Improve the Model Further

Here are practical steps for improvement:

1. Feature Engineering
2. Data Cleaning & Outlier Handling
3. Model Optimization
● Try more advanced models like: XGBoost, LightGBM, CatBoost

Final Thoughts
This project shows how user behavior data can be leveraged to predict mobile data usage, which
could be useful for:
● Telecom companies optimizing data plans,
● Device manufacturers understanding usage patterns,
● End users tracking and managing data consumption.

Documents
No ratings yet
Documents
16 pages
Mobile Device Usage Data Analysis
No ratings yet
Mobile Device Usage Data Analysis
6 pages
EDA Mini - Report
No ratings yet
EDA Mini - Report
24 pages
Plate Notebook Guided Project 1 1
No ratings yet
Plate Notebook Guided Project 1 1
58 pages
ReCell Project PDF
No ratings yet
ReCell Project PDF
21 pages
004 Dataset
No ratings yet
004 Dataset
2 pages
Mobile Users Behavior Final Presntation Hager Ahmed Soad Atef
No ratings yet
Mobile Users Behavior Final Presntation Hager Ahmed Soad Atef
24 pages
Final Report
No ratings yet
Final Report
17 pages
E Commerce Project
No ratings yet
E Commerce Project
12 pages
Electronics 13 04897
No ratings yet
Electronics 13 04897
40 pages
Refurbished Phones Market Analysis
100% (1)
Refurbished Phones Market Analysis
26 pages
Blue 3D Elements 5G Technology Presentation
No ratings yet
Blue 3D Elements 5G Technology Presentation
41 pages
Experiments
No ratings yet
Experiments
7 pages
PM GRADED PROJECT Wagisha Jain
No ratings yet
PM GRADED PROJECT Wagisha Jain
21 pages
Hanoi - 2021: (Document Title)
No ratings yet
Hanoi - 2021: (Document Title)
19 pages
Capstone Soical Media Tourism Venkat Final
No ratings yet
Capstone Soical Media Tourism Venkat Final
35 pages
Exploratory Data Analysis of Live 5G Radio Access Network
No ratings yet
Exploratory Data Analysis of Live 5G Radio Access Network
69 pages
Smartphone Battery Life Prediction
No ratings yet
Smartphone Battery Life Prediction
22 pages
ML - Extended Project Business Report-Richa
No ratings yet
ML - Extended Project Business Report-Richa
32 pages
Project Report
No ratings yet
Project Report
11 pages
Rtps
No ratings yet
Rtps
9 pages
Summer Training Report On Data Analytics
No ratings yet
Summer Training Report On Data Analytics
17 pages
Churn Prediction in B2B SaaS CRM
No ratings yet
Churn Prediction in B2B SaaS CRM
56 pages
Turover Prediction
No ratings yet
Turover Prediction
52 pages
PM Guided Project Sample Business Report
No ratings yet
PM Guided Project Sample Business Report
35 pages
Proposal For Smartphone Prediction Using Machine Learning
No ratings yet
Proposal For Smartphone Prediction Using Machine Learning
4 pages
R Quiz Question2025
No ratings yet
R Quiz Question2025
2 pages
Report Final
No ratings yet
Report Final
31 pages
Unit 2
No ratings yet
Unit 2
48 pages
Inthiyas Phase2 PRJ
No ratings yet
Inthiyas Phase2 PRJ
8 pages
Predictive Modeling Guide
No ratings yet
Predictive Modeling Guide
29 pages
Phase-2 Ibrahim
No ratings yet
Phase-2 Ibrahim
9 pages
Project Re-Cell by Patel Dakshesh Maheshbhai
No ratings yet
Project Re-Cell by Patel Dakshesh Maheshbhai
41 pages
UCLA Electronic Theses and Dissertations: Title
No ratings yet
UCLA Electronic Theses and Dissertations: Title
43 pages
Problem Statement
No ratings yet
Problem Statement
2 pages
Em Semester Project
No ratings yet
Em Semester Project
21 pages
‎⁨نسخة من Senior Project 2 Presentation⁩
No ratings yet
‎⁨نسخة من Senior Project 2 Presentation⁩
33 pages
Proposal For Smatphone Addiction Prediction
No ratings yet
Proposal For Smatphone Addiction Prediction
4 pages
Report
No ratings yet
Report
2 pages
App Rating Prediction Model
No ratings yet
App Rating Prediction Model
51 pages
Telecom Churn Prediction Analysis
100% (1)
Telecom Churn Prediction Analysis
23 pages
AI ML K6rn1i 54 Merged
No ratings yet
AI ML K6rn1i 54 Merged
6 pages
Telecom Customer Churn Project Report
50% (2)
Telecom Customer Churn Project Report
25 pages
Ieee Research Paper
No ratings yet
Ieee Research Paper
2 pages
Machine Learning Model Development Guide
No ratings yet
Machine Learning Model Development Guide
3 pages
Bank Marketing Project
No ratings yet
Bank Marketing Project
18 pages
BDMDM Telemarketing
No ratings yet
BDMDM Telemarketing
16 pages
Kandarp Dave
No ratings yet
Kandarp Dave
1 page
SLF Project SolutionNotebook
100% (1)
SLF Project SolutionNotebook
58 pages
Predicting Li-Ion Battery Quality
No ratings yet
Predicting Li-Ion Battery Quality
66 pages
B - B S F M D U M L T: Ehavior Ased Ecurity OR Obile Evices Sing Achine Earning Echniques
No ratings yet
B - B S F M D U M L T: Ehavior Ased Ecurity OR Obile Evices Sing Achine Earning Echniques
10 pages
Telecome Churn
No ratings yet
Telecome Churn
4 pages
Smartphone App Usage Prediction
No ratings yet
Smartphone App Usage Prediction
16 pages
Assignment Question
No ratings yet
Assignment Question
6 pages
Stock Prediction with Boosting
No ratings yet
Stock Prediction with Boosting
112 pages
Sonu Kumar
No ratings yet
Sonu Kumar
3 pages
Nikhil Sanjay Thorat Assignment 2
No ratings yet
Nikhil Sanjay Thorat Assignment 2
9 pages
3.linear Regression
No ratings yet
3.linear Regression
33 pages
Understanding Linear Regression Techniques
No ratings yet
Understanding Linear Regression Techniques
31 pages
Homework Assignment-10 POM 500 Statistical Analysis Answersdocx
No ratings yet
Homework Assignment-10 POM 500 Statistical Analysis Answersdocx
51 pages
Multiple Regression
100% (1)
Multiple Regression
17 pages
Dawn Griffiths - Excel Cookbook - Recipes For Mastering Microsoft Excel-O'Reilly Media (2024)
No ratings yet
Dawn Griffiths - Excel Cookbook - Recipes For Mastering Microsoft Excel-O'Reilly Media (2024)
75 pages
Chapter Six Handout
No ratings yet
Chapter Six Handout
57 pages
Practical MCQs Least Squares
No ratings yet
Practical MCQs Least Squares
3 pages
Ai Project - 251012 - 153446
No ratings yet
Ai Project - 251012 - 153446
48 pages
Statistics and Probability Quarter 4: Week 8-Module 16 Regression Analysis
100% (2)
Statistics and Probability Quarter 4: Week 8-Module 16 Regression Analysis
13 pages
Effects of Parental Expectations and Cultural Values Orientation On Career
No ratings yet
Effects of Parental Expectations and Cultural Values Orientation On Career
10 pages
Greene, W. H., Econometric Ana
0% (1)
Greene, W. H., Econometric Ana
3 pages
STAT FYUGP Syllabus Mar 19 (Major+Minor)
No ratings yet
STAT FYUGP Syllabus Mar 19 (Major+Minor)
77 pages
Exploratory Data Analysis For Electric Vehicle Driving Range Prediction: Insights and Evaluation
No ratings yet
Exploratory Data Analysis For Electric Vehicle Driving Range Prediction: Insights and Evaluation
9 pages
Elite Rugby Union BiP Match Demands
No ratings yet
Elite Rugby Union BiP Match Demands
7 pages
Effects of Technology Innovation On Fina
No ratings yet
Effects of Technology Innovation On Fina
20 pages
Simple Linear Regression Lab in Python
No ratings yet
Simple Linear Regression Lab in Python
6 pages
Lecture 5 Dummy Variable
No ratings yet
Lecture 5 Dummy Variable
11 pages
Course Outline - ECON 443-23-24
No ratings yet
Course Outline - ECON 443-23-24
4 pages
Data Science Vijay1
No ratings yet
Data Science Vijay1
88 pages
ReSA B46 MS First PB Exam Questions, Answers & Solutions
100% (2)
ReSA B46 MS First PB Exam Questions, Answers & Solutions
12 pages
Health Psychology in Integrative Health Care - Sundeep Katevarapu (Editor), Anand Pratap Singh (Editor)
No ratings yet
Health Psychology in Integrative Health Care - Sundeep Katevarapu (Editor), Anand Pratap Singh (Editor)
298 pages
Plan de Estudios de Ingeniería Estadística
No ratings yet
Plan de Estudios de Ingeniería Estadística
1 page
Flood Susceptible Mapping and Risk Area Delineatio
No ratings yet
Flood Susceptible Mapping and Risk Area Delineatio
19 pages
Previewpdf
No ratings yet
Previewpdf
45 pages
2.correlation Regression Summary Notes by Pranav Popat 1
No ratings yet
2.correlation Regression Summary Notes by Pranav Popat 1
4 pages
Chapter3 Final
No ratings yet
Chapter3 Final
29 pages
(Ebook PDF) An Introduction To Statistical Methods & Data Analysis 7thinstant Download
100% (8)
(Ebook PDF) An Introduction To Statistical Methods & Data Analysis 7thinstant Download
45 pages
GRACE
No ratings yet
GRACE
14 pages
System Dynamics Model Validation Case Study
No ratings yet
System Dynamics Model Validation Case Study
15 pages
Market Segmentation - Product Service Management
No ratings yet
Market Segmentation - Product Service Management
16 pages

Predicting Mobile Data Usage

Uploaded by

Predicting Mobile Data Usage

Uploaded by

Table of Content

1.​ Exploratory Data Analysis (EDA)

a.​ Linear Regression

b.​ Random Forest Regressor

c.​ Gradient Boosting Regressor

Target Variable Distribution

Figure 4: Image shows a plot distribution of the target variables.

Target Variable Distribution

1.​ Categorical Encoding:

80% Train, 20% Test (train_test_split).

Model initialization and fitting.

Figure 8: Image shows model initialization and fitting.

Linear Regression Performance:

Random Forest Performance:

Gradient Boosting Performance:

Making a prediction on new data

Figure 9: Image shows sample predictions made on the model.

Saving the Model for Future Use

What We Can Do to Improve the Model Further

Here are practical steps for improvement:

You might also like

1. Exploratory Data Analysis (EDA)

a. Linear Regression

b. Random Forest Regressor

c. Gradient Boosting Regressor

1. Categorical Encoding: