FRTemplate Software
FRTemplate Software
Project Supervisor
Warda Fiaz
Submitted By
In our opinion, it is satisfactory and up to the mark and therefore fulfills the
requirements of BS in Computer Sciences.
Warda Fiaz
Supervisor,
Software Projects & Research Section,
Department of Computer Sciences
Virtual University of Pakistan
___________________
(Signature)
___________________
(Signature)
Accepted By:
_____________
(For office use)
EXORDIUM
DIABETES PREDICTION USING CLASSIFICATION METHOD 1
In the name of Allah, the Compassionate, the
Merciful.
This project aims to demonstrate the feasibility of utilizing machine learning for
diabetes prediction. While not intended as a diagnostic tool, the model can act as a
preliminary screening mechanism, potentially aiding patients, and healthcare
professionals in early detection.
CHAPTER NO. 1
GATHERING & ANALYZING INFO...................................................08
1.1 INTRODUCTION
1.2 PURPOSE
1.3 SCOPE
CHAPTER NO. 2
DESIGNING THE PROJECT............................................................18
2.1 INTRODUCTION
2.3 SCOPE
CHAPTER NO.3
DEVELOPMENT.............................................................................40
3.1 DEVELOPMENT PLAN (ARCHITECTURE DIAGRAM)
1.1. Introduction:
1.2. Purpose:
Think of it this way: the data serves as the training material for the model. Just as a
student wouldn't perform well on an exam without proper learning materials, a
machine learning model wouldn't achieve ideal performance without high-quality
data and a clear estimation of its characteristics.
In machine learning projects, data gathering, and analysis are fundamental steps
that lay the foundation for a successful model.
Data Gathering:
The initial step involves collecting data that relates to the problem you're
trying to solve. In our case, for the diabetes prediction model, we'll gather data on
patients that might contain factors potentially linked to diabetes. This data could
include demographics, lifestyle habits, blood test results, and other relevant
medical information.
The goal is to acquire a comprehensive dataset that encompasses a wide range of
patient profiles. This variety ensures the model meets various scenarios and can
simplify its learnings effectively.
Data Analysis:
Once gathered, the data needs to be thoroughly examined to understand its
structure, content, and possibility limitations. This consists of identifying data
types, checking for missing values, and analyzing statistical properties. Through
techniques like Exploratory Data Analysis (EDA), we can reveal patterns, trends,
and relationships within the data. This analysis helps us understand how different
factors might be linked to the presence or absence of diabetes. The analysis often
reveals the need for data preprocessing steps such as cleaning (handling missing
values or inconsistencies), normalization (scaling data to a common range), or
feature engineering (creating new features from existing ones). This ensures the
data is suitable for the chosen machine learning algorithms.
1.3. Scope:
● Glucose level
● Body Mass Index (BMI)
● Blood Pressure (BP)
● Age
● Family history etc. of the patient.
v. Training:
● Develop a module to train the dataset using a
Neural Network machine learning algorithm,
ensuring efficient and effective model learning.
vi. Testing:
● Create test data to apply the test data on trained
models for evaluation.
● Display the results of the trained model on test
data.
viii. Results:
● Generate a confusion matrix to access the model’s
performance metrics, including:
⮚ Accuracy
⮚ Precision
⮚ Recall
⮚ F1 Score
iv. Security:
● Ensure the user’s data security across the UI to model.
v. Reliability:
● Design the system to be reliable, accurate, with less
errors, and providing accurate predictions consistency.
vi. Documentations:
● Provide comprehensive user guide documentation.
vii. Cross Platform:
● As Jupiter IDE (or other IDE) run on different OS so no
issue for OS dependence.
F23020A5E2
learning algorithm.
Dataset is
Admin explores
successfully
and visualizes the
Admin imported Data is Dataset is
dataset, observing Admin
selects and visually empty or
EDA summary statistics, skips the
03 "EDA available displayed not
Analysis trends, patterns, and visualizatio
Analysis" for analysis. with relevant preprocesse
insights through n step.
action. Data must insights. d.
Exploratory Data
be wrangled
Analysis (EDA).
(clean).
Admin applies the
test data on the
Model has
previously trained Model is
been
Admin model for Test results not trained,
Admin successfully
selects evaluation, are displayed or
04 Testing skips the trained
"Testing" observing and and insufficient
testing step. using the
action. interpreting the interpreted. test data is
training
results of the available.
data.
model's predictions
on the test dataset.
Dataset is
Admin initiates the Dataset is
Model is not pre-
training process, successfully
Admin Admin successfully processed,
allowing the system pre-
selects cancels the trained using or
05 Training to train the dataset processed
"Training" training Neural insufficient
using a Neural and
action. process. Network data for
Network machine available
algorithm. training is
learning algorithm. for training.
present.
F23020A5E2
action. results. generating
and F1 Score in the tested. displayed.
results is
trained model.
present.
Admin identifies Results of
and showcases Results of different
Admin Admin Model with
which machine different models are
selects skips the highest
08 Accuracy learning model models are not
"Accuracy checking accuracy is
achieves the highest successfully available or
" action. accuracy. highlighted.
accuracy for generated. inconclusiv
predicting diabetes. e.
This chapter dives into the technical aspects of designing your diabetes prediction
model. It figures the methodologies, tools, and representations used to bring the model
to life.
2.1 Introduction:
Early detection of diabetes is critical for effective management and avoiding
complications. This chapter focuses on designing a machine learning model
capable of predicting the possibility of diabetes in patients. Think of it as a
roadmap outlining the technical choices and steps involved in building this
model. Throughout this chapter, we'll explore:
2.2 Purpose:
This chapter serves as the testing ground for our diabetes prediction model.
Here, we'll assess the model's effectiveness in achieving its core objective:
predicting the prospect of diabetes in patients. By carefully evaluating its
performance, we can gauge its potential value as a tool for early detection.
It also noticed that in the testing phase the design document helps a lot
of testers to test the functionalities of the system. Testers use the design
document to understand the expected behavior and functionality of the
system. It helps in creating test cases, ensuring comprehensive test
coverage, and validating that the implemented features align with the
project requirements.
2.3 Scope
The design of the diabetes prediction model will focus on creating a robust and
informative system for early diabetes detection, with the following key
considerations:
Functionalities:
Algorithms:
Data Considerations:
Limitations:
Architectural Diagram:
Sequence Diagrams:
1. Dataset Description:
The dataset contains information related to diabetes patients.
Each row represents a specific patient, and each column represents a
different attribute or feature associated with that patient.
2. Column Descriptions:
preg: Number of times pregnant.
plas: Plasma glucose concentration (measured in mg/dL).
pres: Blood pressure (measured in mm Hg).
skin: Skin thickness (measured in mm).
test: Insulin test value (measured in μU/mL).
mass: Body mass index (BMI).
pedi: Diabetes pedigree function (a genetic risk score).
age: Age of the patient.
class: Indicates whether the patient has diabetes (1) or not (0).
3. Relationships:
Each row corresponds to a specific patient, and the values in the
columns represent the patient’s characteristics.
For example, the first row might represent a patient who has been
pregnant 6 times (preg=6), has a plasma glucose concentration of 148
mg/dL (plas=148), blood pressure of 72 mm Hg (pres=72), etc.
The “class” column indicates whether the patient has diabetes (1) or not
(0).
4. Summary Statistics:
The summary statistics (count, mean, standard deviation, etc.) provide
an overview of the distribution of each attribute across all patients in the
dataset.
These statistics help us understand the central tendency and variability
of the data.
Neural Network Model 0.75 0.82 0.79 0.67 0.84 0.73 0.76
SVM Model 0.81 0.82 0.79 0.80 0.81 0.81 0.80
Decision Tree Model 0.77 0.78 0.75 0.77 0.76 0.77 0.76
Logistic Regression
Model
0.79 0.80 0.78 0.80 0.78 0.80 0.78
Working:
Neural Network model is performing very well, and it is predicting
diabetes fine on dataset. It trained accurately on dataset and tested on dataset
also.
Results:
Working:
SVM Model is performing very well and providing accurate results on
the dataset.
Results:
SVM model provides accurate results of 81% on dataset. The result of
the model is followed by:
Working:
The Logistic Regression Model provides accurate results and trained on
dataset successfully.
Results:
The logistic regression model provides accurate results of 74% and its
results are followed by:
Understand the purpose of the project, its goals, and the scope.
Define the problem statement and the desired outcomes.
Identify any constraints or limitations.
5. Development Phase: