0% found this document useful (0 votes)

61 views37 pages

Diabetes Prediction with Machine Learning

The document is a synopsis of a project titled 'Diabetes Prediction Using Machine Learning' submitted by Ruchi Sharma for a Master's degree. It outlines the prevalence of diabetes, the significance of early detection, and the application of various machine learning algorithms to predict diabetes using the PIMA Indian Diabetes Database. The project aims to assist healthcare professionals in diagnosing diabetes early, with findings indicating that the Random Forest Classifier produced the highest accuracy of 75.97% among the tested algorithms.

Uploaded by

Abhinay Maheshwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views37 pages

Diabetes Prediction with Machine Learning

Uploaded by

Abhinay Maheshwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

A SYNOPSIS ON

DIABETES PREDICTION USING MACHINE

LEARNING

Submitted in partial fulfilment of the requirement for the award

of the degree of

MASTER OF COMPUTER APPLICATIONS

Submitted by:

Ruchi Sharma University Roll

No:1102853

Under the Guidance of

Mr.Harendra Singh Negi
Assistant Professor

Department of Computer Applications

Graphic Era (Deemed to be University)
Dehradun, Uttarakhand
September-2023

1
2
CANDIDATE’S DECLARATION
I hereby certify that the work which is being presented in the Synopsis entitled “Diabetes
Prediction Using Machine Learning” in partial fulfillment of the requirements for the
award of the Degree of Master of Computer Applications in the Department of Computer
Applications of the Graphic Era (Deemed to be University), Dehradun shall be carried out by
the undersigned under the supervision of Mr.Harendra Singh Negi, Assistant Professor,
Department of Computer Applications, Graphic Era (Deemed to be University), Dehradun.

Name : Ruchi Sharma University Roll no:1102853 Signature:

The above mentioned student shall be working under the supervision of the undersigned on
the “Diabetes Prediction Using Machine Learning”

Signature Signature
Supervisor Head of the Department

Internal Evaluation (By DPRC Committee)

Status of the Synopsis: Accepted / Rejected

Any Comments:

Name of the Committee Members: Signature with Date

1.
2.

3
Table of Contents

Page no

Abstract 4

Chapter 1 Introduction and Problem Statement

1.1 Introduction 5

1.2 Types of Diabetes 6

1.3 Symptoms of Diabetes 7

1.4 Causes of Diabetes 8

1.5 Applications of Machine Learning 9

1.6 Problem Statement 7

1.7 Dataset Used 10

1.8 Tools and Technology Used 11

Chapter 2 Background / Literature Survey

2.1 Introduction 14

2.2 Significance of Diabetes Prediction 14

2.3 Evolution Of ML in Diabetes Prediction 15

2.4 Overview of prior research 15

2.5 Research Gap 18

Chapter 3 Literature Review 20

Chapter 4 Problem Statement And Methadology

4.1 Problem Statement 21

4.2 Methadology 22

4.3 Dataset Used 23

4
4.4 Evaluation Metrics 24

4.5 Expected Outcomes 24

4.6 Objectives 25

Chapter 5 Result Analysis

Description 26

5.1 Evaluation 28

5.2 Results 30

Chapter 6 Conclusion 33

6.1 Scope of Future Work 34

References 36

5
Abstract
Diabetes is an illness that causes health problems all around the world. According to the IDF,

382 million people in the world have diabetes. In 2035, this number will grow to 592 million.

It increases the risk of chronic problems such as heart problems and CKD. If the disease is

caught early, people can live long, healthy lives. Predictions in the medical field are difficult,

but ultimately can help doctors make timely decisions about a patient's health and disease

based on data. The emergence of machine learning techniques solves this important problem.

This project aims to create a model that reliably predicts the accuracy of diabetes in patients.

Different models of machine learning trained with appropriate data can help diagnose

diabetes at an early stage. Efficient preprocessing techniques like standardization have also

been used in order to increase the accuracy .

To detect diabetes at its preliminary stage, this project deploys the concepts of machine

learning classification algorithms that includes : K Nearest Neighbours, SVM, Decision

tree ,Random Forest ,Logistic Regression .The PIMA INDJAN DIABETES DATABASE

(PIDD) is used in the experiment. Its purpose is to diagnose whether a patient has diabetes

using diagnostic measures included in the dataset. Various measures like Precision,

Accuracy, Specificity, and F1 Score are measured over classified instances using Confusion

Matrix

Accuracy of various algorithms were compared and the project's conclusion is Random

Forest Classifier Algorithm produced the best results, with an accuracy of 75.97%. Using

machine learning methods, this project aims to assist doctors and physicians in the early

detection of diabetes.

6
Chapter 1

Introduction and Problem Statement

Diabetes is a an illness that holds the power to cause health problems all around the world.
According to the IDF, 382 million people worldwide have diabetes. In 2035, this number
will grow to 592 million. Diabetes is a disease caused by increased blood glucose levels.
High serum glucose can cause symptoms such as frequent micturition, increased thirst and
increased food demand. Diabetes leads in the race of causing blindness, kidney failure,
amputation due to diabetic foot , CHF and stroke. When we consume converts into glucose.
Due to high glucose, our pancreas secretes a hormone insulin. Insulin helps in glucose
transportation and allows glucose to enter in our cells and allows us to use glucose as energy.
However, this well organised mechanism fails in diabetes. Type 1 and type 2 diabetes are the
most common type of disease, but there are others such as gestational diabetes and less
common forms like MODY , DIABETES 1.5 etc. Machine learning is a revolution in field of
data science that studies how machines can learn from experience and be used to open
endless dimensions.

1.1 Introduction

Diabetes is a pathology that affects the beta cells of pancreas and in which the body cannot
produce enough insulin or body becomes resistant to ir[1]. Insulin is mainly responsible for
keeping a check on blood glucose levels. Factors influencing DM includes obesity, lack of
exercise, high blood pressure and bad cholesterol levels . It cause many problems, but
increased micturition commonest. [2]. It causes damage to the skin, nerves and eyes, and if
not treated early, can lead to eye diseases, renal failure and diabetic retinopathy having a
bad prognosis. According to the IDF (International Diabetes Federation), 537 million people
worldwide will have diabetes [3]. According to 2019 statistics, approximately 7.1 million
people in Bangladesh are affected by this disease [2].

According to the World Health Organization (WHO), diabetes affects 8.5% of people over th
e age of 18 and causes 1.6 million deaths worldwide (World Health Organization, 2021). Alth
ough premature deaths from diabetes decreased between 2000 and 2010 in many developing
countries, the statistics increased between 2010 and 2016. Chronic respiratory diseases and di

7
abetes have killed more than 18% of the world's population and have become a public health
problem.

Artificial intelligence and machine learning technologies provide a tool to help them
understand the disease and reduce their work accordingly.

1.2Types of Diabetes

Type 1 Diabetes (T1D)

Type 1 DM, also known as juvenile diabetes or earlier k/a insulin-dependent diabetes, is an
autoimmune condition where the body attacks and destroys the beta cells of the pancreas.
This results in negligible insulin production. Insulin is essential for transporting glucose into
various cells of the body to be used for energy. Without sufficient insulin, glucose levels
spikes dangerously high.
 Causes: The etiology of Type 1 diabetes is idiopathic, but it is believed to involve a
combination of genetic factors and environmental insults, such as exposure to certain
viruses.
 Symptoms: Common symptoms include frequent micturition, excessive
thirst,wasting of muscles and low BMI, lethargy and blurred vision.
 Management: Individuals with Type 1 diabetes require lifelong SC insulin therapy,
either through injections or an insulin pump, as well as regular monitoring of glucose
in blood, a balanced diet, and regular physical activity for better quality of life.
Type 2 Diabetes (T2D)
Type 2 DM aces and is the commonest form of diabetes, amounting for around 90% of all
cases. Its onset in adults occurs over the age of 45, though age demarcation is not well
defined and it may occir in yound individuals including children and adolescents. Type 2 DM
occurs due to insulin resistance d/t which pancreas secretes more inslulin and over long term
it turns unto pancreatic beta cell failure leading to lower levels of insulin in the body.
 Causes: Type 2 DM has strong association with genetic factors and lifestyle choices,
such as poor diet, sedentary habits, obesity, and aging.
 Symptoms: Symptoms are similar to those of Type 1 diabetes, including increased
thirst, frequent micturition, lethargy, and diminished vision. Additionally, some
individuals may experience delayed healing of wounds and frequent infections d/t low
immunity.
 Management: Management strategies include lifestyle modifications (healthy diet,
regular exercise, and weight loss), oral hypoglycemic agents, and sometimes insulin
therapy in the long run. Regular blood glucose monitoring is also essential.
Gestational Diabetes (GDM)
Gestational diabetes is associated with pregnancy and usually disappears after delivery.
However, it increases the risk of Type 2 DM later in life for both the mother and offspring.
Gestational diabetes is characterized by high blood glucose that develop during pregnancy in
women who did not have diabetes in the past.
 Causes: The etiology of gestational diabetes is idiopathic, but hormonal changes
during pregnancy play a role in the development of insulin resistance. Risk factors

8
such as obesity, a history of gestational diabetes in previous pregnancies, and a family
history of diabetes contributes majorly.
 Symptoms: Gestational diabetes often does not cause noticeable symptoms, but it can
be detected through routine screening tests during pregnancy.
 Management: Management includes lifestyle changes such as healthy eating and
regular physical activity. Some women may need insulin therapy if lifestyle changes
are not sufficient to control blood sugar levels. Regular monitoring of blood glucose
levels is essential to ensure they remain within a healthy range.
1.3Symptoms of Diabetes
Diabetes can present with a range of symptoms that vary in intensity and onset depending
on the type and severity of the disease. Understanding these symptoms is crucial for early
diagnosis and effective management.

1. Frequent Urination (Polyuria): One of the hallmark symptoms of diabetes is the

need to urinate frequently. This occurs because excess glucose in the blood spills into the
urine, drawing more water with it, which leads to increased urine output. Individuals may
notice they need to use the bathroom more often, especially at night (nocturia).
2. Increased Thirst (Polydipsia): As a result of frequent urination, the body loses more
water, leading to dehydration. This triggers excessive thirst. People with diabetes often
feel the need to drink more fluids to compensate for the water loss, but despite drinking
more, the thirst persists.
3. Increased Hunger (Polyphagia): Despite having high blood sugar levels, the body's
cells are unable to use glucose effectively for energy due to lack of insulin or insulin
resistance. This lack of energy triggers a feeling of extreme hunger. Even after eating,
individuals may not feel satiated.
4. Fatigue: A common symptom across all types of diabetes is persistent fatigue. The
body's inability to use glucose for energy efficiently means that cells are starved of
energy, leading to constant tiredness and lethargy, even with adequate rest.
5. Blurred Vision: High blood sugar levels can cause the lenses of the eyes to swell,
leading to changes in vision. Blurred vision may come and go depending on blood sugar
levels and can affect one or both eyes.
6. Delayed Healing of Wounds: High blood sugar impairs blood flow and affects the
body's ability to heal wounds. This results in cuts, sores, and bruises taking longer to heal.
Diabetics are also more prone to infections, particularly skin infections.
7. Significant unexplained Weight Loss: In Type 1 diabetes, the body starts breaking
down fat and muscle for energy because it can't use glucose properly. This can lead to
rapid and unexplained weight loss despite normal or increased food intake. In Type 2
diabetes, weight loss might occur less frequently but can still be a sign of uncontrolled
diabetes.
8. Recurrent Infections: People with diabetes are more susceptible to infections due to
high blood sugar levels weakening the immune system. Common infections include
urinary tract infections, yeast infections, and skin infections.
9. Numbness or Tingling in Hands or Feet - Paresthesia (Peripheral Neuropathy):
Chronic high blood sugar levels can damage nerves, particularly in the extremities. This
can cause symptoms like numbness, tingling, burning, or pain in the hands and feet.

9
Peripheral neuropathy is a serious complication that needs to be addressed to prevent
further nerve damage

1.4Causes of Diabetes

Diabetes is a multifaceted disease with various underlying causes. The primary types of
diabetes—Type 1, Type 2, and gestational diabetes—each have distinct causes, though some
factors may overlap. Understanding these causes can help in managing and potentially
preventing the disease.
Type 1 Diabetes
Type 1 diabetes is an autoimmune condition. It occurs when the body’s immune system
mistakenly attacks and destroys the insulin-producing beta cells in the pancreas. The precise
cause of this autoimmune response is not completely understood, but several factors are
believed to contribute:
1. Genetic Factors: Certain genes increase the risk of developing Type 1 diabetes. Family
history plays a significant role, and individuals with close relatives who have Type 1 diabetes
are at a higher risk.
2. Environmental Triggers: Environmental factors, such as exposure to certain viruses,
might trigger the autoimmune response in genetically predisposed individuals. Possible viral
triggers include the Coxsackievirus, mumps, and rubella.
3. Autoimmune Reactions: In Type 1 diabetes, the immune system’s T cells attack the
insulin-producing beta cells. The exact mechanism behind this autoimmune reaction is still
being researched, but it involves a complex interaction of genetic and environmental factors.
Type 2 Diabetes
Type 2 diabetes is primarily associated with insulin resistance, where the body’s cells do not
respond effectively to insulin, and with inadequate insulin production over time. Several
factors contribute to the development of Type 2 diabetes:
1. Lifestyle Factors:
 Obesity: Excess body fat, particularly around the abdomen, increases the body’s resistance
to insulin.
 Physical Inactivity: A sedentary lifestyle contributes to insulin resistance and weight gain,
increasing the risk of Type 2 diabetes.
 Unhealthy Diet: Diets high in processed foods, sugars, and unhealthy fats can lead to weight
gain and insulin resistance.
2. Genetic Factors: A family history of Type 2 diabetes significantly increases the risk.
Certain genes related to glucose metabolism and insulin production can predispose
individuals to the disease.
3. Age: The risk of Type 2 diabetes increases with age, particularly after the age of 45.
However, the incidence is also rising among younger populations, including children and
adolescents, due to increasing rates of obesity and inactivity.
4. Ethnicity: Certain ethnic groups, including African Americans, Hispanics, Native
Americans, and Asian Americans, have a higher prevalence of Type 2 diabetes, suggesting a
genetic predisposition in these populations.
5. Metabolic Syndrome: A cluster of conditions—high blood pressure, high blood sugar,
abnormal cholesterol levels, and excess abdominal fat—collectively known as metabolic
syndrome, significantly raises the risk of Type 2 diabetes.
10
Gestational Diabetes
Gestational diabetes occurs during pregnancy and typically resolves after childbirth, but it
increases the risk of developing Type 2 diabetes later in life. The causes of gestational
diabetes include:
1. Hormonal Changes: During pregnancy, the placenta produces hormones that help the
baby grow. Some of these hormones can make the mother’s cells more resistant to insulin. As
the pregnancy progresses, the placenta enlarges and produces more of these hormones,
increasing insulin resistance.
2. Insulin Demand: As insulin resistance increases, the pancreas tries to compensate by
producing more insulin. If the pancreas cannot keep up with the increased demand, blood
sugar levels rise, leading to gestational diabetes.
3. Genetic and Lifestyle Factors: Similar to Type 2 diabetes, genetic predisposition and
lifestyle factors like obesity and physical inactivity also play a role in the development of
gestational diabetes.

1.5Applications of Machine Learning

Machine learning holds significant promise in the medical field, offering opportunities to
improve diagnostics, treatment planning, disease management, and patient outcomes.
Continued advancements in machine learning algorithms, coupled with the availability of
large-scale medical data, have the potential to revolutionize healthcare and usher in a new era
of personalized and effective medical interventions.

Disease Diagnosis and Detection: Machine learning algorithms can analyze large amounts of
medical data, including patient records, medical images, and genetic information, to assist in
disease diagnosis and detection. By learning patterns and relationships in data, machine
learning models can identify subtle signs or indicators of diseases, enabling earlier and more
accurate diagnoses.

Personalized Treatment Planning: Machine learning algorithms can analyze patient data, such
as medical history, genetic information, and treatment outcomes, to create personalized
treatment plans. By considering individual patient characteristics, machine learning models
can assist in selecting the most effective treatments and predicting potential adverse
reactions, ultimately leading to improved patient outcomes.

With ongoing advancements in machine learning algorithms and the availability of extensive
medical data, the potential for further transformative applications in the medical field is
substantial. Machine learning, when used in conjunction with human expertise, has the power
to revolutionize healthcare, improving diagnostics, treatment planning, disease management,
and patient outcomes.
11
Machine learning (ML) plays a crucial role in predicting diabetes by analyzing vast amounts
of medical and lifestyle data to identify individuals at risk. This application leverages various
algorithms and data sources to provide accurate and early predictions, which are essential for
effective prevention and management of diabetes.

1.6 Problem Statement

Three types of errors can occur in the current diagnostic method:

 Negative type- The patient is actually diabetic, but the test results show that the person
does not have anemia.
 False positive type. In this type of patient, he is not actually diabetic, but the test report
shows that he is diabetic.
 Unclassifiable type, the system cannot detect the situation. This is because there is not
enough information from previous data and the patient is estimated as an unknown type.

This misdiagnosis can lead to inappropriate treatment or missing treatment when needed. In
order to prevent this impact or reduce its severity, machine learning algorithms need to be
developed that will provide accurate results and reduce human use.
We use various classification and association methods to predict diabetes. Machine learning
is a method used to train a computer or machine. Many machine learning methods gather
information by building various classification and association models from the collected data
to provide useful results.
These programs make predictions based on variables such as your health history and lifestyle.
They examine many samples of people with and without diabetes to make better predictions.
For example, they might focus on how much sugar someone eats or how much exercise they
get. By doing this, they can give early warning to people who are at risk of developing
diabetes so they can take better care of themselves.

1.7 Dataset Used :

The Pima Indian dataset is an open source dataset that is publicly available for distributed ma
chine learning and is used in conjunction with private datasets in this project [4]. There is
data on 768 patients, 268 diabetic patients.

12
Figure 1.1 Percentage of people having diabetes in the Pima Indian dataset

1.8 Tools and Technology:

1.8.1 Configuration
i. Processor: Intel® Core i3-10100
ii. RAM: 8GB
iii. Storage: 500GB
iv. Graphic: Intel UHD 630 Graphics
v. OS: Windows 10
1.8.2 Technology
i. Python
ii. Scikit-learn
iii. Panda
iv. Pickle

13
Chapter 2

Literature Survey

2.1 Introduction

The accurate and timely detection of diabetes plays a critical role in effective diagnosis,
treatment planning, and patient outcomes. Over the years, extensive research has been
conducted in this field, leveraging advancements in machine learning algorithms and
techniques. These studies have paved the way for the development of sophisticated and
reliable approaches for diabetes detection.
.Research efforts are now using advanced technologies, primarily machine learning (ML), to
improve health outcomes. Diabetes is a chronic and common disease that has been the focus
of many studies focused on using the power of machine learning to better manage and
predict its onset and progression as it grows. This section reviews some of the important
existing work in this field.

2.2 Significance of Diabetes Detection

Diabetes presents significant challenges due to its complexity and potential for severe health
consequences. Early and accurate detection of diabetes is crucial for ensuring timely medical
interventions, improved patient prognosis, and enhanced quality of life. Traditional
diagnostic methods, while valuable, often rely on manual interpretation and can be subjective
and time-consuming. Therefore, the integration of machine learning techniques in diabetes
detection has gained significant attention, promising automated and efficient solutions that
can assist medical professionals in making informed decisions.
Machine learning algorithms can analyze large datasets to identify patterns and risk factors
associated with diabetes, leading to more accurate and timely diagnoses. These techniques
help in overcoming the limitations of traditional methods by providing consistent, data-driven
insights that enhance diagnostic precision. The application of machine learning in diabetes
detection not only improves early diagnosis but also facilitates personalized treatment plans,
proactive management, and better patient outcomes.

14
2.3 Evolution of Machine Learning in Diabetes Detection
Machine learning algorithms, with their capacity to learn from data patterns and make
predictions, have transformed the field of medical diagnostics, particularly in the detection
and management of diabetes. These algorithms enhance the accuracy, speed, and objectivity
of diabetes detection, providing healthcare professionals with valuable insights. In recent
years, the application of machine learning techniques, such as neural networks, decision trees,
and ensemble models, has yielded promising results in diabetes prediction and diagnosis.
Machine learning models can analyze vast datasets, including electronic health records,
genetic information, and lifestyle factors, to identify individuals at risk of developing
diabetes. By leveraging large-scale data, these models detect subtle patterns that might be
missed by traditional methods, enabling earlier and more precise diagnosis.
The integration of machine learning in diabetes detection marks a significant advancement in
healthcare, offering automated and efficient solutions that support medical professionals in
making informed decisions. As these technologies continue to evolve, they hold the promise
of further enhancing the early detection and management of diabetes, ultimately leading to
better patient outcomes and quality of life.
2.4 Overview of Prior Research
Yashoda et al [5]. The diabetes patient database was created by collecting data from the
hospital's repository, which contained 200 cases with nine characteristics. The nature of this
information relates to two groups; blood test and urine test. In this study,since it is very
effective in small data,verification can be made by classifying the data using WEKA and
evaluating the data with the 10-fold cross-validation method, And the results can be
compared. . Naive Bayes, J48, REP trees and random trees algorithms were used. The result
is that J48 gives the best result with 60.2% accuracy.

Zou et al. (2018)[6] Diabetes prediction using decision trees, random forests, and neural netw
orks. These data were collected from Luzhou Physical Examination in China. Principal comp
onent analysis (PCA) was used to reduce the remaining data sets. They selected several mach
ine learning methods to conduct independent tests to verify the validity of the method.

K. Vijiya Kumar et al. [7] proposed

method to predict diabetes using random forest in machine learning; This construct is one that
can predict early diabetes for patients with greater accuracy. The proposed model provided th
15
e best results for diabetes prediction, and the results showed that the prediction method could
predict diabetes efficiently, effectively, and more
importantly, instantly.

Deeraj Shetty et al. [8] proposed diabetes disease prediction using data mining assemble
Intelligent Diabetes Disease Prediction System that gives analysis of diabetes malady
utilizing diabetes patient’s database. In this system, they propose the use of algorithms like
Bayesian and KNN (K-Nearest Neighbor) to apply on diabetes patient’s database and analyze
them by taking various attributes of diabetes for prediction of diabetes disease.

Tejas N [9] proposed to use machine learning to predict diabetes with three machine
learning methods, including SVM, logistic regression, and ANN. This study provides a
useful tool for rapid diagnosis of diabetes.

Aishwarya et al. [10] It aims to discover solutions for diabetes diagnosis by researching and
analyzing decision trees obtained from data and distribution analysis using Naive Bayes
algorithms. The research hopes to find a faster way to recognize the disease, which will help
treat patients in a timely manner. Here's the result: The J48 algorithm has an accuracy of
74.8%, while Naive Bayes has an accuracy of 79.5% using 70:30 splitting.

Gupta et al. [11] aims to find and calculate the accuracy, sensitivity, and percent specificity of
various classification methods and to compare and analyze the results of several methods
deployed in WEKA. Performance of the same classifier when used using the same
parameters(e.g. compare accuracy, sensitivity, and specificity) by many other tools, including
Rapidminer and Matlab. They use JRIP, Jgraft and BayesNet algorithms. The results showed
that Jgraft had the highest accuracy of 81.3%, sensitivity of 59.7% and specificity of 81.4%.
It was concluded that WEKA works better than Matlab and Rapidminner.

Lee et al. [12] focused on the use of a decision tree algorithm called CART in diabetes
medical records after inverse filtering the data. The author shows the problem in the fuzzy
class and that this problem needs to be solved before using an algorithm to achieve better
accuracy. Category imbalance often occurs in datasets with binary values; This means that

16
there are two outcomes for category variables, and if data is seen first before , the stage can
be easily done and will help improve the accuracy of the prediction model.

In their recent study [13], Mohan and Jain employed the SVM algorithm to analyze and
predict diabetes using the Pima Indian Diabetes Dataset. They experimented with four
distinct types of kernels: linear, polynomial, RBF, and sigmoid, to perform the predictions on
a machine learning platform. The accuracies achieved with these different kernels varied,
ranging from 0.69 to 0.82. Notably, the SVM method utilizing the radial basis function (RBF)
kernel achieved the highest accuracy, reaching 0.82.

Olisah et al. [14] carried out diabetes mellitus prediction using advanced feature selection and
various machine learning models. They utilized two open-source datasets: the Pima Indian
and LMCH Iraqi databases. To handle missing samples, a polynomial regression-based
preprocessing technique was applied. Hyperparameter tuning was conducted for the random
forest, decision tree, and deep neural network (DNN) models. The optimized DNN technique
achieved the highest accuracy, with scores of 0.972 for the Pima dataset and 0.973 for the
LMCH dataset.

Ramesh et al. [15] developed an automated remote system for predicting diabetes using the
Pima Indian dataset. They applied various data preprocessing methods, including feature
scaling, feature selection, and SMOTE. The SVM with an RBF kernel achieved the highest
accuracy of 83.2%. This machine learning framework was integrated into an Android
application.

Jyotismita et al. [16] developed a mechanism for detecting and analyzing diabetes using six
facets: dataset, processing methods, feature extraction, machine learning identification, and
classification and diagnosis of diabetes mellitus (DM), addressing the limitations of
classification. They compared various supervised, unsupervised, and clustering techniques.
Each dataset presented unique challenges, highlighting the need for significant improvements
to enhance the efficiency of detecting different diabetic conditions.

Branimir et al. [17] introduced a system aimed at addressing two main challenges: the
heterogeneity of previous techniques and the lack of transparency in feature selection.

17
Utilizing the PRISMA methodology, they conducted a comparison of 18 different models,
including tree-based algorithms. The study concluded that KNN and SVM are predominantly
used for prediction.

Nur et al. [18] concentrated primarily on data preprocessing, which involved removing
missing values, balancing the dataset, assessing feature importance, and performing data
augmentation. They used Random Forest (RF) and Logistic Regression (LR) for
classification. The results showed a 20% increase in precision and a 24% increase in recall
compared to data that had not undergone preprocessing.

Safial et al. [19] proposed a strategy for diagnosing diabetes using a deep learning (DL)
network, employing 5-fold and 10-fold cross-validation for training. Utilizing the Pima
Indians dataset, they achieved a prediction accuracy of 98.35% with 10-fold cross-validation.

Bavkar et al. [20] developed a pipeline model utilizing deep learning (DL) techniques to
predict diabetes. The model includes data augmentation with a variational autoencoder
(VAE), feature augmentation with a sparse autoencoder (SAE), and classification with a
convolutional neural network (CNN). Using the Pima dataset from the UCI Repository, they
achieved an accuracy of 92.31% by training the CNN classifier in conjunction with SAE for
feature augmentation, compared to a well-balanced dataset.

Goyal and his team [21] developed a smart home health monitor to detect diabetes. The
authors also used the Pima Indian Sourcebook for their research. They use formal judgments
to estimate blood pressure; They use SVM, KNN and decision trees to predict diabetes.
Among these models, SVM outperformed other classification algorithms, with 75% accuracy.

2.5 Research Gap

• Limited consideration of overfitting: While a researcher in one of the studies compared
various algorithms for diabetes prediction and concluded that XGBoost performed the best, it
is noted that the issue of overfitting was not taken into account.
• Overfitting is a common challenge in machine learning models, and its impact on the
classification accuracy and generalizability of the algorithms should be further explored.

18
 Further research could focus on comparing and optimizing different segmentation
algorithms to improve the accuracy and reliability of Diabetes Prediction
 Limited Analysis of Multi-Class Diabetes Classification: While many studies
focus on binary classification of diabetes (diabetic vs. non-diabetic), there is a
significant gap in the literature concerning the comprehensive evaluation and
comparison of algorithms for multi-class diabetes classification. Diabetes presents
in various forms, such as Type 1, Type 2, and gestational diabetes, each requiring
different management strategies.
 Investigating the Effectiveness of Different Machine Learning Approaches in
Classifying Various Types of Diabetes: Exploring diverse machine learning
approaches to accurately classify the types of diabetes could significantly enhance
diagnostic tools.

19
Chapter 3
Literature Review
The objectives of the proposed work are as follows:

1. To develop a machine learning-based model for estimating an individual’s risk of

developing diabetes, utilizing a comprehensive set of risk factors including age,
glucose, Blood Pressure, and lifestyle choices among others.
2. To employ machine learning algorithms (Decision Tree, Random Forest, Support
Vector Machine, Logistic Regression, K Nearest Neighbors) to detect diabetes. Next,
the performance of these classifiers has been evaluated in terms of accuracy.
3. Identifying lowrisk individuals and reducing reliance on invasive procedures and
regular diagnoses can reduce overall health care expenses and costs.
4. Contribute to the overall reduction of diabetes and its treatment burden by improving
the capability of predictive health technology. The program aims to reduce the
incidence of diabetes complications through early prediction and risk assessment,
improve the quality of life of high-risk groups and reduce medical costs for the
management and treatment of diabetes.
5. Improve the effectiveness of the diabetes screening process by incorporating machine
learning predictions, thereby reducing unnecessary testing and associated healthcare
costs. The program focuses on screening processes by improving the accuracy of risk
estimates, targeting resources to individuals at significant risk, and sparing
individuals at lower risk of concern and associated procedures, thereby reducing
unnecessary testing and medical costs.

20
Chapter 4

Problem Statement and Methadology

The prediction and classification of diabetes are crucial in the field of medical diagnosis and
treatment. Traditional methods for diabetes prediction often rely on manual interpretation,
which can be time-consuming, subjective, and prone to errors. The integration of machine
learning techniques has shown great potential in automating and enhancing the accuracy of
diabetes prediction using medical datasets. However, the selection of appropriate machine
learning algorithms, along with the consideration of various factors such as dataset
characteristics and feature extraction methods, is essential to achieve optimal performance
and accuracy in this prediction task.
This chapter presents the problem statement and methodology employed in this research to
address the challenges associated with diabetes prediction using machine learning algorithms.
The primary objective of this study is to conduct a comparative analysis of different machine
learning algorithms for accurately classifying individuals as diabetic or non-diabetic. By
evaluating the performance of these algorithms and investigating the impact of various
factors, such as dataset characteristics and feature extraction methods, we aim to contribute to
the development of automated and efficient diabetes prediction systems.
4.1 Problem Statement
Accurate and timely prediction of diabetes is of utmost importance in the field of medical
diagnosis and treatment. Traditional diagnostic methods for diabetes prediction often rely on
manual interpretation and can be time-consuming and subjective. Therefore, the integration
of machine learning techniques in diabetes prediction has gained significant attention,
promising automated and efficient solutions that can assist medical professionals in making
informed decisions. However, there is a need to evaluate and compare different machine
learning algorithms to determine their effectiveness in accurately classifying individuals as
diabetic or non-diabetic. Furthermore, the impact of dataset characteristics, feature extraction
methods, and algorithmic parameters on prediction performance needs to be explored.

The objective of this project is to conduct a comparative study of machine learning

algorithms for diabetes prediction using the Pima Indian Diabetes Dataset from Kaggle. The
study aims to address the following research questions:

21
Which machine learning algorithms demonstrate high accuracy in classifying individuals as
diabetic or non-diabetic?
How do different feature extraction methods influence the prediction performance?
What is the impact of dataset characteristics?

4.2 Methadology
In the methodology, the data preprocessing steps are described, including the normalization
process and the preparation of the dataset for machine learning algorithms. By scaling and
standardizing the features, consistent input for subsequent processing and training is enabled.

4.2.1 Dataset Used

For this research, the Pima Indian Diabetes Dataset from Kaggle was used. The dataset
comprises medical records of female patients, including information such as glucose levels,
blood pressure, BMI, and age. The dataset contains 768 records, with each record classified
as either diabetic or non-diabetic. The dataset was divided into two parts: a training set and a
testing set, with an appropriate split to ensure the model's generalizability.
In this study, we preprocess the data by handling missing values, performing feature scaling,
and applying data augmentation techniques to address class imbalance. The performance of
various machine learning algorithms, such as SVM, k-NN, Random Forest, Decision
Tree,Logistic Regression will be evaluated and compared.
By conducting this comparative analysis, we aim to identify the most effective machine
learning algorithms and methodologies for predicting diabetes, contributing to the
advancement of automated and accurate diabetes diagnosis systems.

S. No Attributes

1 Pregnancy

2 Glucose

3 Blood Pressure

4 Skin Thickness

5 Insulin

6 BMI(Body Mass Index)

7 Age
22
8 Diabetes Pedigree Function

4.2.2 Data preprocessing

Prior to model training, the dataset underwent preprocessing stages. The data was normalized
to ensure consistent dimensions and prepared for further processing. This involved
standardizing the size of the data and feature scaling to normalize the dimensions. This step
aimed to ensure that the input data for the machine learning algorithms was appropriately
prepared for training.

4.2.3 Machine Learning Algorithms

To achieve accurate classification of diabetes, several machine learning algorithms were
evaluated. The selected algorithms were:
Support Vector Machine (SVM)
K-Nearest Neighbors (KNN)
Random Forest
Logistic Regression
Support Vector Machine (SVM): Support Vector Machine (SVM) is a supervised machine
learning algorithm widely used for classification tasks. It works by finding the hyperplane
that best separates the data points into different classes. The goal is to maximize the margin
between the hyperplane and the nearest data points of each class, making the classification
robust to noise and outliers.
Decision Tree: Decision Tree is a popular classification algorithm that works by recursively
splitting the dataset based on the feature that provides the most information gain at each step.
Each node in the tree represents a decision based on a feature, and the branches represent the
possible outcomes. Decision trees are easy to interpret and visualize, making them useful for
understanding the decision-making process.
Random Forest: Random Forest is an ensemble learning method that builds multiple
decision trees during training and outputs the class that is the mode of the classes of the
individual trees. It improves upon the decision tree algorithm by reducing overfitting and
increasing the accuracy of the model. Random forests are robust to noise and outliers and can
handle large datasets with ease.
Logistic Regression: Logistic Regression is a widely used classification algorithm that
models the probability of a binary outcome based on one or more predictor variables. It
23
estimates the probability that a given instance belongs to a particular class using a logistic
function, which maps any real-valued input into the range [0, 1]. Logistic Regression is
computationally efficient, easy to interpret, and works well for linearly separable data.
K-Nearest Neighbors (KNN): K-Nearest Neighbors (KNN) is a simple and intuitive
classification algorithm that works by assigning a class label to an instance based on the
majority class among its k nearest neighbors in the feature space. KNN does not make any
assumptions about the underlying data distribution and can handle non-linear decision
boundaries. However, it can be computationally expensive, especially for large datasets, and
requires careful selection of the value of k.
These algorithms were chosen based on their suitability for handling classification tasks and
their potential effectiveness in predicting diabetes from the Pima Indian Diabetes Dataset.

4.2.4 Evaluation Metrics

‘Various evaluation metrics were employed to assess the performance of each machine
learning algorithm, including accuracy, precision, recall and F1 score, These metrics were
calculated on the testing set to evaluate the algorithms' classification accuracy and overall
performance.

4.2.5 Expected Outcomes

Through this project, the aim is to identify the most accurate and effective machine learning
algorithm for predicting diabetes. The comparative study of different algorithms, along with
the analysis of dataset characteristics, feature extraction methods, and algorithmic
parameters, will provide valuable insights into the factors influencing diabetes prediction.
The outcomes of this research can contribute to the development of automated and efficient
diabetes prediction systems, aiding in early detection and intervention.

4.3 Objectives
The objective of this research is to conduct a comprehensive comparative study of machine
learning-based diabetes prediction using various clinical and physiological datasets. The
primary aim is to identify the most accurate and effective machine learning algorithm for
predicting diabetes. By evaluating and analyzing the performance of different algorithms,
considering various factors such as dataset characteristics and feature extraction methods, we
seek to contribute to the development of automated and efficient diabetes prediction systems.

 To review and analyze existing literature on diabetes prediction methods: A thorough

literature review will be conducted to gain a comprehensive understanding of the
techniques, algorithms, and methodologies employed in diabetes prediction using
24
machine learning. This analysis will help identify the gaps and limitations in the
current approaches, guiding the direction of our research.

 To acquire and preprocess a representative dataset of clinical and physiological

parameters: A suitable dataset containing significant data points of clinical and
physiological parameters associated with diabetes will be obtained. The dataset will
be carefully curated and divided into training and testing sets to ensure proper
evaluation of the algorithms. Preprocessing steps, including normalization and data
cleaning, will be applied to standardize the data and make it suitable for machine
learning algorithms.

 To evaluate the performance of different machine learning algorithms: Several

commonly used machine learning algorithms, including Support Vector Machines
(SVM), k-Nearest Neighbors (KNN), Decision Trees, Logistic Regression, and
Random Forest, will be implemented and evaluated for diabetes prediction. The
algorithms will be trained and tested using the curated dataset, and their performance
will be assessed using various evaluation metrics such as accuracy, precision, recall,
F1 score.

 To analyze the impact of dataset characteristics and feature extraction methods: The
dataset characteristics, including the distribution of diabetic and non-diabetic
instances, as well as the feature extraction methods employed, will be analyzed to
understand their influence on the performance of the machine learning algorithms.
The aim is to identify the key factors that contribute to accurate and reliable diabetes
prediction and provide insights into improving the effectiveness of the algorithms.

 To recommend the most accurate and effective machine learning algorithm for
diabetes prediction: Based on the comparative analysis and evaluation results, a
recommendation will be made regarding the most accurate and effective machine
learning algorithm for diabetes prediction. The recommendation will consider factors
such as classification accuracy, robustness, computational efficiency, and suitability
for real-world clinical applications.

25
Chapter 5

Result Analysis
In this project, a variety of tools and Python libraries were instrumental in conducting the
experiment and reaching conclusions. The data collection phase commenced with the
acquisition of diabetes-related datasets, followed by meticulous labeling to classify instances
as either indicative or non-indicative of diabetes. This classification was denoted by assigning
a label of 1 to instances indicating diabetes and 0 to those not indicating the condition.
Following training, confusion matrices were generated to quantitatively assess the models'
performance.
To evaluate the effectiveness of each algorithm in predicting diabetes, a range of performance
metrics were employed. These included Recall (R), F1-score (F1), Accuracy (A), By
extracting true positive (T_P), true negative (T_N), false positive (F_P), and false negative
(F_N) values from the confusion matrices, these metrics were calculated and utilized to
determine the most effective algorithm for diabetes prediction.
Several tools were employed to initiate and complete the study on diabetes prediction using
machine learning techniques. Data were collected from a Kaggle dataset and processed for
the training and testing of models. Tools provided by various Python libraries such as
Matplotlib, scikit-learn, pandas supported the analysis of this research. The workflow began
with the collection of the dataset, followed by the labeling process to classify the data into
two categories: non-diabetic instances labeled as 0 and diabetic instances labeled as 1.

The data were then subjected to a pre-processing phase. All features were normalized to
ensure uniformity in scale and to enhance the performance of the machine learning models.
The processed data was then used to train various predictive models.

Once the models were trained, a confusion matrix was generated for each model to quantify
its performance. Performance metrics such as Recall (R), F1-score (F1), Accuracy (A),
Precision were utilized to analyze the efficacy of the algorithms employed in this experiment.
These metrics helped determine which algorithm offered the best performance in predicting
diabetes.

26
The confusion matrices provided true positive (TP), true negative (TN), false positive (FP),
and false negative (FN) values, which were essential for computing the aforementioned
performance parameters. This comprehensive evaluation approach allowed for a robust
analysis of the machine learning algorithms' capabilities in diabetes prediction, guiding the
selection of the most effective model for practical applications in healthcare.

Fig 5.1 Confusion Matrix And Classification Report Of SVM

Fig 5.2 Confusion Matrix And Classification Report Of Decision Tree

Fig 5.3 Confusion Matrix And Classification Report Of Random Forest

27
Fig 5.4 Confusion Matrix And Classification Report Of Logistic Regression

Fig 5.5 Confusion Matrix And Classification Report Of KNN

5.1 Evaluation

This is the last step of the prediction model. Here, we use various metrics such as
classifaction accuracy, confusion matrix ,accuracy etc to evaluate the prediction results.
1. Accuracy : It is the ratio of number of correct predictions to the total number of input
samples. It is given as :
Number of Correct Predictions
Accuracy = ……….(5.1)
Total Number of Predictions Made
2. Confusion Matrix : It gives us gives us a matrix as output and describes the complete
performance of the model.
Where, TP: True Positive
FP: False Positive
FN: False Negative
TN: True Negative

28
3. Precision : Precision is defined as the ratio of correctly classified positive samples
(True Positive) to a total number of classified positive samples (either correctly or
incorrectly).
4. Recall : The recall is calculated as the ratio between the numbers of Positive samples
correctly classified as Positive to the total number of Positive samples. The recall
measures the model's ability to detect positive samples.
5. F1 Score : The F1 score is calculated as the harmonic mean of precision and recall.

Fig 5.6 Accuracy Score

 The Decision Tree algorithm achieved an accuracy of 0.6623, indicating that it
correctly classified approximately 66.23% of the total cases. It demonstrated a recall
of 0.80, which means it identified 80% of the actual tumor-positive cases correctly.
The F1-score of 0.75 suggests a harmonic mean of precision and recall, representing a
balanced performance between the two.
 Similarly, Logistic Regression algorithm attained an accuracy of 0.7403, indicating an
approximately 74.03% overall correct classification rate. It achieved a recall of 0.88,
signifying its ability to identify 88% of the diabetes-positive cases correctly. The F1-
score of 0.81suggests a well- balanced performance between precision and recall.
 The Random Forest algorithm demonstrated an accuracy of 0.7597, indicating an
75.97% overall correct classification rate. It achieved a recall of 0.86, implying its
ability to correctly identify 86% of the diabetes- positive cases. The F1-score of 0.82
indicates a balanced performance between precision and recall.

29
 The KNN algorithm demonstrated an accuracy of 0.7143, indicating an 71.43%
overall correct classification rate. It achieved a recall of 0.88, implying its ability to
correctly identify 86% of the diabetes- positive cases. The F1-score of 0.80 indicates a
balanced performance between precision and recall.
 The SVM algorithm demonstrated an accuracy of 0.7468, indicating an 74.68%
overall correct classification rate. It achieved a recall of 0.89, implying its ability to
correctly identify 86% of the diabetes- positive cases. The F1-score of 0.82 indicates a
balanced performance between precision and recall.

5.1.3 Results
Different steps are taken in this study. The scheme uses different distributions and
integrations and is implemented by python. This technique is a machine learning technique
used to extract the best facts from data. In this study, we saw that the Random Forest
classifier achieved better results than other classifiers. Overall, we used the best machine
learning techniques to make predictions and get great results

Fig 5.7 : Feature Importance Plot for Random Forest

Here feature played important role in prediction is presented for random forest algorithm. The
sum of the importance of each feature playing major role for diabetes have been plotted,
where X-axis represents the importance of each feature and Y-Axis the names of the features.

30
Figure 5.8 Heatmap of coorelation analysis
A heatmap is a graphical representation of data where values in a matrix are represented as
colors. It is particularly useful for visualizing the relationships between two variables across
multiple data points. In a heatmap, each cell in the matrix is assigned a color based on its
value, creating a visual representation of the data's patterns and correlations.
 Heatmaps are often used in data analysis and visualization to:

 Identify Patterns: Heatmaps help identify patterns and trends in large datasets by
visually highlighting areas of high or low values.

 Spot Correlations: Heatmaps can reveal correlations between variables by showing

how they vary together across different data points.

 Compare Data: Heatmaps allow for easy comparison of multiple variables or

categories by displaying them side by side in a color-coded format.

 Highlight Anomalies: Heatmaps can highlight outliers or anomalies in the data by

displaying them as distinct color patterns.

Overall, heatmaps provide a powerful visual tool for understanding complex data and
extracting meaningful insights from it.

31
Figure 5.9 Scatter Plot

32
Chapter 6

Conclusion
Diabetes can be a factor that reduces life expectancy and quality. In the long term, early
diagnosis of this disease can reduce the risk and complications of many diseases. In this
project automatic prediction of diabetes is proposed using various machine learning methods.
Machine learning techniques can support doctors in diagnosing and treating diabetes. We
should note that improving classification accuracy helps machine learning models achieve
better results. The performance analysis is in terms of accuracy rate among all the
classification techniques such as decision tree , SVM and random forest ,Logistic
Regression,K Nearest Neighbor.
In this project, several machine learning algorithms were applied to the task of diabetes
prediction using the Pima Indian Diabetes dataset. The evaluated algorithms included
Logistic Regression, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Decision
Tree, and Random Forest. The primary objective was to compare their performance and
identify the most effective algorithm for accurately predicting diabetes cases.
The results obtained from the experiments provided valuable insights into the performance of
these algorithms. Confusion matrices were analyzed to derive performance metrics such as
accuracy (A), recall (R), and F1-score,precision . Based on the accuracy metric, the Random
Forest algorithm achieved the highest accuracy score of 0.7597, closely followed by the SVM
with an accuracy score of 0.7468. Logistic Regression and KNN algorithms also
demonstrated good accuracy scores ranging from 0.7403 to 0.7143. The Decision Tree
algorithm achieved a slightly lower accuracy score of 0.6623.
It's worth noting that the effectiveness of these algorithms can be influenced by various
factors such as the specific characteristics of the dataset, the preprocessing techniques
applied, and the parameter configurations utilized. Further exploration and optimization may
be necessary to maximize their performance. Nevertheless, this research provides valuable
insights into the capabilities of different machine learning algorithms for diabetes prediction
using the Pima Indian Diabetes dataset. These findings can contribute significantly to the
advancement of more accurate and reliable diagnostic tools for diabetes prediction, ultimately
leading to improved healthcare outcomes for individuals affected by the condition.

33
6.1 Scope of Future Work
The project presented here offers valuable insights into the performance of machine learning
algorithms for diabetes prediction using clinical and physiological data. However, there
remain several opportunities for future exploration and enhancement to further improve the
accuracy and effectiveness of predictive models. This section outlines potential directions for
future work and areas of focus within the scope of diabetes prediction projects.

 Integration of Additional Features:

Incorporating additional features beyond clinical and physiological data, such as
lifestyle factors, dietary habits, and genetic markers, could enhance the predictive
power of models. By integrating a broader range of relevant features, models may
capture more nuanced patterns and dependencies, leading to improved accuracy in
diabetes prediction. Future research could explore methods for feature engineering
and fusion to integrate diverse data sources effectively.

 Expansion of Training Data:

Increasing the size and diversity of the training dataset is essential for improving the
generalization and robustness of predictive models. Collaborations with healthcare
institutions and data-sharing initiatives could facilitate the acquisition of larger and
more representative datasets. Additionally, efforts to collect longitudinal data to
capture temporal trends and changes in patients' health status could further enhance
the predictive capabilities of models.

 Noise Reduction and Preprocessing Techniques:

Enhancing the quality of input data through advanced noise reduction and
preprocessing techniques is crucial for improving model performance. Future research
could explore innovative methods for handling missing data, outlier detection, and
data normalization to ensure the reliability and consistency of input features.
Additionally, techniques for addressing data imbalance and class skewness may be
investigated to mitigate biases in model predictions.

 Algorithmic Modifications:
While the algorithms employed in this project have shown promising results, there is
potential for further optimization and customization. Future research could focus on
algorithmic modifications tailored specifically for diabetes prediction, such as
parameter tuning, ensemble learning, and algorithm selection. Exploring novel
algorithmic architectures and incorporating domain-specific knowledge into models
may also lead to performance improvements.

 Deep Learning Approaches:

Deep learning techniques, particularly deep neural networks, have demonstrated
significant potential in various healthcare applications, including disease prediction.
Investigating the application of deep learning architectures, such as recurrent neural
networks (RNNs), convolutional neural networks (CNNs), and transformer models, to
diabetes prediction tasks could yield improved predictive performance. Furthermore,
techniques for model interpretability and uncertainty quantification in deep learning
models may be explored to enhance trust and transparency in predictions.

34
 Validation and Clinical Trials:
Conducting rigorous validation studies and clinical trials is essential to assess the
clinical relevance and utility of predictive models in real-world healthcare settings.
Collaborations with healthcare professionals and institutions are critical for designing
and executing validation studies, evaluating model performance against established
clinical standards, and assessing the impact of model predictions on patient outcomes.
Additionally, incorporating feedback from clinicians and end-users throughout the
development process can ensure the practical applicability and acceptance of
predictive models in clinical practice.

In conclusion, while the project lays the foundation for diabetes prediction using machine
learning techniques, there are numerous avenues for future research and improvement. By
integrating additional features, expanding the training dataset, improving preprocessing
techniques, modifying algorithms, exploring deep learning approaches, and conducting
validation studies, the accuracy and reliability of diabetes prediction models can be
significantly enhanced. These advancements have the potential to contribute to the
development of more accurate diagnostic tools and personalized healthcare interventions for
individuals at risk of diabetes.

35
References
[1] Kharroubi, A.T. , Darwish, H.M. : Diabetes mellitus: The epidemic of the century. World
Communication Systems (ICIIECS), 2017.
[2] Papatheodorou, K. , Banach, M. , Edmonds, M. , Papanas, N. , Papazoglou, D.
Complications of diabetes. J. Diabetes Res. 2015, 1–6 (2015)
[3] Atlas, G. : Diabetes. International Diabetes Federation. 10th ed., IDF Diabetes Atlas
[4] J.W. , Everhart, J.E. , Dickson, W.C. , Knowler, W.C. , Johannes, R.S. : Using the ADAP
learning algorithm to forecast the onset of diabetes mellitus. In: Annual Symposium on
Computer Applications in Medical Care pp. 261–265 (1998)
[5] Aljumah, A.A., Ahamad, M.G., Siddiqui, M.K., 2013. Application of data mining:
Diabetes health care in young and old patients. Journal of King Saud University -
Computer and Information Sciences 25, 127–136. doi:10.1016/j.jksuci.2012.10.003..
[6] Zou et al., 2018 Q. . Qu, ,Predicting diabetes mellitus with machine learning techniques
[7] K.VijiyaKumar, B.Lavanya, I.Nirmala, S.Sofia Caroline, "Random Forest Algorithm for
the Prediction of Diabetes ".Proceeding of International Conference on Systems
Computation Automation and Networking, 2019.
[8] Deeraj Shetty, Kishor Rit, Sohail Shaikh, Nikita Patil, "Diabetes Disease Prediction Using
Data Mining ".International Conference on Innovations in Information, Embedded and
Communication Systems (ICIIECS), 2017.
[9] Tejas N. Joshi, Prof. Pramila M. Chawan, "Diabetes Prediction Using Machine Learning
Techniques".Int. Journal of Engineering Research and Application, Vol. 8, Issue 1,(Part-
II) January 2018, pp.-09-13
[10] Arora, R., Suman, 2012. Comparative Analysis of Classification Algorithms on
Different Datasets using WEKA. International Journal of Computer Applications
using WEKA. International Journal of Computer Applications 54, 21–25.
doi:10.5120/8626-2492.
[11] Bamnote, M.P., G.R., 2014. Design of Classifier for Detection of Diabetes Mellitus
Using Genetic Programming. Advances in Intelligent Systems and Computing
[12] Choubey, D.K., Paul, S., Kumar, S., Kumar, S., 2017. Classification of Pima indian
diabetes dataset using naive bayes with genetic algorithm as an attribute selection, in:
Communication and Computing Systems: Proceedings of the International Conference
on Communication and Computing System (ICCCS 2016), pp. 451– 455

36
[13] Mohan, N. , Jain, V. : Performance analysis of support vector machine in diabetes
prediction. In: International Conference on Electronics, Communication and Aerospace
Technology, pp. 1–3 (2020)
[14] Olisah, C.C. , Smith, L. , Smith, M. : Diabetes mellitus prediction and diagnosis from a
data preprocessing and machine learning perspective. Comput. Methods Programs
Biomed. 220, 1–12 (2022)
[15] Ramesh, J. , Aburukba, R. , Sagahyroon, A. : A remote healthcare monitoring
framework for diabetes prediction using machine learning. Healthcare Technol. Lett (2021]
[16] J. Chaki, S. T. Ganesh, S. K. Cidham, and S. Ananda Theertan, “Machine learning and
artificial intelligence based diabetes mellitus detection and self-management: a systematic
review,” Journal of King Saud University - Computer and Information Sciences, vol. 34

[17] B. Ljubic, A. A. Hai, M. Stanojevic et al., “Predicting complications of diabetes mellitus

using advanced machine learning algorithms,” Journal of the American Medical Informatics
Association, vol. 27, no. 9, pp. 1343–1351, 2020.
[18] A. Nur Ghaniaviyanto Ramadhan and R. Ade, “Preprocessing handling to enhance
detection of type 2 diabetes mellitus based on random forest,” International Journal of
Advanced Computer Science and Applications (IJACSA), vol. 12, 2021.
[19] S. Islam Ayon and M. Milon Islam, “Diabetes prediction: a deep learning
approach,” International Journal of Information Engineering and Electronic Business, vol.
11, no. 2, pp. 21–27, 2019.
[20] V. C. Bavkar and A. A. Shinde, “Machine learning algorithms for
Diabetes prediction and neural network method for blood glucose
measurement,” Indian Journal of Science and Technology, vol. 14, 2021.
[21] Chatrati, S.P. , Hossain, G. , Goyal, A. , et al.: Smart home health monitoring system
for predicting type 2 diabetes and hypertension. J. King Saud Univ. Comput. Inf. Sci. 34(3),
862–870 (2020)

Diabetes Prediction with Machine Learning
No ratings yet
Diabetes Prediction with Machine Learning
6 pages
RP 3
No ratings yet
RP 3
6 pages
Diabetes Prediction with Machine Learning
No ratings yet
Diabetes Prediction with Machine Learning
28 pages
Machine Learning for Diabetes Diagnosis
No ratings yet
Machine Learning for Diabetes Diagnosis
12 pages
3 An Early-Stage Diabetes Symptoms Detection Prototype Using Ensemble Learning
No ratings yet
3 An Early-Stage Diabetes Symptoms Detection Prototype Using Ensemble Learning
6 pages
Diabetes Mellitus Prediction Using Supervised Machine Learning Techniques
No ratings yet
Diabetes Mellitus Prediction Using Supervised Machine Learning Techniques
6 pages
Diabetes Prediction with Hybrid ML Techniques
No ratings yet
Diabetes Prediction with Hybrid ML Techniques
10 pages
Machine Learning for Diabetes Prediction
No ratings yet
Machine Learning for Diabetes Prediction
23 pages
AI for Diabetes Detection Report
No ratings yet
AI for Diabetes Detection Report
9 pages
Machine Learning for Diabetes Prediction
No ratings yet
Machine Learning for Diabetes Prediction
6 pages
MLA Report
No ratings yet
MLA Report
19 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
12 pages
Early Diabetes Detection with Machine Learning
No ratings yet
Early Diabetes Detection with Machine Learning
7 pages
Research Paper DL
No ratings yet
Research Paper DL
8 pages
Journal Pone 0310218
No ratings yet
Journal Pone 0310218
29 pages
Predicting Diabetes with Deep Learning
No ratings yet
Predicting Diabetes with Deep Learning
15 pages
1 Comparative Study of Ensemble Learning Algorithms On Early Stage Diabetes Risk Prediction
No ratings yet
1 Comparative Study of Ensemble Learning Algorithms On Early Stage Diabetes Risk Prediction
6 pages
189 Submission
No ratings yet
189 Submission
6 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
14 pages
Prediction of Diabetes Mellitus Using RB
No ratings yet
Prediction of Diabetes Mellitus Using RB
8 pages
Onset Diabetes Diagnosis Using Artificia
No ratings yet
Onset Diabetes Diagnosis Using Artificia
6 pages
Diabetes Prediction via Machine Learning
No ratings yet
Diabetes Prediction via Machine Learning
16 pages
Paper 4
No ratings yet
Paper 4
5 pages
Diabetes Prediction with Machine Learning
No ratings yet
Diabetes Prediction with Machine Learning
10 pages
Machine Learning For Early Diabetes Screening: A Comparative Study of Algorithmic Approaches
No ratings yet
Machine Learning For Early Diabetes Screening: A Comparative Study of Algorithmic Approaches
20 pages
Projectreport Diabetes Prediction
No ratings yet
Projectreport Diabetes Prediction
22 pages
Sensors 22 05304 v2
No ratings yet
Sensors 22 05304 v2
18 pages
Diabetes Prediction Using ML & DL Techniques
No ratings yet
Diabetes Prediction Using ML & DL Techniques
4 pages
Swaraja J.matpr.2020.09.522
No ratings yet
Swaraja J.matpr.2020.09.522
7 pages
Jurnal Penelitian Teknik Informatia 4 (Internasional)
No ratings yet
Jurnal Penelitian Teknik Informatia 4 (Internasional)
11 pages
Machine Learning in Diabetes Support
No ratings yet
Machine Learning in Diabetes Support
24 pages
Diabetes Prediction Using Different Machine Learning Techniques PDF
No ratings yet
Diabetes Prediction Using Different Machine Learning Techniques PDF
5 pages
Supervised Learning Method of Diabetes Prediction
No ratings yet
Supervised Learning Method of Diabetes Prediction
10 pages
Dia Proteo
No ratings yet
Dia Proteo
18 pages
Diabetes Detection Model Accuracy Analysis
No ratings yet
Diabetes Detection Model Accuracy Analysis
6 pages
Diabetes Risk Prediction Using HMM
No ratings yet
Diabetes Risk Prediction Using HMM
9 pages
Machine Learning for Diabetes Prediction
100% (1)
Machine Learning for Diabetes Prediction
12 pages
Neural Network for Diabetes Prediction
No ratings yet
Neural Network for Diabetes Prediction
10 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
8 pages
Machine Learning for Diabetes Prediction
No ratings yet
Machine Learning for Diabetes Prediction
6 pages
A Survey On Diabetic Prediction System Using Machine Learning
No ratings yet
A Survey On Diabetic Prediction System Using Machine Learning
5 pages
Diabetes Detection via Machine Learning
No ratings yet
Diabetes Detection via Machine Learning
7 pages
Analysis of Various Data Mining Techniques To Predict Diabetes Mellitus
No ratings yet
Analysis of Various Data Mining Techniques To Predict Diabetes Mellitus
6 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
20 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
8 pages
RP 2
No ratings yet
RP 2
10 pages
A Survey On Diabetes Risk Prediction Using Machine.50
No ratings yet
A Survey On Diabetes Risk Prediction Using Machine.50
6 pages
Diabetes Prediction with Machine Learning
No ratings yet
Diabetes Prediction with Machine Learning
5 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
8 pages
Finaql Copy 3 Oct
No ratings yet
Finaql Copy 3 Oct
12 pages
Diabetic Patient Glucose Level Prediction Using Machine Learning
No ratings yet
Diabetic Patient Glucose Level Prediction Using Machine Learning
6 pages
1 s2.0 S2666307421000048 Main
No ratings yet
1 s2.0 S2666307421000048 Main
7 pages
Advanced Diabetes Prediction Models
100% (1)
Advanced Diabetes Prediction Models
13 pages
Diabetes Mellitus Prediction Using Class
No ratings yet
Diabetes Mellitus Prediction Using Class
5 pages
Diagnosing Diabetes with Data Mining
No ratings yet
Diagnosing Diabetes with Data Mining
5 pages
BDA Paper3
No ratings yet
BDA Paper3
6 pages
Understanding Diabetes Mellitus Management
No ratings yet
Understanding Diabetes Mellitus Management
83 pages
AI Tool for Diabetes Diagnosis
No ratings yet
AI Tool for Diabetes Diagnosis
13 pages
Diabetes Prediction with Neural Networks
No ratings yet
Diabetes Prediction with Neural Networks
12 pages
Dermatology History and Skin Lesions Guide
No ratings yet
Dermatology History and Skin Lesions Guide
36 pages
Document Scanning with CamScanner
No ratings yet
Document Scanning with CamScanner
2 pages
Document Scanned with CamScanner
No ratings yet
Document Scanned with CamScanner
18 pages
5aab8babe4b0674b4640dba2 5ba3ec31e4b061908b47bc98 1580136199953
No ratings yet
5aab8babe4b0674b4640dba2 5ba3ec31e4b061908b47bc98 1580136199953
68 pages
Redox Reaction Notes PDF
No ratings yet
Redox Reaction Notes PDF
30 pages
Hybrid CNN Amp Random Forest Model For Effective Onion Leaf Disease
No ratings yet
Hybrid CNN Amp Random Forest Model For Effective Onion Leaf Disease
6 pages
Hazardous Asteroids Prediction Analysis
No ratings yet
Hazardous Asteroids Prediction Analysis
13 pages
Energy Consumption Prediction Report
No ratings yet
Energy Consumption Prediction Report
4 pages
Tree-Based Ensemble Methods Explained
No ratings yet
Tree-Based Ensemble Methods Explained
13 pages
Course Outline 2025
No ratings yet
Course Outline 2025
5 pages
Credit Card Fraud Detection Algorithms
No ratings yet
Credit Card Fraud Detection Algorithms
21 pages
608 1057 1 SM
No ratings yet
608 1057 1 SM
20 pages
Heart Disease Prediction Using Random Forest
No ratings yet
Heart Disease Prediction Using Random Forest
11 pages
Mushroom Project Report
No ratings yet
Mushroom Project Report
80 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Parkinson's Disease Detection via Voice Analysis
No ratings yet
Parkinson's Disease Detection via Voice Analysis
9 pages
Predicting Product Availability Dates Using ML
No ratings yet
Predicting Product Availability Dates Using ML
21 pages
Human Activity Classification Techniques
No ratings yet
Human Activity Classification Techniques
6 pages
Diagnosis of Lung Cancer Prediction System Using Data Mining Classification Techniques
No ratings yet
Diagnosis of Lung Cancer Prediction System Using Data Mining Classification Techniques
7 pages
IPL Win Prediction Model Analysis
No ratings yet
IPL Win Prediction Model Analysis
12 pages
Customer Churn Prediction in Telecom Sector Using Machine Learning Techniques
No ratings yet
Customer Churn Prediction in Telecom Sector Using Machine Learning Techniques
16 pages
University Institute of Computing: Big Data Analytics 22CAH-782
No ratings yet
University Institute of Computing: Big Data Analytics 22CAH-782
27 pages
Unit 04 EDA 02
No ratings yet
Unit 04 EDA 02
7 pages
Frai 2 1397388
No ratings yet
Frai 2 1397388
17 pages
AI-Driven Demand Forecasting
No ratings yet
AI-Driven Demand Forecasting
12 pages
Prediction of Crops Based On Soil Type Using Machine Learning
0% (1)
Prediction of Crops Based On Soil Type Using Machine Learning
44 pages
Research Paper v5.1
No ratings yet
Research Paper v5.1
55 pages
Machine Learning Based House Price Prediction Using Modified Extreme Boosting
No ratings yet
Machine Learning Based House Price Prediction Using Modified Extreme Boosting
14 pages
Crop Prediction Report
No ratings yet
Crop Prediction Report
12 pages
Supervised Learning Explained: Key Concepts
No ratings yet
Supervised Learning Explained: Key Concepts
38 pages
Customer Churn Prediction
100% (1)
Customer Churn Prediction
32 pages
Predictive Breast Cancer Statistical Modelling For Early Diagnosis
No ratings yet
Predictive Breast Cancer Statistical Modelling For Early Diagnosis
14 pages
Data Analysis Report Template
No ratings yet
Data Analysis Report Template
4 pages
AIML Solved Paper Nov-Dec 2024
No ratings yet
AIML Solved Paper Nov-Dec 2024
2 pages
IEEEJournalStudent Placement Analysis Using Machine Learning
No ratings yet
IEEEJournalStudent Placement Analysis Using Machine Learning
6 pages

Diabetes Prediction with Machine Learning

Uploaded by

Diabetes Prediction with Machine Learning

Uploaded by

A SYNOPSIS ON

DIABETES PREDICTION USING MACHINE

Submitted in partial fulfilment of the requirement for the award

MASTER OF COMPUTER APPLICATIONS

Ruchi Sharma University Roll

Under the Guidance of

Department of Computer Applications

Name : Ruchi Sharma University Roll no:1102853 Signature:

Internal Evaluation (By DPRC Committee)

Status of the Synopsis: Accepted / Rejected

Name of the Committee Members: Signature with Date

Chapter 1 Introduction and Problem Statement

1.2 Types of Diabetes 6

1.3 Symptoms of Diabetes 7

1.4 Causes of Diabetes 8

1.5 Applications of Machine Learning 9

1.6 Problem Statement 7

1.7 Dataset Used 10

1.8 Tools and Technology Used 11

Chapter 2 Background / Literature Survey

2.2 Significance of Diabetes Prediction 14

2.3 Evolution Of ML in Diabetes Prediction 15

2.4 Overview of prior research 15

2.5 Research Gap 18

Chapter 3 Literature Review 20

Chapter 4 Problem Statement And Methadology

4.1 Problem Statement 21

4.3 Dataset Used 23

4.5 Expected Outcomes 24

Chapter 5 Result Analysis

6.1 Scope of Future Work 34

been used in order to increase the accuracy .

learning classification algorithms that includes : K Nearest Neighbours, SVM, Decision

Introduction and Problem Statement

Type 1 Diabetes (T1D)

1. Frequent Urination (Polyuria): One of the hallmark symptoms of diabetes is the

1.5Applications of Machine Learning

1.6 Problem Statement

Three types of errors can occur in the current diagnostic method:

1.7 Dataset Used :

1.8 Tools and Technology:

2.2 Significance of Diabetes Detection

K. Vijiya Kumar et al. [7] proposed

2.5 Research Gap

1. To develop a machine learning-based model for estimating an individual’s risk of

Problem Statement and Methadology

The objective of this project is to conduct a comparative study of machine learning

4.2.1 Dataset Used

6 BMI(Body Mass Index)

4.2.2 Data preprocessing

4.2.3 Machine Learning Algorithms

4.2.4 Evaluation Metrics

4.2.5 Expected Outcomes

 To review and analyze existing literature on diabetes prediction methods: A thorough

 To acquire and preprocess a representative dataset of clinical and physiological

 To evaluate the performance of different machine learning algorithms: Several

Fig 5.1 Confusion Matrix And Classification Report Of SVM

Fig 5.2 Confusion Matrix And Classification Report Of Decision Tree

Fig 5.3 Confusion Matrix And Classification Report Of Random Forest

Fig 5.5 Confusion Matrix And Classification Report Of KNN

Fig 5.6 Accuracy Score

Fig 5.7 : Feature Importance Plot for Random Forest

 Spot Correlations: Heatmaps can reveal correlations between variables by showing

 Compare Data: Heatmaps allow for easy comparison of multiple variables or

 Highlight Anomalies: Heatmaps can highlight outliers or anomalies in the data by

 Integration of Additional Features:

 Expansion of Training Data:

 Noise Reduction and Preprocessing Techniques:

 Deep Learning Approaches:

[17] B. Ljubic, A. A. Hai, M. Stanojevic et al., “Predicting complications of diabetes mellitus

You might also like