Disease Detection Using ML
Disease Detection Using ML
1. Abstract--------------------------------------------------------------------------1
2. Keyword--------------------------------------------------------------------------1
3. Introduction----------------------------------------------------------------1 to 3
4. Machine Learning & AI---------------------------------------------------3 to 4
4.1 Machine learning types-------------------------------------------4 to 6
4.2 Machine learning & A.I. History---------------------------------6 to 7
4.3 Domain/ Application----------------------------------------------7 to 8
5. Role of Artificial Intelligence in Health---------------------------------8 to 9
5.1 Medical history on A.I.-------------------------------------------------9
6. Related work--------------------------------------------------------------9 to 10
7. Different types of machine learning technique i.e. Different types of
Machine Learning Algorithm-----------------------------------------10 to 16
8. Important Keywords---------------------------------------------------16 to 17
9. Discussion--------------------------------------------------------------17 to 22
10. Conclusion------------------------------------------------------22 to 23
11. References ------------------------------------------------------23 to 24
Cardio vascular disease (CVD) is another disease where ML draw a vital sign for
detecting this. The reason behind this disease due to high blood pressure, diabetes,
extreme level smoking, Over thinking or hypertension, variation of BMI (Body Mass
index), Obesity, Inactivity, High level cholesterol, unhealthy diet, excessive alcohol and
many other reason. Different kind of heart problems like Arrhythmia, Atherosclerosis,
heart defect, CAD (coronary artery disease), Heart infection, Heart attack, Heart failure,
Stroke, Very much chest pain, Excessive rate of heart beat, Herat pressure, Frustration,
fbs, Heart block, and many other. Machine learning can detect it easily by the help of
individual record what they have symptoms, characteristics, attributes like their sex,
height, weight, level of cholesterol & glucose, blood test, blood pressure (systolic,
diastolic), etc.
Breast Cancer is the most often identified cancer and major reason for increasing
mortality rate
among women. Manual diagnosis takes huge time and the lesser amount of systems,
there is
important to develop automatic diagnosis system for early detection of breast cancer.
Data mining technique which contribute a lot in development of such system. For the
classification of Benign and Malignant Tumours, & early detection of breast cancer we
have used classification technique of machine learning where machine is learned from
some past data which can predict the category of new input. Not only breast cancer but
there also blood cancer, skin cancer, lung cancer can also be detected by Machine
learning technique.
For any type of Disease detection and prediction it’s important to test blood. Hence
Blood Disease detection also important. Different types of Machine learning classifier
help to detect disorder of blood. Except this type of Diseases AI or Machine learning
model can detect Diabetes, Plant disease detection, also.
Different type of DDS (Disease detection system) has been generated where information
is inserted into android app. In DDS real time database are used by some pre trained
machine learning algorithm. Here datasets are deployed in firebase.
AI is basically when a Machine want to mimic cognitive function of Humans by some programmed
rule or protocol. This AI gives instruction to machine that how they behave in certain situation. The
difference is Machine will never tired’ by doing some work artificially they are more intelligent than
human. Machine has more problem solving approach rather than one human. Basically we want some
system & software in such a way that they mimic the Human behaviour, AI will have accomplished by
studying & learning how a human brain thinks, the way of learning, decide, work, while they solve
some problems in that exact style a machine will do that. Which outcome occurred in this study we
used it as basis of developing intelligent system as well as smart software.
Figure- 4
➢ We split our data into training and test sets.
➢ Using the training set, we train our computer.
➢ Then it’s necessary to create a Machine learning model
➢ Now for checking the performance of that model to get accuracy it’s important to test
data
➢ Now we will pass that data and will see the accuracy
Let’s assume input features are {x1 , 𝑥2 , … , 𝑥𝑀 }, y is a target feature. This values are considered
in a training examples for each example. Now also assume a new training example is given
where only input features are given. No target values are here. Then We need to predict the
value for that target features which is unknown to us. If the value of target value is discrete
Figure- 5 If data has split in two class what properties they will require
In contrast to supervised learning, we lack any preset characteristic. On the basis of the
data that is provided, we will attempt to construct our own class. And will make every effort
to ensure that whatever class is produced has a high intra-class similarity and a low inter-
class similarity. This method is useful for classifying the input data based on its statistical
characteristics.
Reinforcement learning is a subset of Machine Learning in which the learning system
observes its surroundings and learns the optimal behaviour by attempting to maximise some
concept of cumulative.
Agents monitor their surroundings, choose and carry out certain activities,
reward, and are rewarded in turn (or penalties in certain cases).
Over time, the agent develops a strategy or policy (choice of behaviours) that
optimises its benefits.
year History
✓ Machine learning also detect chronic diseases, Liver disorder, Heart diseases,
Hepatitis, Parkinson’s disease. The symptoms are different from each other.
Machine learning use this symptom as input and detect that particular
diseases earlier. Here different type disease symptoms, Patient records,
different type of lab measurement, pathological test, Blood test, DNA tests,
previous report is used as inputs in Machine learning for diagnose a disease.
And as a result disease has detected. By the help of this records various
datasets have created & from that future patients have cured early.
✓ Not only medical history ML is used but also useful for image recognition,
object detection, Robot control, for natural language processing, Speech
recognition, fraud detection, & many domain & application machine learning
has & AI has.
6. Related work-
Machine learning has wide range application. Continuous improvement of Machine
learning entire world used this very much. It has high computation power also and in
large datasets also. It’s offer many essential resources as well for various data analysis.
Here lot of research areas which are related with this topic Disease detection using ML.
Heart disease can detect or diagnosis by using hybrid machine learning model. In
(Khourdifi and Bahaj et al.) they predict heart disease by using different type of
Machine learning model. They applied lot of optimization that include PSO which is
combined with ACO. There are several study on hear disease detection. In a research
(Chidambaram T) et al. predict Heart disease using Naïve bayes, AI network, Support
vector machine classifier (SVM), Random forest classifier, and simple regression method.
They found 98.83% prediction result by Decision tree classifier. This paper was about
comparison different type of Machine learning classifier. And Decision tree gave best
result. In a study, (Fahad Kamal Alsheref et al.) different type of machine learning
classification algorithm like Support vector machine (SVM), K-Nearest Neighbour (KNN),
Regression analysis, Decision tree is used for detection of Blood Diseases. (Naresh Kumar
et al.) focuses on automated disease diagnosis. They selected three harmful diseases
which are coronavirus, heart disease, diabetes. Here a android app has been used where
this three disease can be detected by entering data in to that android app by using real
time dataset.
Machine learning model is used for corona virus handling purpose. By the help of
different type of machine learning technique to determine or predict number of death,
number of recovery, affected people, also number of active cases. In china in was
predicted that when covid will end by Machine learning technique. A recently published
paper (Suparna Biswas et al.) A Hybrid Model based on mBA-ANFIS for COVID-19
confirmed case prediction and Forecast. In another study recently published
Mathematical modelling for decision makin of lockdown during covid19 by SIR model
(Suparna Biswas et al.). ES, LR, LASSO, SVM (Rustam et al, 2020) has been used for the
purpose of estimating number of future affected patients. Data set was collected from
GitHub (Wissel et al, 2020).
It’s known that cancer is terrific disease in todays’ life. And Pakistan is one of the country
which has occurrence of Breast cancer. More than 83,000 cases reported. For this this
early detection is important and it is the best effective mode. Machine learning helps to
early detect and gives us the best outcomes. It requires an effective procedure for
discriminate benign tumours from malignant ones. Here lot of related topic exist where
researcher has been done lot of research on this. [Senturk & Kara, 2014] he has proposed
a model for early diagnosis of breast cancer for patients. He used seven different types
of machine learning algorithm for predictions, for prediction process breast cancer
dataset has been collected from UCI machine learning repository. And during the process
of prediction he used Rapidminer 5.0 the data mining tool to apply data mining technique
on chosen algorithm.
For diabetes disease detection (Al-Zebari & Sengur, 2019) used different type of machine
learning technique like Decision tree (DT), Logistic Regression (LR), DA, SVM, k-NN and
ensemble technique. He used matlab for this purpose. 10-fold cross-validation has been
used here. The classification accuracy obtained by individual classifier and compared.
Among Decision tree (DT), Logistic Regression (LR), DA, SVM, k-NN and ensemble The
best accuracy was given by LR method 77.9%. And worst accuracy was given by
Gaussian SVM technique which is 65.5%. Another related work of breast cancer
(Cinarer & Emiroglu, 2019) classifying MR brain image characteristics by the outcomes
of tumour classification techniques. For this the machine learning classifiers like RF
(Random Forest), KNN (K-Nearest Neighbour), LDA, SVM has been used for classifying
MR brain image which are n/a, multicentric, multifocal, gliomatosis. Among this
classifier SVM gave highest precision rate compare to other.
Thyroid is also another disease in todays’ scenario. It has many research areas.
(Kousarrizi et al., 2012). By using SVM classifier and UCI machine learning dataset He
got 98.62% classification accuracy. He used two datasets one collected from UCI machine
learning repository and another from Imam Khomeini hospital which was collected by
IDL (Intelligent Device Laboratory) of K.N. Toosi University of Technology.
Parkinson’s disease (Huriharan et al., 2014) was diagnosed by neural network and SVM
(Support Vector machine). He got 100% classification precision result. (Naql et al, 2020)
detect Lung cancer by DL and dataset was taken from LIDC-IDIR (Menge et al, 2018). He
provides automated disease detection analyser. He also provides classification for
promoting radiologists’ diagnosis. By using SVM (Liu et al., 2020) brain stroke has been
diagnosed from 1157 patients. 83.3% accuracy was obtained. Liver disease diagnosis
approach (Durai et al.,) by UCI machine learning dataset. J48, SVM, NB model was used
and obtained 95.04% accuracy. The objective of this research was prediction of higher
score rate for liver disease detection. Using different cancer dataset (Zebedee et al, 2018)
Cancer disease was detected by Convolution neural network (CNN) base on some gene
expression. Classification accuracy was 100%. Another research work is differentiating
the characteristics of Alzheimer’s disease. (Kulkarni and Bairagi, 2017) proposed SVM
model for identifying characteristics for diagnosis of such type of diseases. 96% accuracy
was obtained by this model.
7. Different type of machine learning Technique I.e. Different types of Machine Learning
Algorithm
There are lot of machine learning algorithm which are developed by Machine learning
scientists for detecting any disease, early diagnosis of disease & prediction instead of
manually detecting. Lot of machine learning technique such as Random forest classifier,
K-Nearest-Neighbour, Support vector machine, Gaussian naïve bayes or Bayes theorem,
Decision tree classifier, Linear Regression, Artificial Neural network, Ensemble learning
classifier, Logistic Regression, K-Means clustering, C-Means clustering, Principle
component analysis, Anomaly detection algorithm, J45, C4.5 and C50 model etc. are
used for early detecting disease based on clinical data. Machine learning can use for
distinguish the cancerous cell and Non-cancerous cell. It can distinguish pneumonia. ML
can detect early growth of Tumour in brain as well malignant and benign growth in
Breast, lungs.
Table 2
7.1 Support Vector Machines (SVM)- Vapnik and Alexey introduced Support vector
machine 1995s. (Berhard Boser et al., 1992) developed Kernel trick on MMH or Maximum-
Mergin-Hyperplanes. (Corinna Cortex et al., 1993) proposed soft margin concept. For
classification and regression purpose SVM analyse data. Main purpose of SVM is to generate
a decision boundary. This line separates data points into two classes. This decision boundary
is called hyperplane. For hyperplane SVM select those points which will be near to that
boundary. This points are called Support Vector. For non-linearly separable data Kernel trick
is used. In Kernel trick inputs are taken as in low dimension and transform it in higher
dimensional space. Which means that non-separable data will be converted to separable
data. Polynomial kernel, Gaussian radial basis function, Gaussian kernel, Laplace RBF
kernel, Hyperbolic tangent kernel, and ANOVA radial basis kernel are all examples of SVM
kernels.
Figure 4
7.2 Logistic Regression- The term "logistic regression" refers to an expansion of
the term "linear regression." Probabilistic value is assigned using logistic regression.
Logistic function is used in Logistic regression for mapping the predicted values into
probabilities. And output is executed through logistic function (Kousarrizi et al.,
2012) Logistic regression use the concept of Threshold value. Based on threshold
value output can either 0 or 1 (Kousarrizi et al., 2012, Abdulazeez, 2018). Logistic
1
function, logistic(η) = 1+ⅇ−η. The Odds ratio concept in Logistic Regression making
p
computation easier. Odds ratio = 1−p
where p= probability that an event occurs and
(1-p)= probability that the event will not occure.
7.3 Linear Regression- Linear regression classifier in machine learning generates a
linear relationship in between dependent & independent variable. Independent
variable can one or more. This model works well in regression but fails in
classification. Linear regression can be represented as y=a+bx+c, where y= Target
Variable (i.e. Dependent Variable), x= (Predictor Variable) Independent Variable, a=
intercept, c=random error. Here x, y are training datasets.
7.5 DTs- This decision tree classifier is based on a tree structure and has a non-linear
function type. It is composed of a root node, an internal node, and a leaf node. This decision
node makes decisions about route selection. And the test node defines the example's class.It
can be performed on the basis of Greedy. This Greedy approach helps to minimizing depth
of tree. It is a non-parametric machine learning classifier (Al-Zebari et al., 2019). That
attribute will provide the best prediction result which has highest information gain. And it
will select as a root node. And for Gini index that attribute will select for split which will have
lowest Gini index and highest reduction impurity. We will prefer smallest DTs always.
7.6 SVR- SVR is Support Vector Regression. svr is SVM which can support linear regression
as well as Non-Linear regression. Which principle use SVM, SVR use that’s one. But the SVR
7.8 Ensemble learning classifier- It is a Machine learning algorithm which will construct a
set of Machine learning classifier that will help to classify new data point based on vote of
predictions. This Ensemble technique composition of multiple classifier & it will be more
reliable compare to a single classifier. Basically Bayesian averaging is the primary ensemble
technique.
Figure- 7
In case of Max Voting when predictions will perform based on voting. And majority votes are
considered for the final result. In case averaging just average predictions are considered as
final prediction result. In weighted averaging technique different machine learning model are
imposed with various weights demark of each machine learning model for final prediction.
Except this, stacking classifier is also ensemble learning classifier. This staking classifier is
also used for prediction (on the test data) purpose where lot of models are used for making a
new model. Bagging is bootstrapping aggregating. Bagging is a classifier where need to
bootstrapped the data set for making decision. Multiple training datasets sets (Bag) are
created here by random sampling with replacement. And each of bags a single machine
XGBM
Light GBM
CatBoost
Table- 3
7.9 K- Means & K- Medoids- Both are Clustering algorithm. Main goal of this clustering
algorithm are minimizing the sum squared distance in between data points and cluster
centre in which cluster we want to assign that data points. Number of clusters will already
be known to us (‘K’= Number of cluster). Objective is compute the optimal centroid. In K-
Medoids medoid means a particular point of cluster. Dissimilarities are measured in between
medoid with other data points of that same cluster. And objective is this dissimilarity should
be minimum.
Except this type of cluster there are also DBSCAB (Density- based spatial clustering of
application with noise), Agglomerative clustering, Fuzzy C- Means clustering belong in
Un-Supervised Machine Learning Algorithm.
7.10 Deep Learning- It is a method for unsupervised machine learning. It may have
worked with organised or unstructured data. The concept is same like ML and AI
that it’s build The learning algorithm by mimic the human brain. It is implemented
through the concept of Neural Network. Neural network mimic the concept of
Biological Neuron. This Neurons are said as brain cell.
In ANN there are several inputs (x1 , x2,…, xn ) like BNN. There are some
randomized taken weights ( w1 , 𝑤2 , . . , 𝑤𝑛). Now this inputs will provide to the
processing element (like cell bode in BNN). Here weights and input will
have summed up by multiplying them. This summation i.e. s=x1 ∗ w1 + x2 ∗
𝑤2 +. . +xn ∗ 𝑤𝑛 . This S is called as a Transfer Function F(S). Then this F(S)
will go through some activation function. This activation function is
nothing but Threshold value. And the fina l output of neuron is dependent
on it. Then it final result will return value 1 If final result is greater than
Threshold value, i.e. neuron will fire at that case otherwise it will generate
0 value i.e. neuron will not fire. There are lot of activation function like
sigmoid activation function, step function as well. In this way AN N works
which mimic the BNN. In ANN if actual output and desired output are not
same that means there are some error. In in Backpropagation algorithm
we will try to minimize that error. We will continue this process until we
get our desired output as our actual output. By the help of
Backpropagation weights will be adjusted. This process called learning.
Deep networks are nothing but neural network with multiple hidden layer.
In this learning every node/ neuron will be interconnected to each other.
There can be multiple hidden layer. Due to present of lot “Hidden Layer”
the concept deep has come. It is not possible in Machine learning.
8. Important Keywords-
Accuracy-
(True Positive or (TP)) + True Negative or (TN))/ (True Positive or (TP))
+ True Negative or (TN)) + (False Positive or (FP)) + (False Negative or
(FN))
Precision-
(True Positive or (TP))/ (True Positive or (TP)) + (False Positive or (FP))
Recall-
(True Positive or (TP))/ (True Positive or (TP)) + False Negative or
(FN))
F1-Measure- (2*Precision*Recall)/ (Precision + Recall)
True Positive (TP)- When we will get targeted result and obtain
result both true.
For heart disease detection they used 70K data points. Among them relevant
features like gender, age, height, weight, glucose, smoke, alcohol, cholesterol,
systolic blood pressure, diastolic blood pressure has been used for heart disease
detection. From this relevant features systolic blood pressure, diastolic blood
pressure, age, cholesterol is major reason for heart disease according to their heat
map.
For Diabetes detection total 768 data points has been used. Among them relevant
features are pregnancies, Basel metabolic rate, age, blood pressure for prediction
purpose. And this features are reason for diabetes according to their heat map.
Logistic regression has been used & stored on firebase for all dataset.
By their proposed model they successfully detect disease by asking some question
just a few second. In their work, they split their dataset into 3 portion training
(75%), testing (25%) & validation (10%). Accuracy & F- measures for this proposed
model has been predicted which is given bellow-
So, the prediction result is comparatively good. In future we will try feed more
data on android app & will try deployed firebase dataset like this.
❖ Parkinson’s disease
Another disease called Parkinson’s disease for which no particular test is
generated. When nerve cells of brain die Parkinson’s disease occurs. Lot of
symptoms are here like Tremor, changes of speech, writing, Loss of
movements or slower movement, Constipation, Weak muscular etc. Early
detection important. Various researcher works on it by Machine learning
model, Artificial model. Machine learning is the only way which can detect
this disease within few second. (V. Ulagamuthalvi et al., 2020) used
machine learning model for identification of Parkinson’s disease using
XGBoost classifier & Logistic Regression for purpose of classification. They
used UCI dataset which consisting of lot of audio signals. For pre-
processing data Min-max scaler has been used. By XGBoost classifier 96%
accuracy has been occurred where Logistic Regression gave only 79%
accuracy. So, XGBoost classifier works well in Parkinson’s disease
Except this disease more no of diseases are detected by ML/AI model which
are given-
Diseases Researcher’s Technique & Result
Chronic Kidney Disease S. Revathy et al., 2019 They used SVM, RFC,
prediction DTs for prediction. After
training and pre-
processing data DT
gave 94.16% accuracy.
Where among 120 #
instances 113
instances were correctly
classified.
10. Conclusion- Machine learning has capability of handling large amount of data.
And accuracy, precision and recall depend on the quality of dataset. There are lot of
machine learning techniques helpful for automatic diseases detection. And all
algorithm has different procedure for detection. Different type diseases experiment
by Machine learning model has been conducted for different disease like covid19,
heart disease, different type of cancer etc. So, lot of disease has been discussed here
for the purpose of decreasing high risk from disease. I have discussed about Cancers,
Chronic Kidney disease, Hepatitis, Thyroid, Liver, Blood Disease, COVID19,
Diabetes, Heart disease etc. I have explored and give a review from various
researchers what they did on DDS (Disease detection system).
Different researcher got different and excellent result. It’s necessary all diseases
should detect within limited time. But I have observed in various ML model gave
worse result just because of high dimensional data in which case early detection is
not possible. I have observed (Chithambaram T et al., 2019) they got 63.4% score for
heart diseases detection which is not too good by using K-NN classifier (not too much
relevant features). The dataset was huge. And also KNN took huge amount of time
for processing it. In this review paper I understood every ML classifier that researcher
have chosen worked well & gave good result. It’s shown Decision tree gave best
prediction result 98.83% for Heart disease detection which was good accuracy result
compare to other model. For diabetes disease detection 77.9% was the prediction
accuracy which is moderate not too good. So, it’s need to do more work on it. We will
try to build ensemble model, Deep learning model on diabetes dataset that will give
best classification result. SVM worked well in Breast cancer UCI machine learning
dataset gave accuracy result 96.40%. SVM gave 98.62% result on Thyroid disease
detection which is awesome & amazing. Another disease that for Parkinson’s
diagnosed 100% classification precision result was given by SVM. SVM gave 83.3%
accuracy for brain stroke diagnosed. J48 models gave 95.04% accuracy liver disease
diagnosis. For Alzheimer’s disease researcher got 100% classification accuracy,
another researcher got 96% accuracy by using SVM. So, in nutshell we can conclude
that SVM is a good classification algorithm & smart compare to other which showed
high accuracy. And also it’s noticed that SVM is used by every researcher frequently.
Also I have noticed 10-fold cross validation has been used for every classifier. But for
Parkinson’s disease detection FBANN gave prediction result good which is
significantly good compare to SVM which provided good specificity only that was
100%. The ML algorithm which has some good side and bad side also.
12. References
[1] (Nareen O.M.Salim & Adnan Mohsin Abdulazeez, 2021). Human Diseases
Detection Bases On Machine Learning Algorithms: A Review.
[2] (Meherwar Fatima & Maruf Pasha, 2017). Survey of Machine Learning for Disease
Diagnostic.
[3] (Shaik Razia, Swathi Prathyusha, Vamsi Krishna, & Sathya Sumana, 2018). A
review on disease diagnosis using machine learning techniques.