Advanced Computing
Advanced Computing
Garg
Kit Wong
Jagannathan Sarangapani
Suneet Kumar Gupta (Eds.)
Advanced Computing
10th International Conference, IACC 2020
Panaji, Goa, India, December 5–6, 2020
Revised Selected Papers, Part I
Communications
in Computer and Information Science 1367
Jagannathan Sarangapani•
Advanced Computing
10th International Conference, IACC 2020
Panaji, Goa, India, December 5–6, 2020
Revised Selected Papers, Part I
123
Editors
Deepak Garg Kit Wong
Bennett University University College London
Greater Noida, Uttar Pradesh, India London, UK
Jagannathan Sarangapani Suneet Kumar Gupta
Missouri University of Science Bennett University
and Technology Greater Noida, Uttar Pradesh, India
Rolla, MO, USA
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
The 10th International Advanced Computing Conference (IACC 2020) was organized
with the objective of bringing together researchers, developers, and practitioners from
academia and industry working in the area of advanced computing. The conference
consisted of keynote lectures, tutorials, workshops, and oral presentations on all aspects
of advanced computing. It was organized specifically to help the computer industry to
derive benefits from the advances of next-generation computer and communication
technology. Researchers invited to speak presented the latest developments and tech-
nical solutions in the areas of High Performance Computing, Advances in Commu-
nication and Networks, Advanced Algorithms, Image and Multimedia Processing,
Databases, Machine Learning, Deep Learning, Data Science, and Computing in
Education.
IACC promotes fundamental and applied research which can help in enhancing the
quality of life. The conference was held on 05th-06th December, 2020 to make it an
ideal platform for people to share views and experiences in Futuristic Research
Techniques in various related areas.
The conference has a track record of acceptance rates from 20% to 25% in the last
10 years. More than 10 IEEE/ACM Fellows hold key positions on the conference
committee, giving it a quality edge. In the last 10 years the conference’s citation score
has been consistently increasing, moving it into the top 10% cited conferences globally.
This has been possible due to adherence to quality parameters of review and acceptance
rate without any exception, which allows us to make some of the best research
available through this platform.
Honorary Co-chair
Sundaraja Sitharama Florida International University, USA
Iyengar
Sartaj Sahni University of Florida, USA
Jagannathan Sarangapani Missouri University of Science and Technology, USA
General Co-chair
Deepak Garg Bennett University, India
Ajay Gupta Western Michigan University, USA
M. A. Maluk Mohamed M.A.M. College of Engineering and Technology, India
Program Co-chairs
Kit Wong University College London, UK
George Ghinea Brunel University London, UK
Carol Smidts Ohio State University, USA
Ram D. Sriram National Institute of Standards and Technology, USA
Kamisetty R. Rao University of Texas at Arlington, USA
Sanjay Madria Missouri University of Science and Technology, USA
Marques Oge Florida Atlantic University, USA
Vijay Kumar University of Missouri-Kansas City, USA
Publication Co-chair
Suneet K. Gupta Bennett University, India
MaskNet: Detecting Different Kinds of Face Mask for Indian Ethnicity . . . . . 492
Abhinav Gola, Sonia Panesar, Aradhna Sharma,
Gayathri Ananthakrishnan, Gaurav Singal,
and Debajyoti Mukhopadhyay
Novel Design Approach for Optimal Execution Plan and Strategy for Query
Execution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
Rajendra D. Gawali and Subhash K. Shinde
Kit Wong received the BEng, the MPhil, and the PhD
degrees, all in Electrical and Electronic Engineering, from
the Hong Kong University of Science and Technology,
Hong Kong, in 1996, 1998, and 2001, respectively. Since
August 2006, he has been with University College London.
Prof. Wong is Fellow of IEEE and IET. He is Area
Editor for IEEE Transactions on Wireless Communica-
tions, and Senior Editor for the IEEE Communications
Letters and IEEE Wireless Communications Letters.
aparnasanthosh1911@[Link]
1 Introduction
Brain waves are electrical brain impulses. The behavior, emotions and thoughts of an
individual within our brains are communicated between neurons. Brainwaves are pro-
duced by synchronized electric pulses from neuron masses that communicate with each
other. Brainwaves happen at different frequencies [1]. Some are fast, and others are
slow. Such EEG (Electroencephalogram) Bands are generally called delta, theta, alpha,
and beta and gamma and are measured in cycles per second or hertz(Hz). Irregularity
in these waves results in several problems ranging from irregular sleeping patterns to
several neural diseases like epilepsy. An EEG (Electroencephalography) can be used to
identify possible issues relevant to the irregularity in brainwaves [2].
Electroencephalography (EEG) is a method of electrophysiological monitoring to
record activities of the brain [3]. It is a non-invasive method in which electrodes are
placed along the scalp. During EEG electrodes with wires are attached to one’s head.
This electrode detects the brain waves and the EEG machine amplifies it and later the
wave pattern is recorded on screen or paper [4]. Most commonly used for evaluating the
form and origin of seizure.
Epilepsy is a neurological disorder in which brain activity becomes abnormal result-
ing in the sensations, and loss of consciousness, exhibition of seizures, unusual behavior
[5]. The two main types of seizures include focal and generalized seizures. The focal
seizure is those which start at a particular part of the brain and are named after the origin.
Generalized seizures are those in which the brain misfires and result in muscle spasms
and blackouts.
Using new and emerging technologies like deep learning, this research paper is able
to make a directional change in the field of medical science. This can be used as a prime
tool in diagnosis, the most important phase in medical science. Epilepsy being one of the
most complicated diseases, it needs accurate detection facilities. EEG signals recorded
are analyzed by neuro physician and related specialists [6]. This detection and diagnosis
method depends solely on the decision of humans are susceptible to human prone errors
and is really time-consuming. Using deep learning algorithms, an automated alternative
solution is found that is faster and less error-prone thereby increasing the patient’s quality
of life [7].
A detailed study of CNN (Convolutional Neural Network) for epileptic seizure detec-
tion is presented in this research paper. The network performance is tested using four
approaches: a combination of a 10-fold cross-validation method and two databases
(binary and multiclass). The results are presented using the confusion matrix and as
well as by plotting accuracy-loss graphs. The overall performance of our model and
the results obtained from this study prove that the superiority of our CNN based deep
learning technique to effectively detect epileptic seizure.
2 Related Work
As deep learning is one of the most emerging and advanced technology now, there are
numerous effective studies done in order to effectively incorporate it in different means
of life. Like any other, epilepsy detection using deep learning have already undergone
different assessment by scholars. Here some of the state-of-the-art work is described
which were done earlier.
Sirwan Tofiq and Mokhtar Mohammad [8] mentioned how deep neural networks
allow learning directly on the data can be useful. In almost all machine learning applica-
tions, this approach has been hugely successful. They created a new framework which
also learns directly from the data, without extracting a set of functions. The EEG signal
is segmented into 4 segments and used to train the memory network in the long and
short term. The trained model is used to discriminate against the background of the EEG
seizure. The Freiburg EEG data set is used. Approximately 97.75% accuracy is achieved.
Vikrant Doma and Martin pirouz [9] conducted an in-depth analysis of the EEG data
set epoch and conducted a comparative study of multiple machine learning techniques
like SVM, K-nearest neighbor, LDA, Decision trees. The accuracy was between 55–75%.
Epileptic Seizure Detection Using CNN 5
The study carried out by a group of researchers Mi Li, Hongpei Xu, Xingwang
Liu, Shengfu Liu [10] used various EEG channel combinations and classified the emo-
tional states into two dimensions mainly valence and arousal. Entropy and energy than
measured as neighboring K-nearest characteristics. The accuracy was ranging from
89–95%.
Jong-Seob Yun and Jin Heon Kim [11] used the DEAP data set to classify emotions
by modeling the artificial neural network, k-NN, and SVM models by selecting EEG
training data based on the Valence as well as the Arousal values calculated using the
SAM (Self-Assessment Manikin) process methods. Accuracy of 60–70% was shown.
Ramy Hussein, Hamid Palangi, Rabab Ward, Z. Jane Wang [12] used the LSTM
network for the classification of their model. SoftMax functions were also found a
handful in their research. But the approach was found noisy and robust in real-life
situations.
3 Methodology
In this section proposed system architecture of detecting epilepsy using CNN and the
architecture of proposed CNN is described next.
Data is the basis for any machine learning based classification problem. Data collection
is a crucial task as data gathered will affect the model used for classification problem.
Initially input data of epilepsy is taken. After the data is collected it is preprocessed
as the real world data is noisy, incomplete, and inconsistent. To resolve such issues
preprocessing is done on the data set. After collecting the data it is divide into training
and testing data, where training data is large and testing data is smaller than the training
data. Then an appropriate model which suits our data set and our requirements is selected.
In our project, CNN model is taken for classifying the data into two groups. The model
makes the prediction in two classes. Class 0 is for patients suffering from epileptic
seizures and class 1 is patients not suffering.
The Convolutional Neural Network (CNN) is class of Deep Neural Network, most widely
used for working with 2d image data, although it can be used to work with 1d and 3d
data also. CNN was inspired from the biological process and its architecture is similar
to the connections between the neurons in the human brain. The name CNN refers to
the network using a mathematical operation called convolution.
A CNN usually consists of an input layer, an output layer and multiple hidden layers
sometimes only one hidden layer is present. Typically, the hidden layers of a CNN
consist of a series of convolutional layers which converge with a multiplication or other
dot product (Fig. 1).
6 D. Acharya et al.
CNN model as the ability to learn the filters and these filters are usually smaller
in size than the input and dot product is applied between the filter sized patch of the
input and the filter which is added to get a value. Sometimes the size of the output data
may be different from the input data so to retain the size and make it equal padding
is done. Specifically, it is possible to inspect and visualize the two-dimensional filters
learned by the model to discover the types of features that the model can detect, and it is
possible to inspect the activation maps produced by convolutional layers to understand
precisely what features were detected for a given input. Compared to other classification
algorithms the pre-processing required for CNN is much lower.
CNN is a method of information processing which is influenced by the way infor-
mation is processed by the biological neural network which is the human brain. The
main goal is to build a system that performs specific computational tasks faster than the
conventional systems. These tasks include the identification and classification of pat-
terns, approximation, optimization and clustering of data. This includes a huge number
of highly interconnected processing units that integrate to solve a particular problem.
Messages passing through the network can influence the configuration of ANN when a
neural network changes or learns depending on the I/O.
Figure 2 shows the architecture of our CNN model. The CNN model consists of an
input layer and output layer and 4 hidden layers which has a dense layer. The data set is
fed to the input layer and filters are applied which produce an output which are fed as
input to the next layer.
Max pooling and dropout are applied on the 1d convolutional layers to avoid over-
fitting of the data and to reduce the computational costs. Max pooling will take the
maximum value from the previous layer as the neuron in the next layer, and dropout
reduces the number of neurons performing in the convolution to reduce computation
cost. Relu activation function is used for the convolutional layers and SoftMax func-
tion is used on the output layer. The output layer classifies the data into seizure or not
seizure.
Epileptic Seizure Detection Using CNN 7
4 Experimental Results
This section contains the description of dataset used, implementation of proposed CNN
model, and discussion and analysis on results obtained as follows:
4.1 Dataset
Figure 3 shows the dataset and is held by the UCI machine learning repository [13].
It is a preprocessed dataset. The sampling rate of the data was 173.61 Hz. It includes
11500 rows and 179 attributes with the closing attribute representing the class. The data
set includes the recording of 500 people’s brain activity. At a given time point, each
data point represents the EEG recording value. So for 2.3.6 s, there are 500 individuals
in total with 4097 data points. These 4097 statistical points were divided into 23 parts,
each part containing 178 data factors, and each data point represents the EEG recording
value at an exceptional time factor. Now the dataset have 23 * 500 = 11500 excerpts
of facts (rows) and each record incorporates 178 data points. The last column shows
the values of y that are 1, 2, 3, 4, 5. The dataset is converted into binary class problem
having epilepsy and non-epilepsy for classification.
The values of y are given in 179 dimensional input vector column and the explana-
tory variables are referred to as X1, X2,…., X178. The label in y represents the EEG
recordings of people taken from different states.
Class 2 to 5 are the records with people not having epileptic seizures and class 1 is
for people suffering from epileptic seizure. In this research paper we have implemented
a binary classification and multi class classification by means of thinking about class 1
as people who are epileptic and the rest of the classes are blended and made right into
a class 0 that is taken into consideration to be the folks who are not epileptic for binary
class.
8 D. Acharya et al.
Our proposed CNN model is a 1 dimensional fully connected sequential model with
an input layer an output layer and 4 hidden layers which has one dense layer. Imple-
mented the model using two approaches one is by splitting the data set in 70–30 ratio
and the other by 10 fold cross validation method. For the implementation of the CNN
algorithm, Kera’s API was used to develop the CNN model with input size as 178 × 1.
The data set was divided into training and testing in the ratio 70:30. The input is fed into
the CNN architecture with a sequence of convolutional and pooling layers. Max pooling
and dropout is applied on the convolutional layers to avoid overfitting and to reduce the
computational cost. Padding was applied for each layer and a stride of 2 was applied.
SoftMax and ReLu were used as the activation functions and Adam as the optimizer.
Compilation of the CNN model was done by specifying the loss function as “categorical
cross-entropy” and evaluation metric as “accuracy”. Training of the CNN model was
done with batch-size equal to 16 for 200 epochs. When validated with the test set, an
accuracy of 97.19% was obtained.
In the 10-fold cross validation model K-Fold was imported
from sklearn.model_selection package and the number of folds were taken as 10.
The same model was implemented within 10 folds each with batch size 20 and
200 epochs and accuracy was calculated for each fold and the mean accuracy of
all the folds was taken, and the model achieved an accuracy of 98.32%. Using 10
fold cross validation, better accuracy is provide for both testing and training and
it is also beneficial when the data set size is small.
Epileptic Seizure Detection Using CNN 9
To ensure that the results were valid and generalizable to make predictions from new
data the detection was further tested on a different multiclass data set. For the new data
set validation a splitting of the training and testing data in 70 and 30 ratio and 10 fold
cross validation is done. When the dataset is divided into 70–30 ratio is it has obtained an
accuracy of 77%. When 10 fold cross validation was implemented the obtained accuracy
is 90.2%.
Convolutional Neural Networks are computationally efficient in terms of memory
and time because of parameter sharing. They tend to perform better than regular neural
networks. However, CNN has high computational cost and training is slow if you don’t
have a good GPU. In addition, they demand large training data in order to make accurate
classifications.
Table 1 shows the hyper-parameters used for CNN model. A lot of hyper-parameter
tuning was carried out while finalizing the Network parameters. In the CNN architecture,
Conv1D layers are used because it is most suitable for time series data. Both Max
pooling and average pooling was tried but, max pool gave better results as expected from
the literature. Other parameters like the number of epochs, batch size, optimizer, loss
function, activation functions and learning rates were finalized using the Grid Search.
The epoch size finalized for the CNN architecture is 200 with batch size of 16. The
models are trained on various train test splits like 80–20 and 75–25 and K-fold cross
validation with 10 folds is also used for finding the most appropriate metrics-accuracy.
The loss function used by them for updating the weights during back-propagation is
categorical cross entropy and the optimizer used is Adam. Activation function for the
last layers is SoftMax and ReLu.
Collected the multiclass data set to predict epileptic seizures and have preprocessed the
data to fill in the missing values and performed binary classification on the data to
predict if the patient has epileptic seizure or doesn’t have epileptic seizure. Proposed
CNN model is used as the classifier. The data was split into training and testing in the
ratio 70–30 split and in 10-fold cross validation to check the performance of our model
classifying epileptic or not epileptic data. The performance was evaluated using different
performance metrics such as accuracy, recall, precision and f1 score.
10 D. Acharya et al.
Fig. 4. (a) loss vs epoch performance graph of the proposed CNN model for 70–30 validation
technique (b) accuracy vs epoch graph of the proposed CNN model for 70–30 validation technique
Fig. 5. (a) loss vs epoch performance graph of the proposed CNN model for 10-fold cross val-
idation technique (b) accuracy vs epoch graph of the proposed CNN model for 10-fold cross
validation
Figure 5 shows the accuracy vs epoch and loss vs epoch graphs for the CNN model
where the validation technique used is 10-fold cross validation. From the Figure it’s clear
that our model is minimizing loss up to 0.02. The results show that CNN has achieved
Epileptic Seizure Detection Using CNN 11
higher accuracy and loss is low compared to the accuracy. Therefore, achieved accu-
racy up to 98.32% for 200 epochs of the testing phase for the 10-fold cross valida-
tion. Also, here validation loss has deviated from training loss but the deviation is not
too much which indicates our model is not over fitted and they are no overlapping, which
indicates our model is neither under-fitted.
A confusion matrix is a table that compares the actual values to the predicted values,
therefore evaluating the performance of the classifier on test data to which the true values
are known. Fig. 6 and Fig. 7 show that our model is able to classify the classes correctly.
The matrix shows high TP and TN values compared to the low FP and FN values therefore
this can be stated that our model is able to predict correct samples correctly with higher
accuracy.
Fig. 6. Confusion matrix of the proposed CNN model for 70–30 validation technique
Fig. 7. Confusion matrix of the proposed CNN model for 10-fold cross validation technique
A binary classification was done to predict epileptic seizures. The data set was
divided into training and testing with 70 and 30 ratio and the accuracy obtained was
97.72%. When the 10 fold cross validation was applied the model obtained a slightly
12 D. Acharya et al.
better accuracy of 98.32% than the previous one as shown in Table 2. Therefore, it is
stated that cross validation of the data set has given better accuracy. In the Table 2 various
performance metrics were also evaluated such as precision, recall, f1 score.
The proposed CNN model has achieved recall of 97.65% and 99.71% for 70–30
and 10-fold cross validation data partition method. In terms of precision also our model
has achieved 96.64% and 91.21% values for 70–30 and 10-fold cross validation data
partition method.
Since the dataset was highly unbalanced as more samples of non-epileptic data was
there so F1-score is also calculated for the proposed CNN model. For 70–30 and 10-fold
cross validation data partition method F1-score obtained is 97.14% and 95.27% which
proves that our proposed CNN based epilepsy classification model is able to handle and
accurately classify unbalanced dataset too as shown in Table 2.
Validation technique Accuracy (%) Precision (%) Recal l(%) F1-score (%)
70 and 30 ratio 97.72 96.64 97.65 97.14
Cross validation 98.32 91.21 99.71 95.27
Fig. 8. (a) accuracy vs epoch graph of the proposed CNN model for 70–30 validation technique
for multiclass dataset (b) loss vs epoch performance graph of the proposed CNN model for
70–30 validation technique for multiclass dataset
Figure 8 shows the accuracy vs epoch and loss vs epoch graphs for the CNN model
where the validation technique used is splitting the data in 70–30 ratios. Categorical
Epileptic Seizure Detection Using CNN 13
Loss entropy function is used to calculate loss and accuracy is used as our metric.
From the figure it’s clear that our model is minimizing loss up to 0.5. The results show
that our model has achieved higher accuracy and loss is low compared to the accu-
racy which means our model has got lower FP and FN values. It has achieved average
accuracy of 78% for 200 epochs.
Figure 9 shows the accuracy vs epoch and loss vs epoch graphs for the CNN model
where the validation technique used is 10-fold cross validation. From the figure it’s clear
that our model is minimizing loss up to 0.2. The results show that our CNN model has
achieved higher accuracy and loss is low compared to the accuracy which means our
model has got lower FP and FN values. Therefore, the proposed model has achieved
accuracy of 89.40% for 200 epochs of the testing phase for the 10-fold cross validation.
Also, here validation loss has deviated from training loss but the deviation is not too
much which indicates our model is not overfitted and they are no overlapping, which
indicates our model is neither underfitted.
Fig. 9. (a) accuracy vs epoch graph of the proposed CNN model for 10-fold cross validation
for multiclass data set (b) loss vs epoch performance graph of the proposed CNN model for 10-
fold cross validation for multiclass data set
A confusion matrix is a table that compares the actual values to the predicted values,
therefore evaluating the performance of the classifier on test data to which the true
values are known. Fig. 10 and Fig. 11 show that our model is able to classify the classes
correctly. The matrix shows high TP and TN values compared to the low FP and FN
values therefore we can say that our model is able to predict correct samples correctly
with a good accuracy.
In the Table 3, values for different performance metrics such as precision, recall,
f1 score for 4 different classes is represented. In the data set class 0 is Sad, class 1 is
Amusement, class 2 is Disgust and class 3 is Fear. The overall accuracy for the multiclass
data set using the 70 and 30 ratios of splitting of data is 78.9% and a loss of 0.5. For
the 10-fold cross validation the overall accuracy is 89.40% and a loss of 0.3. Therefore,
10-fold cross validation has given better results in terms of accuracy and loss and other
performance metrics also.
14 D. Acharya et al.
Fig. 10. Confusion matrix of the proposed CNN model for 70–30 validation technique having
multiclass data set
Fig. 11. Confusion matrix of the proposed CNN model for the 10-fold cross validation technique
having multiclass data set
The proposed CNN model has achieved highest value of recall for multiclass classi-
fication as 0.94 in class 0 for 70–30 and 0.85 in class 3 for 10-fold cross validation data
partition method. In terms of precision for multiclass classification also our model has
achieved highest value as 0.93 and 0.94 by class 2 in both the dataset partition method
i.e., 70–30 and 10-fold cross validation.
Since the dataset was highly unbalanced as more samples of non-epileptic data was
there so F1-score is also calculated for the proposed CNN model. For 70–30 and 10-fold
cross validation data partition method the highest F1-score obtained is 0.70 in class 0
and 0.80 in class 0 which proves that our proposed CNN based epilepsy classification
model is able to handle and accurately classify unbalanced dataset too as shown in Table
3.
Epileptic Seizure Detection Using CNN 15
Acknowledgment. This research work is performed under the nation wise initiative leadingin-
[Link] and Bennett University, India. They have supported us with lab and equipment during the
experiments.
References
1. Bhardwaj, A., et al.: An analysis of integration of hill climbing in crossover and mutation
operation for EEG signal classification. In: Proceedings of the 2015 Annual Conference on
Genetic and Evolutionary Computation (2015)
2. Acharya, D., Goel, S., Bhardwaj, H., Sakalle, A., Bhardwaj, A.: A long short term memory
deep learning network for the classification of negative emotions using EEG signals. In: 2020
International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, pp. 1–8 (2020).
[Link]
16 D. Acharya et al.
3. Bhardwaj, H., et al.: Classification of electroencephalogram signal for the detection of epilepsy
using innovative genetic programming. Expert Syst. 36(1), e12338 (2019)
4. Acharya, D., et al.: An enhanced fitness function to recognize unbalanced human emotions
data. Expert Syst. Appl. 166, 114011 (2020)
5. Acharya, U.R., et al.: Application of entropies for automated diagnosis of epilepsy using EEG
signals: a review. Knowl.-Based Syst. 88, 85–96 (2015)
6. Acharya, D., et al.: Emotion recognition using fourier transform and genetic programming.
Appl. Acoust. 164, 107260 (2020)
7. Acharya, D., et al.: A novel fitness function in genetic programming to handle unbalanced
emotion recognition data. Pattern Recogn. Lett. 133, 272–279 (2020)
8. Jaafar, S.T., Mohammadi, M.: Epileptic Seizure Detection using Deep Learning Approach.
UHD J. Sci. Technol. 3(41), 41–50 (2019). [Link]
9. Doma, V., Pirouz, M.: A comparative analysis of machine learning methods for emotion
recognition using EEG and peripheral physiological signals. J. Big Data 7(1), 1–21 (2020).
[Link]
10. Li, M., Xu, H., Liu, X., Liu, S.: Emotion recognition from multichannel EEG signals using
k-nearest neighbour classification. Technol. Health Care 26(S1), 509–519 (2018). [Link]
org/10.3233/THC-174836
11. Yun, J.-S., Kim, J.H.: A Study on “training data selection method for EEG emotion analysis
using machine learning algorithm.” Int. J. Adv. Sci. Technol. 119, 79–88 (2018). [Link]
org/10.14257/ijast.2018.119.07
12. Hussein, R., Palangi, H., Ward, R., Wang, Z.J.: Epileptic seizure detection: a deep learning
approach, March 2018. arXiv:1803.09848 [[Link]]
13. Andrzejak, R.G., Lehnertz, K., Rieke, C., Mormann, F., David, P., Elger, C.E.: Indications
of nonlinear deterministic and finite dimensional structures in time series of brain electrical
activity: dependence on recording region and brain state. Phys. Rev. E 64, 061907 (2001)
Residual Dense U-Net for Segmentation
of Lung CT Images Infected
with Covid-19
1 Introduction
bat [1] whose primary sources were considered to be wet markets. High transmis-
sion rate of the novel COVID-19 is so threatening that it has forced humankind
to take shelters for long lock down periods. It created a threatening situation of
increasing clinical treatment forcing medical workers to work round the clock to
help the infected beings risking their own life. It is observed that a COVID-19
positive will infect roughly three new susceptible (the reproductive number [2]
is averaged to be 3.28) and the number increases even more if precautions are
not taken. Symptoms in patients infected with Covid-19 vary from person to
person based on immune response, with some patients remaining asymptomatic
[3], but the common ones are fever, cough, fatigue and breathing problems. It
was reported that [4] that 44% of the patients from China suffered from fever
in the beginning whereas that 89% of them developed a fever while in hospi-
tal [5]. It was also revealed later that the patients had varying symptoms like
cough (68%), fatigue (38%), sputum production (34%), and shortness of breath
(19%) and some of them who already were suffering from other illness where
more vulnerable to the impact of COVID-19. Not every community has suffi-
cient infrastructure for dealing with outbreaks like this, so there is a need to do
whatever we can to control.
A standard procedure is recommended by the World Health Organization
(W.H.O.) to test the presence of pathogens in the suspected host known as Real-
Time Fluorescence (RT-PCR) [6] for the in this procedure an oropharyngeal or
a nasopharyngeal swab is used to collect the specimen of a suspected being to
determine the nuclei acid in the sputum [7]. Still due to its high false positive rate,
resampling of the suspected person is suggested by W.H.O. Computer Tomog-
raphy (CT scan) imaging technique is one of the good options for the diagnosis
of SARS- CoV2 virus [8]. With demand of finding a Vaccine for the COVID-19
(SARs-COV2) many laboratory and pharmaceutical industries are working to
design vaccine based on immune response, targeting specific epitopes for bind-
ing sites. But part from these classic and important procedures and researches
it was discovered that subjects infected COVID-19 form abnormalities such as
bilateral, and unilateral pneumonia involves the lower lobes, pleural thickening,
pleural effusion, and lymphadenopathy, which is then analyzed by experts for
such characteristics features for diagnosis. Computer Aided Diagnosis (CAD)
tools help in better diagnosis from the CT scans [9] are based on some applica-
tion of machine learning algorithms. Moreover, CT scans improved false negative
rate compared to RT-PCR. Several studies have exploited deep learning archi-
tectures for various applications in medical imaging viz. lesion segmentation,
object/cell detection, tissue segmentation, image registration, anatomy localiza-
tion etc. Dice similarity coefficient is widely used to validate the segmentation
of white matter lesions in MRIs and CT scans [10]. In a recent work, Chen et
al. [11] proposed a residual attention U-net for automatic quantification of lung
infection in Covid-19 cases. They used aggregated residual transforms ResNet
blocks on the encoder side followed by soft attention. It is focused on relative
position of features on the decoder side in a U-Net like architecture evaluated
for multi class segmentation on Covid-19 data from Italian Society of Medical
Residual Dense U-Net 19
2.1 Dataset
Medical scans and data are usually private as they contain the information
of patients making it hard to access publicly. But due to the rapid spread of
Covid-19 many researches and organizations have released datasets which can
be accessed publicly for CAD development. This research is based on two pub-
licly available datasets described below.
COVID-CT. This CT- Scans based Covid-19 dataset [17]1 consists of 349 CT
images containing clinical findings of Covid-19 and numerous Normal patients’
slices.
1
[Link]
20 A. Srivastava et al.
Fig. 1. Two different masks consolidation and pleural effusion for a Covid-19 patient
which was the prime task as multi-class segmentation from CTSeg dataset [18].
In this section the components of the proposed model viz. Dense Residual Blocks,
U-Net and Residual Connections are described in length.
Residual Blocks. Residual blocks [19] are a special case of highway networks
without any gates in their skip connections. Essentially, residual blocks allow
the flow of memory from initial layers to last layers and avoiding training of
some parameters for our output segmentation. Despite the absence of gates in
their skip connections, residual networks perform as good as any other highway
network in practice.
Residual Block ease the training of few layers due to its skip connection by
producing an identity function which makes the model to learn the F(x) part
which is easier to learn than the H(x) part as mentioned in Fig. 2. We deployed
several residual blocks on the encoder decoder parts to avoid gradient vanishing
during training.
U-Net. The U-net architecture is designed mainly for segmentation of Bio Med-
ical images. The encoder part comprises of several Fully Connected Networks
(FCN) [20] to extract the spatial features from the subject, similarly decoder
is equipped with series of convolution, up-sample layers and skip connections
between the two, to retain the features from each encoder levels. But range of
interest of U-Net is very small, and do not have enough capability to distinguish
those trivial difference.
2
[Link]
Residual Dense U-Net 21
Fig. 2. Canonical form of ResNet Block. A skip connection allows reusing of activations
from previous layer till current layer learns its weights hence avoiding vanishing gradient
in the initial back propagation.
Fig. 4. Residual dense block consisting of dense connected layers, local residual learning
through the Rd feature maps produced due to concatenation of feature map obtained
through densely connected [Bd,1, Bd,2, Bd,3, Bd,4, Bd,5] and Rd-1, leading to a con-
tiguous memory (CM) mechanism and improve the information flow.
3 Proposed Model
map from extracted spatial and hierarchical features from all convolution layers
in encoder. Full description of the model layers s provided in the Table 1 along
with the hyper parameters used during training process Table 2.
Fig. 5. 3-RrDB Network consisting of RDB block which is used in later stage for
Encoder stem of U-Net through which information flow from input is processed through
Global Residual Learning by concatenating the feature maps produced through Local
Residual Learning of 3-RrDB blocks and feature maps produced by the Encoder of U-
net. The extracted feature maps from the encoder branch are passed through 3-RrDB
Network blocks and concatenated with feature maps of Encoder to give rise to Global
Residual Pooling.
Rg = R0 + [RdI ] (1)
These feature maps obtained through 3-RrDB Network is fed into Decoder
part of RrDB-U-Net. A skip connection is added from each filter level from
encoder straight with decoder at every interval in order to get better precise
locations. The traditional CNN used in the decoder often have limited receptive
field which creates a shallow feature map of the encoder output. The dense
blocks are a continuous memory mechanism preserves both the low dimensional
features as well as high dimensional features of encoder output which is shown
in Eq. (2 to 8).
X → C1 → X1 (2)
(X, X1 ) → C2 → X2 (3)
(X, X1 , X2 ) → C3 → X3 (4)
(X, X1 , X2 , X3 ) → C4 → X4 (5)
(X, X1 , X2 , X3 , X4 , X5 ) → C5 → X5 (6)
X5 = X5 ∗ α (7)
X = X + X5 (8)
Where X denotes the input to the decoder layer, C1 is the first Convolution
layer, C2 is the second Convolution layer, C3 is the third Convolution layer,
C4 is the fourth Convolution layer, C5 is the fifth Convolution layer and α is a
constant. The lower output channels of (X1 , X2 , X3 , X4 , X5 ) ensures that the
continuous mechanism of the dense blocks stay intact. At each level of dense
24 A. Srivastava et al.
Fig. 7. Proposed residual dense U-net with residual connection and 3-RrDB network.
blocks only necessary higher as well as lower dimensional features are extracted
and propagated for the decoder layers to allow better generation of mask.
Extraction of quality information is one of the tough tasks that need to be
addressed before designing any model due to the presence of some proportion of
SNR (Signal to Noise Ratio) in the CT scan during acquisition. This may result
in poor performance of deep convolutional networks. To address this issue RrDB
blocks were included in the U-Net. U-Net improves the flow of information,
which leads to a dense fusion of features along with deep supervision, acting as a
catalyst, to learn fine line features from and around the region of interest as the
deep model has a strong representation capacity to capture semantic information.
Residual Dense U-Net 25
Table 1. Dimension description of each layer incorporated within the proposed con-
volution model
Number Type of Output Output Kernel Number Type of Output Output Kernel
of Layers Layer Features Size Size of Layers Layer Features Size Size
1 Input Layer 1 512*512 NA 41 Convolution a8 32 32*32 3*3
2 ResNet Layer R1 32 512*512 (3*3), (3*3), (1*1) 42 Leaky Relu l 9 32 32*32 Alpha = 0.25
3 Convolution C1 32 512*512 3*3 43 Concatenate c8 640 32*32 NA
4 Maxpool M1 32 256*256 2*2 44 Convolution a9 512 32*32 3*3
5 ResNet Layer R2 64 256*256 (3*3), (3*3), (1*1) 45 Leaky Relu l 10 512 32*32 Alpha = 0.25
6 Convolution C2 64 256*256 3*3 46 Lambda 2 512 32*32 x * 0.4
7 Maxpool M2 64 128*128 2*2 47 Add 2 512 32*32 NA
8 ResNet Layer R3 128 128*128 (3*3), (3*3), (1*1) 48 Convolution a10 32 32*32 3*3
9 Convolution C3 128 128*128 3*3 49 Leaky Relu l 11 32 32*32 Alpha = 0.25
10 Maxpool M3 128 64*64 2*2 50 Concatenate c9 544 32*32 NA
11 ResNet Layer R4 256 64*64 (3*3), (3*3), (1*1) 51 Convolution a11 32 32*32 3*3
12 Convolution C4 256 64*64 3*3 52 Leaky Relu l 12 32 32*32 Alpha = 0.25
13 Maxpool M4 256 32*32 2*2 53 Concatenate c10 576 32*32 NA
14 Convolution C5 512 32*32 3*3 54 Convolution a12 32 32*32 3*3
15 Convolution C6 512 32*32 3*3 55 Leaky Relu l 13 32 32*32 Alpha = 0.25
16 Convolution a1 32 32*32 3*3 56 Concatenate c11 604 32*32 NA
17 Leaky Relu l 1 32 32*32 Alpha = 0.25 57 Convolution a13 32 32*32 3*3
18 Concatenate c1 544 32*32 NA 58 Leaky Relu l 14 32 32*32 Alpha = 0.25
19 Convolution a2 32 32*32 3*3 59 Concatenate c14 640 32*32 NA
20 Leaky Relu l 2 32 32*32 Alpha = 0.25 60 Convolution a14 512 32*32 3*3
21 Concatenate c2 576 32*32 NA 61 Leaky Relu 15 512 32*32 Alpha = 0.25
22 Convolution a3 32 32*32 3*3 62 Lambda 3 512 32*32 x * 0.4
23 Leaky Relu l 3 32 32*32 Alpha = 0.25 63 Add 3 512 32*32 NA
24 Concatenate c3 608 32*32 NA 64 Lambda 4 512 32*32 x * 0.2
25 Convolution a4 32 32*32 3*3 65 Add 4 512 32*32 NA
26 Leaky Relu l 4 32 32*32 Alpha = 0.25 66 DropOut 1 512 32*32 NA
27 Concatenate c4 640 32*32 NA 67 Up Sampling 1 512 64*64 2*2
28 Convolution a5 512 32*32 3*3 68 Convolution C7 256 64*64 3*3
29 Leaky Relu l 5 512 32*32 Alpha = 0.25 69 Convolution C8 256 64*64 3*3
30 Lambda 1 512 32*32 x * 0.4 70 Up Sampling 2 256 128*128 2*2
31 Add 1 512 32*32 NA 71 Convolution C9 128 128*128 3*3
32 Convolution a6 32 32*32 3*3 72 Convolution C10 128 128*128 3*3
33 Leaky Relu l 6 32 32*32 Alpha = 0.25 73 Up Sampling 3 128 256*256 2*2
34 Concatenate c5 544 32*32 NA 74 Convolution C11 64 256*256 3*3
35 Convolution a6 32 32*32 3*3 75 Convolution C12 64 256*256 3*3
36 Leaky Relu l 7 32 32*32 Alpha = 0.25 76 Up Sampling 4 64 512*512 2*2
37 Concatenate c6 576 32*32 NA 77 Convolution C13 32 512*512 3*3
38 Convolution a7 32 32*32 3*3 78 Convolution C14 32 512*512 3*3
39 Leaky Relu l 8 32 32*32 Alpha = 0.25 79 Convolution C15 32 512*512 3*3
40 Concatenate c7 604 32*32 NA 80 Output Segmented Mask 1 512*512 NA
In contrast, to the fact that deeper model is hard to train, performance was
facilitated with easy training and better performance.
Table 2. List of hyperparameters used for training the proposed network for COVID-
19 CT scan segmentation
Hyperparameter values
Epochs 150
Batch sizes 20
Activation function Softmax, leaky relu, sigmoid [30, 31]
Optimizers Adam [29]
Loss Categorical crossentropy
Learning rate 0.001
Performance matrices Dice coefficient, accuracy
to prevent noises and black frame issues in raw data. The total of 838 images was
split into training set (60%), validation set (20%), and test set (20%). Experiment
was performed with 150 number of epochs on intel i5 8th Gen Intel CoreTM i5
9300H (2.4 GHz, up to 4.1 GHz, 8 MB cache, 4 cores) + NVIDIA GeForce
GTX 1050 (3 GB) GPU.
Fig. 9. Plot between training and validation data confirms that no over-fitting and
under-fitting takes place and model converges nearly around 30–40 epochs.
with very high variance to perfectly model the training data hence, resulting
poor performance on test/validation set.
Fig. 10. Results of the proposed architecture (A) For the (i) lungs effected due to
the COVID-19 labelled as from (ii) consolidation along with the generated segmented
mask (in green) in (iii). (B) Similar to the above cases where (i) CT scan of human
lungs labelled as (ii) pleural and its (iii) generated mask (in blue) (C) Atlast cases
where both the consolidation and pleural cases were identified ((i), (ii), (iii)) and its
(iv) segmented masks in green and blue for both the labels reprectively (Color figure
online)
5 Conclusion
CT imaging is used for screening Covid-19 patients and for analyzing the sever-
ity of the disease. For Computer Aided Diagnosis, deep learning has played an
important role. In this work, we explored the use of Residual Dense U-Net for seg-
mentation of lung CT Images infected with Covid-19. The proposed approach can
accurately and efficiently identify regions of interest within CT images of patients
infected withCovid-19. As current clinical tests take relatively longer time, this
approach of incorporating RrDB blocks in the standard encoder decoder struc-
ture of U-Net improves the quality of segmentations and proves as a useful
component in COVIDs-19 analysis and testing through CT images. A superior
performance was observed with dice coefficient of 97.6%. It was observed that
Residual Dense U-Net 29
References
1. Zhou, P., et al.: A pneumonia outbreak associated with a new coronavirus of prob-
able bat origin. Nature 579, 270–273 (2020). [Link]
2012-7
2. Liu, Y., Gayle, A., Annelies, W. S., Rocklöv, J.: The reproductive number of
COVID-19 is higher compared to SARS coronavirus. J. Travel Med. 27 (2020).
[Link]
3. Gao, Z., et al.: A Systematic Review of Asymptomatic Infections with COVID-19.
J. Microbiol. Immunol. Infect. (2020). [Link]
4. Huang, C., et al.: Clinical features of patients infected with 2019 novel coronavirus
in Wuhan, China. Lancet. 395, 497–506 (2020). [Link]
6736(20)30183-5
5. Guan, W.J., et al.: Clinical Characteristics of Coronavirus Disease 2019 in China
(2020). [Link]
6. Ai, T., et al.: Correlation of chest CT and RT-PCR testing for coronavirus disease
2019 (COVID-19) in China: a report of 1014 cases. Radiology. 296 (2020). https://
[Link]/10.1148/radiol.2020200642
7. Di Gennaro, F., et al.: Coronavirus diseases (COVID-19) current status and future
perspectives: a narrative review. Int. J. Environ. Res. Public Health 17, 2690
(2020). [Link]
8. Yang, W., Yan, F.: Patients with RT-PCR-confirmed COVID-19 and normal chest
CT. Radiology. 295 (2020). [Link]
9. Lee, E., Ng, M.Y., Khong, P.: COVID-19 pneumonia: what has CT taught
us? Lancet Infect. Dis. 20, 384–385 (2020). [Link]
3099(20)30134-1
10. Zijdenbos, A., Dawant, B., Margolin, R., Palmer, A.: Morphometric analysis of
white matter lesions in MR images. IEEE Trans. Med. Imaging 13, 716–24 (1994).
[Link]
11. Chen, X., Yao, L., Zhang, Y.: Residual attention U-Net for automated multi-class
segmentation of COVID-19 chest CT images (2020). arXiv:2004.05645
12. Shan, F., et al.: Lung infection quantification of Covid-19 in CT images with deep
learning (2020). arXiv:2003.04655
13. Wu, Y.H., et al.: JCS: An explainable Covid-19 diagnosis system by classification
and segmentation (2020). arXiv:2004.07054
14. Zhou, T., Canu, S., Ruan, S.: An automatic Covid-19 CT segmentation network
using spatial and channel attention mechanism (2020). arXiv:2004.06673
15. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomed-
ical image segmentation (2015). arXiv:1505.04597
16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition
(2015). arXiv:1512.03385
17. Zhao, J., Zhang, Y., He, X., Xie, P., Covid-CT (dataset): a CT scan dataset about
Covid-19 (2020). arXiv:2003.13865
18. Jenssen, H.B., Covid-19 CT-segmentation (dataset). [Link]
com/covid19/. Accessed 13 April 2020
30 A. Srivastava et al.
19. Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network for image
super-resolution. In: Conference on Computer Vision and Pattern Recognition, pp.
2472–[Link]/CVF (2018). [Link]
20. Basha, S.H.S., Dubey, S.R., Pulabaigari, V., Mukherjee, S.: Impact of fully con-
nected layers on performance of convolutional neural networks for image classifi-
cation. Neurocomputing (2019). [Link]
21. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-
level performance on ImageNet classification (2015). arXiv:1502.01852
22. Freeman, T.G.: The Mathematics of Medical Imaging: A Beginner’s Guide.
Springer Undergraduate Texts in Mathematics and Technology. Springer, Heidel-
berg (2010)
23. Keiron, O.S., Nash, R.: An introduction to convolutional neural networks (2015).
arXiv:1511.08458
24. Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory, vol. 9 of Neural Com-
putation. 8th edn. Cambridge, London (1997)
25. Wang, X., et al.: ESRGAN: enhanced super resolution generative adversarial net-
works (2018). arXiv:1809.00219
26. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by
reducing internal covariate shift (2015). arXiv:1502.03167
27. Salman, S., Xiuwen, L.: Overfitting mechanism and avoidance in deep neural net-
works (2019). arXiv:1901.06566
28. Shamir, R.R., Duchin, Y., Kim, J., Sapiro, G., Harel, N.: Continuous dice coeffi-
cient: a method for evaluating probabilistic segmentations. medRxiv and bioRxiv
(2018). [Link]
29. Diederik, K., Jimmy, B.: Adam: A method for stochastic optimization (2014).
arXiv:1412.6980
30. Maas, A. L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural net-
work acoustic models. In: International Conference on Machine Learning (2013)
31. Nwankpa, C., Ijomah, W., Gachagan, A., Marshall, S.: Activation functions:
comparison of trends in practice and research for deep learning (2018).
arXiv:1811.03378
Leveraging Deep Learning and IoT
for Monitoring COVID19 Safety Guidelines
Within College Campus
drkalbande@[Link]
1 Introduction
Coronavirus 2019, since the day it originated in Wuhan city of Hubei Province of China
in December 2019, was declared a pandemic on March 11, 2020. Globally, 14.6 Million
confirmed cases had been reported with 610110 death cases by July 21, 2020. India
registered its first COVID19 case of a student returned from Wuhan, China, in the state
of Kerala, on January 30, 2020. Following this, numerous incidents were reported from
different states of the country, mainly from travelers returning from abroad, and then
local transmission led to widespread COVID19.
The graph depicts the severity of this pandemic and the rate at which it is spreading.
The trajectory for all the affected countries started when 100 confirmed cases were
reported within that country. This helps us in realizing how quickly the number of
confirmed cases has grown worldwide. India recorded its 1 million cases on July 17,
2020 (Fig. 1).
COVID-19 displays clinical symptoms varying from a state where symptoms are not
seen to multiple organ dysfunction syndromes and acute respiratory distress syndrome.
Conferring to the release of a recent study led by the World Health Organization based
on confirmed laboratory cases, a majority showed clinical characteristics like fever being
the most common symptom with 87.9%, dry cough with 67.7%, fatigue with 38.1% and
sputum production was seen in 33.4%. Few cases had symptoms like a sore throat with
13.9%, headache with 13.6%, myalgia with 14.8%, and breathlessness in 18.6%, while
symptoms such as nausea were seen in 5.04%, nasal congestion in 4.8%, hemoptysis in
0.9%, diarrhea in 3.7%, and conjunctival congestion in 0.8% were seen rarely [3].
At its inception, Coronavirus research was linked with the exposure of humans to
suspected animals’ species; the sudden outburst and quick spread have changed the
direction of research to transmission due to human contacts. The study of COVID-19
cases has confirmed that the Coronavirus is principally transmitted amongst humans
due to the spread of respiratory droplets via coughing and sneezing [4]. Respiratory
droplets can cover a distance of up to 6 feet (1.8 m). Thus, any human being coming
in close contact with another infected person is at high risk of getting exposed to these
virus traces and can contract the Coronavirus. Touch in any form directly or indirectly
Leveraging Deep Learning and IoT 33
with surfaces that are infected has been acknowledged as one of the likely reasons for
Coronavirus spread. There is proof which reveals that coronavirus can live on metal and
plastic surfaces for three days, on cardboard, it remains up to 24 h and on copper for
nearly 4 h [5].
As the world struggles due to the COVID-19 pandemic, it is very much required to
follow useful preventive guidelines to reduce the probability of being another fatality.
Every Individual and group must adhere to the practices given below, and if these prac-
tices are strictly followed, the world may soon see a flattened Coronavirus curve. Curve
Flattening indicates lowering the transmission of the Coronavirus to the level where
available healthcare arrangements can adequately manage the effect of the disease.
1. Hands must be washed more often using an alcohol-based sanitizer or use soap and
water to wash them thoroughly at regular intervals if you are away from home.
2. Practice social distancing – maintain a distance of 1 m from others
3. Make sure you don’t touch your eyes, nose, and mouth with bare hands.
4. Spraying disinfectant on regularly touched surfaces is essential.
5. Try staying at home unless it’s an emergency. Pregnant women’s and old age people
with any health conditions should avoid social interactions.
6. One should sneeze or cough in the open. Try covering your face with a cloth or use
elbow pit.
7. One must wear a mask always if people surround them. However, care should be
taken while disposing of the used masks [6].
The rate at which COVID19 is spreading across the world, the globe is facing issues of
falling economies and increasing casualties. Regrettably, the human race is still under
a persistent threat of contracting infection, with the condition getting worse every day.
However, researchers worldwide are coming up with technological approaches to deal
with Coronavirus pandemic’s impacts. These technologies include AI, IoT, Blockchain,
and the upcoming 5G Telecommunication networks, which have been at the forefront.
[7]. As per the CDC and the WHO cutting edge technologies will play an important role
in helping fight against Coronavirus Pandemic [8].
In this paper, we are focusing on the post lockdown scenarios where schools and
colleges will reopen, pending examinations will be held. This reopening will lead to a
lot of human movement and gathering at campuses. We are proposing a model where
precautionary measures can be automated with the help of technology and alert the
administration in the lapse of adequate precautionary measures or in the event of finding
symptoms like high body temperature in the person entering the facility. Highlights of
our research are the following:
Today, at the time of a severe crisis, screening of potential risk bearers is very crucial,
wherein this must be done without human interaction, hence automation of this process
must be done, such that a person can be identified uniquely and preventive measures can
be taken after that if considered as a risk.
Machine Learning and Deep Learning models have been used to detect various kinds
of objects and even faces. Wide range of applications have been using object detection
techniques, yet no model uniquely identifies a person and if a mask is present or not
at the same time. In the current scenario, there is a need for one such model, so that
34 S. Vedant et al.
we can identify every person by their unique features and thus automate the facemask
detection process along with identity verification. Just detecting whether a person is
wearing a facemask is not enough. According to the World Health Organization, one of
the primary symptoms of COVID-19 is the rise in body temperature. If fever patterns
of a person can be monitored it will be easy to take preventive measures and break the
chain of spread.
Due to advancement in the field of IoT, we are surrounded by various types of
sensors. Infrared thermal sensors are the best way to scan and detect body temperatures.
The speed of scanning is fast, measuring body temperature with an accuracy of ±0.5 °C.
The speed of processing is fast making these sensors detect body temperatures even
in larger groups of people. Another reliable method of scanning high temperature is
using Thermal Imaging Cameras. They work by rendering the infrared radiations as
visible light. Each College/University has a well-defined database of students studying
in their facility. Using any programming language, we can access the database. If the
model is running on the same server, where the database is present the computations
and processing time will be very less. Migrating from Relational database to a NoSQL
database will make the application scalable and easy to store data according to dates for
the pattern checking. Accessing databases for the admin will also be easy to find out
who are the potential risk bearers.
2 Literature Review
Various techniques exist for face detection with varying levels of accuracy and com-
putation speed. The major deciding factor in determining the technique was a balance
between accuracy and performance as the operation is run on a Raspberry Pi 3B+. Results
from the paper “A comparison of CNN-based face and head detectors for real-time video
surveillance applications” suggest that, although CNNs can accomplish a high level of
precision in comparison to old-style detectors, they require high computational resources
which are a constraint for several practical real-time applications [9]. The method of face
recognition developed by P. Viola and M. Jones has appropriate accuracy for the purpose
and can be run on a Raspberry Pi 3B+.
2.4 IoT
We have based the embedded system design on systems already in use since it was
not the primary objective of the paper. A fusion of the methods of temperature sensor
interfacing [26] and the Pi camera library [27] was used to capture an image of the user’s
face and simultaneously record the user’s temperature.
36 S. Vedant et al.
4 Algorithm
The significant thought process behind the advancement of the Fig. 2 framework, was to
make a strong framework which needn’t bother with overwhelming registering neces-
sities and hefty costing. But simultaneously doesn’t settle on the precision part too. So,
we propose such an architecture is very much cost-effective as well as makes sure that
all the safety protocols are ensured by tracking every individual entering the college.
When the students and the staff enter the college, they are required to go through the
following process:
1. Get their image captured by the camera using the face detection model and
temperature by the MLX90614 Infrared Temperature sensor once a face is detected.
2. This image and data are read and sent to the central server where the details of each
student and staff members are present.
3. The machine learning models are applied to the captured image to identify the
student and check if a mask is present or not.
4. For face recognition, we have used OpenFace.
Leveraging Deep Learning and IoT 37
Figure 3 shows the working of the system for a single individual when he/she approaches
the entry point of the college. The process is repeated continuously in a loop for all the
individuals entering the college.
38 S. Vedant et al.
5 Software Design
Initially, the input is provided in the form of a captured image. The image is sent to
the application server as soon as the face is detected. At the application server, its
features are separated. Features are then contrasted with the authentic features for the
face recognition part. Though the same features are likewise sent to the face mask
classifier model to identify whether the student is wearing a mask or not. If the student’s
matches and different boundaries which include body temperature and mask detection
are inside the permissible limits then the student is permitted to enter the school. On
the off chance that the essence of the student is unrecognized or any of the boundaries
like a face mask and the body temperature is off the breaking point, alert assistance will
alarm the security about the sitting at a savvy social good way from the section/leave
point. Beginning from coding language python to human-computer interaction through
Face Detection, Mask Detection & Face Recognition model, and Firebase for the user
interface. A detailed analysis has been done of the software stack used.
Leveraging Deep Learning and IoT 39
Face detection is performed to find the trigger to capture an image from the camera and
simultaneously record the temperature at that instant. The entire procedure takes place
on a Raspberry Pi 3B+. This necessitates an object detection algorithm which is robust,
can run in real-time while not using too much processing power, since processing power
is a limited resource on this platform. The limitations and demands of the algorithm are
satisfied with the Viola-Jones Object Detection framework. When implemented on 384
× 288-pixel images, faces are detected at 15 frames per second on a 700 MHz Intel
Pentium III, which is an x86 processor from 1999 [28]. The performance of the system
and its accuracy suit the application perfectly.
The algorithm has four stages:
– Haar features are used to match human faces since all faces share some common
characteristics like the upper cheeks are lighter than the eyes and the eyes are darker
than the nose bridge.
– Integral Image Rectangle features are quick to compute using an intermediate rep-
resentation for the image, which is known as an integral image. The integral image
lets any rectangular sum be computed in four array references [28]. Thus, the inte-
gral image method reduces the number of calculations and thus can save a lot of
time.
• Adaboost Training
• Cascading Classifiers
– Classifiers work in a sequence, with simpler classifiers first in line, which reject the
majority of sub-windows before more complex classifiers are even necessary. This
results in low false-positive rates. This detection process resembles a degenerate
decision tree and is referred to as ‘Cascading Classifiers’ [28].
For the motivation behind facial recognition, we have utilized OpenFace [29], which is an
open face profound learning facial acknowledgement model. It is based upon the paper
[30] developed by Google developers. OpenFace is actualized utilizing Python and Torch
permitting the system to be executed smoothly on CPU as well as on a GPU acceleration
40 S. Vedant et al.
1. Pre-trained models from libraries like OpenCV [37] or dlib are used to detect
distinguished faces.
2. The faces are then fed into the neural network.
3. Utilize a deep neural system to implant the face on a 128-dimensional unit hyper-
sphere. The embedding is a conventional portrayal of anyone’s face. In contrast
to other portrayals, inserting has a pleasant property: a bigger separation between
two face embeddings implies that the appearances are likely not of a similar indi-
vidual. Thereby making grouping, likeness discovery, & order assignments simpler
than other face acknowledgement strategies where Euclidean separation betwixt
highlights isn’t significant.
4. Apply your preferred grouping or classifying methods to the highlights to finish your
acknowledgement task.
Working
As we are utilizing the pre-trained model to compare the embedding vectors of the
pictures put away in the file system with the embedding vector of the picture captured
by the webcam. This can be clarified by underneath Fig. 6.
All the images stored in the file system are converted to a dictionary with names
as key and embedding vectors as value. When handling an image, face recognition is
done to discover bounding boxes around faces. We have used the same face detection
code that is being executed at the Raspberry Pi end for extricating the face Region of
Interest of the captured image. Before passing the picture to the neural system, it is
resized to 96 × 96 pixels as the profound neural system expects the fixed (96 × 96)
input picture size. When the picture is taken care of into the model, we produce the
128-measurement inserting vector for the obscure picture with the assistance of a pre-
prepared model. Simultaneously, we likewise load the put away implanting vectors for
the known datasets. To think about two pictures for likeness, we figure the separation
between their embeddings. This should be possible by either computing Euclidean (L2)
42 S. Vedant et al.
distance or Cosine separation between the 128-dimensional vectors. On the off chance
that the separation is not exactly an edge (which is a hyperparameter), at that point the
countenances in the two pictures are of a similar individual, if not, they are two distinct
people.
face ROI, we feed it into our face classifier model and get the ideal forecasts for that
face’s ROI. At long last, we decide the classmark dependent on the probabilities score
returned by the mask classifier model and thereby allocate the related class name which
is “with_mask” and “without_mask” for that captured image of the understudy.
5.4 Firebase
Firebase Firestore is a horizontally scaling NoSQL cloud-based database service pro-
vided by Google Developers. Firestore is a serverless database hence it can be easily
integrated with any platform very easily. The services of Firebase, being on the cloud
is available for usage from anywhere. The cloud messaging service of Firebase gives a
way to send notifications to the admin about a potential carrier of the virus. The Firebase
Firestore being a horizontally scaling database is highly scalable. At any point in time,
if we require new functionality, it can be integrated for the next versions of our database,
hence increasing the scope of the project is possible.
The usage of firebase is happening as follows:
1. Firstly, the image is being captured and transferred to the central server along with
the temperature.
2. Then, face recognition algorithms predict if a user is wearing a mask or not also
assign the captured image identity of the person.
3. The complete data as a packet is checked for any vulnerabilities or Null values
4. If the checks are completed the data is stored on the Firebase firestore according to
the current date. If the temperature readings are above normal or the student is not
wearing a mask, in that scenario the admin/security personnel will be notified.
6 Hardware Design
We use the Raspberry Pi 3B+ as a platform to capture user images and temperature
readings. The Raspberry Pi 3B+ has an ARMv8 64-bit SoC with Wi-Fi and Bluetooth
support. Gigabit Ethernet is also supported over the USB 2.0 connection [40]. This allows
the Raspberry Pi to perform basic face detection and communicate with the central server
effectively. The camera used is the Raspberry Pi Camera v2, which interfaces over the
Camera Serial Interface (CSI) port of the Raspberry Pi 3B+ [40]. It supports many video
resolutions and has libraries to access the camera feed [41]. The MLX 90614 [42] (3.3 V)
Infrared temperature sensor is used to measure user temperature. The sensor interfaces
over i2c hardware bus through i2c_bcm2708 kernel module and the libi2c library [26].
The camera and temperature sensor have to be adjusted so that the Field of View of the
sensor is aligned over the centre of the frame of the view of the camera (Fig. 8).
Leveraging Deep Learning and IoT 45
7 Results
The significant thought process behind the advancement of such a framework was to
make a strong framework which needn’t bother with overwhelming registering necessi-
ties and simultaneously doesn’t settle on the precision part too. The models that we used
should be computationally efficient and deployable to embedded systems (Raspberry Pi,
Google Coral, etc.). This was the very explanation we have utilized the OpenFace [29]
model for facial recognition, transfer learning on the MobileNet V2 [25] model for the
face mask classifier and Viola Jones [24] for face detection.
We can see that the training is done on LFW dataset [51] for the OpenFace [32]
Keras model which gave us an exactness of around (93.80 ± 1.3) % alongside different
measurements as according to the Table 2.
Then again, the training is done on the custom dataset which incorporates around
10563 pictures downloaded from Kaggle [38] and RFID [39] for the face classifier model
dependent on transfer learning based upon the MobileNetV2[25] gave us a precision of
again 93% on normal conditions. Taking a gander at Fig. 11 we can see there are little
Leveraging Deep Learning and IoT 47
indications of overfitting and the Fig. 10 shows the assessment measurements on the
testing dataset per epoch which includes 20% of the all-out pictures present in the
custom dataset.
At the point when the image captured from the microcontroller is fed into the model
by the application server after pre-preparing the image, the models return the probabil-
ities of the expectations made and the name of the understudy perceived. For portrayal
purposes, we have hued the bounding boxes showing up as red for an understudy without
mask and green for an understudy with a cover. We at that point additionally print the
48 S. Vedant et al.
class name {i.e. “with_mask” or “without_mask”}, likelihood, and the name perceived
by the models on the head of the bounding enclosure as indicated in the underneath
Fig. 12 and Fig. 13.
For any great framework, UI is one of the most significant perspectives. It is through
the UI that the individual interfaces with the framework get advantageous. Saving the
Leveraging Deep Learning and IoT 49
accommodation for the administrator and the for the security staff, we have made such
an interface, that would unravel the two fundamental purposes that are keeping up the
record of the understudy with the name, timestamp, mask, and internal heat level just as
keep any track in the abnormalities in the estimation of the mask-wearing and internal
heat level of every single understudy entering the school. In the Firebase database,
complete information is stored as a bundle of a packet of each understudy is embedded
by the current date and day. This makes the framework progressively adaptable and the
information from sorted out for playing out the data analysis by the administrator. To the
extent the alert notification generation is considered, the alert notification is produced by
the firebase itself as a message pop-up/email, which makes it considerably increasingly
best for the framework. The accompanying Fig. 14 and Fig. 15 are of the UI and the
firebase database respectively that we have utilized in our framework.
8 Limitations
Our present strategy for recognizing whether an individual is wearing a mask or not is
a two-advance procedure that performs face detection and afterwards applies a classi-
fication on faces to detect the mask. The issue with this methodology is that a mask
darkens some portion of the face. If enough of the face is darkened, the face can’t be
distinguished, and hence, the face mask detector won’t be applied.
Another issue is the reliability of the web relationship of the framework in which
the system is being set up. The web relationship with the system must have low inaction
and high transmission ability to send the alarm to the security as well as the image to
the application server for further processing. The force flexibly of the framework must
be steady as all the segments of the security framework run on power.
9 Future Work
We have entirely fair outcomes by simply contrasting the Euclidean separation with
perceiving a face. Notwithstanding, if one needs proportional the framework to a creation
framework, at that point, one ought to consider applying Affine changes additionally
before taking care of the picture to the neural system.
Further to improve our face mask detection model, we need to assemble all the
more genuine pictures of individuals wearing masks. Additionally, we have to assemble
pictures of appearances that may “befuddle” our classifier into speculation the individual
is wearing a mask when in truth they are not—potential models incorporate shirts folded
over faces, a handkerchief over the mouth, and so forth. At long last, we ought to
consider preparing a committed two-class object finder instead of a straightforward
picture classifier.
10 Conclusion
Since the origin of Covid19, technological solutions have been worked out by researchers
to combat the spread of Coronavirus pandemic. Few hot technologies like, IoT and Arti-
ficial Intelligence have been the front runners. Our paper discussed using IoT-based
sensors and Deep learning-based algorithms to detect the breach of suggested precau-
tionary measures like the use of masks in public places and to ensure no entry within
the campus to individuals showing COVID19 symptoms in our case high body temper-
ature. Our model also records every student’s body temperature in a central database on
a day-to-day basis and raises the alarm if the Pattern generated shows a gradual rise in
body temperature also helps the administration in monitoring safety standards within
the campus. This automated approach helps prevent the security personnel from coming
in contact with every student or visitors and reduces the chances of human errors in
identifying the person entering the facility with COVID19 symptoms.
Leveraging Deep Learning and IoT 51
References
1. WHO Homepage. [Link] Accessed 16
July 2020
2. Ourworldindata Homepage. [Link] Accessed 14 July 2020
3. Report WHO-China Joint Mission Coronavirus Disease 2019 (COVID-19), February
2020. [Link]
[Link]. Accessed 14 July 2020
4. Modes of Transmission of Virus Causing COVID-19: Implications for IPC Precaution Rec-
ommendations, April 2020. [Link]
of-transmission%-of-virus-causing-covid-19-implications-for-ipc-precaution-recommend
ations. Accessed 14 July 2020
5. Study Suggests New Coronavirus May Remain on Surfaces for Days, March
2020. [Link]
virus-may-remain-surfaces-days. Accessed 15 July 2020
6. Coronavirus Disease (COVID-19) Advice for the Public: When and How to Use Masks, April
2020. [Link]
lic/when-and-how-to-use-masks. Accessed 15 July 2020
7. Ting, D.S.W., Carin, L., Dzau, V., Wong, T.Y.: Digital technology and COVID-19. Nat. Med.
26(4), 459–461 (2020)
8. Digital Technology For Covid-19 Response, April 2020. [Link]
ail/03-04-2020-digital-technology-for-%covid-19-response. Accessed 16 July 2020
9. Nguyen-Meidine, L.T., Granger, E., Kiran, M., Blais-Morin, L.: A comparison of CNN-based
face and head detectors for real-time video surveillance applications. In: 2017 Seventh Inter-
national Conference on Image Processing Theory, Tools and Applications (IPTA), Montreal,
QC, pp. 1–7 (2017). [Link]
10. Alabort-i-medina, J., Antonakos, E., Booth, J., Snape, P.: Menpo: a comprehensive plat-
form for parametric image alignment and visual deformable models categories and subject
descriptors, pp. 3–6 (2014)
11. Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild.
In: CVPR (2012)
12. Morency, L.-P., Whitehill, J., Movellan, J.R.: Generalized adaptive view-based appearance
model: integrated frame-work for monocular head pose estimation. In: FG (2008)
13. Fanelli, G., Gall, J., Gool, L.V.: Real time head pose estimation with random regression
forests. In: CVPR, pp. 617–624 (2011)
14. Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Robust discriminative response map fitting
with constrained local models. In: CVPR (2013)
15. Asthana, A., Zafeiriou, S., Cheng, S. Pantic, M.: Incremental face alignment in the wild. In:
CVPR (2014)
16. Hansen, D.W., Ji, Q.: In the eye of the beholder: a survey of models for eyes and gaze. IEEE
Trans. Pattern Anal. Mach. Intell. 32, 478–500 (2010)
17. Lidegaard, M., Hansen, D.W., Krüger, N.: Head mounted device for point-of-gaze estima-
tion in three dimensions. In: Proceedings of the Symposium on Eye Tracking Research and
Applications - ETRA 2014 (2014)
18. Świrski, L., Bulling, A., Dodgson, N.A.: Robust real-time pupil tracking in highly off-axis
images. In: Proceedings of ETRA (2012)
19. Ferhat, O., Vilarino, F.: A cheap portable eye–tracker solution for common setups. In: 3rd
International Workshop on Pervasive Eye Tracking and Mobile Eye-Based Interaction (2013)
20. Wood, E., Bulling, A.: EyeTab: model-based gaze estimation on unmodified tablet computers.
In: Proceedings of ETRA, March 2014
52 S. Vedant et al.
21. Zielinski, P.: Opengazer: open-source gaze tracker for ordinary webcams (2007)
22. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical
image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition,
Miami, FL, pp. 248–255 (2009). [Link]
23. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
24. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Pro-
ceedings of the IEEE Conference on computer Vision and Pattern Recognition, pp. 770–778
(2016)
25. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.: MobileNetV2: inverted residuals
and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern
Recognition, Salt Lake City, UT, pp. 4510–4520 (2018). [Link]
00474
26. Sensor. [Link] Accessed 20 Apr 2020
27. GitHub Repository. [Link] Accessed 05 June 2020
28. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In:
Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition. CVPR 2001, Kauai, HI, USA, p. I-I (2001) [Link]
990517
29. Amos, B., Ludwiczuk, B., Satyanarayanan, M.: OpenFace: a general-purpose face recogni-
tion library with mobile applications. CMU-CS-16-118, CMU School of Computer Science,
Technical report (2016)
30. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition
and clustering. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), Boston, MA, pp. 815–823 (2015). [Link]
31. TensorFlow Homepage. [Link] Accessed 19 June 2020
32. GitHub Repository. [Link]
Accessed 16 Apr 2020
33. Lungu, I.A., Hu, Y., Liu, S.: Multi-resolution siamese networks for one-shot learning. In: 2020
2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS),
Genova, Italy, pp. 183–187 (2020). [Link]
34. Bromley, J., et al.: Signature verification using a siamese time delay neural network. Int. J.
Pattern Recogn. Artif. Intell. 7(04), 669–688 (1993)
35. Koch, G.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning
Workshop (2015)
36. LFW Dataset. [Link] Accessed
02 May 2020
37. OpenCV Homepage. [Link] Accessed 18 June 2020
38. Kaggle Datasets. [Link] Accessed 28 June 2020
39. GitHub Repository. [Link]
Accessed 29 Apr 2020
40. Raspberry Pi Products. [Link]
Accessed 19 Apr 2020
41. Raspberry Pi Products. [Link] Accessed
19 Apr 2020
42. Sparkfun Sensors Datasheets. [Link]
MLX90614_rev001.pdf. Accessed 20 Apr 2020
43. Viola, P., Jones, M.J.: Robust real-time face detection. J. Comput. Vis. 57(2), 137–154 (2004)
44. Yan, J., Zhang, X., Lei, Z., Li, S.Z.: Real-time high-performance deformable model for face
detection in the wild
Leveraging Deep Learning and IoT 53
45. Liu, W., et al.: SSD: single shot multibox detector. CoRR, abs/1512.02325 (2015)
46. Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal
networks. CoRR, abs/1506.01497 (2015)
47. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional
networks. CoRR, abs/1605.06409 (2016)
48. Kim, K., Cheon, Y., Hong, S., Roh, B., Park, M.: PVANET: deep but lightweight neural
networks for real-time object detection. CoRR, abs/1608.08021 (2016)
49. Vu, T., Osokin, A., Laptev, I.: Context-aware CNNs for person head detection. In: ICCV
(2015)
50. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. CoRR,abs/1612.08242 (2016)
51. Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a
database for studying face recognition in unconstrained environments. Technical Report
07-49, University of Massachusetts, Amherst, October 2007
A 2D ResU-Net Powered Segmentation
of Thoracic Organs at Risk Using
Computed Tomography Images
Abstract. The recent advances in the field of computer vision have led
to the wide use of Convolutional Neural Networks (CNNs) in organ seg-
mentation of computed tomography (CT) images. Image-guided radia-
tion therapy requires the accurate segmentation of organs at risk (OARs).
In this paper, the proposed model is a 2D ResU-Net network to auto-
matically segment thoracic organs at risk in computed tomography (CT)
images. The architecture consists of a downsampling path for capturing
features and a symmetric upsampling path for obtaining precise local-
ization. The proposed approach achieves a 0.93 dice metric (DSC) and
0.26 hausdorff distance (HD) after using ImageNet stats for normalizing
and using pre-trained weights.
1 Introduction
Lung cancer is one of the leading cause of death in both males and females with
a contribution of 26.8% of all cancer deaths [1]. There were approximately 3.05
million cancer survivors treated with radiation, accounting for around 29% of all
the cancer survivors in 2016. The radiation-treated cancer survivors are projected
to reach 4.17 million by 2030 [1]. The introduction of procedures like stereotactic
body radiation therapy and intensity-modulated radiation therapy has led to
the improvement of Radiation therapy techniques, therefore, protecting normal
organs become a primary concern [2].
During the radiation treatment, it is necessary to segment organs at risk
correctly to avoid a very high radiation dose from the computed tomography
(CT). The segmentation of images has brought a significant impact on diagno-
sis and treatment. This segmentation helps the doctors in viewing the internal
c Springer Nature Singapore Pte Ltd. 2021
D. Garg et al. (Eds.): IACC 2020, CCIS 1367, pp. 54–65, 2021.
[Link]
A 2D ResU-Net Powered Segmentation 55
2 Related Work
A few interesting work have been done in recent years using a deep neural net-
work to segment the CT images. In [4] Olaf Ronneberger et al. introduced a
model that was based on simple UNet architecture for biomedical image segmen-
tation. Other modifications are also proposed like localization and organ-specific
U-Net model, Pixel shuffle method on fully convolutional U-Net architecture like
in the Two-stage encoder-decoder model with coarse and fine segmentation in
[5]. The author in [6] Used multi-task learning on U-Net architecture. Another U-
Net model with each layer containing a context pathway, a localization pathway,
and 2D residual U-Net with dilation rate was proposed in [7]. Moreover, dilated
56 M. Asudani et al.
U-Net architecture is also used with convolution, dilation, ReLU, batch normal-
ization, and average pooling in [7]. These architectures use 2D convolutions, but
with more computational capabilities. In another research, 3D convolutions are
also being used like Using two resolutions and applying the VB-Net for each
with Single-Class Dice loss in [8]. These are modified by researchers with a 3D
enhanced multi-scale network with residual V-Net and 3D dilated convolution
in [9]. A Simple dens V-Net with post-processing is presented in [10]. In [11], the
author used both 3D and 2D convolutions in a full convolution 3D network.
3 Proposed Methodology
3.1 Data Collection and Pre-processing
The experimental data was collected from the SegTHOR19 training and testing
datasets. The training data set include 40 patients (7390 slices), and testing
data contains 20 patients (3694 slices). By analyzing the provided training data,
The data is in the Neuroimaging Informatics Technology Initiative (NifTI) .nii
format. It was then converted into NumPy .npy format [13] and later to png
format using the matplotlib and PIL. A sample training image is shown in Fig.
1a, and a masked image is shown in Fig. 1b.
Pre-processing, often overlooked, is a major concern in terms of performance.
Generally, There are bright regions in the images as compared with the external
objects which will have a key effect on the organ voxels when normalizing with
the original intensity range. Due to the said reason, the key step was assumed to
be normalization. The reduction in the variability in the size occurred due to the
re-sampling of the images to the same voxel spacing. It also helped in bringing the
testing case distribution near to the training case distribution [2]. The Computed
Tomography scans have 512 × 512 pixels with its spatial variations varying from
0.90 to 1.37 mm. The most frequent spatial resolution is 0.98 × 0.98 × 2.5 mm3 .
The 3D CT scan was converted into 2.5D or 2D images formed by stacking the
previous array and next array. They were also normalized to 256 range values.
The 3D CT scan was cut into slices along the axial, sagittal, and coronal planes
for visualization of the test data. The 3D visualization of the testing data is
depicted in Fig. 2.
The Overlap Dice Metric (DSC) has been used to find overlap between
segmented area as result of proposed algorithm [3].
(2|X ∩ Y |)
DSC(X, Y ) = (1)
(|X| + |Y |)
The accuracy metric was utilized in our study. It was demonstrated by the
research that for highly unbalanced segmentation dice loss yielded better results
[15]. In this paper, the Dice loss has been used to rained the model [2]. The
accuracy metric shows a high instability therefore, the localization neuralnet the
more time to converge. We also used flattened loss of Cross-entropy loss function
which gave nearly same results as compared with dice loss (Fig. 3).
(2|X ∩ Y |)
DSC(X, Y ) = (3)
(|X| + |Y |)
Fig. 3. The loss surfaces of ResNet-56 with/without skip connections. The proposed
filter normalization scheme is used to enable comparisons of sharpness/flatness between
the two figures.
between the contraction and the expansion layer of the U-net. This layer makes
use of two 3 × 3 convolutional neural network (CNN) layers preceded by 2 × 2 up
convolution layers. Same as the contraction layer on the left, the right expand-
ing section is also formed by many expansion blocks. Each of these blocks gives
input to two 3 × 3 convolution layers. To maintain the symmetry, Only half of
the feature map will carry forward after each block. The number of expansion
and contraction blocks on both sides is equal. The resulting mapping is fetched
to another 3 × 3 CNN. In this CNN layer, the number of feature maps is the
same as the number of segments desired.
The ResU-net model, as shown in Fig. 4, was implemented using the PyTorch
framework. ResU-Net brings out appreciable segmentation accuracy compared
with many other classical convolution networks. The residual connections pro-
vided the benefits in reducing the training difficulty [2]. Along with that, train-
ing a deep network required more memory and training time. A mix of residual
connection with deeper network, as shown in Fig. 4 yields better or equal per-
formances but takes a lot longer to train.
Utilization of dilated convolutions was another attempt as shown in Fig. 5
with more tunable parameters that includes dilation rates; the performances
were alike and hence, no further investigate was carried out.
3.6 Training
For the training, the proposed model was trained with weight decay of 1e−2 and
a learning rate of 1e−4 as shown in Fig. 6 learning rate and loss. Then slices
were made for varying learning rates at different epochs. The model was trained
for ten epochs. In the model, pixel shuffling and average pooling is used.
60 M. Asudani et al.
Total trainable parameters for our model are 19,946,396 and total non-
trainable parameters are 11,166,912. ImageNet stats were used for normalizing
the data.
For the task of image super-resolution, Shi et al. at [17] proposed to use pixel
shuffle as an upsampling operator.
This operator rearranges input channels to produce a feature map with higher
resolution, as shown in Fig. 7. Worth to mention, this technique solves the prob-
lem of checkerboard artifacts in the output image. Later, the same concept was
employed for semantic segmentation tasks [18,19]. The loss curve has been shown
in Fig. 8 with respect to epoch.
A 2D ResU-Net Powered Segmentation 61
Due to conversion to 2.5D images, the number of total images formed is less by
two images (i.e., first one and last one) as compared with the given training data.
So, after the conversion of the results of 2.5D to 3D image again, void images
are added to the 3D image by stacking all the 2.5D images depth-wise. It was
noticed that the first and last images missing are void images in all the cases.
62 M. Asudani et al.
4 Experimental Results
The proposed algorithm has been implemented in Python 3.6, 64-bit Ubuntu
Linux platform in docker of Nvidia DGX-1 GPU. The proposed method was
validated on the 20 Computed Tomography scans of the given test data. No
external data was used, and our model was trained from scratch. The proposed
method uses the evaluation metrics, overlap Dice metric (DSC) Dice Similarity
Coefficient and the Hausdorff distance given in Eq. 2 and 1. The best result
obtained by the proposed algorithm shown in Fig. 9. Moreover, a comparative
result with a recent previous approach has been given in Table 1. It is evident
from Table 1 that the proposed approach is able to achieve better performance
in terms of DSC and HD both. Moreover, a sample predicted output and ground
truth is also shown in Fig. 10.
Fig. 10. Comparison between ground truth and predictions of masks and CT scans of
the validation set
5 Discussion
The networks trained included U-net with ResNet34 and ResNet50, but the
results and metrics were similar and approximately equal. This network used a
2D CNN for training, and then also it has similar or better results than using a 3D
CNN network like V-nets or VB-net [8]. That’s why the parameters to be trained
are less, and the model is trained faster, cheaper, and with excellent efficiency
in results. A few lessons on convolutional neural network implementation were
learned, which are discussed below.
6 Conclusion
The images were converted to 3D CT scans from 2D to train our model. So,
there is a loss in slicing. State of the art architecture was used, and that helped
a lot with high accuracy. Without ResNet18, a single class dice metric was 0.39.
Pre-trained weights were used for resnet18 downloaded from torchvision models.
After using ImageNet stats for normalizing and using pretrained weights, the
accuracy graph got a high bump. This methodology gives accurate and more
robust segmentation as compared to manual segmentation. The proposed model
was applied to the test dataset and the results are depicted in Table 1.
64 M. Asudani et al.
References
1. Cancer - World Health Organization. [Link]
2. Feng, X., Qing, K., Tustison, N.J., Meyer, C.H., Chen, Q.: Deep convolutional
neural network for segmentation of thoracicorgans-at-risk using cropped 3D images.
Med. Phys. (2019)
3. Trullo, R., Petitjean, C., Ruan, S., Dubray, B., Nie, D., Shen, D.: Segmentation
of organs at risk in thoracic CT images using a sharpmask architecture and con-
ditional random fields. In: IEEE International Symposium on Biomedical Imaging
(ISBI), pp. 1003–1006 (2017)
4. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomed-
ical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F.
(eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).
[Link] 28
5. Zhang, L., Wang, L., Huang, Y., Chen, H.: Segmentation of thoracic organs at risk
in CT images combining coarse and fine network. In: SegTHOR ISBI (2019)
6. He, T., Guo, J., Wang, J., Xu, X., Yi, Z.: Multi-task learning for the segmentation
of thoracic organs at risk in CT images. In: SegTHOR ISBI (2019)
7. Vesal, S., Ravikumar, N., Maier, A.: A 2D dilated residual U-Net for multi-organ
segmentation in thoracic CT. arXiv preprint arXiv:1905.07710 (2019)
8. Han, M., et al.: Segmentation of CT thoracic organs by multi-resolution VB-nets.
In: SegTHOR ISBI (2019)
9. Wang, Q., et al.: 3D enhanced multi-scale network for thoracic organs segmenta-
tion. In: SegTHOR ISBI (2019)
10. Feng, M., Huang, W., Wang, Y., Xie, Y.: Multi-organ segmentation using simplified
dense V-net with post-processing. In: SegTHOR ISBI (2019)
11. van Harten, L.D., Noothout, J.M., Verhoeff, J.J., Wolterink, J.M., Isgum, I.: Auto-
matic segmentation of organs at risk in thoracic CT scans by combining 2D and
3D convolutional neural networks. In: SegTHOR ISBI (2019)
12. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional
encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal.
Mach. Intell. 39(12), 2481–2495 (2017)
13. Gibson, E., et al.: Niftynet: a deep-learning platform for medical imaging. Comput.
Methods Programs Biomed. 158, 113–122 (2018)
14. Kim, S., Jang, Y., Han, K., Shim, H., Chang, H.J.: A cascaded two-step approach
for segmentation of thoracic organs. In: CEUR Workshop Proceedings, vol. 2349.
CEUR-WS (2019)
15. Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Jorge Cardoso, M.: Generalised
dice overlap as a deep learning loss function for highly unbalanced segmentations.
In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp.
240–248. Springer, Cham (2017). [Link] 28
16. Lambert, Z., Petitjean, C., Dubray, B., Ruan, S.: SegTHOR: Segmentation of Tho-
racic Organs at Risk in CT images. arXiv preprint arXiv:1912.05950 (2019)
A 2D ResU-Net Powered Segmentation 65
17. Shi, W., et al.: Real-time single image and video super-resolution using an efficient
sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)
18. Chen, K., Kun, F., Yan, M., Gao, X., Sun, X., Wei, X.: Semantic segmentation of
aerial images with shuffling convolutional neural networks. IEEE Geosci. Remote
Sens. Lett. 15(2), 173–177 (2018)
19. Gao, H., Yuan, H., Wang, Z., Ji, S.: Pixel deconvolutional networks. arXiv preprint
arXiv:1705.06820 (2017)
20. Wang, Z., Liu, D., Yang, J., Han, W., Huang, T.: Deeply Improved Sparse Coding
for Image Super-Resolution, ArXiv 2015, abs/1507.08905
21. Boureau, Y., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in
vision algorithms. In: Proceedings of International Conference on Machine learning
(ICML 2010), vol. 28 (2010)
A Compact Shape Descriptor Using Empirical
Mode Decomposition to Detect Malignancy
in Breast Tumour
Jyothi Nagar, Pragathi Nagar, Nizampet (S.O), Hyderabad 500090, Telangana, India
Abstract. Breast cancer is the most common cancer in India and the world. Mam-
mogram helps the radiologists to detect abnormalities in breast. Analysis of the
lesions on breast helps doctors in the detection of cancer in early stages. Lesion
contours of breast are characterized by their shape. Malignant lesion contours
have speculated and ill-defined shapes and benign have circular and lobulated
shape. In the present work, we proposed a method to classify breast contours
into benign/malignant using empirical mode decomposition (EMD) technique.
Initially, the two-dimension contours of breast lesions are compacted into 1D
signature. Further, 1D signatures of lesions are decomposed into intrinsic mode
functions (IMFs) by the EMD algorithm and statistical based features are cal-
culated from these IMFs. This parameters form a input feature vector which are
further fed to classifier.
1 Introduction
Breast cancer is the most common cancer in India and the world. According the WHO
reports, 2.1 million women got affected with breast cancer each year, and resulted in
highest mortality rate among women [1]. In 2018, nearly 627,000 women died due to
breast cancer. Approximately 15% of death in women is due to breast cancer. Mammog-
raphy plays prominent role in the detection of breast cancer in early stages. Computer
aided diagnosis and detection of masses from mammograms helps radiologists in early
indication of breast cancer. Mass is one of the abnormality in breast in which the radi-
ologists look for diagnosis. Masses are characterized by their shape. Benign mass is
circular or round with well defined boundary but where as malignant mass is spicu-
lated with fuzzy boundary. Shape descriptors are very important tools to classify masses
in breast. The goal of shape based descriptors is to measure spiculation in malignant
masses based on their boundary. Complexity of 1D signature of mass contour is studied
using fractal analysis and achieved accuracy of 89% using ruler method [2]. Several
studies have been carried out to classify masses as benign and malignant. Shape fea-
tures such as compactness (C), fractional concavity (Fcc), spiculation index (SI), and
a Fourier-descriptor-based factor (FF) are calculated to discriminate benign and malig-
nant contours [3, 4]. Pohlman et al. [5] applied fractal analysis to benign and malignant
contours of breast masses and achieved accuracy of 80%. Rangayan et al. [6] employed
fractal analysis based on power spectral analysis to classify breast contour 1D signatures.
Texture features can also be extracted from mammograms to classify masses as benign
masses are homogeneous in nature and malignant masses have heterogeneous textures.
Many researchers have contributed papers on classification of masses using texture fea-
tures. Yang et al. [7], applied wave atom transform to extract features and classified the
masses using random forest classifiers. Prathibha et al. [8] employed a method of bandlet
and orthogonal ripplet type II transforms to extract features and applied KNN classifier
to distinguish normal-benign, normal-malignant and malignant-benign images. Dhahbi
et al. [9] used curvelet moment to classify masses. However, the use of texture features
results in high dimensional feature vector and increases computational cost of the classi-
fication model [7]. Regardless, many researches have shown that shape based descriptors
are more useful compared to any other descriptors such as texture, color, etc., [10]. In
the work proposed we have implemented EMD algorithm to extract features from 1D
signature of 2D mass contours to classify masses. Empirical mode decomposition algo-
rithm is developed by Huang et al. [11] to analyse nonstationary or nonlinear signals.
Djemili et al. [12] applied EMD algorithm and artificial neural networks to classify 1D
EEG signals. Orosco et al. [13] employed EMD for epileptic seizure detection.
In this work we focus on extraction of compact shape feature vector from 2D mass
contours. This work is proposed in three steps. In the first step, the 2D contour is mapped
into a compact 1D signature using Eucleidian distance. In the second step the 1D sig-
nature is further compressed using empirical mode decomposition algorithm to extract
statistical based features from IMFs of 1D signature and in the third step the extracted fea-
tures are given to classifier to discriminate benign and malignant masses. The proposed
model to classify breast masees is shown in Fig. 1.
Classification
Benign masses are almost circular and well defined which gives smooth signature and
malignant masses have speculated and rugged boundary. 1D signature curve of mass
contours is an important component for diagnosis of benign and malignant tumors or
masses due to its invariant properties in Euclidean space and the signature curve does
not changes with the orientation of mass contours [14] in mammogram. Mapping od 2D
contour into 1D signature is performed by centralized distance function method and it
is discussed below.
The procedure to obtain IMFs from 1D signature is summarized in steps given below
[12]
Step1: Intialize m = 0, and r(t) = x(t)
Step2: local minima and the local maxima of x(t) are to be computed
Step 3: Get the local minima and maxima envelopes using cubic spline interpolation and
they are represented as El (t)(lowerenvelope) and E(t)(t)(upperenvelope)
Step 4: Calculate mean of the envelopes and it is given as
El (t) + Eu (t)
M (t) =
2
Step 5: Compute mode 1 IMF represented as h(t)
a) b) c) d)
Feature Extraction
Features are extracted from IMFs of 1D signature obtained by the EMD algorithm.
The features extracted from IMFs are given as follows
Along with above features, we also calculated length of the 1D signature, area,
solidity and eccentricity of 2D contour. We computed ten features for each contour
considered in the dataset. These features are further given to different classifiers for
further validation.
2.4 Classification
Classification is an important step to validate the efficacy of the proposed method. The
features extracted from the procedure discussed above are given to different classifiers
such as K-Nearest-neighbor (KNN), support vector machine (SVM), Adaboost decision
tree classifier and artificial neural network (ANN) are used to discriminate benign and
malignant mass contours.
Performance analysis of different classification model is achieved by computing
different parameters such as accuracy, sensitivity, specificity and Area under the curve
(AUC).
In the proposed work, ten features have been extracted and fed to SVM (Support
Vector Machine), KNN (K-Nearest Neighborhood) and Decision tree classifier. Table 1
shows the accuracies computed with different classifiers. Among them SVM classifier
achieved accuracy of 94.7%. Intially, the classifiers are fed with different feature set such
as only IMF1 features, entropies of IMF1, IMF2 and IMF3 and 2D contour features and
computed accuracies as shown in Table 1.
Different sets of training to testing ratio of mass contours have been considered for
classification. First, we used 20% of mass contours for testing and 80% for training
and achieved accuracy of 94.7%, 86.1% and 77.3% with all three classifiers. The Area
under curve (AUC) is 0.85 with SVM classifier and 80:20 testing to training ratio as
shown in Fig. 4. In the same way the Fig. 5 shows the confusion matrix for testing
images with SVM kernel. Secondly, we used 25% for testing and 75% for training and
obtained accuracy of 83.3%, 66.7% and 79.2% with SVM, KNN and Decision Tree.
Finally, we used 50% for testing and 50% for training and obtained accuracy of 75%,
72 S. Paramkusham et al.
72.9% and 75%. Therefore, from Table 2 we can conclude that the accuracies obtained
with different number of testing images is above 75%.
Table 3 gives comparison of our proposed method with the existing methods. Our
proposed model has given all assessment parameters such as accuracy, sensitivity, speci-
ficity and AUC which is not specified for other methods. Our method also achieved
highest accuracy of 94.7%. The drawback of our model is we have tested with less
number of mass contours when compared to other methods.
A Compact Shape Descriptor Using Empirical Mode Decomposition 73
Table 3. Comparison of accuracies, specificity, sensitivity and AUC with our proposed method
Feature extraction method Images Acc (%) Sens (%) Spec (%) AUC
GaborPCA [15] 114 80 – – –
Fractional concavity and spiculation index 111 82 0.79
[3]
Fractal dimension using ruler method and 111 – – – 0.82
fractional concavity [2]
Proposed method 97 94.7 100 83 0.85
4 Conclusion
In this paper, we proposed a compact shape descriptor with empirical mode decompo-
sition algorithm from 1D signature of 2D mass contour for the classification of benign
and malignant masses. This proposed method can help radiologists in classification of
breast masses. The proposed methos is validated using different classifiers and achieved
maximum accuracy of 94.7%. The experimental results show that our proposed method
achieved accuracy of 94.7%, sensitivity of 100% specificity of 83% to classify benign
and malignant masses.
References
1. [Link]
2. Rangayyan, R.M., Nguyen, T.M.: Fractal analysis of contours of breast masses in mammo-
grams. J. Digit. Imaging (2006). [Link]
3. Rangayyan, R.M., El-Faramawy, N.M., Desautels, J.E.L., Alim, O.A.: Measures of acutance
and shape for classification of breast tumors. IEEE Trans. Med. Imag. 16(6), 799–810 (1997)
4. Rangayyan, R.M., Mudigonda, N.R., Desautels, J.E.L.: Boundary modelling and shape anal-
ysis methods for classification of mammographic masses. Med BiolEngComput 38, 487–496
(2000)
5. Pohlman, S., Powell, K.A., Obuchowski, N.A., Chilcote, W.A., Grundfest-Broniatowski, S.:
Quantitative classification of breast tumors in digitized mammograms. Med. Phys. 23(8),
1337–1345 (1996)
6. Rangayyan, R.M., Oloumi, F.: Fractal analysis and classification of breast masses using the
power spectra of signatures of contours. J. Electron. Imaging 21(2), 023018 (2012)
7. Yang, W., Tianhui, L.: A robust feature vector based on waveatom transform for mammo-
graphic mass detection. In: Proceedings of the 4th International Conference on Virtual Reality
(2018)
8. Prathibha, G., Mohan, B.C.: Classification of benign and malignant masses using bandelet
and orthogonal ripplet type II transforms. Comput. Methods Biomech. Biomed. Eng. Imaging
Vis. 6(6), 704–717 (2018)
9. Dhahbi, S., Barhoumi, W., Zagrouba, E.: Breast cancer diagnosis in digitized mammograms
using curvelet moments. Comput. Biol. Med. 64, 79–90 (2015)
10. Rojas-Domínguez, A., Nandi, A.K.: Development of tolerant features for characterization of
masses in mammograms. Comput. Biol. Med. 39(8), 678–688 (2009)
74 S. Paramkusham et al.
11. Huang, N.E., Shen, Z., Long, S.R., et al.: The empirical mode decomposition and the Hilbert
spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. London 454,
903–995 (1998)
12. Orosco, L., Laciar, E., Correa, A.G., Torres, A., Graffigna, J.P.: An epileptic seizures detection
algorithm based on the empirical mode decomposition of EEG. In: Conference on Proceedings
of IEEE Engineering in Medicine and Biology Society (2009)
13. Djemili, R., Bourouba, H., Korba, M.C.A.: Application of empirical mode decomposition and
artificial neural network for the classification of normal and epileptic EEG signals. Biocybern.
Biomed. Eng. 36(1), 285–291 (2016)
14. Arica, N., Yarman-Vural, F.T.: A compact shape descriptor based on the beam angle statistics.
In: Bakker, E.M., Lew, Michael S., Huang, T.S., Sebe, N., Zhou, X.S. (eds.) CIVR 2003.
LNCS, vol. 2728, pp. 152–162. Springer, Heidelberg (2003). [Link]
45113-7_16
15. Görgel, P., Sertbas, A., Ucan, O.N.: Mammographical mass detection and classification
using local seed region growing–spherical wavelet transform (LSRG–SWT) hybrid scheme.
Comput. Biol. Med. 43(6), 765–774 (2013)
An Intelligent Sign Communication
Machine for People Impaired
with Hearing and Speaking Abilities
Abstract. People who are impaired with speaking and hearing abilities
use sign language for communication between them, but it is a tough task
for them to communicate with the outside world. Through this paper, we
are proposing a system to convert Indian Sigh Language (ISL), Ameri-
can Sign Language (ISL) and British Sign Language (BSL) hand ges-
tures to a textual format of the respective language as well as convert
text in to their preferable Sign language. In this paper, we are captur-
ing ISL, ASL, BSL gestures through a web camera. The streaming video
of hand gestures is then sliced to distinct images to match the finger
orientation to the corresponding alphabets. Finger orientations as fea-
tures of the hand gestures in terms of angles made by fingers, numbers of
fingers completely open, semi-open, fully closed, finger axis verticals or
horizontal and recognition of each finger are prepossessed and required
for gesture recognition. Implementation is done for alphabets uses single
hand and results are explained. After prepossessing the hand part of the
sliced frame in the form of masked image is projected to the extraction
of features from the image frame. To classify different gestures we used
SVM (Support Vector Machine), CNN (Convolutional Neural Network)
for further testing the probable gesture and recording the accuracies of
each algorithm. Implementation is done over our own regular ISL, BSL,
ASL data-set made by us only, using the web camera of our laptops. Our
Experimental results depict that our proposed work and methodology
can work on different backgrounds like a background consist of differ-
ent objects or may have some sort of color background etc. For text to
sign conversion we create a video which tells respective text into sign
language.
1 Introduction
All non-vocal communication requires a particular action for a particular context
like the movement of the face, flipping of hands or folding fingers or actions by
any other body part is a form of gesture. Gesture recognition is a method to
make a machine or a computer get to recognize these actions. Algorithms used
by these methods act as a mediator between human and machine. This enables a
computer to interact with humans naturally by their own without any physical
contact, actually just by using cameras as their eyes. Deaf and dumb people
use hand gestures in their community for communication under the name sign
language. This leads to a kind of isolation between their community and ours
due to language differentiation as a normal person do not want to learn such
language. So if we can program our computers in such a way that they take
input in sign language and process them to convert in their respective language
or maybe other languages also either in speech or in the textual format then
they can act as a noble inter mediator and can remove the language barrier,
the difference between communities can be minimized and the most important,
knowing a language will meet to its worthy result in this high-tech world as sign
language can interact to English and vice versa. All these discussions lead to a
need for a system which can act as a translator and converts sign language to
the desired language in the desired format, so people with a different language
background can have a possible conversation with the people who know only
sign language due to some disabilities but literate.
Sign Language shares grammar syntax like the use of pauses, full stop, and
simultaneity, hand postures, hand placement, orientation, motion of the head,
face gestures with different sign languages. As a country like India is completely
diverse in terms of culture, religion, beliefs, and majorly in languages, so there
is not a standard sign language is adopted in India. Various social groups of
Indian Sign Language with their native and historical variation are there in
India in various parts of the country. But still, language skeleton is similar for the
maximum gestures. Work relating to the system of contrast relationships among
the speech sounds that constitute the fundamental components of ISL started
in the 1970s. With the help from Woodward, National Science Foundation USA
Vasishta and Wilson visit of India and collection of signs from different points
in the country for language analytic.
The organization of the paper is as follows: ‘Sect. 2’ the methods related to
different technologies available in the language. ‘Section 3’ explains the given Sign
language recognition system the method which uses algorithms for skin cropping
and SVM(Support Vector Machine). ‘Section 4’ concerns on the implementation
results and ‘Sect. 5’ is description and conclusion.
2 Literature Survey
This paper [14] proposes HSI color model for segmentation of images instead of
RGB model. HSI model works better for skin color recognition. The optimal H
An Intelligent Sign Communication Machine 77
and S values for hand as specified in [14] is H < 25 or H > 230 and S < 25 or
S > 230. After this they use euclidean distance formula to evaluate the distance
between centroid of palm and fingers. Distance transform method is used to
identify the centroid of the hand. The pixel with the maximum intensity becomes
the centroid. To extract each finger tip they select farthest point from centroid.
Every finger is identified by predefined sign gestures. To recognize semi opened
finger they divide every finger into 3 parts. and angle between the centroid and
the major axis of finger is calculated (Figs. 1, 2 and 3).
In this paper [4] they used YCbCr color space, where Y channel represents
brightness and (Cb, Cr) channels refer to chrominance. They use Cb, Cr channels
to represent color and avoid Y since it is related to brightness only. There are
some small regions near skin but not in skin so they use morphological operation.
After that they select skin region and extract features to recognize hand gesture.
They use three features velocity, orientation and location. They use orientation
78 A. Sharma et al.
feature as a main feature for their system. Then they classify features using
Baum-Welch algorithm (BW). The gesture of hand motion is recognized using
Left-Right Banded model with 9 stage.
In this paper [10] they used YCbCr color space. This color model is imple-
mented by defining skin range in RGB model then convert these values into
YCbCr model using conversion formula. They used support vector machine
(SVM) algorithm. This algorithm use hyper plane to differentiate between two
classes. Hyper plane is defined by the Support vectors which are nothing but the
subset of training data. This algorithm also used to solved multi-class problem
by demising it into two-class problem.
They [7] create data-set using an external camera having some specifications
like 29 fps, 18 MP ans Canon EOS with 18–55mm lens. They eliminate back-
ground and extract hand region from left-out upper body part. They used RGB
configuration of frame having dimensions of 640 * 480 then they extract key
frames from video. They use orientation histogram to extract key frames. They
used different distance metrics (Chess Board Distance, Euclidean distance etc)
to recognise a gesture. After successful recognition of gesture they classified them
for text formation.
They [8] use Fully convolution network algorithm. In particular they used 8
layers FCN model which achieves good performance and used for solving dense
prediction problems. The output segmentation of this network is robust under
various face conditions because it consider a large range of context information.
After that they use CRF algorithm for image matting.
They [9] used Convolution neural network to generate their trained model.
In this network they used 4 layers, in first stage they used five rectified linear
units (ReLu), in second stage two stochastic pooling layers then one dense and
one SoftMax output layer. They took frames of 640 * 480 dimensions then resize
these frames into 128 * 128 * 3. They took 200 frames by 5 different people and
at 5 different viewing angles. Their data-set size is of 5000 frames.
An Intelligent Sign Communication Machine 79
In this paper [13] they used CNN to recognize static sign gestures. They use
American Sign Language (ASL) data-set to train their model which is provided
by Pugeault and Bowden in 2011. There are around 60,000 RGB images they
used for training and testing. They perform some operations on this data-set
because not every is image has same depth according to their dimensions. They
used V3 model to perform color features then for better accuracy they combined
it with depth features. They use 50 epoch and 100 batch size to train their model
using CNN.
Suharjito et al. [1] reviewed the different methods and techniques that
researchers are using to develop better Sign Language.
Kakoty et al. [6] address the sign language number and alphabets recognition
using hand kinematics with hand glove. They achieved the 97 % recognition rate
of these alphabets and numbers.
In this article [11] the proposed system is translating the English text into
Indian Sign Language (ISL). Authors have used human-computer interaction to
implement it. The implemented system consists of the ISL parser, the Hamburg
Notation System, the Signing Gesture Mark-up Language and generates the
animation for ISL grammar.
Paras et al. [12] used the wordnet concept to extend and expansion of the
dictionary and further construct the system to develop the Indian sign language
system for dump and deaf peoples.
Matt et al. [5] address the video-based feedback information to students to
learn the American Sign Language (ASL).
In this artical [3] authors address the deep learning based Gesture Images
implementation for sign language. The validation accuracy obtained for this
implementation using the different layers of deep learning is more than 90%.
3 Proposed Work
Flow Chart. The given flow chart explains the work flow of our project includes
segmentation of video and then masking of image followed by canny edge detec-
tion which is used surf library and then features of images projected to clustering
and comparisons between clusters of training and testing data is further done
by svm library as described below flowchart.
Skin Masking. The reasoning behind a process such that to remove the extra
noise in the segmented frame, after the masking there should be only the Region
of Interest (ROI), which contains only useful information in the image. This
is achieved via Skin Masking defining the threshold on RGB schema and then
converting RGB colour space to grey scale image (Fig. 4).
So to achieve skin masking various image processing functions has been used.
Firstly, the frame is convert into a gray schema. This output gray image will
help us to convert it to HSV schema which will help us to detect the skin colour
which is the main objective of ours so that we can identify the hand region. After
identifying the hand region we have removed the noise from the image using blur
function.
An Intelligent Sign Communication Machine 81
Text to Video. To convert text into sign video generation function is applied.
We use sign of alphabets to convert text into sign language.
4 Experiment Setup
Data-Set. As we have searched on the internet and we found no resources
from where we can get Indian Sign Language dataset. So after a long effort
82 A. Sharma et al.
in searching and finding dataset from different resources, then we only made
our own ISL dataset as in our lighting conditions and in other factors like own
environmental setup. There we have 26 × 15O = 3900 static training images
and 26 × 30 = 60 images which will use for testing. The actual resolution of
the images is 640 × 480, which will be cropped and normalized into 120 ×
120. The samples from the video are 320 × 260 in size and they are taken in a
various lighting environment. Same process we used on two other sign languages
American sign language and British sign language.
We have made one interface where we have given choice to the user in which
language he/she wants to do operation i.e whether in ISL, BSL or ASL. After
that another two other choices will come in which user have to tell whether
he/she wants to do sig-text conversion or text-sign conversion. It makes our
system user friendly and a normal people can easily use it for communication.
Algorithms
– Support Vector Machine Algorithm The support vector machine (SVM) is an
algorithm which is used for two-class problems (Binary classification prob-
lems) in which the concerned data can be separated by a different plane like
linear plane, parabolic plane etc depending upon the number of features of
the sets. Hyper plane basically refers to a virtual plane that can be drawn
in the 3D properties plot of the given data in order to separate them on the
basis of some features. Different classes are separated using it which uses the
training data to do the supervised learning of the system. Every feature in
the training data set is send with the target value to do the learning of the
system according to it. Support vector machine is mainly used to predict the
targeted value of the given testing data set features according to the plane
which is drawn by the algorithm for the distinguish of the different features
in the training data set [2].
Both Classification or regression function can be used for the mapping of
function. When there are non-linear functions for the distinction non-linear
plane is used according the features of it to convert it into n-d space distinc-
tion. Fig represents the plane which is drawn to separate the n-features in
n-d plane. Then the creation of Maximum-margin hyper planes can be done.
Proposed model works over only a subset of the training data set as per the
class boundaries. Similarly, This model can also be produced by SVR (sup-
port vector regression).
SVM uses different values of gamma and c to draw the hyper plane between
the two clusters for distinct of them. Larger the value of gamma more it con-
sidered the points far from the hyper plane which will give the better result
and c will tell how smoothly will the plane gonna be larger the value of c
greater distinguish it will take in consideration.
– Convolution Neural Network The combination of neurons with biases and
weights is known as Convolution Neural Network. The neurons which are
there in the layer gets the input from the its parents layers. Computation of
product between the weights and input is done, and posses an option to follow
An Intelligent Sign Communication Machine 83
The CNN architecture has been classified in to different layers: (1) Con-
volution Layer: We extract features from our frame in this convolution layer,
Some parts of image is link to the upcoming layer convolution layer. Computa-
tion of the dot product is19 done in the receptive area and a kernel [3 * 3 filter]
on all the image as shown in the image. The output of the dot product gives as
the integer value which is known as features as shown in fig. After that feature
extraction is done using filter or kernel of small matrix. (2) Padding Process:
Padding means to do the summation of all the features which we got in the fea-
ture map and finally putting the summation in the middle of the 3 × 3 matrix.
This is done to get the equal dimension of output which we have used in the
input volume.
(3) Rectifier Activation Function (ReLU):
After the implementation of convolution layer on the image matrix, we will use
ReLU layer to get the non-linearity to the system by applying ReLU (non-linear
activation function) to the feature matrix. There are many activation function
are present but here we are using ReLU as it does not which makes the network
hard to train.
(4) Pooling Layer:
Controlling of over fitting and decreasing the dimension of the image is done in
Pooling layer. It can be done in three ways first one is max, second one is average
and third one is mean pooling, here we are using the max pooling, it is used to
take maximum value from the input which we are convoling with features.
(5) Fully Connected Layer:
This one of the important layer of convolution layer as it gives the classified
images according to the training data set. We have used the different sign images
for the training set as discussed above.
(6) Epochs:
During the whole data set is going backward and forward propagation through
networks is called epochs.
(7) Training Accuracy:
Training accuracy given by the model, when we are applying training on training
data sets.
(8) Validation Accuracy:
After the successful training of the model then it is evaluated with help of test
data sets then accuracy of model is predicted.
84 A. Sharma et al.
5 Experimental Result
We have performed the training on three different sign languages each having
45,500 training images and performed the testing on 20,800 images.
Algorithm Accuracy
K-nearest neighbour 0.6628820960698
Logistic regression 0.7554585152838
Naive bayes 0.6283842794759
6 Conclusion
– We have worked on the stationary hand gesture but sign language can have
moving hands also. So, in future it can be done for both moving hands also.
– The major problem with the project is it is mainly depend on the lighting
condition so in future the effect of lighting can be overcome.
References
1. Abraham, A., Rohini, V.: Real time conversion of sign language to speech
and prediction of gestures using artificial neural network. Proc. Comput. Sci.
143, 587–594 (2018). [Link] [Link]
[Link]/science/article/pii/S1877050918321331. 8th International Con-
ference on Advances in Computing & Communications (ICACC-2018)
2. Dai, H.: Research on svm improved algorithm for large data classification. In: 2018
IEEE 3rd International Conference on Big Data Analysis (ICBDA), pp. 181–185,
March 2018. [Link]
3. Das, A., Gawde, S., Suratwala, K., Kalbande, D.: Sign language recognition using
deep learning on custom processed static gesture images. In: 2018 International
Conference on Smart City and Emerging Technology (ICSCET), pp. 1–6 (2018)
4. Elmezain, M., Al-Hamadi, A., Michaelis, B.: Real-time capable system for hand
gesture recognition using hidden Markov models in stereo color image sequence. J.
WSCG 16 (2008)
5. Huenerfauth, M., Gale, E., Penly, B., Pillutla, S., Willard, M., Hariharan, D.: Eval-
uation of language feedback methods for student videos of American sign language.
ACM Trans. Access. Comput. (TACCESS) 10(1), 1–30 (2017). [Link]
1145/3046788
86 A. Sharma et al.
6. Kakoty, N.M., Sharma, M.D.: Recognition of sign language alphabets and num-
bers based on hand kinematics using a data glove. Proc. Comput. Sci. 133, 55–
62 (2018). [Link] [Link]
com/science/article/pii/S1877050918309529. International Conference on Robotics
and Smart Manufacturing (RoSMa2018)
7. Liu, L.: Research on logistic regression algorithm of breast cancer diagnose data by
machine learning. In: 2018 International Conference on Robots Intelligent System
(ICRIS), pp. 157–160, May 2018. [Link]
8. Qin, S., Kim, S., Manduchi, R.: Automatic skin and hair masking using fully
convolutional networks. In: 2017 IEEE International Conference on Multimedia
and Expo (ICME), pp. 103–108, July 2017. [Link]
8019339
9. Rao, G.A., Syamala, K., Kishore, P.V.V., Sastry, A.S.C.S.: Deep convolutional neu-
ral networks for sign language recognition. In: 2018 Conference on Signal Processing
And Communication Engineering Systems (SPACES), pp. 194–197, January 2018.
[Link]
10. Reshna, S., Jayaraju, M.: Spotting and recognition of hand gesture for Indian sign
language recognition system with skin segmentation and SVM. In: 2017 Interna-
tional Conference on Wireless Communications, Signal Processing and Networking
(WiSPNET), pp. 386–390, March 2017. [Link]
8299784
11. Sugandhi, Kumar, P., Kaur, S.: Sign language generation system based on Indian
sign language grammar. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 19(4),
1-26 (2020). [Link]
12. Vij, P., Kumar, P.: Mapping Hindi text to Indian sign language with exten-
sion using WordNet. In: Association for Computing Machinery, New York, NY,
USA (2016). [Link] [Link]
2979779.2979817
13. Xie, B., He, X., Li, Y.: RGB-D static gesture recognition based on convolutional
neural network. J. Eng. 2018(16), 1515–1520 (2018). [Link]
2018.8327
14. Zhou, Q., Zhao, Z.: Substation equipment image recognition based on sift feature
matching. In: 2012 5th International Congress on Image and Signal Processing, pp.
1344–1347, October 2012. [Link]
Features Explaining Malnutrition in India:
A Machine Learning Approach to Demographic
and Health Survey Data
[Link]@[Link]
Abstract. India is one of the severely malnourished countries in the world. Under-
nutrition is the reason for death among two-third of the 1.04 million deaths among
the children under the age of five in the year 2019. Several strategies have been
adopted by the Government of India and state governments to minimize the inci-
dents of malnutrition. However, to make the policies effective, it is important to
understand the key features explaining malnutrition. Analyzing the Indian Demo-
graphic Health Survey Data (IDHS) of the year 2015–2016, this paper attempts
to identify causes of four dimensions of malnutrition namely, Height Age Z-
score (HAZ), Weight Age Z-score (WAZ), Weight Height Z-score (WHZ) and
Body Mass Index (BMI). Using machine learning approach of feature reduction,
the paper identifies ten most important features out of available 1341 features
in the database for each of the four anthropometric parameters of malnutrition.
The features are reduced and ranked using WEKA tool. Results and finding of
this research would provide key policy inputs to address malnutrition and related
mortality among the children under the age five.
1 Introduction
Under-nourished women are most likely to have unhealthy babies. In addition, under-
nourished individuals can do less productive work leading to low payments and poverty.
The Indian Government has started many programs such as midday meal scheme on
15th August 1995 in order to eradicate malnutrition. Under this scheme, fresh cooked
meals are provided to millions of children in almost all government and government
aided schools. Apart from this, the Government of India has also started Integrated
Child Development Services in 1975 [2], which targets on improving health of mothers
and children under age of 6 by providing health and nutrition education, health services,
supplementary food, and pre-school education. But these programmes and many other
such national as well as state level policies have not been designed considering the
variation of factors responsible for malnutrition in children below five. This is the root
cause of slower rate of decrease in number of deaths of children under age five caused
due to undernutrition.
This paper is categorised into six sections. The literature review is the next section,
IDHS dataset is explained in detail in the third section. Technique used in this analysis
is described in the fourth section, results and findings are discussed in the fifth section
of this paper followed by conclusion in the sixth section.
2 Literature Survey
Several studies on malnutrition have been carried out in past decades using different
types of datasets and methodology amongst which most commonly used dataset is Demo-
graphic Health Survey Data. Demographic Health Survey is conducted in every 10 years.
Although, many studies have been done using this dataset in the past, but very few of
them have used machine learning techniques for their analysis. Others have used either
analytical or statistical approach. Following are some of the works carried out in the
field of analysing increasing rate of malnutrition.
Nair et al. [3], has characterised malnutrition causes for states of India using IDHS
2005–2006 dataset. With the help of K-means clustering analysis states were divided
according to different features. Synthetic Minority Oversampling Technique was used for
pre-processing the dataset. For attribute selection Adaboost and Ranker algorithm were
used. The analysis resulted in generating seven clusters of HAZ, four clusters of WAZ, six
clusters of WHZ and five clusters of BMI which were the four anthropometric measures
used. Later in the research, using Ranker algorithm, the features were ranked in which,
the top rank features, those having highest variance amongst all four anthropometric
parameters, were found to be mainly responsible for malnutrition. These features were
considered important for policy makers as these would be helpful for improving and
creating new policies for different regions of India to eradicate malnutrition from its
root [4].
Many studies have used data mining techniques like decision tree and clustering.
In this work [5], few patterns were found - like a child can be malnourished even if
safe water source is used and there are 87% chances of malnutrition in the child if she
acquires a major disease and does not use good toilet facility. Another research developed
a model which can help policy designers and health care facilitators to identify children
under risk. Factors which were found to be the major contributors in malnutrition were
mother’s education, child age, region, wealth index and residence [6].
Features Explaining Malnutrition in India 89
Other studies were done using statistical analysis methods such as ANOVA, Case-
based Reasoning (CBR), Euclidean distance, ID3 algorithm, Probabilistic Bayes the-
ory and logistic regression [7–11]. To prove that malting technique produce phytase
enzyme, least significant difference techniques on zinc, iron and phytic acid was used.
Zinc is an essential metalloenzyme and it widely helps in reducing stunting, wasting and
improves brain development in infants [12]. Using multivariate logistic regression on
Bangladesh DHS dataset and environmental indicator, Normalized Difference Vegeta-
tion Index (NDVI), trends of nutrition security in foods of Ganges Brahmaputra Meghna
Delta have been found for year 2007 and 2011. Results showed, with the increase of
NDVI wasting probability decreases as the food consumption of medium income group
varies with the variation in vegetation due to change in climate [13]. Results of statis-
tical analysis on Pakistan DHS show secondary or higher education of parents, health
facilities and rich children have less tendency of becoming stunted whereas, children of
rural residence having no toilet facilities, smaller size during birth and older mother are
more likely to be stunted [14].
Poverty have strong implications on malnutrition, this work [15] used Indian Health
Development Survey (IHDS) of year 2012 to find the factors responsible for absconding
and suffering from poverty. For this purpose, machine learning techniques have been
applied such as info-gain and random forest classifier. The work found that livestock
such as goat plays a vital role in explaining poverty. Also, caste, education and rural to
urban migration are major factors in falling to poverty whereas, toilet and financial sector
are features of escaping poverty. Another research was conducted on infant mortality
rate by finding the influencing factors such as national income and fertility rate, etc.
using data from [Link] [16]. Similarly, several machine learning techniques are
deployed to identify probable causes of malnutrition [17–22].
From literature survey it is observed that, strategies deployed were based on country.
There are many different techniques that were used to identify root cause of malnutrition
and how it can be dealt effectively. Features themselves are divided into four classes of
anthropometric parameters which are also recognized by WHO, they are HAZ, WAZ,
WHZ and BMI. Identifying features for these anthropometric parameters is very impor-
tant. Selecting most important features of all four anthropometric parameters HAZ,
WAZ, WHZ and BMI from IDHS data, finding major impacting features using Principal
Component Evaluator and ranking them with Ranker Algorithm are the main objec-
tives of this paper. The features thus identified will help policy makers in improving the
existing policies and address the important causes of malnutrition.
3 Data Source
Dataset used in this paper is IDHS data of year 2015–2016. The DHS program collects
information on health and population in 90 developing countries, one of which is India.
The data is categorized in fields like birth record, children’s record, couples record,
individual’s record and men’s record etc. Amongst all, birth record data set is employed
for this purpose. Information of child such as age, sex, HAZ, WAZ, WHZ and BMI, etc.
are recorded in this dataset [4]. The mother of the child is also interviewed to collect
information about both mother and child health status such as type of place of residence,
90 S. R. Vasu et al.
number of children under five in household, births in last five years, gave child pumpkin,
carrots, squash, received polio vaccine, number of tetanus injections before pregnancy,
during pregnancy, given or bought iron tablets, etc. Birth record of year 2015–2016
contains 1315617 instances of 1341 features of all states and union territories of India.
4 Methodology
Methodology used in this analysis is shown in the schematic diagram Fig. 1, which begins
with data collection and cleaning of irrelevant information from the dataset, followed by
selection of useful features of all four anthropometric parameters, determining the most
important malnutrition impacting variables and ranking them using WEKA tool.
are eliminated before analysis which reduced the variables to 745. On removing the
duplicate instances using distinct method of dplyr package, total observations decrease
to 639916. The remaining useful data, has both numeric as well as categorical data. For
selection of features using Boruta Algorithm, the data need to be converted into numeric
type. For this purpose, all the categorical variable instances are encoded based on factor
levels of the feature whereas for numeric variables having NA values, the NA values are
replaced by mean of the column.
Fig. 2. Plot of Boruta algorithm result for HAZ (Color figure online)
The attributes which are found common in all the four anthropometric parame-
ters are ‘Had diarrhoea recently’, ‘Taking iron pills, sprinkles or syrup’, ‘Assistance:
DAI/Traditional Birth Attendant’, ‘Place received most vaccinations’, and ‘Women’s
age in year’. Whereas, those which are unique are ‘Daughters elsewhere’, ‘Delivery by
94 S. R. Vasu et al.
caesarean section’, and ‘Haemoglobin level (g/dl - 1 decimal)’. The common attributes
have higher probability of being the main cause of malnutrition as compared to the
unique ones.
After finding the 10 most important features of all four anthropometric parameter, HAZ,
WAZ, WHZ and BMI the next step is to find the ranking of the factors that are mainly
responsible for malnutrition. For this purpose, WEKA tool is used in which, for attribute
selection, Principal Component evaluator is used with Ranker algorithm to get ranking
Features Explaining Malnutrition in India 95
of features. Former performs Principal Component Analysis (PCA) on data for dimen-
sionality reduction by choosing enough eigen vectors to account for some percentage of
variance in the original data whereas later rank the principal component features.
PCA reduces the dimensionality of the dataset having many interrelated variables,
retaining the variation of data as much as possible. The data set then contains variables
arranged according to decreasing variation amongst all. The first few of them which are
ordered and uncorrelated are called principal component and all others as components.
PCA finds the correlation pattern among the original variables thereafter substituting a
new component in place of group of attributes which were correlated (Table 5).
96 S. R. Vasu et al.
Table 5. Ranking of features of all anthropometric parameters determined using WEKA tool
5 Discussion
Using Principal Component Analysis along with Ranker algorithm, features were
selected and ranked based on their variance across all the four anthropometric param-
eters. The features having highest variation are identified as the most impactful fea-
tures explaining malnutrition. Three highest ranking features of HAZ are had diarrhoea
recently, taking iron pills, sprinkles or syrup and did eat any solid, semi-solid or soft
food yesterday. Similarly, for WAZ type of mosquito bed nets child slept under IPC,
drank from bottle with nipple yesterday and had diarrhoea recently are the most varying
features of respective anthropometric parameter.
From the analysis on all four anthropometric parameters namely, HAZ, WAZ, WHZ,
and BMI it was identified that 6 features are common across all the parameters. These are,
“Had diarrhoea recently”, “Taking Iron pills, sprinkles or syrup”, “Assistance of Dai”,
“Received most vaccination”, “Women’s age” and “Type of mosquito bed nets child
slept under IPC”. These variables can be used for improving or making new policies.
Three features are identified across three parameters, these are “Assistance from ANM”,
“Drank from bottle with nipple” and “Number of Children under five in the household”.
Besides, there are four features found across two parameters and there were only three
features unique to any of the parameters. BMI did not have any unique feature.
Considering only the features which are present in all the four or at least three parame-
ters different characteristics explaining malnutrition can be identified. These characteris-
tics can be classified into broadly three categories. First category is related to ‘availability
and awareness’ of safe drinking water and iron pills. It is irony of the country that even
after seventy plus years of independence a large section of the society is deprived from
availing safe drinking water. These problems are becoming even more acute in urban
areas especially in the slums apart from remote terrains. It is not surprising that iron
deficiency among the pregnant and lactating mother is one of the most important cause
of malnutrition among the mothers and children. An effective reach out in rural as well
Features Explaining Malnutrition in India 97
as in urban areas to these mothers would be helpful in addressing such deficiencies. Easy
availability and accessibility of iron rich food like fish, drumstick etc., would go a long
way in addressing iron deficiency among mothers and children. It is equally important
to invest and develop food products that can be easily stored, easily available at a very
low price would go a long way in addressing iron deficiencies. A second category is “ac-
cess to the services of ANM and trained Dais”. Investment in public health and public
health services especially creating a large pool of trained paramedical services would
be effective in addressing not only malnutrition for children but also for mothers as well
as general well-being of the mass in the need of healthcare services. Similarly access to
free vaccinations in the vicinity is an important feature to address malnutrition. A third
category is related to ‘awareness and behavioural and social change’. Early marriage
among the women and not having sufficient gap between the births are identified as two
important features of malnutrition. Investing in education, creating awareness through
the local governance structure as well as increasing income level of the households have
been identified as important factors in the literature that can have positive impact on
the behavioural as well as social change. These would require persistent investment and
action at the ground.
6 Conclusion
References
1. The Economic Times. [Link]
india-has-one-third-of-worlds-stunted-children-global-nutrition-report/articleshow/668
[Link]?from=mdr. Accessed 02 June 2020
2. Malnutrition in India. [Link]
3. Anilkumar, N.A., Gupta, D., Khare, S., Gopalkrishna, D. M., Jyotishi, A.: Characteristics and
causes of malnutrition across Indian states: a cluster analysis based on Indian demographic
and health survey data. In: 2017 International Conference on Advances in Computing, Com-
munications and Informatics (ICACCI), Udupi, pp. 2115–2123 (2017). [Link]
1109/ICACCI.2017.8126158.
4. The DHS Program: Demographic and Health Surveys. [Link] Accessed 23
June 2020
98 S. R. Vasu et al.
5. Ariyadasa, S.N., Munasinghe, L.K., Senanayake, S.H.D., Fernando, N.A.S.: Data mining
approach to minimize child malnutrition in developing countries. In: International Conference
on Advances in ICT for Emerging Regions (ICTer2012), Colombo, p. 225 (2012). [Link]
org/10.1109/ICTer.2012.6423030.
6. Markos, Z., Agide, F.: Predicting under nutrition status of under-five children using data
mining techniques: the case of 2011 ethiopian demographic and health survey. J. Health Med.
Inf. 5, 152 (2014). [Link]
7. Arun, C., Khare, S., Gupta, D., Jyotishi, A.: Influence of health service infrastructure on
the infant mortality rate: an econometric analysis of indian states. In: Nagabhushan, T.N.,
Aradhya, V.N.M., Jagadeesh, P., Shukla, S., Chayadevi, M.L. (eds.) CCIP 2017. CCIS, vol.
801, pp. 81–92. Springer, Singapore (2018). [Link]
8. Jeyaseelan, L., Lakshman, M.: Risk factors for malnutrition in South Indian children. J.
Biosoc. Sci. 29(1), 93–100 (1997). [Link]
9. Fenske, N., Kneib, T., Hothorn, T.: Identifying risk factors for severe childhood malnutrition
by boosting additive quantile regression. J. Am. Stat. Assoc. 106, 494–510 (2011). https://
[Link]/10.1198/jasa.2011.ap09272
10. Mosley, W.H., Chen, L.C.: An analytical framework for the study of child survival in develop-
ing countries. Populat. Dev. Rev. 10, 25–45 (1984). [Link]/stable/2807954. Accessed
14 Aug 2020
11. Hanmer, L., Lensink, R., White, H.: Infant and child mortality in developing countries:
analysing the data for robust determinants. J. Dev. Stud. 40(1), 101–118 (2003). https://
[Link]/10.1080/00220380412331293687
12. Ana, I.M., Udota, H.I.J., Udoakah, Y.N.: Malting technology in the development of safe
and sustainable complementary composite food from cereals and legumes. In: IEEE Global
Humanitarian Technology Conference (GHTC 2014), San Jose, CA, pp. 140–144 (2014).
[Link]
13. Van Soesbergen, A., Nilsen, K., Burgess, N., Szabo, S., Matthews, Z.: Food and Nutrition
Security Trends and Challenges in the Ganges Brahmaputra Meghna (GBM) Delta. Elem Sci
Anth. 5, 56 (2017). [Link]
14. Abbasi, S., Mahmood, H., Zaman, A., Farooq, B., Malik, A., et al.: Indicators of malnutrition
in under 5 Pakistani children: a DHS data secondary analysis. J. Med. Res. Health Educ. 2(3),
12 (2018)
15. S. Narendranath, S. Khare, Gupta, D., Jyotishi, A.: Characteristics of ‘escaping’ and ‘falling
into’ poverty in India: an analysis of IHDS panel data using machine learning approach. In:
2018 International Conference on Advances in Computing, Communications and Informat-
ics (ICACCI), Bangalore, pp. 1391–1397 (2018). [Link]
4571.
16. Suriyakala, V., Deepika, M.G., Amalendu, J., Deepa, G.: Factors affecting infant mortality
rate in india: an analysis of Indian states. In: Corchado Rodriguez, J., Mitra, S., Thampi,
S., El-Alfy, E.S. (eds.) Intelligent Systems Technologies and Applications 2016, ISTA 2016.
Advances in Intelligent Systems and Computing, vol. 530, pp. 707–719. Springer, Cham
(2016). [Link]
17. Shyam Sundar, K., Khare, S., Gupta, D., Jyotishi, A.: Analysis of fuel consumption character-
istics: insights from the Indian human development survey using machine learning techniques.
In: Raju, K.S., Govardhan, A., Rani, B.P., Sridevi, R., Murty, M.R. (eds.) Proceedings of the
Third International Conference on Computational Intelligence and Informatics. AISC, vol.
1090, pp. 349–359. Springer, Singapore (2020). [Link]
7_30
18. Khare, S., Kavyashree, S., Gupta, D., Jyotishi, A.: Investigation of nutritional status of children
based on machine learning techniques using Indian demographic and health survey data. Proc.
Comput. Sci. 115, 338–349 (2017). [Link]
Features Explaining Malnutrition in India 99
19. Khare, S., Gupta, D., Prabhavathi, K., Deepika, M.G., Jyotishi, A.: Health and nutritional
status of children: survey, challenges and directions. In: Nagabhushan, T.N., Aradhya, V.N.M.,
Jagadeesh, P., Shukla, S., M. L., C. (eds.) CCIP 2017. CCIS, vol. 801, pp. 93–104. Springer,
Singapore (2018). [Link]
20. Sharma, V., Sharma, V., Khan, A., et al.: Malnutrition, health and the role of machine learning
in clinical setting. Front Nutr. 7, 44 (2020). [Link]
21. Giabbanelli, P., Adams, J.: Identifying small groups of foods that can predict achievement
of key dietary recommendations. Data mining of the UK national diet and nutrition survey.
Public Health Nutr. 1, 1–9 (2016). [Link]
22. Hearty, A., Gibney, M.: Analysis of meal patterns with the use of supervised data mining
techniques - Artificial neural networks and decision trees. Am. J. Clin. Nutr. 88, 1632–1642
(2009). [Link]
Surveillance System for Monitoring Social
Distance
1 Introduction
Surveillance devices like drones are one of the most wonderful and precious
advancements of technology [16]. Science and technology are developing day by
c Springer Nature Singapore Pte Ltd. 2021
D. Garg et al. (Eds.): IACC 2020, CCIS 1367, pp. 100–112, 2021.
[Link]
Social Distance Monitoring System 101
be identified and thereby giving out awareness to the public. To reduce human
efforts and to make sure everyone follows the social distancing concept, this
work may seem to be quite promising. Further paper is arranged as, in Sect. 2,
the literature of related work is presented, in Sect. 3, the methodology of person
detection technique and monitoring of distance is shown. In Sect. 4, performance
evaluation is done and the final section gives the conclusion and future work of
our work.
2 Literature Review
with call time speed without recognition accuracy. Their model is compared with
RetinNet RestNet50 and HAL-RetinNet.
In the paper [4], the authors demonstrate three collaborative-based DL appli-
cations for tracking and detecting objects and assessment of distance. The object
edition is a developed method, it’s high in accuracy and also the real-time imag-
inary limitations of identifying the object. They used SSD and YOLO V3 algo-
rithms on object detection to know which algorithm is more suitable. YOLO
V3 is higher when compared to SSD. The MonoDepth algorithm provides an
asymmetric map as output. They verified policy with different datasets such as
Citiescope and Kitty, also in the RSIG LBC vehicle on Row City Center Traffic
Road in real-time. They confirmed under the railway dataset of the Tramway
Rouen. The new method presented is based on SSD to analyze the behavior of
objects such as pedestrians or vehicles. With the SSD modified algorithm, after
identifying an object they assessed future status by including its direction of
motion, for pedestrians willing to cross the road, for not willing to cross the road,
etc. SSD and YOLO V3 algorithms are used for detecting and tracking objects.
A large and appropriate dataset is very important to optimize their performance.
Changing the detection classes does not yield a significant improvement.
In paper [1] provides a comparison based on time, accuracy, and parameter
values of different algorithms for identifying and localizing objects with different
dimensions of the input image. In this, they have identified a new method to
improve speed for single stage models and for not losing accuracy. Final results
declare that Tiny Yolo V3 improves detection speed, confirming the accurate
result.
Speed and accuracy are important parameters for evaluating pedestrian
detection performance. Performance is being squandered in different situations
because the experiment does not always take place in the same condition [11]. Of
course, many parameters can vary from one experience to another. By analyzing
the characteristics for object detection three popular models are there, Single
Shot Detection [13], YOLO [19] and F-RCNN [9]. F-RCNN is highly accurate
compared to SSD and YOLO v3, but, it is slow. If high-quality accuracy needs
to be achieved, RCNN is the fastest solution. But, it is not the fastest approach.
If speed is important, then YOLO v3 is the best approach. If we want good
accuracy and good speed at the same time, SSD is a good solution. At the same
time, YOLO V4 is a good solution, as it is a fast approach, and accuracy is
similar to faster-RCNN [22].
3 Methodology
The two major steps involved in monitoring social distancing are pedestrian
detection and distance calculation. We get the video input from the surveillance
system and convert the video input into image sequences. The model runs the
detection on these images and then distance calculation is done. After we know
the people breaking the social distancing threshold, we mark them with a red
bounding box as shown in Fig. 1. This section is divided into two sub-sections. In
104 S. Jethani et al.
the first subsection, we will discuss the models we used for pedestrian detection
and in the other sub-section, we talk about the approaches we used to calculate
the distance between each pedestrian.
Fig. 1. The flow chart for the work flow of monitoring social distancing
the class “person” from the COCO Dataset with 66808 samples. Further, we
calculated various parameters such as confusion matrix, mAP, and the time
required to do the detection for each model. These parameters give an under-
standing and help in differentiating and selection among the various pre-selected
models. The hyper parameters used in training of SSD+Mobilenet(SSD+M),
SSD+Inception(SSD+I), Faster RCNN(FRCNN), RFCN, YOLOv4 and Tiny
YOLOv3 are listed in Table 1.
Once we had the detection the next part was to calculate the distance between
each person. To calculate the distance we used two approaches:
where p(px , py ) and q(qx , qy ) are the bottom centre point of two bounding boxes
respectively and the unit of the distance will be “pixel”. For conversion of units
from pixel to centimetres (cm), we need to know how many pixels in the hori-
zontal and vertical direction equates to certain ground truth distance. For that,
we selected four points as shown in Fig. 2. Points 1 and 2 constitute a horizontal
distance of 490 cm and Points 3 and 4 constitute vertical distance of 330 cm (the
ground truth distance was calculated with the help of Google Maps [6]). We
then calculated the distance(in pixels) between Point 1 and Point 2 and simi-
larly for Point 3 and Point 4 using Euclidean Distance Formula in the given input
frame. Let’s name these distances as “distance w” and “distance h” respectively.
Now we consider two coordinates on the image, say P (Px , Py ) and Q(Qx , Qy ) to
calculate the distance between them in centimetres following process was done:
106 S. Jethani et al.
Fig. 2. Point 1 to Point 4 used for conversion of units from pixel to cm.
(Py − Qy )
Height = × 490 (2)
Distance h
(Px − Qx )
W idth = × 330 (3)
Distance w
Distance = (Height)2 + (W idth)2 (4)
The Distance Calculated here will have the units in centimetres.
The next step was to mark the people who were not following the social
distancing protocols. As the social distancing guidelines suggest a minimum of
6 ft (182 cm) distance between two people, we set a threshold distance of 182 cm
and whosoever falls below this distance threshold was marked by drawing a red
bounding box around them. Also, we drew red lines between those people to
show with whom they were at proximity.
Conversion from Perspective View to Bird’s Eye View. The video input
from CCTV, Drone or any other surveillance system can be in any random
perspective view, we needed a method where we could calculate distance as
accurately as possible in any view. In the method that we came up with, we
converted the perspective view into a bird’s eye view. The surveillance system
has a monocular vision and it is not possible to calculate the distance between the
detected persons from that view. By selecting four points from the image(Region
of Interest) we can then map the entire image to a bird’s eye view perspective
using a perspective transformation matrix.
For the conversion and mapping from Perspective View to Bird’s Eye View,
we need to calculate transformation matrix (Msd ). Let’s assume we have the
point P (x, y) in the perspective view image and want to locate the same point in
the bird’s eye view, say Q(u, v) as shown in Fig. 3. If we have the transformation
Social Distance Monitoring System 107
Fig. 3. The selected points from the perspective image and the four corners of the
rectangle where we map the bird’s eye view.
(dxk + eyk + f )
vk = ⇒ vk = dxk + eyk + f − gk uk − hyk uk
(gk + hyk + 1)
For k = 0, 1, 2, 3 this can written as 8 × 8 system:
⎡ ⎤⎡ ⎤ ⎡ ⎤
x0 y0 1 0 0 0 −x0 u0 −y0 u0 a u0
⎢x1 y1 1 0 0 0 −x1 u1 −y1 u1 ⎥ ⎢ b ⎥ ⎢u1 ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢x2 y2 1 0 0 0 −x2 u2 −y2 u2 ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ c ⎥ ⎢u2 ⎥
⎢x3 y0 3 1 0 0 0 −x3 u3 −y3 u3 ⎥ ⎢
⎥ ⎥ ⎢ ⎥
⎢ ⎢ d ⎥ = ⎢u3 ⎥
⎢0 0 0 x0 y0 1 −x0 v0 −y0 v0 ⎥ ⎢ e ⎥
⎥ ⎢ ⎢ ⎥
⎢ ⎥ ⎢ v0 ⎥
⎢0 0 0 x1 y1 1 −x1 v1 −y1 v1 ⎥ ⎢f ⎥ ⎢
⎥ ⎢ ⎥ ⎥
⎢ ⎢ v1 ⎥
⎣0 0 0 x2 y2 1 −x2 v2 −y2 v2 ⎦ ⎣ g ⎦ ⎣ v2 ⎦
0 0 0 x3 y3 1 −x3 v3 −y3 v3 h v3
108 S. Jethani et al.
Computing this we can calculate all the elements from “a” to “h” and get
the transformation matrix (Msd ). Once we have the transformation matrix we
can apply it to the perspective image to map the entire image into the bird’s
eye view image. After this we follow the same steps as in the previous approach
i.e, calculate the bottom point of each bounding box, convert those points into
bird’s eye view, Point 1 to Point 4 as shown in the Fig. 2 are also converted to
bird’s eye view and then the distance between them was calculated(in pixels).
We then converted the distance from “pixels” to “centimetres” similarly as the
previous method. Using the distance between the bounding box we marked the
people who were in the proximity of less than 182 cm (6 ft).
4 Results
Evaluation of both the subtasks of this proposed work along with their inferences
is discussed in this section. The models were trained on google colab which has
the following configuration
For evaluating our selected models we have used the Oxford Town Center Data
set [2]. It contains video from a CCTV camera located in the Cornmarket and
Market St., Oxford, England. We calculated the Mean Average Precision (map)
and the prediction time taken per image (in seconds). Following graphs were
obtained after the evaluation.
Fig. 4. Prediction time taken per image of all the selected models.
Social Distance Monitoring System 109
Fig. 6. Error in calculating the distance vs the ground truth distance for both proposed
approaches
110 S. Jethani et al.
From Fig. 4 and Fig. 5, we observed that YOLOv4 and RFCN had the highest
mAP but took a long time for the detection while Tiny Yolo and SSD+Mobilenet
took the least time but had low mAP. For the distance calculation, it is clear
from Fig. 6 and from the mean scores of both the approaches, that the Bird’s
Eye View Approach is better than the Euclidean Distance approach. Also, from
Fig. 6, it can be observed that as the distance increases the error also increases
for the Euclidean Distance Approach but, the same does not happen for the
other approach. Figure 7 shows the output for both of the proposed approaches
of this work.
References
1. Adarsh, P., Rathi, P., Kumar, M.: Yolo V3-Tiny: object detection and recog-
nition using one stage improved model. In: 2020 6th International Conference
on Advanced Computing and Communication Systems (ICACCS), pp. 687–694
(2020). [Link]
2. Benfold, B., Reid, I.: Stable multi-target tracking in real-time surveillance video.
In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern
Recognition, CVPR 2011, pp. 3457–3464. IEEE Computer Society (2011). https://
[Link]/10.1109/CVPR.2011.5995667
3. Cabreira, T., Brisolara, L., Ferreira Jr., P.: Survey on coverage path planning
with unmanned aerial vehicles. Drones 3, 4 (2019). [Link]
drones3010004
4. Chen, Z., Khemmar, R., Decoux, B., Atahouet, A., Ertaud, J.: Real time object
detection, tracking, and distance and motion estimation based on deep learning:
application to smart mobility. In: 2019 Eighth International Conference on Emerg-
ing Security Technologies (EST), pp. 1–6 (2019). [Link]
2019.8806222
5. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference
on Computer Vision (ICCV), December 2015. [Link]
46493-0 22
6. Google: Google maps. [Link]
15z
7. Guo, Q., Li, Y., Wang, D.: Pedestrian detection in unmanned aerial vehicle scene.
In: Lu, H. (ed.) ISAIR 2018. SCI, vol. 810, pp. 273–278. Springer, Cham (2020).
[Link] 26
8. Gupta, S., Sangeeta, R., Mishra, R., Singal, G., Badal, T., Garg, D.: Corridor
segmentation for automatic robot navigation in indoor environment using edge
devices. Comput. Netw. 178, 107374 (2020). [Link]
2020.107374
9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), June 2016
10. Ministry of Health & Family Welfare, Government of India: Social distancing
measure in view of spread of Covid-19 disease. [Link]
[Link]
11. Kushwaha, R., Singal, G., Nain, N.: A texture feature based approach for person
verification using footprint bio-metric. Artif. Intell. Rev. 1–31 (2020). [Link]
org/10.1007/s10462-020-09887-6
12. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D.,
Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp.
740–755. Springer, Cham (2014). [Link] 48
13. Liu, W., et al.: SSD: single shot multibox detector. arXiv abs/1512.02325 (2016)
14. Lygouras, E., Santavas, N., Taitzoglou, A., Tarchanidis, K., Mitropoulos, A.,
Gasteratos, A.: Unsupervised human detection with an embedded vision system
on a fully autonomous UAV for search and rescue operations. Sensors 19(16), 3542
(2019). [Link]
15. Nguyen, D.T., Li, W., Ogunbona, P.: Human detection from images and videos: a
survey. Pattern Recogn. 51 (2015). [Link]
112 S. Jethani et al.
16. Pareek, B., Gupta, P., Singal, G., Kushwaha, R.: Person identification using
autonomous drone through resource constraint devices. In: 2019 Sixth International
Conference on Internet of Things: Systems, Management and Security (IOTSMS),
pp. 124–129 (2019). [Link]
17. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified,
real-time object detection. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), June 2016
18. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
19. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement (2018)
20. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time
object detection with region proposal networks. In: Cortes, C., Lawrence,
N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neu-
ral Information Processing Systems 28, pp. 91–99. Curran Associates, Inc.
(2015). [Link]
[Link]
21. Vaddi, S., Kumar, C., Jannesari, A.: Efficient object detection model for real-time
UAV applications. CoRR abs/1906.00786 (2019). [Link]
22. Veeramsetty, V., Singal, G., Badal, T.: Coinnet: platform independent application
to recognize Indian currency notes using deep learning techniques. Multimed. Tools
Appl. 79(31), 22569–22594 (2020). [Link]
Consumer Emotional State Evaluation Using
EEG Based Emotion Recognition Using Deep
Learning Approach
Abstract. The standard methodologies for marketing (e.g., newspaper ads and
tv commercials) are not effective in selling products as they do not excite the
customers to buy any specific item. These methods of advertising try to ascertain
their consumers’ attitude towards any product, which might not represent the
actual behavior. So, the customer behavior is misunderstood by the advertisers
and start-ups because the mindsets do not represent the buying behaviors of the
consumers. Previous studies reflect that there is lack of experimental work done
on classification and the prediction of their consumer emotional states. In this
research, a strategy has been adopted to discover the customer emotional states by
simply thinking about attributes and the power spectral density using EEG-based
signals. The results revealed that, though the deep neural network (DNN) higher
recall, greater precision, and accuracy compared with support vector machine
(SVM) and k-nearest neighbor (k-NN), but random forest(RF) reaches values that
were like deep learning on precisely the similar dataset.
1 Introduction
As an emerging field, neuromarketing relates the full of feeling and psychological sides
of consumer conduct by utilizing neuroscience. The field of neuromarketing is a rising
field that individuals don’t perceive what occurs in their minds that were oblivious. Fur-
thermore, it has been exhibited that individuals are not satisfactory in their emotional s
or objectives (Hammou 2013). The utilization of promoting and publicizing media, sim-
ilar to reviews and meeting’s needs, and purchasing purposes can cause making of ends
(Telpaz et al. 2015; Barros et al. 2016). Similarly, oral communication about emotions
can prompt biased decisions. It is hard to extricate the emotions of consumer straight-
away through decisions, because of ethical issues associated with product purchase and
delivery (Telpaz et al. 2015). These components accentuate a logical inconsistency in
the shoppers’ suppositions during the ease of use appraisals and their genuine assess-
ments, sentiments, and observations with respect to an item’s utilization (Barros et al.
2016). Hence, neuromarketing needs methodological choices that can check consumer
BCIs help to communicate effectively between user brain and computer system. It does
not involve in physiological interference and record signals through system generated
commands (Ramadan et al. 2015). BCI have its application area in advertising, medical
science, smart cities and neuroscience (Abdulkader 2015; Hwang 2013). BCI systems
are working to aid the user. BCI systems are very challenging in the field of advertising
and marketing.
The promising neuroimaging devices in neuromarketing is Brain-computer inter-
faces (BCIs). It permits frameworks and consumers to convey proficiently. To run and
execute commands, BCI don’t requires the utilization of any sort of device or muscle
obstacle (Abdulkader 2015). Besides, to control a framework a BCI utilizes energetically
created consumers’ cerebrum action through signs, which offers the ability to associate
or communicate with the nearby marketplace.
For the same various neuromarketing techniques which record the brain activity are
used. The various techniques EEG, fNIRS, fMRI, MEG, SST, PET, TMS (Krampe 2018)
are used for recording brain activity (Ohme 2009; Hakim 2019; Harris 2018). But from
all the techniques EEG has best temporal resolution as shown in Table 1.
The study based on BCI based neuroimaging techniques indicate that there are three
neuroimaging techniques – MEG, SST, EEG which have good scope for marketing
research but due to limitations of MEG and SST these are not used for the current
research. Because of the extensive advantages and varied features of EEG over SST and
MEG (Cherubino et al. 2019), EEG is being used for the current research.
The EEG is the BCI to perform dreary, ongoing assessment of brains’ associations
in low temporal resolution (Ramadan 2017; Ramadan et al. 2015). Thus, in the experi-
mental study, EEG was held onto as the info sign to get a BCI framework. BCIs might be
Consumer Emotional State Evaluation Using EEG 115
The previous studies on EEG based recognition systems for emotion state recog-
nition are presented in this section. Emotional states can be defined as presentation of
116 R. Gill and J. Singh
human behavioral state for recognition of pleasantness states which could help in making
decisions (Ramsøy 2012).
The research by (Hwang 2013; Lotte and Bougrain 2018) stated that there is need of
more than one classifier and classifier combinations to detect and define feature sets and
improve the performance. The authors (Chew et al. 2016) stated that there is a great effect
on buying decision due to aesthetics presentation. They used 3D EEG signals to record
frequency bands and achieved good accuracy over liking scale. The extensive study and
review by provided by authors (Lotte and Bougrain 2018; Teo 2018a, b) to study various
deep learning and machine learning algorithms users to study consumer preferences.
(Hakim 2019), provided in depth study of classifiers and prediction algorithms user for
understanding consumer preference states and state that SVM with approximate accuracy
of 60% is best classifier so far for preference prediction. As per the study, LDA, SVM
are most studied algorithms for classifiers. The authors studied the various preferences
using EEG based systems (Hakim 2019). The previous (Lin 2018; Alvino 2018; Yadava
2017; Teo 2018a, b; Boksem 2015) has done much work on EEG based emotional state
detection.
With the emergence of neural networks and deep learning, EEG based studies have
become popular for emotional state prediction. Deep neural network (DNN) is type of
artificial neural network with various layers along with input and output layers. The most
basic type is multi-layer perceptron (MLP). The author (Loke 2017) suggested use DNN
for object identification. The authors (Teo 2018a, b; Roy 2019) have explores various
deep learning frameworks and (Teo 2017; 2018a, b) proposed the methods for EEG based
preference classification with compared with various machine learning classifiers.
The research has done considerable use of EEG in emotional state prediction to
understand the consumer preferences.
1. Acquisition of Signal for the selected device: EEG-A DEAP dataset has been taken
and pre-processed to remove the artifacts.
EEG headset used in DEAP data set contain 32 channels. Table 2 provides the
mapping of 14+2 EEG Emotive headset used for the current research work. The
channels in bold are the mapped channels of EEG headset with DEAP dataset.
Table 2. Channel positioning according to 32 channel EEG headset used in DEAP dataset
Channel 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
No
Channel Fp1 AF3 F3 F7 FC5 FC1 C3 T7 CP5 CP1 P3 P7 PO3 O1 Oz Pz
Channel 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
No
Channel Fp2 AF4 Fz F4 F8 FC6 FC2 Cz C4 T8 CP6 CP2 P4 P8 PO4 O2
2. The pre-processing techniques that have been used are Independent Component
Analysis (IA). The data that has been pre-processed is fed into SVM, k-NN, RF and
DNN classifiers
3. Features are extracted and selected for a chosen device using Power spectral density
function. Features are selected where the most required optimum features have been
identified
4. Classification of features based on machine learning and DNN algorithms: KNN,
SVM, RF, DNN
5. Prediction of emotional states using classifiers by comparing the accuracy of each
classifier
The selection and extraction of features are basic techniques which are used to eval-
uate the performance of EEG recognition systems. The current study aims to detect the
emotional states-pleasant and unpleasant using classification algorithm through EEG
Emotive headset (Hwang 2013; Pham and Tran 2012). An off-line analysis was con-
ducted to evaluate the intelligence for the emotional detection and classification. The
DEAP data set was used to explore the performance and computation of deep learning
classification techniques. This might effectively replicate the emotional states of the
consumers for advertisement prediction. To carry out the experiment, the authors intend
to compare individuals’ recordings of the k-nearest neighbor (k-NN) and random forest
(RF), Support Vector Machine (SVM) classifiers and Deep Neural Network classifiers
for evaluating the emotional states. The Scikit-Learn toolbox was used to develop the
system learning suite, also used Python for EEG - artifact cleaning, filtering and pre-
processing, Python library - MNE software suite which is an open-source library used
to explore visualize and analyze cognitive and physiological signals, In addition Keras
library is used on the top of tensorflow for understanding and managing the cognitive
load.. This section discusses the methodology in conjunction with the experimentation
details of the proposed experimental work for detection of emotional states. It starts with
the outline of fact and, also the available prerecorded dataset of the emotional states.
118 R. Gill and J. Singh
Then the characteristic extraction is processed and eventually, the DNN classification
model is illustrated.
DEAP Dataset
DEAP dataset is used for the experimentation (Koelstra 2013). This dataset can be
divided into parts.
i. Calculation of valence, arousal and dominance emotional ratings for 120 music
videos of 1 min each.
ii. Calculation of participant ratings and recording of physiological and face video of 32
volunteers while watching 40 music videos mentioned in first parts. 22 participants
frontal face video was also recorded as shown in Fig. 3.
Data Pre-processing
The experimental trial was done on the already pre-processed EEG based recordings from
the DEAP database. The EEG signal recordings were down examined from 512 Hz to
128 Hz, utilizing a band pass frequency filter between 4.0 Hz to 45.0 Hz, and the EOG
artifacts were eliminated from the epochs by using the dimensionality reduction method-
independent component analysis (ICA) (Hadjidimitriou 2012). The ICA decompose the
extracted features into independent signal by selecting a subset, by eliminating noisy and
very high-dimensional data (Nezamfar 2016). The features which are useful features are
retained and outliers are removed during the experimentation. Additionally, it reduces
the experimentation cost of the consequent measures. Thus, only the mentioned channels
were kept (Fz, F3, F4, AF3, and AF4) (Aldayel et al. 2020). Figure 4 shows the emotional
state engagement in various regions of brain with channel with frequency band involved.
Consumer Emotional State Evaluation Using EEG 119
Fig. 4. EEG emotion detection with band detection for EEG Channel (Teo 2017)
Valence - In the current study, valence was chosen as the rate of measure emotional
states. The Likert scale values ranging from 0–9 was used to record the same. The value
of activation for EEG frontal asymmetry (E) is directly proportion to valence (V), E ∝ V
(Koelstra 2013). Also, DEAP dataset also reflects this association valence (V) and EEG
frequency bands (αβγθ) (Koelstra 2011), shown in Fig. 6. The increase from valence,
leads to increase in intensity value of frequency bands, which is in accordance with the
results in a comparable study (Al-Nafjan et al. 2017a, b). The liking rating from the
DEAP dataset is not used in the current experiment (Al-Nafjan et al. 2017a, b).
Channel
SelecƟon
PSD
Valence
CalculaƟon
NormalizaƟon
Calculation of valence (V) is done using below equations (Eqs. 1–4) (Al-Nafjan et al.
2017a, b):
4 DNN Classification
Deep Neural Networks are framework that contains layers of “neurons” combined with
each other. Each layer of neuron performs a different linear transformation to the input
information (Roy 2019; Aldayel et al. 2020). Then, every layer’s transformation under-
goes processing to the give an outcome through a nonlinear cost function. These cost
functions are minimized to obtain the optimal outcome. The DNN functions in a single
forward direction, by the input via the hidden ones (if accessible) into the output neurons
in the forward directions. The neuron output from previous layer acts as activation of
each neuron for the next layer.
For the current research, the DNN model using one input layer, three hidden layers,
one batch normalization layer and one output layer. The hyperparameters used for DNN
model training are learning rate calculated through Adam gradient, number of epochs,
and ReLU activation function and output in the form of Softmax activation function. The
trained DNN model was compared with accuracy results for classification algorithms -
SVM, RF, k-NN. The DNN classifier’s block structure is displayed in Fig. 7.
The first step was to normalize the extracted features. There are two commonly
used normalization techniques – min-max normalization and z-score. For the current
experiment, min-max normalization (Eq. 5) was used and were fed to DNN classifiers.
This is the most common way to normalize the data. The data is normalized in the range
of 0 and 1. The minimum (min) value is converted to 0 and maximum (max) value is
converted to 1, all other values (v) lies between decimals of 0 and 1.
v_normalized = (v − min )/max − min (5)
Adam gradient descent optimization strategy was used to train the DNN classifier.
It is one of the most optimal strategy which uses an iterative algorithm in order to
122 R. Gill and J. Singh
minimize a function to the local or global minima. For the current experiment, three
reduction functions namely cross entropy functions - binary and categorical, and hinge
cross function are used. The system was stopped when the machine started to over-fit
and was stopped at 0.0001. With the acceptable defaults and proper setup: the starting
experimentation learning rate was 0.001. The system consists of layers input layer, and 3
hidden layers 1700, 1200,700 respectively and an output layer. As per the experimental
requirements the sample size for input layer was 2125 samples with decreasing the
size to 75% after every filter operation in the hidden layers. The output measurements
pertain to the amount of goal emotional states. The network was tested over the test
data which comprised roughly 20% of DEAP data samples. Together with three hidden
layers, that comprises components between rectified linear unit (ReLu). The output is
DDN execution is obtained through soft-max activation function with a binary cross-
entropy loss function. Soft-max activation function normalizes the outputs from various
hidden layers.
Fig. 8. Accuracy prediction using cross validation method on various classifiers on proposed
dataset
Consumer Emotional State Evaluation Using EEG 123
benchmark data set and then perform evaluation. For the current study three methods
namely - holdout, k-fold cross validation, and leave-one-out cross validation (LOOCV)
were used.
Hold out – Hold out (test/train splitting) method performs the training ay 50% of
the data set and 50% for the test dataset. The results of DNN and k-NN are better than
random forest (RF), support vector machine (SVM) as shown in Table 3:
LOOCV - This method performs training on the whole dataset leaving aside only one
data-point and then iterates over each data point. This is very time-consuming process.
The results of random forest (RF) outperformed other classifiers as shown in Table 4.
K-fold Cross Validation - The data set is spilt into number of subsets known as folds.
This model uses k − 1 folds for training and 1 for testing and then iterated each time over
every fold. The results of random forest and k-nearest neighbor is better than classifiers
as shown in Table 5.
Since the very best results were achieved using the holdout validation from all the
validation techniques, this technique was chosen to apply the loss function -hyper param-
eters for DNN framework. Figure 8 presents the summary of all the cross-validation
techniques.
Figure 9 presents the results for accuracy calculation for SVM, RF, KNN, and DNN
using three different loss functions: the cross-entropy function - binary and categorical,
124 R. Gill and J. Singh
and hinge function. Categorical cross entropy loss is combination of softmax activation
function and Cross - Entropy loss and is used for multi-class classification. Binary
cross entropy is a combination Sigmoid activation and Cross-Entropy loss and is used
for multi-label classification. The hinge loss is used for max-margin classification and
shows best results with SVM classifiers.
Fig. 9. Classifier accuracy for loss functions for emotional state classification using hold-out
validation
The results demonstrate that the k-NN classifier highest accuracy of 88% when the
cross validation of k = 1. Though, accuracy of 92% was achieved for RF, the DNN also
reached the accuracy result of 91% which is the highest with hinge cross-entropy loss
function as compared to the other studied algorithms.
Further the research work compared the work done in the using DNN model on EEG
based emotion recognition. Table 6 provides summary of the results when compared with
existing researches. Two studies were used which used PSD feature extraction on DEAP
dataset and worked on detecting arousal. The comparative results show that proposed
method gave comparative results when applied on DNN model.
6 Conclusion
In this paper, a DNN based learning model has been proposed to detect consumer emo-
tional states from EEG signals. The complete work is carried proposed dataset and DEAP
dataset. Initially from EEG two types of signals are extracted i.e. PSD and valence. There
are around 2125 different feature in each EEG activity. In this paper various evaluation
parameters of accuracy are used. The parameters were used to test the classifier perfor-
mance and validation using LOOCV, holdout and K-fold techniques. Total four different
classifiers were used (DNN, SVM, KNN, RF), our proposed method achieves the accu-
racy of around 70%, 93%, 91%, 84% and 87% in all the validation parameters. Our
proposed method had shown highest accuracy in contrast with all other methods. The
research work results were compared with existing researches. The major limitations of
the research if limited to only two emotional states and with evaluation using a smaller
number of parameters. In future, DNN method can be further explored on certain param-
eters to improve the achieved accuracy for emotional state evaluation. The exploration
of enhanced DNN model is proposed as future work for the valence arousal model.
The authors recommend applying DNN model on multiple modalities used in order to
understand consumer emotional states.
References
Abdulkader, S.N.: Brain computer interfacing: applications and challenges. Egypt. Inform. J.
16(2), 213–230 (2015)
Agarwal, S.: Neuromarketing and consumer neuroscience: current understanding and the way
forward. Decision 457–462 (2015)
Aldayel, M., Ykhlef, M., Al-Nafjan, A.: Deep learning for EEG-based preference classification
in neuromarketing. Appl. Sci. 10(4), 1525–1548 (2020)
Al-Nafjan, A., Hosny, M., Al-Ohali, Y., Al-Wabil, A.: Review and classification of emotion recog-
nition based on EEG brain-computer interface system research: a systematic review. Appl. Sci.
7(12), 1239 (2017a)
Al-Nafjan, A., Hosny, M., Al-Wabil, A., Al-Ohali, Y.: Classification of human emotions from
electroencephalogram (EEG) signal using deep neural network. Int. J. Adv. Comput. Sci. Appl.
8(9), 419–425 (2017b)
Alvino, L.C.: Towards a better understanding of consumer behavior: marginal utility as a parameter
in neuromarketing research. Int. J. Mark. Stud. 10(1), 90–106 (2018)
Ameera, A., Saidatul, A., Ibrahim, Z.: Analysis of EEG spectrum bands using power spectral
density for pleasure and displeasure state. In: IOP Conference Series: Materials Science and
Engineering, vol. 557, no. 1, pp. 012030–01203. IOP Publishing (2019)
Barros, R.Q., et al.: Analysis of product use by means of eye tracking and EEG: a study of
neuroergonomics. In: Marcus, A. (ed.) DUXU 2016. LNCS, vol. 9747, pp. 539–548. Springer,
Cham (2016). [Link]
Boksem, M.A.: Brain responses to movie trailers predict individual preferences for movies and
their population-wide commercial success. J. Mark. Res. 52(4), 482–492 (2015)
Chew, L., Teo, J., Mountstephens, J.: Aesthetic preference recognition of 3D shapes using EEG.
Cogn. Neurodyn. 10(2), 165–173 (2016)
Cherubino, P., et al.: Consumer behaviour through the eyes of neurophysiological measures:
state-of-the-art and future trends. Comput. Intell. Neurosci. 1–41 (2019)
126 R. Gill and J. Singh
Teo, J.C.: Classification of affective states via EEG and deep learning. Int. J. Adv. Comput. Sci.
Appl. 9(5), 132–142 (2018a)
Teo, J.H.: Deep learning for EEG-based preference classification. In: AIP Conference Proceedings,
vol. 1891, p. 020141. AIP Publishing LLC (2017)
Teo, J.H.: Preference classification using electroencephalography (EEG) and deep learning. J.
Telecommun. Electron. Comput. Eng. (JTEC), 10(1–11), 87–91 (2018b)
Qin, X., Zheng, Y., Chen, B.: Extract EEG features by combining power spectral density and
correntropy spectral density. In: 2019 Chinese Automation Congress (CAC), pp. 2455–2459.
IEEE (2019)
Yadava, M.K.: Analysis of EEG signals and its application to neuromarketing. Multimed. Tools
Appl. 76(18), 19087–19111 (2017)
Covid Prediction from Chest X-Rays Using
Transfer Learning
Abstract. The novel corona virus is a rapidly spreading viral infection that has
became a pandemic causing destructive effects on public health and global econ-
omy. So, early detection and Covid-19 patient early quarantine is having the sig-
nificant impact on curtailing it’s transmission rate. But it has become a major chal-
lenge due to critical shortage of test kits. A new promising method that overcomes
this challenge by predicting Covid-19 from patient X-rays using transfer learning,
a deep learning technique is proposed in this paper. For this we used a dataset
consisting of chest x-rays of Covid-19 infected and normal people. we used VGG,
GoogleNet-Inception v1, ResNet, CheXNet models of transfer learning which is
a deep learning technique for its benefit of decreasing the training time for a neu-
ral network model. Using these we show accuracies of 99.49%, 99%, 98.63%,
99.93% respectively in Covid-19 prediction from x-ray of suspected patient.
1 Introduction
In December 2019, Covid-19 caused by most recently discovered corona virus was first
reported in Wuhan, China as a special case of pneumonia and later named as Covid-19
and the virus as SARS-CoV-2. It infects respiratory system at mild level common cold
to most impacting MERS (Middle East Respiratory Syndrome) as well as SARS (Severe
Acute Respiratory Syndrome). The clinical features of the disease include fewer, sore
throat, headache, cough, mild respiratory symptoms even leading to pneumonia. The
better accurate test techniques that are being currently used for Covid diagnosis are
Polymerase Chain Reaction and Reverse Transcription PCR [1] tests and are laboratory
methods that interact with other RNA and DNA to determine volume of specific RNA
using fluorescence. This is done by collecting samples of nasal secretions. Due to lim-
ited availability of these test kits, early detection can not be done which in turn leads to
increase in the spread of disease. Covid became a pandemic effecting globally and right
now there is no vaccine available to cure this. In this epidemic situation Artificial Intel-
ligence techniques are becoming vital. Some of the applications in this Covid pandemic
scenario that show promising use of AI are AI techniques embedded in cameras to iden-
tify infected patients with their recent travel history using facial recognition techniques,
using robot services to deliver food items and medicines for Covid infected patients, and
using drones to disinfect the surfaces in public places etc. [2]. Lot of research is being
carried out in using AI for drug discovery for Covid cure and vaccine for Covid preven-
tion by learning about the RNA of virus. Machine learning techniques are being used in
medical disease diagnosis for reducing manual intervention and automatic diagnosis and
are becoming supportive tool for clinicians. Deep learning techniques are successfully
applied in several issues like carcinoma detection, carcinoma classification, and respira-
tory disorder detection from chest x-ray pictures. Day by day the Covid19 is growing at
an exponential rate so, the usage of deep learning techniques for Covid prediction may
help to increase testing rate and thereby reducing the transmission rate. Covid effects
line up of respiratory track, shows preliminary symptoms like pneumonia and as doctors
frequently use x-rays to test for pneumonia etc., identification of Covid using X-ray can
play significant role in corona tests. So, to increase the Covid testing rate we can use
X-ray test as preliminary test and if AI prediction test results in positive then patient can
undergo medical test. In this paper, transfer learning, a machine learning technique is
used that takes an approach of reserving knowledge gained in solving one problem and
apply that knowledge for solving the other similar problems. A dataset consisting of x-
rays of normal and Covid-19 patients is used for transfer learning. A deep neural network
is build to be implemented with VGG, inception v1, ResNet and CheXNet models. We
have chosen these models as they are CNNs and are trained with large ImageNet datasets.
These are widely used in Image classification and disease prediction also. We selected
in particular CheXNet as it was trained on Chest X-rays. Section 2 briefs some of the
recent works done in Covid prediction using AI and Deep Learning (DL) techniques.
Section 3 presents our methodology used for Covid prediction using Transfer learning.
Section 4 discusses the results obtained in applying four VGG, GoogleNet-Inception v1,
ResNet, CheXNet models. In Sect. 5 the use of Transfer Leaning in Covid prediction is
concluded.
2 Related Work
Many researches are working rigorously on possibilities of early Covid-19 detection
since Feb 2019. Both laboratory clinical testing methods and computer aided testing
using Artificial Intelligence, machine learning and deep learning (DL) approaches are
being developed. As this disease does not show symptoms immediately, early identifi-
cation of infected person has become difficult. Artificial Intelligence can be aided for
easy and rapid X-ray diagnosis using deep learning. The ideology of using x-ray images
in prediction of covid19 came from the deep neural network approaches which were
used in pneumonia detection using chest X-rays [3]. A deep learning based automated
diagnosis system for X-ray mammograms was proposed by Al-Antari et al. [4]. They
used YOLO, a regional deep learning approach which resulted in detection accuracy of
98.96%.
Bar et al., have detected chest pathology in chest radio-graphs using deep learning
models [5]. The feasibility of detecting pathology based on non-medical learning using
DL approaches is observed. Later many works for detection of lung abnormalities, tuber-
culosis patterns, vessel extraction using x-rays are developed [6, 7]. Covid-19 diagnosis
using deep learning In recent days extensive work is being carried out in using deep
130 D. Haritha and M. K. Pranathi
learning and AI techniques in the Covid 19 prediction. More accurate and faster Covid-
19 detection can be achieved by AI and DL using Chest X-rays with good accuracies.
There were numerous previous works done in the application of transfer learning models
based on Convolutional Neural Networks for different disease predictions. Apostolopou-
los et al., have taken X-ray image dataset from patients with common microorganism
respiratory disorder, Covid-19 positive, and normal diseases from public repositories
for the automated detection of the Coronavirus sickness [8]. They used transfer learning
models that uses CNN for detecting the varied abnormalities in little medical image
datasets yielding outstanding results approximately 96%. Their promising results show
that Deep Learning techniques from X-ray images extract important bio markers associ-
ated with the Covid-19 sickness. Three CNN based models ResNet50, InceptionV3 and
Inception-ResNetV2 were applied for the detection of coronavirus using chest X-ray
radiographs by Narin, Ceren, Pamuk [9]. They obtained 98%, 97% and 87% accuracies
respectively. Salman, Fatima M., et al., used Convolutional Neural Network for Covid19
detection [10, 12]. As an alternate to build a model from scratch, Transfer Learning helps
in reducing the computational overhead and is proved to be the most promising technique
in many deep learning applications. In this paper we proposed covid-19 prediction from
x-rays using transfer learning models with better accuracy.
3 Methodology
Transfer Learning is one of the advanced deep learning approaches in which a model
trained on similar problem is used as a starting point for the other similar problems. It
decreases training time in neural network for optimization of tuning hyper parameters.
One or more layers from the trained model are used in new model and some are freezed
and fine tuning is applied to other output layers which are to be customized. Figure 2
shows the working of Transfer Learning technique. The popular methods of this app-
roach are - VGG (VGG 16 or 19), GoogleNet (Inception v1 or v3), Residual Network
(ResNet50), CheXNet. Keras provides access to a number of such pretrained models. In
transfer learning initially Convolution Neural Networks (CNN) are trained on datasets
and then they are employed to process new set of images and extract the features. In
Covid Prediction from Chest X-Rays Using Transfer Learning 131
medical related tasks we use transfer learning to exploit CNN with these models and eval-
uate algorithms for image classification and object detection. In this section we discuss
the architecture of four models VGG, GoogleNet, ResNet and CheXNet and explore
their applicability using pretrained weights as part of transfer learning for Covid-19
prediction.
CNN and almost 12× less parameters. It uses variant strategies like 1 × 1 convolu-
tion and average pooling that enables it to create a deeper design. Fig. 4 depicts the
architecture of GoogleNet model.
ResNet: ResNet abbreviation for Residual Neural Network proposed in 2015 as part of
ImageNet challenge for computer vision task [15]. It was the winner of that challenge
and is widely used for Computer Vision projects. Using Transfer learning concept we
can train its 150 plus layers successfully. The last two or three layers that contain non
linearity can be skipped. This helps to avoid gradient vanishing problem. It’s architecture
is shown in Fig. 5.
CheXNet: CheXNet consists of 121 CNN layers. It produces heatmap comprising local-
ized areas which can indicate the areas effected by the disease in the image along with
the prediction probability [16]. This was developed to predict the pneumonia from chest
x-rays. This model used chest X-ray14 dataset containing 14 different pathological X-
ray images. It’s architecture is shown below in Fig. 6. The test set labels were annotated
by four reputed radiologists and was used for evaluating the performance of the model
with reference to annotations given by radiologists.
3.3 Implementation
In our paper, we performed of transfer learning models for Covid-19 prediction from
x-rays. The deep architectures helped in predicting the results with good accuracies for
VGG, GoogleNet, ResNet and CheXNet models. The Fig. 7 describes our proposed
implementation model.
Algorithm
Step1: Load the dataset that contains 1824 images with 2 classes for binary classification.
Step 2: Resize the images in our dataset to 224 × 224, as the Transfer Learning CNN
models takes input images of size 224 × 224
Step 3: Select pre trained layers from VGG/GoogleNet/ResNet/CheXNet and modify
the output layers. The no of layers selected and modifications carried out are described
below for each model individually.
Step 4: Fine tune the hyper parameters of each model individually and tuned parameters
are indicated in Table 1
Step 5: Evaluate the performance of each model using the metrics explained in the next
subsection.
Step 6: Pass a new X-ray image to detect whether the patient is having Covid-19 or not.
The VGG16 model contains 16 weight layers that include convolutional, pooling,
fully connected and final dense layer. The final layer contains 1000 predictable output
classes out of which we considered 2 classes for our model. This is done by freez-
ing convolutional layers and 2 new fully connected layers are constructed. GoogleNet
contains 22 layers with average pooling, all are trained and in output layer 2 softmax
layers are taken for prediction. ResNet model has 50 layers with output layer capable
of classifying 1000 objects. We freezed the final dense layer and added 2 layers for
predicting our 2 classes covid-19 and non covid. Finally, for CheXNet we considered
DenseNet121 network with pre trained weights and freezed the CONV weights. Then,
new fully connected sigmoid layers are constructed and appended at top of DenseNet.
The hyperparameters are tuned in order to obtain a highly performing model. We tuned
around 5 different parameters which comprise of adjusting the learning rate, selection of
optimizer, loss functions, changing number of epochs, batch size, test size, rotation range
etc. Learning rate is given as parameter to the optimizer function. Working on different
optimizer and loss function did not affected the working of the model much so we
used Adam as optimizer function and binary cross entropy as loss function throughout
the model. Batch size is the number of samples that will be propagated through the
network and epochs is the number of times the model is implemented on training data.
Dropout is a regularisation technique where some random neurons are ignored during
training. Increasing dropout generally increases accuracy. Table 1 shows the values of
hyperparameters that we used for different transfer learning models.
In a model the values like accuracy, precision, recall, and F1 score are considered as
performance metrics since they are used to evaluate the model performance. Accuracy
is the ratio of correctly classified to the total number of predictions. Precision is the ratio
of true positives to the predicted positives.
Recall is the ratio of true positives predicted out of total positives.
F1-score It is the weighted average of precision and recall.
Precision and recall are useful when the dataset is imbalanced i.e. when there is large
difference between the number of X rays with Covid and without Covid.
4.4 Result
It ends up with a good accuracy of 99.49% and the values for sensitivity, specificity as
1.0000 and 0.9890 respectively using VGG 16 model, accuracy of 99% with values for
sensitivity, specificity as 1.0000 and 0.9834 respectively using GoogleNet-inception v1
model, accuracy of 98.63% with values for sensitivity, specificity as 1.0000 and 0.9725
respectively using ResNet 50 model and 99.93% accuracy with values for sensitivity,
specificity as 1.000 and 1.000 respectively using CheXNet model for Covid and normal
classes in Covid prediction. The performance measures of all these models is shown
below in Table 2.
136 D. Haritha and M. K. Pranathi
Fig. 9. Graph showing variations in different measures for GoogleNet inceptionV1 model.
Covid Prediction from Chest X-Rays Using Transfer Learning 137
Owing to the well performance of these proposed models, they can be incorporated
in real-time testing which in turn increases the testing rate. The graphs in below figures,
Fig. 8, 9, 10, 11 shows variation in different measures of accuracy and loss for VGG,
GoogleNet, ResNet and CheXNet models.
Fig. 10. Graph showing variations in different measures for ResNet50 model.
Fig. 11. Graph showing variations in different measures for CheXNet model.
In this paper, we used transfer learning approach to train CNN using x-ray images to
predict the novel Covid-19 disease. This idea can be implemented in real-time scenarios
of Covid-19 detection with further developments. This can also be implemented using
138 D. Haritha and M. K. Pranathi
other transfer learning methods. Our work can be further extended to train with large
datasets so that still better accuracy can be achieved even for the cases of unseen data.
This can also be further enhanced to predict the possibility of survival of the covid
affected patients. However the work carried in this paper can offer potential insight and
will contribute towards further research regarding COVID-19 predictions.
References
1. World Health Organization: Laboratory testing for coronavirus disease 2019 (Covid-19) in
suspected human cases: interim guidance, 2 March 2020. World Health Organization, World
Health Organization (2020)
2. Ruiz Estrada, M.A.: The uses of drones in case of massive Epidemics contagious diseases
relief humanitarian aid: Wuhan-Covid-19 crisis. SSRN Electron. J. (2020). [Link]
10.2139/ssrn.3546547
3. Wu, H., et al.: Predict pneumonia with chest X-ray images based on convolutional deep neural
learning networks. J. Intell. Fuzzy Syst. Preprint (2020)
4. Al-Antari, M.A., et al.: A fully integrated computer-aided diagnosis system for digital X-ray
mammograms via deep learning detection, segmentation, and classification. Int. J. Med. Inf.
117, 44–54 (2018)
5. Bar, Y., et al.: Chest pathology detection using deep learning with non-medical training. In:
2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI). IEEE (2015)
6. Bhandary, A., et al.: Deep-learning framework to detect lung abnormality-a study with chest
X-ray and lung CT scan images. Pattern Recogn. Lett. 129, 271–278 (2020)
7. Nasr-Esfahani, E., et al.: Vessel extraction in X-ray angiograms using deep learning. In: 2016
38th Annual International Conference of the IEEE Engineering in Medicine and Biology
Society (EMBC). IEEE (2016)
8. Apostolopoulos, I.D., Mpesiana, T.A.: Covid-19: automatic detection from x-ray im-ages
utilizing transfer learning with convolutional neural networks. Phys. Eng. Sci. Med. 6, 1
(2020)
9. Narin, A., Ceren, K., Pamuk, Z.: Automatic detection of coronavirus disease (Covid-19)
using x-ray images and deep convolutional neural networks. arXiv preprint arXiv:2003.10849
(2020)
10. Salman, F.M., et al.: Covid-19 detection using artificial intelligence (2020)
11. [Link]
844e-4e8246751706
12. Ozturk, T., et al.: Automated detection of Covid-19 cases using deep neural networks with
X-ray images. Comput. Biol. Med. 121, 103792 (2020)
13. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recog-
nition. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego,
7–9 May 2015 (2015)
14. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of IEEE Conference on
Computer Vision and Pattern Recognition (2015)
15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
16. Rajpurkar, P., et al.: CheXNet: radiologist-level pneumonia detection on chest X-rays
with deep learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern
Recognition (2017)
Machine Learning Based Prediction of H1N1
and Seasonal Flu Vaccination
Abstract. The H1N1 Flu that came into existence in 2009 had a great impact on
the lives of people around the world. It was a life-threatening season to hundreds
of people mainly below 65 years old which eventually made the World Health
Organization (WHO) to declare it as the greatest pandemic in more than 40 years.
To find out the vaccination status National 2009 H1N1 Flu Survey (NHFS) was
conducted in U.S. In this paper, the data from the above survey was used to
develop a model that predicts how likely people got H1N1 and seasonal flu vaccine.
For this purpose, various Machine Learning (ML) and Artificial Neural Network
(ANN) models are used to determine the probability of person receiving H1N1
and Seasonal Flu vaccine.
1 Introduction
H1N1 or swine flu virus first emerged in 2009, spring season in Mexico and then in
the United States and quickly spread across the globe. A distinctive combination or
integration of influenza genes was discovered in this novel H1N1 virus which was not
identified prior in humans or animals [1]. This contagious novel virus had a very powerful
impact on the whole world and spread across the world like a forest fire and as a result on
June 11 2009 the World Health Organization (WHO) declared that a pandemic of 2009
H1N1 flu or swine flu had begun [2]. The effects of this novel H1N1 virus were more
severe on people below the age of 65. There was significantly high pediatric mortality,
and higher rate of hospitalizations for young adults and children [3].
According to Centres for Disease Control and Prevention (CDC) the first and fore-
most step in protecting oneself of this virus is a yearly flu vaccination [4]. There are
various factors such as age, health perceptions of an individual and the similarities or
“match” in the vaccine’s virus structure and the virus structure which is affecting the
community which affects the ability of the vaccination to provide protection to the per-
son who is vaccinated [5]. Several activities were performed using various social media
platforms and broadcasting networks such as Twitter was used to track the levels of dis-
ease activity and the concern of the public towards this pandemic [6]. The social media
played an important role to assess the sentiments towards vaccination and the implica-
tions for disease dynamics and control [7] etc. The popular among them is the phone
survey conducted by the U.S. where they asked respondents whether they had received
the H1N1 and seasonal flu vaccines, in conjunction with questions about themselves.
In the present study, we used the data obtained from the National 2009 H1N1 Flu
Survey (NHFS) to predict how likely people got H1N1 and seasonal flu vaccines. The
NHFS data is used for estimating the probability of a person receiving H1N1 and Sea-
sonal Flu vaccine using various Machine Learning (ML) and Artificial Neural Network
(ANN) models. The performance of various ML and ANN techniques are also dis-
cussed. In Sect. 2 literature review is presented. Section 3 discusses the data resource
i.e. NHFS survey and Sect. 4 presents the methodology used. Section 5 discusses the
results obtained and Sects. 6 and 7 presents conclusion and future research scope.
2 Literature Review
Mabrouk et al. [8] “A chaotic study on pandemic and classical (H1N1) using EIIP
sequence indicators”, states that the methods such as moment invariants, correlation
dimension, and largest Lyapunov exponent which were used to detect H1N1 indicated
the differences between the pandemic and classical influenza virus. Chinh et al. [9] “A
possible mutation that enables the H1N1 influenza A virus to escape antibody recogni-
tion” explained the methods such as phylogenetic analysis of pandemic strains, molec-
ular docking for the predicted epitopes. Huang et al. [10], “Aptamer-modified CNTFET
(Carbon NanoTube Field Effect Transistors) biosensor for detecting H1N1 virus in a
droplet,” suggested the combination immersed in nanotube which gives CNTFET and
thus it acts as a biosensor which is used in the detection of H1N1 virus by droplet.
M. S. Ünlü [11], “Optical interference for multiplexed, label-free, and dynamic
biosensing: Protein, DNA and single virus detection,” described interferometric
reflectance imaging sensor which can be used for label-free, high throughput, high
sensitivity and dynamic detection and gives detection of H1N1 virus and nanoparticles
and Kamikawa et al. [12] “Pandemic influenza detection by electrically active magnetic
nanoparticles and surface plasmon resonance” indicated that the detection consists of
several processes such as nanoparticle synthesis, glycans, polyaniline, and sensor modifi-
cation by means to find H1N1 by nanoparticle and resonance. Jerald et al. [13], “Influenza
virus vaccine efficacy based on conserved sequence alignment,” spoke about the vital
strain sequence used from National Center for Biotechnology Information (NCBI) and
sequence alignment which helps vaccine efficiency for influenza.
Chrysostomou, et al. [14] “Signal-processing-based bioinformatics approach for the
identification of influenza A virus subtypes in Neuraminidase genes” discussed the
methods used for identification of influenza virus such as neuraminidase genes, sig-
nal processing, F-score, Support Vector Machines (SVM) and Wiriyachaiporn et al.
[15] “Rapid influenza an antigen detection using carbon nano string as the label for
lateral flow immune chromatographic assay,” presented preparation of allantoic fluid
infected with influenza A virus conjugation of Central Nervous System (CNS) to anti-
body and about the evaluation of CBNS-MAb using Lateral Flow Immunoassay (LFIA)
Machine Learning Based Prediction of H1N1 and Seasonal Flu Vaccination 141
and Ma et al. [16], “An integrated passive microfluidic device for rapid detection of
influenza a (H1N1) virus by reverse transcription loop-mediated isothermal amplifica-
tion (RT-LAMP)” demonstrated the loading of virus and magnetic beads and discussed
about virus capture, collection of virus-magnetic beads complexes, removal of excessive
wastes, virus particle lysis, RT-LAMP reaction and the coloration steps to detect H1N1
virus.
Nieto-Chaupis, Huber. [17]. “Face To Face with Next Flu Pandemic with a Wiener-
Series-Based Machine Learning: Fast Decisions to Tackle Rapid Spread” explained
about the Wiener model used in order to increase optimization, efficiency and perfor-
mance to find the spread of seasonal flu and Stalder et al. [18] “Tracking the flu pandemic
by monitoring the social web” related the retrieving data from Twitter and official health
reports provides inexpensive and timely information about the epidemic and Motoyama
et al. [19] “Predicting Flu Trends using Twitter Data” demonstrated the use of SNEFT
model and twitter crawler methods for predicting the flu using twitter data.
Wong et al. [20] “Diagnosis of Response Behavioural Patterns Towards the Risk
of Pandemic Flu Influenza A (H1N1) of Urban Community Based on Rasch Measure-
ment Model” presented the source of data and data analysis methodology used for the
response behavioral patterns towards H1N1 and Bao et al. [21] “Influenza-A Circulation
in Vietnam through data analysis of Hemagglutinin entries” provided NCBI influenza
virus resource datasets (2001–2012) which is used for the analysis of influenza virus and
Hu et al. [22], “Computational Study of Interdependence Between Hemagglutinin and
Neuraminidase of Pandemic 2009 H1N1” explained sequence data and informational
spectrum model.
3 Data Resources
Data is one of the most important and vital aspect of any research study. The National
Flue Survey (NFS) is being conducted since 2010–11 influenza season [23]. The data
for our study is obtained from the National 2009 H1N1 Flu Survey (NHFS) which
was carried out for Centres for Disease Control and Prevention (CDC). The main aim
of the survey was to monitor and evaluate H1N1 flu vaccination efforts among adults
and children. The survey was conducted through telephones, twitter and with the help
of various other electronic media in all the 50 states. The survey consists of national
random digit dialed telephone survey based on rolling weekly sample of landline and
cellular telephone contacted to identify residential households. Various questions about
flu related behaviors, opinions about flu vaccine’s safety and effectiveness, medical
history like recent respiratory illness and pneumococcal vaccination status were asked
apart from the major question about H1N1 and seasonal flu vaccination status. The
NHFS data was collected during Oct., 2009 to May, 2010. This data was obtained to
get a fair idea about the knowledge of people on the effectiveness and safety of flu
vaccines and to learn why some people refrained from getting vaccinated against the
H1N1 flu and seasonal flu. Huge amount of data was gathered through this survey which
is being commonly used for analysis and research purposes and the data also measures
the number of children and adults nationwide who have received vaccinations.
142 S. Inampudi et al.
4 Methodology
A methodology is proposed to determine the probability that a person will receive H1N1
and seasonal Flu vaccination based on many parameters. The data obtained from the
National 2009 H1N1 Flu Survey (NHFS) contains 3 CSV files namely the training set
features, the training set labels, and the test set features. The data has been obtained from
over 53000 people from which around 26000 observations have been considered for the
training set and the rest have been considered for the testing set.
We have considered various methodologies and compared different Machine Learn-
ing and Artificial Neural Network models to predict the probability. The Machine Learn-
ing algorithms such as Multiple Linear regression, Support Vector Regression, Ran-
dom Forest Regression and Logistic Regression were used. The system architecture of
Machine Learning model is presented in Fig. 1.
Artificial Neural Network (ANN) with different optimizers such as Adam, RMSprop,
SGD were used to predict the probability of the test set features. The system architecture
of ANN is presented in Fig. 2.
The training set features and training set labels have been split into training set (80%)
and testing set (20%) using train_test_split from sklearn.model_selection. This library
splits the dataset into training and testing sets.
Hyperparameter tuning is done to find the most optimal parameter for the model on which
the model gives the best results. We have used various Hyperparameter tuning methods
such as GridseacrchCV, RandomSearchCV for our machine learning models to obtain
better results. K fold cross Validation method has been used to tune hyperparameters for
the Artificial Neural Network.
Table 1. Results for H1N1 flu and Seasonal flu vaccination prediction
Fig. 3. ROC AUC Curve using Support Vector Machine: RBF Kernel for (a) h1n1 vaccine and
(b) seasonal flu vaccine
Fig. 4. ROC AUC Curve using Random Forest Regressor for (a) h1n1 vaccine and (b) seasonal
flu vaccine
146 S. Inampudi et al.
Fig. 5. ROC AUC Curve using Logistic Regression for (a) h1n1 vaccine and (b) seasonal flu
vaccine
Fig. 6. ROC AUC Curve using Artificial Neural Network for (a) h1n1 vaccine and (b) seasonal
flu vaccine
random forest regression are training the model with ‘10’ n_estimators, and the optimal
parameters for logistic regression is C:5. All these results are presented in tabulated
form in Table 2 and Table 3. It is observed that the results of Seasonal flu vaccination
prediction have not been upto the mark using hyperparameter tuning, they were better
predicted using the default models.
Machine Learning Based Prediction of H1N1 and Seasonal Flu Vaccination 147
Table 2. Results with Hyperparameter tuning (GridSearchCV) for H1N1 flu vaccination
prediction
Table 3. Results with Hyperparameter tuning (RandomSearchCV) for H1N1 flu vaccination
prediction
Kfold method is used to fine tune hyperparameters in the Artificial Neural Network
method. The obtained results are more or less equal to the default method but a marginal
increase in performance is noted which can be clearly seen in Table 4. The most optimal
parameters obtained for ANN with kfold method are 1st hidden layer with selu as
activation function and having 60 units, the 2nd hidden layer with selu as activation
function and having 3 units, and the output layer with sigmoid as activation function and
having 2 units. All the results are presented in Table 4.
Table 4. Results with Hyperparameter tuning (kfold method) for H1N1 flu and Seasonal
vaccination prediction
6 Conclusion
In this paper, prediction of H1N1 and seasonal flu vaccination are carried out using the
data source given by the National 2009 H1N1 flu survey (NHFS) for Center of Disease
Control and Prevention (CDC). Various ML and ANN models are used for predition of
H1N1 and Seasonal Flu vaccination. The model studies are improved using several tech-
niques such as taking care of missing data, encoding categorical data, hyperparameter
tuning and splitting of data set for training and testing purposes. The results obtained
from various models are compared and evaluated. The results indicated that prediction of
H1N1 vaccination is done best by the help of SVM model with RBF kernel with the help
of hyperparameter tuning using GridSearchCV which yielded an accuracy of 83.97%
and seasonal flu vaccination prediction is done best with Artificial Neural Network which
yielded an accuracy of 86.10%.
Acknowledgement. The work presented in this paper is carried out as part of Internship project at
Bennett University, Noida, India. Success of our Internship Project involving such high technical
proficiency requires patience and massive support of guides. We take this opportunity to express
our gratitude to those who have been instrumental in the successful completion of this work. Big
thanks to Dr. Madhushi Verma for all the encouragement, timely details and guidelines given to
our team. We would also like to thank Dr. Deepak Garg, HOD of Computer Science Engineering
Department and Dr. Sudhir Chandra, Dean, School of Engineering & Applied Sciences, Bennett
University for giving us the opportunity and the environment to learn and grow.
References
1. CDC. [Link] Accessed 21
June 2020
2. CDC. [Link] Accessed 22 May 2020
3. CDC. [Link] Accessed 22 May
2020
4. CDC. [Link] Accessed 22 May 2020
Machine Learning Based Prediction of H1N1 and Seasonal Flu Vaccination 149
22. Hu, W.: Molecular features of highly pathogenic Avian and Human H5N1 Influenza a viruses
in Asia. Comput. Mol. Biosci. 2(2), 45–59 (2012)
23. Smith, P.J., Wood, D., Darden, P.M.: Highlights of historical events leading to national surveil-
lance of vaccination coverage in the United States. Public Health Rep. 126(Suppl 2), 3–12
(2011)
24. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12,
2825–2830 (2012)
25. Buitinck, L., et al.: API design for machine learning software: experiences from the scikit-learn
project (2013)
26. Dubosson, F., Bromuri, S., Schumacher, M.: A python framework for exhaustive machine
learning algorithms and features evaluations. In: Proceedings of IEEE 30th International
Conference on Advanced Information Networking and Applications (AINA), Crans-Montana,
pp. 987–993 (2016)
27. Virtanen, P., Gommers, R., Oliphant, T.E., et al.: SciPy 1.0: fundamental algorithms for
scientific computing in Python. Nat Methods 17, 261–272 (2020)
A Model for Heart Disease Prediction
Using Feature Selection with Deep
Learning
1 Introduction
In the research field, heart disease has created a lot of serious concerns, and
the significant challenge is accurate detection or prediction at an early stage to
minimize the risk of death. According to World Health Organization (WHO) [1],
the medical professionals have predicted only 67% of heart diseases correctly and
hence there exists a vast research scope in the area of heart disease prediction.
A lot of technicalities and parameters are involved in predicting the diseases
c Springer Nature Singapore Pte Ltd. 2021
D. Garg et al. (Eds.): IACC 2020, CCIS 1367, pp. 151–168, 2021.
[Link]
152 V. Baviskar et al.
accurately. Various machine learning, deep learning algorithms and several opti-
mization techniques have been used to predict the heart-disease risk. All these
techniques mainly focus on the higher accuracy which shows the importance of
correct prediction of heart disease. It would be helpful for the doctors to predict
the heart disease at an early stage and save millions of life from death [2]. For
temporal sequenced data, the recurrent neural network (RNN) models are best
suited and for sequenced features, several variants have been chosen. In vari-
ous sequence based tasks like language modelling, handwriting recognition, and
for other such as tasks, long short term memory (LSTM) has been used, which
shows an impressive performance [3,4]. For better performance, evolutionary
algorithms (EAs) are used for model optimization. The evolutionary algorithm
related to self-adaptability based on population is very useful in case of feature
selection and extraction. The EAs used in the recent year include ant colony
optimization (ACO), particle swarm optimization (PSO) and genetic algorithm
(GA). The GA is considered as a stochastic method for optimization and global
search, which is very helpful in handling the medical data. The possible solu-
tions are obtained from set of individuals using GA. GA which are generally
used to create solutions with a better quality for global search and optimiza-
tion are based on the mutation, crossover and selection operators. The PSO-a
meta heuristic algorithm is considered in this study due to its simplicity and
ease implementation. It uses only few parameters and required few numbers of
parameters tuning. The PSO exhibits the information sharing mechanism and
population-based methods, and hence it extended from single to multi-objective
optimization. It has been successfully applied in the medical field for heart dis-
ease prediction and recorded good performances [5,6]. The main contribution of
this study involves,
– Improve the accuracy of the prediction of heart disease in human using effi-
cient feature selection and classification methods.
– Implementing the GA and PSO for efficient feature selection.
– Implementing the RNN and LSTM to improve an accuracy for heart disease
prediction.
– Compared performance of the proposed method with the existing techniques
in terms of an accuracy, precision, recall and f-measure.
The remaining organization of the paper is as follows: Sect. 2 includes the
literature survey of the existing research work related to feature selection tech-
niques and deep learning classification methods for heart disease prediction.
Section 3 discusses the implementation process of the GA and PSO optimiza-
tion algorithm and LSTM and RNN classification. Section 4 discusses the per-
formance analysis of the proposed work. The conclusion has been presented in
Sect. 5.
2 Related Work
In [7], researchers proved that optimization algorithms are necessary for an effi-
cient heart disease diagnosis and also for their level estimation. They used sup-
A Model for Heart Disease Prediction Using Feature Selection with DL 153
port vector machine (SVM) and generated an optimization function using the
GA for the selection of more substantial features to identify the heart disease.
The data set used in this research is a Cleveland heart disease database. G. T.
Reddy et al. developed an adaptive GA with fuzzy logic design (AGAFL) in
[8], which in turn helps the medical practitioners for heart disease diagnose at
an early stage. Using the hybrid AGAFL classifier, the heart disease has been
predicted, and this research has been performed on UCI heart disease data sets.
For diagnosing the coronary artery disease, usually angiography method is used,
but it shows significant side effects and highly expensive. The alternative modal-
ities have been found by using the data mining and machine learning techniques
stated in [9], where the coronary artery disease diagnosis is done with the more
accurate hybrid techniques with increased performance of neural network and
used GA to enhance its accuracy. For this research work, Z-Alizadeh Sani data
set is used and yields above 90% values in specificity, accuracy and sensitivity.
In [10], researchers proposed trained recurrent fuzzy neural network (RFNN)
based on GA for heart disease prediction. The data set named UCI Cleveland
heart disease is used. From the testing set, 97.78% accuracy has been resulted.
For large data related to health diagnosis, the machine learning has been con-
sidered as an effective support system. Generally to analyze this kind of massive
data more execution time and resources have required. Effective feature selec-
tion algorithm has been proposed by J. Vijayashree et al. in [11] to identify the
significant features which contribute more in disease diagnosis. Hence to identify
the best solution in reduced time the PSO has been implemented. The PSO
also removes the redundant and irrelevant features in addition to selecting the
important features in the given data set. Novel fitness function for PSO has
been designed in this work using the support vector machine (SVM) to solve
the optimal weight selection issue for velocity and position of particle’s upda-
tion. Finally, the optimization algorithms show the merit of handling the difficult
non-linear problems with adaptability and flexibility. To improve the heart dis-
ease classification quality, the Fast correlation based feature selection namely
(FCBF) method used in [12] by Y. Khourdifi et al. to enhance the classification
of heart disease and also filter the redundant feature. The classification based
on SVM, random forest, MLP, K-Nearest neighbor, the artificial neural network
optimized using the PSO mixed with an ant colony optimization (ACO) tech-
niques, have been applied on heart disease data set. It resulted in robustness
and efficacy by processing the heart disease classification. By using data min-
ing and artificial intelligence, the heart disease has been predicted but for lesser
time and cost in [13], which focused on PSO and neural network feed forward
back propagation method by using the feature ranking on the disease’s effective
factors presented in Cleveland clinical database. After evaluating the selected
features, the result shows that the proposed classified methods resulted in best
accuracy. In [14], for the risk prediction of diseases, machine learning algorithm
plays a major role. The prediction accuracy influenced by attribute selection in
the data set. The performance metric of Mathew’s correlation co-efficient has
been considered. For attribute selection performance, the altered PSO has been
154 V. Baviskar et al.
applied. N. S. R. Pillai et al. in [15] using the deep RNNs the language model like
technique demonstrated to predict high-risk diagnosis patients (prognosis pre-
diction) named as PP-RNNs. Several RNNs used by this proposed PP-RNN for
learning from the patient’s diagnosis code to predict the high risk disease exis-
tences and achieved a higher accuracy. In [16], M. S. Islam et al. suggested grey
wolf optimization algorithm (GWO) combined with RNN, which has been used
for predicting medical disease. The irrelevant and redundant attributes removed
by feature selection using GWO. The feature dimensionality problem avoided
by RNN classifier in which different diseases have been predicted. In this study,
UCI data sets used and enhanced an accuracy in disease prediction obtained
from Cleveland data set. From the structured and unstructured medical data,
deep learning techniques exhibited the hidden data. In [17], researchers used
the LSTM for predicting the cardio vascular disease (CVD) risk factors, and
it generally yields better Mathew’s correlation co-efficient (MCC) as 0.90 and
accuracy as 95% compared with the existing methods. Compared with other sta-
tistical machine learning algorithms, the LSTM based proposed module shows
best performance in the CVD risk factors’ prediction. Based on novel LSTM
deep learning method in [18], helped in predicting the heart failure at an early
stage. Compared with general methods like SVM, logistic regression, MLP and
KNN, the proposed LSTM method shows superior performance. Due to mental
anxiety also CVD occurs, which may increase in COVID-19 lock down period.
In [19], researchers proposed an automated tool which has used RNN for health
care assistance system. From previous health records of patients for detecting the
cardiac problems, the stacked bi-directional LSTM layer has been used. Cardiac
troubles predicted with 93.22% accuracy from the obtained experimental results.
In [21], Senthilkumar Mohan et al. proposed a hybrid machine learning technique
for an effective prediction of heart disease. A new method which finds major
features to improve the accuracy in the cardiovascular prediction with differ-
ent feature’s combinations and several known classification techniques. Machine
learning techniques were used in this work to process raw data and provided
a new and novel discernment towards heart disease. The challenges are seen in
existing studies exhibited as,
– In the medical field, the challenging requirement is, training data in a large
amount is necessary to avoid the over-fitting issue. Towards the majority
samples, predictions are biased if the data set is imbalanced and hence over-
fitting occurs.
– Through the tuning of hyper parameters such as activation functions, learning
rates and network architecture, the deep learning algorithms are optimized.
However, the hyper-parameters selection is a long process as several values
are interdependent, and multiple trials are required.
– Significant memory and computational resources are required for timely com-
pletion assurance. Also, need to improve an accuracy of Cleveland heart dis-
ease data set using deep learning with feature selection techniques.
A Model for Heart Disease Prediction Using Feature Selection with DL 155
3 Methodology
The main purpose of this study is to predict the heart disease in human. The
proposed workflow is shown in Fig. 1, which starts with the collection of dataset,
data pre-processing, implementing the PSO and GA significantly for feature
selection and for classification, RNN and LSTM classifiers used. At last, the
proposed model is evaluated with respect to accuracy, precision, recall and f-
measure. This section describes the workflow of the proposed study.
Fig. 1. Heart rate prediction proposed flow with RNN and LSTM classification
156 V. Baviskar et al.
|sf |
f it = αE(C) + β (1)
|Af |
where, E(C) is the classifier’s error rate, sf is the selected feature subset
length and available features total count is the Af , the parameters used to con-
trol feature reduction and classification accuracy weights β is 1 − α and α ∈
[0,1].
Selection
It selected a portion of population for next-generation breed. Based on the mea-
sured fitness values using Eq. (1) the selection is generated.
Crossover
For further breeding, randomly selected two parents from the previously selected
pool. Until the suitable population size reached, the process is continued. At only
one point, the crossover taken place and this is the parent solution’s mid-point.
The crossover probability parameter is probc which controls the crossover fre-
quency.
Mutation
Selected the random solutions from the chosen candidates for breeding and on
these, the bit flipping has been carried out. A diverse group of solutions arise,
which keeps various characteristics of their parents. The mutation probability
parameter is P robm which controls the mutation’s frequency.
A Model for Heart Disease Prediction Using Feature Selection with DL 157
Table 1. Algorithm 1
until the ending criteria is seen, i) evaluate the fitness value using f (xi ) ii) breed-
ing population selected as xval = N T op (f itsort) iii) Taken random value and its
2
higher than P robc , random sample mutation from xval is taken iv) update the
enhanced new solution with existing solution v) Taken random value and its
higher than P robm , random sample mutation from xval is taken vi) update the
enhanced new solution with existing solution vii) combination of xval and xnewval
generated and it is considered a new solution and finally global best solution is
produced considered as best found solution.
From Table 2, the PSO algorithm described as, at first the swarm size values
N, acceleration constant Ac1 , Ac2 , wmax , wmin , vmax , maxit are initialized. As in
Eq. (2) and Eq. (3), the population is randomly initialized and velocity vectors
are initialized respectively. The following calculations are repeated until the end-
ing criterion is seen, i) inertia weight value w is updated, ii) using f (xi ) the each
solution’s fitness value is updated, iii) assigned the personal-best solution pbest
and gbest as global test solution, iv) the velocity of each particle is formulated
with respect to each iteration c, v) using the transfer function k, the continuous
values are mapped into binary values and generate the new solutions. Finally,
the global best is produced as best found solution.
LSTM and RNN for Classification
A classification technique to predict the heart disease using the RNN and LSTM
model is developed. The LSTM model is proposed at first by Hochreiter et al. in
1997 considered as special RNN model [20]. The RNN is a catch up to the current
A Model for Heart Disease Prediction Using Feature Selection with DL 159
Table 2. Algorithm 2
hidden layer state to previous n-level hidden layer state to obtain the long-term
memory. Basis of RNN network, the LSTM layers are added to valve node, which
overcomes the RNN long term memory evaluation problems. Generally, LSTM
includes three gates to original RNN network such as an input gate, forget gate
and an output gate. The LSTM design key vision is to integrate data-dependent
controls and non-linear to RNN cell is trained and assures that the objective
160 V. Baviskar et al.
function gradient does not vanish based on the state signal. The specification of
RNN and LSTM shown in Table 3.
GA and PSO algorithms with LSTM deep learning model are shown in Fig. 2
and Fig. 3. Here, GA and PSO are used as feature selection algorithms and
LSTM is used as classifier to classify the patients into normal and abnormal
class. Selected features are given as an input to classifier. The details of features
selected are given in Table 6.
Accuracy: The correctly classified in test data set shows in percentage values
are termed as accuracy. The accuracy can be calculated based on the formula
given in Eq. (5),
A Model for Heart Disease Prediction Using Feature Selection with DL 163
TP + TN
Accuracy = (5)
TP + TN + FP + FN
Precision: While the correctly classified subjects showed by precision value.
Precision is calculated by using the formula given in Eq. (6),
TP
P recision = (6)
TP + FP
Recall: A recall is the proportion of related instances that have been recovered.
Therefore, both accuracy and recall are based on an understanding of significance
and measurement. It is estimated by the formula given in Eq. (7),
TP
Recall = (7)
TP + FN
F-measure: The method of F1 score is referred to as the harmonious mean of
accuracy and recall. This can be computed with the aid of the formula given in
Eq. (8),
2 ∗ P recision ∗ Reall
F 1Score = (8)
P recision + Recall
From Fig. 4, it shows the results of the performance metric of accuracy of deep
learning models, RNN and LSTM with and without feature selection algorithms
164 V. Baviskar et al.
of GA and PSO. Here, all six models are compared and LSTM + PSO shows
better accuracy of 93.5%. Out of 61 records tested, 57 predicted accurately where
25 records are from normal class, and 32 records are from abnormal class. Also,
LSTM gives an accuracy in less time compared to RNN as shown in Table 5.
From Table 6, the proposed method evaluation shows the PSO, and GA
selected features. For PSO, the selected features’ count is 8 and shows an accu-
racy level as 91% and takes more time. While the GA selected features’ count
is 11 and shows an accuracy level as 90% and takes lesser time compared with
PSO. However, in terms of accuracy, the PSO shows better performance com-
pared with GA.
From proposed Fig. 8, the evaluation performance for RNN is shown for GA
and PSO features selected algorithms. It shows that, RNN with PSO shows
the better performance compared to RNN with GA and without any feature
selection. Also, accuracy is increased by 3% using PSO algorithm.
From proposed Fig. 9, the evaluation performance for LSTM is shown for GA
and PSO features selected algorithms. It shows that, LSTM with PSO shows
the better performance compared to LSTM with GA and without any feature
selection. Also, accuracy is increased by 7% using PSO algorithm.
166 V. Baviskar et al.
Methods Accuracy
DNN + χ2 Statistical model [22] (K-fold)91.57
2
DNN + χ Statistical model [22] (holdout) 93.33
RNN+GA ( Proposed method) 90
RNN+ PSO ( Proposed method) 92
LSTM+GA ( Proposed method) 90
LSTM+ PSO ( Proposed method) 93.5
From Table 7, it shows that by compared with the existing method the pro-
posed method with LSTM + PSO shows higher accuracy for predicting the heart
disease.
5 Conclusion
In this study, the efficient diagnosis approach has been developed for accurate
prediction of heart disease. The proposed approach used enhanced GA and PSO
for optimized feature selection from the heart disease data set. Further, the
A Model for Heart Disease Prediction Using Feature Selection with DL 167
classification has been achieved by using deep learning models such as RNN
and LSTM. The proposed model has been evaluated using the accuracy, preci-
sion, recall and f-measure performance metrics. The obtained results show that
the proposed method which implements LSTM with PSO yields an accuracy of
93.5% and slightly higher computational time due to the feature selection phase
but leads to an accurate prediction of heart disease as compared to the existing
methods. For other performance metrics like precision, recall and f-measure also
LSTM + PSO shows better performance. In the future, it may be considered for
enhancing the performance of the proposed model.
References
1. Kirubha, V., Priya, S.M.: Survey on data mining algorithms in disease prediction.
Int. J. Comput. Trends Tech. 38, 124–128 (2016)
2. Sharma, H., Rizvi, M.: Prediction of heart disease using machine learning algo-
rithms: a survey. Int. J. Recent Innov. Trends Comput. Commun. 5, 99–104 (2017)
3. Choi, E., Schuetz, A., Stewart, W.F., Sun, J.: Using recurrent neural network
models for early detection of heart failure onset. J. Am. Med. Inform. Assoc. 24,
361–370 (2017)
4. Jin, B., Che, C., Liu, Z., Zhang, S., Yin, X., Wei, X.: Predicting the risk of heart
failure with EHR sequential data modelling. IEEE Access 6, 9256–9261 (2018)
5. Salem, T.: Study and analysis of prediction model for heart disease: an optimization
approach using genetic algorithm. Int. J. Pure Appl. Math. 119, 5323–5336 (2018)<