Machine Learning
Machine Learning
Pradeep Singh
Deepak Singh
Vivek Tiwari
Sanjay Misra Editors
Machine Learning
and Computational
Intelligence
Techniques for Data
Engineering
Proceedings of the 4th International
Conference MISP 2022, Volume 2
Lecture Notes in Electrical Engineering
Volume 998
Series Editors
Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli
Federico II, Naples, Italy
Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán, Mexico
Bijaya Ketan Panigrahi, Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India
Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, Munich, Germany
Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China
Shanben Chen, Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore, Singapore,
Singapore
Rüdiger Dillmann, Humanoids and Intelligent Systems Laboratory, Karlsruhe Institute for Technology, Karlsruhe,
Germany
Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China
Gianluigi Ferrari, Università di Parma, Parma, Italy
Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid, Madrid,
Spain
Sandra Hirche, Department of Electrical Engineering and Information Science, Technische Universität München,
Munich, Germany
Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA, USA
Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Alaa Khamis, German University in Egypt El Tagamoa El Khames, New Cairo City, Egypt
Torsten Kroeger, Stanford University, Stanford, CA, USA
Yong Li, Hunan University, Changsha, Hunan, China
Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA
Ferran Martín, Departament d’Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra,
Barcelona, Spain
Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore
Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany
Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA
Sebastian Möller, Quality and Usability Laboratory, TU Berlin, Berlin, Germany
Subhas Mukhopadhyay, School of Engineering and Advanced Technology, Massey University,
Palmerston North, Manawatu-Wanganui, New Zealand
Cun-Zheng Ning, Electrical Engineering, Arizona State University, Tempe, AZ, USA
Toyoaki Nishida, Graduate School of Informatics, Kyoto University, Kyoto, Japan
Luca Oneto, Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genova,
Genova, Genova, Italy
Federica Pascucci, Dipartimento di Ingegneria, Università degli Studi Roma Tre, Roma, Italy
Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Gan Woon Seng, School of Electrical and Electronic Engineering, Nanyang Technological University,
Singapore, Singapore
Joachim Speidel, Institute of Telecommunications, Universität Stuttgart, Stuttgart, Germany
Germano Veiga, Campus da FEUP, INESC Porto, Porto, Portugal
Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Beijing, China
Walter Zamboni, DIEM—Università degli studi di Salerno, Fisciano, Salerno, Italy
Junjie James Zhang, Charlotte, NC, USA
The book series Lecture Notes in Electrical Engineering (LNEE) publishes the
latest developments in Electrical Engineering—quickly, informally and in high
quality. While original research reported in proceedings and monographs has
traditionally formed the core of LNEE, we also encourage authors to submit books
devoted to supporting student education and professional training in the various
fields and applications areas of electrical engineering. The series cover classical and
emerging topics concerning:
• Communication Engineering, Information Theory and Networks
• Electronics Engineering and Microelectronics
• Signal, Image and Speech Processing
• Wireless and Mobile Communication
• Circuits and Systems
• Energy Systems, Power Electronics and Electrical Machines
• Electro-optical Engineering
• Instrumentation Engineering
• Avionics Engineering
• Control Systems
• Internet-of-Things and Cybersecurity
• Biomedical Devices, MEMS and NEMS
For general information about this book series, comments or suggestions, please
contact [email protected].
To submit a proposal or request further information, please contact the Publishing
Editor in your country:
China
Jasmine Dou, Editor ([email protected])
India, Japan, Rest of Asia
Swati Meherishi, Editorial Director ([email protected])
Southeast Asia, Australia, New Zealand
Ramesh Nath Premnath, Editor ([email protected])
USA, Canada
Michael Luby, Senior Editor ([email protected])
All other Countries
Leontina Di Cecco, Senior Editor ([email protected])
** This series is indexed by EI Compendex and Scopus databases. **
Pradeep Singh · Deepak Singh · Vivek Tiwari ·
Sanjay Misra
Editors
Machine Learning
and Computational
Intelligence Techniques
for Data Engineering
Proceedings of the 4th International
Conference MISP 2022, Volume 2
Editors
Pradeep Singh Deepak Singh
Department of Computer Science Department of Computer Science
and Engineering and Engineering
National Institute of Technology Raipur National Institute of Technology Raipur
Raipur, Chhattisgarh, India Raipur, Chhattisgarh, India
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Contents
v
vi Contents
Dr. Pradeep Singh received a Ph.D. in Computer science and Engineering from the
National Institute of Technology, Raipur, and an M.Tech. in Software engineering
from the Motilal Nehru National Institute of Technology, Allahabad, India. Dr. Singh
is an Assistant Professor in the Computer Science & Engineering Department at the
National Institute of Technology. He has over 15 years of experience in various
government and reputed engineering institutes. He has published over 80 refereed
articles in journals and conference proceedings. His current research interests areas
are machine learning and evolutionary computing and empirical studies on software
quality, and software fault prediction models.
Dr. Deepak Singh completed his Bachelor of Engineering from Pt. Ravi Shankar
University, Raipur, India, in 2007. He earned his Master of Technology with honors
from CSVTU Bhilai, India, in 2011. He received a Ph.D. degree from the Department
of Computer Science and Engineering at the National Institute of Technology (NIT)
in Raipur, India, in 2019. Dr. Singh is currently working as an Assistant Professor at
the Department of Computer Science and Engineering, National Institute of Tech-
nology Raipur, India. He has over 8 years of teaching and research experience along
with several publications in journals and conferences. His research interests include
evolutionary computation, machine learning, domain adaptation, protein mining, and
data mining.
xiii
xiv About the Editors
Dr. Sanjay Misra Sr. Member of IEEE and ACM Distinguished Lecturer, is Professor
at Østfold University College (HIOF), Halden, Norway. Before coming to HIOF, he
was Professor at Covenant University (400-500 ranked by THE (2019)) for 9 years.
He holds a Ph.D. in Information & Knowledge Engineering (Software Engineering)
from the University of Alcala, Spain, and an M.Tech. (Software Engineering) from
MLN National Institute of Tech, India. Total around 600 articles (SCOPUS/WoS)
with 500 co-authors worldwide (-130 JCR/SCIE) in the core & appl. area of Soft-
ware Engineering, Web engineering, Health Informatics, Cybersecurity, Intelligent
systems, AI, etc.
A Review on Rainfall Prediction Using
Neural Networks
1 Introduction
Rain plays the most vital function in human life during all types of meteorological
events [1]. Rainfall is a natural climatic phenomenon that has a massive impact on
human civilization and demands precise forecasting [2]. Rainfall forecasting has a
link with agronomics, which contributes remarkably to the country’s providence [3,
4]. There are three methods for developing rainfall forecasting: (i) Numerical, (ii)
Statistical, and (iii) Machine Learning.
Numerical Weather Prediction (NWP) forecasts using computer power [5, 6].
To forecast future weather, NWP computer models process current weather obser-
vations. The model’s output is formulated on current weather monitoring, which
digests into the model’s framework and is used to predict temperature, precipitation,
and lots of other meteorological parameters from the ocean up to the top layer of the
atmosphere [7].
Statistical forecasting entails using statistics based on historical data to forecast
what might happen in the future [8]. For forecasting, the statistical method employs
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 1
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_1
2 S. Mandal et al.
linear time-series data [9]. Each statistical model comes with its own set of limi-
tations. The statistical model, Auto-Regressive (AR), regresses against the series’
previous values. The AR term simply informs how many linearly associated lagged
observations are there, therefore it’s not suitable for data with nonlinear correla-
tions. The Moving Average (MA) model uses the previous error which is used as a
describing variable. It keeps track of the past of distinct periods for each anticipated
period, and it frequently overlooks intricate dataset linkages. It does not respond
to fluctuations that occur due to factors such as cycles and seasonal effects [10].
The ARIMA model (Auto-Regressive Integrated Moving Average) is a versatile and
useful time-series model that combines the AR and MA models [11]. Using stationary
time-series data, the ARIMA model can only forecast short-term rainfall. Because of
the dynamic nature of climatic phenomena and the nonlinear nature of rainfall data,
statistical approaches cannot be used to forecast long-term rainfall.
Machine learning can be used to perform real-time comparisons of historical
weather forecasts and observations. Because the physical processes which affect rain-
fall occurrence are extremely complex and nonlinear, some machine learning tech-
niques such as Artificial Neural Network (ANN), Support Vector Machine (SVM),
Random Forest Regression Model, Decision Forest Regression, and Bayesian linear
regression models are better suited for rainfall forecasting. However, among all
machine learning techniques, ANNs perform the best in terms of rainfall forecasting.
The usage of ANNs has grown in popularity, and ANNs are one of the most exten-
sively used models for forecasting rainfall. ANNs are a data-driven model that does
not need any limiting suppositions about the core model’s shape. Because of their
parallel processing capacity, ANNs are effective at training huge samples and can
implicitly recognize complex nonlinear correlations between conditional and non-
conditional variables. This model is dependable and robust since it learns from the
original inputs and their relationships with unseen data. As a result, ANNs can
estimate the approximate peak value of rainfall data with ease.
This paper presents the different rainfall forecasting models proposed using ANNs
and highlights some special features observed during the survey. This study also
reports the suitability of different ANN architectures in different situations for rain-
fall forecasting. Besides, the paper finds the weather parameters responsible for
rainfall and discusses different issues in rainfall forecasting using machine learning.
The paper has been assembled with the sections described as follows. Section 2
discusses the literature survey of papers using different models. Section 3 discusses
the theoretical analysis of the survey and discussion and, at last, the conclusion part
is discussed in Sect. 4. Future scope of this paper is also discussed in Sect. 4.
2 Literature Survey
Rainfall prediction is one of the most required jobs in the modern world. In general,
weather and rainfall are highly nonlinear and complex phenomena, which require the
latest computer modeling and recreation for their accurate prediction. An Artificial
A Review on Rainfall Prediction Using Neural Networks 3
Neural Network (ANN) can be used to foresee the behavior of such nonlinear systems.
Soft computing hands out with estimated models where an approximation answer and
result are obtained. Soft computing has three primary components; those are Artificial
Neural Network (ANN), Fuzzy logic, and Genetic Algorithm. ANN is commonly
used by researchers in the field of rainfall prediction. The human brain is highly
complex and nonlinear. On the other hand, Neural Networks are simplified models
of biological neuron systems. A neural network is a massively parallel distributed
processor built up of simple processing units, which has a natural tendency for
storing experiential knowledge and making it available for use. Many researchers
have attempted to forecast rainfall using various machine learning models. In most
of the cases, ANNs are used to forecast rainfall. Table 1 shows some types of ANNs
like a Backpropagation Neural Network (BPNN) and Convolutional Neural Network
(CNN) that are used based on the quality of the dataset and rainfall parameters for
better understanding and comprehensibility. Rainfall accuracy is measured using
accuracy measures such as MSE and RMSE.
One of the most significant advancements in neural networks is the Backprop-
agation learning algorithm. For complicated, multi-layered networks, this network
remains the most common and effective model. One input layer, one output layer,
and at least one hidden layer make up the backpropagation network. The capacity
of a network to provide correct outputs for a given dataset is determined by each
layer’s neuron count and the hidden layer’s number. The raw data is divided into
two portions, one for training purpose and the other for testing purpose of the
model. Vamsidhar et al. [1] have proposed an efficient rainfall prediction system
using BPNN. They created a 3-layered feedforward neural network architecture
and initialized the weights of the neural network. A 3-layered feedforward neural
network architecture was created by initializing the weights of the neural network by
random values between −1.0 and 1.0. Monthly rainfall from the year 1901 to the year
2000 was used here. Using humidity, dew point, and pressure as input parameters,
they obtained an accuracy of 99.79% in predicting rainfall and 94.28% for testing
purposes. Geeta et al. [2] have proposed monthly monsoon rainfall for Chennai,
using the BPNN model. Chennai’s monthly rainfall data from 1978 to 2009 were
taken for the desired output data for training and testing purposes. Using wind speed,
mean temperature, relative humidity, and aerosol values (RSPM) as rainfall param-
eter, they got a prediction of 9.96 error rate. Abhishek et al. [3] have proposed an
Artificial Neural Network system-based rainfall prediction model. They concluded
that when there is an increase in the number of hidden neurons in ANN, then MSE of
that model decreases. The model was built by five sequential steps: 1. Input and the
output data selection for supervised backpropagation learning. 2. Input and the output
data normalization. 3. Normalized data using Backpropagation learning. 4. Testing
of fit of the model. 5. Comparing the predicted output with the desired output. Input
parameters were the average humidity and the average wind speed for the 8 months
of 50 years for 1960–2010. Back Propagation Algorithm (BPA) was implemented in
the nftools, and they obtained a minimum MSE = 3.646. Shrivastava et al. [4] have
proposed a rainfall prediction model using backpropagation neural network. They
used rainfall data from Ambikapur region of Chhattisgarh, India. They concluded that
4
2018 S. Aswin et al. [13] Not Given 468 months Convolutional Neural Precipitation RMSE = 2.44
Networks
2020 C. Zhang et al. [14] Shenzhen, China, 2014 to 2016 (March Deep convolutional Gauge rainfall and RMSE = 9.29
to September), neural network Doppler radar echo map
2019 R. Kaneko. et al. [15] Kyushu region in From 2016 to 2018 2-layer stacked LSTMs Wind direction and RMSE = 2.07
Japan wind velocity,
temperature,
precipitation, pressure,
and relative humidity
2020 A. Pranolo et al. [16] Tenggarong of East 1986 to 2008 A Long Short-Term Not mentioned RMSE = 0.2367
Kalimantan-Indonesia Memory
(continued)
5
6
Table 1 (continued)
Year Authors Region used Dataset Model used Parameters used Accuracy
2020 I. Salehin et al. [17] Bangladesh 2020 (1 Aug to 31 Long Short-Term Temperature, dew 76% accuracy
Meteorological Aug) Memory point, humidity, wind
Department properties (pressure,
speed, and direction)
2020 A. Samad et al. [18] Albany, Walpole, and 2007–2015 for Long Short-Term Temperature, pressure, MSE, RMSE, and
Witchcliffe training, 2016 for Memory humidity, wind speed, MAE
testing and wind direction RMSE = 5.343
2020 D. Zatusiva East Java, Indonesia December 29, 2014 to Long Short-Term El Nino Index (NI) and MAAPE = 0.9644
et al. [19] August 4, 2019 Memory Indian Ocean Dipole
(IOD)
2019 S. Poornima et al. Hyderabad, India Hyderabad region Intensified Long Maximum temperature, Accuracy—87.99%
[20] starting from 1980 Short-Term Memory minimum temperature,
until 2014 maximum relative
humidity, minimum
relative humidity, wind
speed, sunshine and
S. Mandal et al.
A Review on Rainfall Prediction Using Neural Networks 7
BPN is suitable for the identification of internal dynamics of high dynamic monsoon
rainfall. The performance of the model was evaluated by comparing Standard Devi-
ation (SD) and Mean Absolute Deviation (MAD). Based on backpropagation, they
were able to get 94.4% accuracy. Sharma et al. [5] have proposed a rainfall prediction
model on backpropagation neural network by using Delhi’s rainfall data. The input
and target data had to be normalized because of having different units. By using
temperature, humidity, wind speed, pressure, and dew point as input parameters of
the prediction model, MSE was approximately 8.70 and accuracy graph was plotted
with NFTools. Chaturvedi [6] has proposed rainfall prediction using backpropaga-
tion neural network. He took 70% of data for training purpose, 15% for testing, and
other 15% for validation purpose. The input data for the model consisted of 365
samples within that testing purpose of 255 samples, 55 samples for testing, and the
rest samples are for validation purpose. He plotted a graph using NFTools among the
predicted value and the target values which showed a minimized MSE of 8.7. He also
concluded increase in the neuron number of the network shows a decrease in MSE of
the model. Lessnussaa et al. [7] have proposed a rainfall prediction using backprop-
agation neural network in Ambon city. The researchers have used monthly rainfall
data from 2011 to 2015 and considered weather parameters such as air temperature,
air velocity, and pressure. They got a result of accuracy 80% by using alpha 0.7,
iteration number ( in terms of epoch) 10,000, and also MSE value 0.022.
Radial Basis Function Networks are a class of nonlinear layered feedforward
networks. It is a different approach which views the design of neural network as a
curve fitting problem in a high-dimensional space. The construction of a RBF network
involves three layers with entirely different roles: the input layer, the only hidden
layer, and the output layer. Lee et al. [8] have proposed rainfall prediction using an
artificial neural network. The dataset has been taken from 367 locations based on the
daily rainfall at nearly 100 locations in Switzerland. They proposed a divide-and-
conquer approach where the whole region is divided into four sub-areas and each is
modeled with a different method. For two larger areas, they used radial basis function
(RBF) networks to perform rainfall forecasting. They achieved a result of RMSE
of the whole dataset: 78.65, Relative and Absolute Errors, Relative Error—0.46,
Absolute Error—55.9 from rainfall prediction. For the other two smaller sub-areas,
they used a simple linear regression model to predict the rainfall. Lee et al. [9] have
proposed “Artificial neural network analysis for reliability prediction of regional
runoff utilization”. They used artificial neural networks to predict regional runoff
utilization, using two different types of artificial neural network models (RBF and
BPNN) to build up small-area rainfall–runoff supply systems. A historical rainfall
for Taipei City in Taiwan was applied in the study. As a result of the impact variances
between the results used in training, testing, and prediction and the actual results,
the overall success rates of prediction are about 83% for BPNN and 98.6% for
RBF. Liu Xinia et al. [10] have proposed Filtering and Multi-Scale RBF Prediction
Model of Rainfall Based on EMD Method, a new model based on empirical mode
decomposition (EMD) and the Radial Basis Function Network (RBFN) for rainfall
prediction. They used monthly rainfall data for 39 years in Handan city. Therefore, the
8 S. Mandal et al.
results obtained were evidence of the fact that the RBF network can be successfully
applied to determine the relationship between rainfall and runoff.
Convolutional Neural Network (ConvNet/CNN) is a well-known deep learning
algorithm which takes an input image, sets some relevance (by using learnable
weights and biases) to different aspects/objects in the image, and discriminates among
them. CNN is made up of different feedforward neural network layers, such as convo-
lution, pooling, and fully connected layers. CNN is used to predict rainfall for time-
series rainfall data. Qiu et al. [11] have proposed a multi-task convolutional neural
networks-based rainfall prediction system. They evaluated two real-world datasets.
The first one was the daily collected rainfall data from the meteorological station
of Manizales city. Another was a large-scale rainfall dataset taken from the obser-
vation sites of Guangdong province, China. They got a result of RMSE = 11.253
in their work. Halder et al. [12] have proposed a one-dimensional Deep Convolu-
tional Neural Network based on a monthly rainfall prediction system. Additional
local attributes were also taken like Mint and MaxT. They got a result of RMSE =
15.951 in their work. Aswin et al. [13] have proposed a rainfall prediction model
using a convolutional neural network. They used precipitation as an input parameter
and using 468 months of precipitation as an input parameter, they got an RMSE
accuracy of 2.44. Ghang et al. [14] have proposed a rainfall prediction model using
deep convolutional neural network. They collected this rainfall data from the mete-
orological observation center in Shenzhen, China, for the years 2014 to 2016 from
March to September. They got RMSE = 9.29 for their work. They have concluded
that Tiny-RainNet model’s overall performance is better than fully connected LSTM
and convolutional LSTM.
Recurrent Neural Network is an abstraction of feedforward neural networks that
possess intrinsic memory. RNN is recurring as it brings about the same function for
every input data and the output depends on the past compilation. After finding the
output, it is copied and conveyed back into the recurrent network unit.
LSTM is one of the RNNs that has the potential to forecast rainfall. LSTM is
a component of the Recurrent Neural Network (RNN) layer, which is accustomed
to addressing the gradient problem by forcing constant error flow. A LSTM unit
is made up of three primary gates, each of which functions as a controller for the
data passing through the network, making it a multi-layer neural network. Kaneko
et al. [15] have proposed a 2-layer stacked RNN-LSTM-based rainfall prediction
system with batch normalization. The LSTM model performance was compared
with MSM (Meso Scale Model by JMA) from 2016 to 2018. The LSTM model
successfully predicted hourly rainfall and surprisingly some rainfall events were
predicted better in the LSTM model than MSM. RMSE of the LSTM model and
MSM were 2.07 mm h-1 and 2.44 mm h-1, respectively. Using wind direction and
wind velocity, temperature, precipitation, pressure, and relative humidity as rainfall
parameters, they got an RMSE of 2.07. Pranolo et al. [16] have proposed a LSTM
model for predicting rainfall. The data consisted of 276 data samples, which were
subsequently separated into 216 (75%) training datasets for the years 1986 to 2003,
and 60 (25%) test datasets for the years 2004 to 2008. In this study, the LSTM
and BPNN architecture included a hidden layer of 200, a maximum epoch of 250,
A Review on Rainfall Prediction Using Neural Networks 9
gradient threshold of 1, and learning rate of 0.005, 0.007, and 0.009. These results
clearly indicate the advantages of the LSTM produced good accuracy than the BPNN
algorithm. They got a result of RMSE = 0.2367 in their work. Salehin et al. [17]
have proposed a LSTM and Neural Network-based rainfall prediction system. Time-
series forecasting with LSTM is a modern approach to building a rapid model of
forecasting. After analyzing all data using LSTM, they found 76% accuracy in this
work. LSTM networks are suitable for time-series data categorization, processing,
and prediction. So, they concluded that LSTM gives the most controllability and thus
better results were obtained. Samad et al. [18] have proposed a rainfall prediction
model using Long Short-Term memory. Using temperature, pressure, humidity, wind
speed, and wind direction as input parameters on the rainfall data of years 2007–
2015, they got an accuracy of RMSE 5.343. Haq et al. [19] have proposed a rainfall
prediction model using long short-term memory based on El Nino and IOD Data.
They used 60% training data with variation in the hidden layer, batch size, and learn
rate drop periods to achieve the best prediction results. They got an accuracy of
MAAPE = 0.9644 in their work. S. Poornima [20] has proposed an article named
“Prediction of Rainfall Using Intensified LSTM-Based Recurrent Neural Network
with Weighted Linear Units”. This paper presented Intensified Long Short-Term
Memory (Intensified LSTM)-based Recurrent Neural Network (RNN) to predict
rainfall. The parameters considered for the evaluation of the performance and the
efficiency of the proposed rainfall prediction model were Root Mean Square Error
(RMSE), accuracy, number of epochs, loss, and learning rate of the network. The
initial learning rate was fixed to 0.1, and no momentum was set as default, with a
batch size of 2500 undergone for 5 iterations since the total number of rows in the
dataset is 12,410 consisting of 8 attributes. The accuracy achieved by the Intensified
LSTM-based rainfall prediction model is 87.99%.
For prediction, all of these models use nearly identical rainfall parameters.
Humidity, wind speed, and temperature are important parameters for backpropa-
gation [21]. Temperature and precipitation are important factors in convolutional
neural networks (Covnet). Temperature, wind speed, and humidity are all impor-
tant factors for Recurrent Neural Network (RNN) and Long Short-Term Memory
(LSTM) networks. In most of the cases, accuracy measures such as MSE, RMSE,
and MAE are used. With temperature, air pressure, humidity, wind speed, and wind
direction as input parameters, BPNN has achieved an accuracy of 2.646, CNN has
achieved an accuracy of 2.44, and LSTM has achieved a better accuracy of RMSE
= 0.236. As a result, from this survey it can be said that LSTM is an effective model
for rainfall forecasting.
10 S. Mandal et al.
High variability in rainfall patterns is the main problem of rainfall forecasting. Data
inefficiency and absence of the records like temperature, wind speed, and wind direc-
tions can affect prediction [22, 23]. So, data preprocessing is required for compen-
sating the missing values. As future data is unpredictable, models have to use esti-
mated data and assumptions to predict future weather [24]. Besides massive defor-
estation, abrupt changes in climate conditions may prove the prediction false. In the
case of the yearly rainfall dataset, there is no manageable procedure to determine rain-
fall parameters such as wind speed, humidity, and soil temperature. In some models,
researchers have used one hidden layer, and for that large number of hidden nodes
are required and performance gets minimized. To compensate this, 2 hidden layers
are used. More than 2 hidden layers give the same results. Either a few or more input
parameters can influence the learning or prediction capability of the network [25].
The model simulations use dynamic equations which demonstrate how the atmo-
sphere will respond to changes in temperature, pressure, and humidity over time.
Some of the frequent challenges while implementing several types of ANN archi-
tecture for modeling weekly, monthly, and yearly rainfall data are such as hidden
layer and node count, and training and testing dataset division. So, prior knowledge
about these methods and architectures is needed. As ANNs are prone to overfitting
problems, this can be reduced by early stopping or regularizing methods. Choosing
accurate performance measures and activation functions for simulation are also an
important part of rainfall prediction implementation.
4 Conclusions
This paper considers a study of various ANNs used by researchers to forecast rain-
fall. The survey shows that BPN, CNN RNN, LSTM, etc. are suitable to predict
rainfall than other forecasting techniques such as statistical and numerical methods.
Moreover, this paper discussed the issues that must be addressed when using ANNs
for rainfall forecasting. In most cases, previous daily data of rainfall and maximum
and minimum temperature, humidity, and wind speed are considered. All the models
provide good prediction accuracy, but as the models progress from neural networks
to deep learning, the accuracy improves, implying a lower error rate. Finally, based
on the literature review, it can be stated that ANN is practical for rainfall forecasting
because several ANN models have attained significant accuracy. RNN shows better
accuracy as there are memory units incorporated, so it can remember the past trends
of rainfall. Depending on past trends, the model gives a more accurate prediction.
Accuracy can be enhanced even more if other parameters are taken into account.
Rainfall prediction will be more accurate as ANNs progress, making it easier to
understand weather patterns.
A Review on Rainfall Prediction Using Neural Networks 11
From this research work after analyzing all the results from these mentioned
research papers, it can be concluded that neural networks perform better, so, for
further works, rainfall forecasting implementation will be done by using neural
networks. If RNN and LSTM are used, then forecasting would be better for their
additional memory unit. So, for the continuation of this paper, rainfall forecasting
of a particular region will be done using LSTM. And additionally, there will be
a comparative study with other neural networks for a better understanding of the
importance of artificial neural networks in rainfall forecasting.
References
1. Vamsidhar E, Varma KV, Rao PS, Satapati R (2010) Prediction of rainfall using back
propagation neural network model. Int J Comput Sci Eng 02(04):1119–1121
2. Geetha G, Samuel R, Selvaraj (2011) Prediction of monthly rainfall in Chennai using back
propagation neural network model. Int J Eng Sci Technol 3(1):211–213
3. Abhishek K, Kumar A, Ranjan R, Kumar S (2012) A rainfall prediction model using artificial
neural network. IEEE Control Syst Graduate Res Colloquium. https://doi.org/10.1109/ICS
GRC.2012.6287140
4. Shrivastava G, Karmakar S, Kowar MK, Guhathakurta P (2012) Application of artificial neural
networks in weather forecasting: a comprehensive literature review. IJCA 51(18):0975–8887.
https://doi.org/10.5120/8142-1867
5. Sharma A, Nijhawan G (2015) Rainfall prediction using neural network. Int J Comput Sci
Trends Technol (IJCST) 3(3), ISSN 2347–8578
6. Chaturvedi A (2015) Rainfall prediction using back propagation feed forward network. Int J
Comput Appl (0975 – 8887) 119(4)
7. Lesnussa YA, Mustamu CG, Lembang FK, Talakua MW (2018) Application of backpropaga-
tion neural networks in predicting rainfall data in Ambon city. Int J Artif Intell Res 2(2). ISSN
2579–7298
8. Lee S, Cho S, Wong PM (1998) Rainfall prediction using artificial neural network. J Geog Inf
Decision Anal 2:233–242
9. Lee C, Lin HT (2006) Artificial neural network analysis for reliability prediction of regional
runoff utilization. S. CIB W062 symposium 2006
10. Xinia L, Anbing Z, Cuimei S, Haifeng W (2015) Filtering and multi-scale RBF prediction
model of rainfall based on EMD method. ICISE 2009:3785–3788
11. Qiu M, Zha P, Zhang K, Huang J, Shi X, Wa X, Chu W (2017) A short-term rainfall prediction
model using multi-task convolutional neural networks. In: IEEE international conference on
data mining. https://doi.org/10.1109/ICDM.2017.49
12. Haidar A, Verma B (2018) Monthly rainfall forecasting using one-dimensional deep convo-
lutional neural network. Project: Weather Forecasting using Machine Learning Algorithm,
UNSW Sydney. https://doi.org/10.1109/ACCESS.2018.2880044
13. Aswin S, Geetha P, Vinayakumar R (2018) Deep learning models for the prediction of rainfall.
In: International conference on communication and signal procesing. https://doi.org/10.1109/
ICCSP.2018.8523829,2018
14. Zhang CJ, Wang HY, Zeng J, Ma LM, Guan L (2020) Tiny-RainNet: a deep convolutional neural
network with bi-directional long short-term memory model for short-term rainfall prediction.
Meteorolog Appl 27(5)
15. Kaneko R, Nakayoshi M, Onomura S (2019) Rainfall prediction by a recurrent neural network
algorithm LSTM learning surface observation data. Am Geophys Union, Fall Meeting
12 S. Mandal et al.
16. Pranolo A, Mao Y, Tang Y, Wibawa AP (2020) A long short term memory implemented
for rainfall forecasting. In: 6th international conference on science in information technology
(ICSITech). https://doi.org/10.1109/ICSITech49800.2020.9392056
17. Salehin I, Talha IM, Hasan MM, Dip ST, Saifuzzaman M, Moon NN (2020) An artificial
intelligence based rainfall prediction using LSTM and neural network. https://doi.org/10.1109/
WIECON-ECE52138.2020.9398022
18. Samad A, Gautam V, Jain P, Sarkar K (2020) An approach for rainfall prediction using long short
term memory neural network. In: IEEE 5th international conference on computing communi-
cation and automation (ICCCA) Galgotias University, GreaterNoida,UP, India. https://doi.org/
10.1109/ICCCA49541.2020.9250809
19. Haq DZ, Novitasari DC, Hamid A, Ulinnuha N, Farida Y, Nugraheni RD, Nariswari R, Rohayani
H, Pramulya R, Widjayanto A (2020) Long short-term memory algorithm for rainfall prediction
based on El-Nino and IOD Data. In: 5th international conference on computer science and
computational intelligence
20. Poornima S, Pushpalatha M (2019) Prediction of rainfall using intensified LSTM based
recurrent neural network with weighted linear units. Comput Sci Atmos. 10110668
21. Parida BP, Moalafhi DB (2008) Regional rainfall frequency analysis for Botswana using L-
Moments and radial basis function network. Phys Chem Earth Parts A/B/C 33(8). https://doi.
org/10.1016/j.pce.2008.06.011
22. Dubey AD (2015) Artificial neural network models for rainfall prediction in Pondicherry. Int
J Comput Appl (0975–8887). 10.1.1.695.8020
23. Biswas S, Das A, Purkayastha B, Barman D (2013) Techniques for efficient case retrieval
and rainfall prediction using CBR and Fuzzy logic. Int J Electron Commun Comput Eng
4(3):692–698
24. Basha CZ, Bhavana N, Bhavya P, Sowmya V (2020) Proceedings of the international conference
on electronics and sustainable communication systems. IEEE Xplore Part Number: CFP20V66-
ART; ISBN: 978-1-7281-4108-4.
25. Biswas SK, Sinha N, Purkayastha B, Marbaniang L (2014) Weather prediction by recurrent
neural network dynamics. Int J Intell Eng Informat Indersci, 2(2/3):166–180 (ESCI journal)
Identifying the Impact of Crime
in Indian Jail Prison Strength
with Statical Measures
1 Introduction
The use of machine learning algorithms to forecast any crime is becoming common-
place. This research is separated into two parts: the forecast of violent crime and its
influence in prison, and the prediction of detainees in jail. We are using data from
a separate source. The first two datasets used are violent crime and total FIR data
from the police department, followed by data on prisoners and detainees sentenced
for violent crimes from the Jail Department.
A guide for the correct use of correlation in crime and jail strength is needed to
solve this issue. Data from the NCRB shows how correlation coefficients can be used
in real-world situations. As shown in Fig. 1, a correlation coefficient will be used for
the forecast of crime and the prediction of jail overcrowding.
Regression and correlation and are two distinct yet complementary approaches.
In general, regressions are used to make predictions (which do not extend beyond the
data used in the research), whereas correlation is used to establish the degree of link.
There are circumstances in which the x variable is neither fixed nor readily selected
by the researcher but is instead a random covariate of the y variable [1]. In this article,
the observer’s subjective features and the latest methodology are used. The begin-
nings and increases of crime are governed by age groups, racial backgrounds, family
structure, education, housing size [2], employed-to-unemployed ratio, and cops per
capita. Rather than systematic classification to categorize under the impressionistic
S. S. kshatri (B)
Department of Computer Science and Engineering (AI), Shri Shankaracharya Institute of
Professional Management and Technology, Raipur, C.G., India
e-mail: [email protected]
D. Singh
Department of Computer Science and Engineering, National Institute of Technology, Raipur,
C.G., India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 13
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_2
14 S. S. kshatri and D. Singh
2 Related Work
et al. present the police department used big data analytics (BDA), support vector
machine (SVM), artificial neural networks (ANNs), K-means algorithm Naive Bayes.
AI (machine learning) and DL (deep learning approaches). This study’s aim is to
research the most accurate AI and DL methods for predicting crime rates and the
application of data approaches in attempts to forecast crime, with a focus on the
dataset.
Modern methods based on machine learning algorithms can provide predictions
in circumstances where the relationships between characteristics and outcomes are
complex. Using algorithms to detect potential criminal areas, these forecasts may
assist politicians and law enforcement to create successful programs to minimize
crime and improve the nation’s growth. The goal of this project is to construct a
machine learning system for predicting a morally acceptable output value. Our results
show that utilizing FAMD as a feature selection technique outperforms PCA on
machine learning classifiers. With 97.53 percent accuracy for FAMD and 97.10
percent accuracy for PCA, the naive Bayes classifier surpasses other classifiers [9].
Retrospective models employ past crime data to predict future crime. These include
hotspot approaches, which assume that yesterday’s hotspots are likewise tomorrow’s.
Empirical research backs this up: although hotspots may flare up and quiet down
quickly, they tend to stay there over time [10].
Prospective models employ more than just historical data to examine the causes
of major crime and build a mathematical relationship between the causes and levels
of crime. Future models use criminological ideas to anticipate criminal conduct. As
a consequence, these models should be more relevant and provide more “enduring”
projections [11]. Previous models used either socioeconomic factors (e.g., RTM
[15]) or near-repeat phenomena (e.g., Promap [12]; PredPol [13]). The term “near-
repeat” refers to a phenomenon when a property or surrounding properties or sites
are targeted again shortly after the initial criminal incident.
Another way, Drones may also be used to map cities, chase criminals, investigate
crime scenes and accidents, regulate traffic flow, and search and rescue after a catas-
trophe. In Ref. [14], legal concerns surrounding drone usage and airspace allocation
are discussed. The public has privacy concerns when the police acquire power and
influence. Concerns concerning drone height are raised by airspace dispersal. These
include body cameras and license plate recognition. In Ref. [15], the authors state that
face recognition can gather suspect profiles and evaluate them from various databases.
A license plate scanner may also get data on a vehicle suspected of committing a
crime. They may even employ body cameras to see more than the human eye can
perceive, meaning the reader sees and records all a cop sees. Normally, we cannot
recall the whole picture of an item we have seen. The influence of body cameras
on officer misbehaviors and domestic violence was explored in Ref. [16]. Patrol
personnel now have body cameras. Police misconduct protection. However, wearing
a body camera is not just for security purposes but also to capture crucial moments
during everyday activities or major operations.
While each of these ways is useful, they all function separately. While the police
may utilize any of these methods singly or simultaneously, having a device that
Identifying the Impact of Crime in Indian Jail Prison Strength … 17
can combine the benefits of all of these techniques would be immensely advan-
tageous. Classification of threats, machine learning, deep learning, threat detection,
intelligence interpretation, voice print recognition, natural language processing Core
analytics, Computer linguistics, Data collection, Neural networks Considering all of
these characteristics is critical for crime prediction.
3.1 Dataset
The dataset in this method consists of 28 states and seven union territories. As a
result, the crime has been split into parts. We chose some of the pieces that were held
in the category of Violence Crime for our study. A type for the Total Number of FIRs
has also been included. The first dataset was gathered from the police and prison
departments, and it is vast. Serious sequential data are typically extensive in size,
making it challenging to manage terabytes of data every day from various crimes.
Time series modeling is performed by using classification models, which simplify
the data and enable it to model to construct an outcome variable. Data from 2001 to
2015 was plotted in an excel file in a state-by-state format. The most common crime
datasets are chosen from a large pool of data. Within the police and jail departments,
violent offenses are common—one proposed arduous factor for both departments.
Overcrowding refers to a problem defining and describing thoughts, and it is also
linked to externally focused thought. This study aimed to investigate the connection
between violent crime, FIR, and strength in jail. There are some well-documented
psychological causes of aggression. For example, both impulsivity and anger have
indeed been linked to criminal attacks [17].
The frequent crime datasets are selected from huge data [17]. A line graph is used
to analyze the total IPC crimes for each state (based on districts) from 2001 to 2015.
The attribute “States? UT” is used to generate the data, as compared to the attribute
“average FIR.” The supervised and unsupervised data techniques are used to predict
crime accurately from the collected data [18, 19] (Table 1).
The imported dataset is pictured with the class attribute being STATE/UT. The
representation diagram shows the distribution of attribute STATE/UT with different
attributes in the dataset; each shade in the perception graph represents a specific state.
The imported dataset is pictured; the representation diagram shows the circulation of
crime as 1–5 levels specific attributes with class attributes which are people captured
during the year.
18 S. S. kshatri and D. Singh
The blue region in the chart represents high crime like murder, and the pink area
represents low crime like the kidnapping of a particular attribute in the dataset. Police
data create a label for murder, attempt to murder, and dowry death as 1—the rape,
2—attempt to rape, 3—dacoity, assembly to dacoity, and, likewise, up to 5 (Fig. 2).
The dataset with crime rates and the prison population is gazing at us. Fortunately,
a correlation matrix can assist us in immediately comprehending the relationship
between each variable. One basic premise of multiple linear regression is that no
independent variable in the model is substantially associated with another variable.
Numerous numerical techniques are available to help you understand how effec-
tively a regression equation relates to the data, in addition to charting. The sample
coefficient of determination, R 2 , of a linear regression fit (with any number of predic-
tors) is a valuable statistic to examine. Assuming a homoscedastic model (wi = 1),
R 2 is the ratio between SSReg and Syy , the sum of squares of deviations from the
mean (Syy ,) accounted for by regression [1].
The primary objective behind relapse is to demonstrate and examine the relation-
ship between the dependent and independent variables. The errors are proportionally
independent and normally distributed with a mean of 0 and variance σ. By decreasing
the error or residual sums of squares, the βs are estimated:
n k
S(β0 , β1,.......βm ) = Yi − (β0 + β j Xi j ) (1)
i=1 j=1
To locate the base of (2) regarding β, the subsidiary of the capacity in (2), as for
each of the βs, is set to zero and tackled. This gives the accompanying condition:
⎛ ⎛ ⎞⎞
δs
n
k
= −2 ⎝Yi − ⎝β̂0 + β̂ j X i j ⎠⎠ = 0, j = 0, 1, 2 . . . k
δβ|β̂0 , β̂1 . . . β̂m i=1 j=1
(2)
And
⎛ ⎛ ⎞⎞
δs
n
k
= −2 ⎝Yi − ⎝β̂0 + β̂ j X i j ⎠⎠ = 0, j = 0, 1, 2 . . . k
δβ|β̂0 , β̂1 . . . β̂m i=1 j=1
(3)
The β̂s, the answers for (3) and (4), are the least squares appraisals of the βs.
It is helpful to communicate both the n conditions in (1) and the k + 1 condition
in (3) and (4) (which depend on straight capacity of the βs) in a lattice structure.
Model (1) can be communicated as
y = Xβ + (4)
The crime expands, and the measure of detainees doesn’t diminish; this shows a
negative relationship and would, by expansion, have a negative connection coeffi-
cient. A positive correlation coefficient would be the relationship between crime and
prisoners’ strength; as crime increases, so does the prison crowd. As we can see in
the correlation matrix, there is no relation between crime and prison strength, so we
Identifying the Impact of Crime in Indian Jail Prison Strength … 21
5 Conclusion
References
1. Asuero AG, Sayago A, González AG (2006) The correlation coefficient: an overview. Crit Rev
Anal Chem 36(1):41–59. https://doi.org/10.1080/10408340500526766
2. Wang Z, Lu J, Beccarelli P, Yang C (2021) Neighbourhood permeability and burglary: a case
study of a city in China. Intell Build Int 1–18. https://doi.org/10.1080/17508975.2021.1904202
3. Mukaka MM (2012) Malawi Med J 24, no. September:69–71. https://www.ajol.info/index.php/
mmj/article/view/81576
22 S. S. kshatri and D. Singh
4. Andresen MA (2007) Location quotients, ambient populations, and the spatial analysis of crime
in Vancouver, Canada. Environ Plan A Econ Sp 39(10):2423–2444. https://doi.org/10.1068/
a38187
5. Clipper S, Selby C (2021) Crime prediction/forecasting. In: The encyclopedia of research
methods in criminology and criminal justice, John Wiley & Sons, Ltd, 458–462
6. Zhu H, You X, Liu S (2019) Multiple ant colony optimization based on pearson correlation
coefficient. IEEE Access 7:61628–61638. https://doi.org/10.1109/ACCESS.2019.2915673
7. Hu K, Li L, Liu J, Sun D (2021) DuroNet: a dual-robust enhanced spatial-temporal learning
network for urban crime prediction. ACM Trans Internet Technol 21, 1. https://doi.org/10.
1145/3432249
8. Kshatri SS, Singh D, Narain B, Bhatia S, Quasim MT, Sinha GR (2021) An empirical analysis
of machine learning algorithms for crime prediction using stacked generalization: an ensemble
approach. IEEE Access 9:67488–67500. https://doi.org/10.1109/ACCESS.2021.3075140
9. Albahli S, Alsaqabi A, Aldhubayi F, Rauf HT, Arif M, Mohammed MA (2020) Predicting the
type of crime: intelligence gathering and crime analysis. Comput Mater Contin 66(3):2317–
2341. https://doi.org/10.32604/cmc.2021.014113
10. Spelman W (1995) The severity of intermediate sanctions. J Res Crime Delinq 32(2):107–135.
https://doi.org/10.1177/0022427895032002001
11. Caplan JM, Kennedy LW, Miller J (2011) Risk terrain modeling: brokering criminological
theory and GIS methods for crime forecasting. Justice Q 28(2):360–381. https://doi.org/10.
1080/07418825.2010.486037
12. Johnson SD, Birks DJ, McLaughlin L, Bowers KJ, Pease K (2008) Prospective crime mapping
in operational context: final report. London, UK Home Off. online Rep., vol. 19, no. September,
pp. 07–08. http://www-staff.lboro.ac.uk/~ssgf/kp/2007_Prospective_Mapping.pdf
13. Wicks M (2016) Forecasting the future of fish. Oceanus 51(2):94–97
14. McNeal GS (2014) Drones and aerial surveillance: considerations for legislators, p 34. https://
papers.ssrn.com/abstract=2523041.
15. Fatih T, Bekir C (2015) Police Use of Technology To Fight, Police Use Technol. To Fight
Against Crime 11(10):286–296
16. Katz CM et al (2014) Evaluating the impact of officer worn body cameras in the Phoenix Police
Department. Centre for Violence Prevention and Community Safety, Arizona State University,
December, pp 1–43
17. Krakowski MI, Czobor P (2013) Depression and impulsivity as pathways to violence: implica-
tions for antiaggressive treatment. Schizophr Bull 40(4):886–894. https://doi.org/10.1093/sch
bul/sbt117
18. Kshatri SS, Narain B (2020) Analytical study of some selected classification algorithms and
crime prediction. Int J Eng Adv Technol 9(6):241–247. https://doi.org/10.35940/ijeat.f1370.
089620
19. Osisanwo FY, Akinsola JE, Awodele O, Hinmikaiye JO, Olakanmi O, Akinjobi J (2017) Super-
vised machine learning algorithms: classification and comparison. Int J Comput Trends Technol
48(3):128–138. https://doi.org/10.14445/22312803/ijctt-v48p126
Visual Question Answering Using
Convolutional and Recurrent Neural
Networks
1 Introduction
“Visual Question Answering” is a topic that inculcates the input as an image and a set
of questions corresponding to a particular image which when fed to neural networks
and machine learning models generate an answer or multiple answers. The purpose
of building such systems is to assist the advanced tasks of computer vision like object
detection and automatic answering by machine learning models when receiving the
data in the form of images or in even advanced versions, receiving as video data. This
task is very essential when we consider research objectives in artificial intelligence. In
recent developments of AI [1], the importance of image data and integration of tasks
involving textual and image forms of input is huge. Visual question-answering task
will sometimes be used to answer open-ended questions, otherwise multiple choice,
or close-ended answers. In our methodology, we have considered the formulation
of open-ended answers instead of close-ended ones because in the real world, we
see that most of the human interactions involve non-binary answers to questions.
Open-ended questions are a part of a much bigger pool of the set of answers, when
compared to close-ended, binary, or even multiple choice answers.
Some of the major challenges that VQA tasks face is computational costs, exe-
cution time, and the integration of neural networks for textual and image data. It is
practically unachievable and inefficient to implement a neural network that takes into
account both text features and image features and learns the weights of the network to
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 23
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_3
24 A. Azade et al.
make decisions and predictions. For the purposes of our research, we have considered
the state-of-the-art dataset which is publically available. The question set that could
be formed using that dataset is very wide. For instance one of the questions for an
image containing multiple 3-D shapes of different colors can be “How many objects
of cylinder shape are present?” [1]. As we can see this question pertains to a very
deep observation, similar to human observation. After observing, experimenting,
and examining the dataset questions we could see that each answer requires multi-
ple queries to converge to an answer. Performing this task requires knowledge and
application of natural language processing techniques in order to analyze the textual
question and form answers. In this paper, we discuss the model constructed using
Convolutional Neural Network layers for processing image features and Recurrent
Neural Network based model for analyzing text features.
2 Literature Survey
A general idea was to take features from a global feature vector by convolution
network and to basically extract and encode the questions using a “lstm” or long
short term memory networks. These are then combined to make out a consolidated
result. This gives us great answers but it fails to give accurate results when the answers
or questions are dependent on the specific focused regions of images.
We also came across the use of stacked attention networks for VQA by Yang [3]
which used extraction of the semantics of a question to look for the parts and areas
of the picture that related to the answer. These networks are advanced versions of
the “attention mechanism” that were applied in other problem domains like image
caption generation and machine translation, etc. The paper by Yang [3] proposed a
multiple-layer stacked attention network.
This majorly constituted of the following components: (1) A model dedicated to
image, (2) a separate model dedicated to the question, which can be implemented
using a convolution network or a Long Short Term Memory (LSTM) [8] to make
out the semantic vector for questions, and (3) the stacked attention model to see
and recognize the focus and important part and areas of the image. But despite its
promising results, this approach had its own limitations.
Research by Yi et al. [4] in 2018 proposed a new model, this model had multiple
parts or components to deal with images and questions/answers. They made use of a
“scene parser”, a “question parser” and something to execute. In the first component,
Mask R-CNN was used to create segmented portions of the image. In the second
component meant for the question, they used a “seq2seq” model. The component
used for program execution was made using modules of python that would deal with
the logical aspects of the questions in the dataset.
Focal visual-text attention for visual question-answering Liang et al. [5] This
model (Focal Visual Text Attention) combines the sequence of image features gen-
erated by the network, text features of the image, and the question. Focal Visual Text
Attention used a hierarchical approach to dynamically choose the modalities and
Visual Question Answering Using Convolutional and Recurrent Neural Networks 25
snippets in the sequential data to focus on in order to answer the question, and so can
not only forecast the proper answers but also identify the correct supporting argu-
ments to enable people to validate the system’s results. Implemented on a smaller
dataset and not tested against more standard datasets.
Visual Reasoning Dialogs with Structural and Partial Observations Zhu et al. [7]
Nodes in this Graph Neural Network model represent dialog entities (title, question
and response pairs, and the unobserved questioned answer) (embeddings). The edges
reflect semantic relationships between nodes. They created an EM-style inference
technique to estimate latent linkages between nodes and missing values for unob-
served nodes. (The M-step calculates the edge weights, whereas the E-step uses
neural message passing (embeddings) to update all hidden node states.)
3 Dataset Description
4 Proposed Method
After reading about multiple techniques and models used to approach VQA task,
we have used CNN+LSTM as the base approach for the model and worked our way
up. CNN-LSTM model, where Image features and language features are computed
separately and combined together and a multi-layer perceptron is trained on the
combined features. The questions are encoded using a two-layer LSTM, while the
visuals are encoded using the last hidden layer of CNN. After that, the picture features
are l2 normalized. Then the question and image features are converted to a common
used to form a fixed length vector and simple feed forward network to extract the
features. Refer Fig. 3 for the proposed model.
4.1 Experiment 1
4.1.1 CNN
A CNN takes into account the parts and aspects of an input fed to the network
as an image. The importance termed as weights and biases in neural networks is
assigned based on the relevance of the aspects of the image and also points out
what distinguishes them. A ConvNet requires far less pre-processing than other
classification algorithms. CNN model is shown in Fig. 4. We have used mobilenetv2
in our CNN model. MobileNetV2 is a convolutional neural network design that as
the name suggests is portable and in other words “mobile-friendly”. It is built on
an inverted residual structure, with residual connections between bottleneck levels.
MobileNetV2 [9] is a powerful feature extractor for detecting and segmenting objects.
The CNN model consists of the image input layer, mobilenetv2 layer, and global
average pooling layer.
4.1.2 MobileNetV2
In MobileNetV2, there are two types of blocks. A one-stride residual block is one of
them. A two-stride block is another option for downsizing. Both sorts of blocks have
three levels. 1 × 1 convolution using ReLU6 is the initial layer, followed by depthwise
convolution. The third layer employs a 1 × 1 convolution with no non-linearity.
4.1.3 LSTM
where
x1 = Output from CNN,
x2 = Output from LSTM,
Out = Concatenation of x1 and x2.
After this, we will create a dense layer consisting of a softmax activation function
with the help of TensorFlow. Then we will give CNN output, LSTM output, and
the concatenated dense layer to the model. Refer Fig. 6 for overall architecture. The
adam optimizer and sparse categorical cross-entropy loss were used to create this
model. For merging the two components, we have used element-wise multiplication
and fed it to the network to predict answers.
4.2 Experiment 2
As a first step, we have preprocessed both the image data and the text data, i.e.,
the questions given as input. For this experiment, we have used a CNN model for
extracting features from the image dataset. In Fig. 8, we have represented the model
architecture used in the form of block representation. The input image of 64 * 64 is
given as the input shape and fed to further layers. Then through a convolution layer
with eight 3 × 3 filters using “same” padding, the output of this layer results in 64
× 64 × 8 dimensions. Then we used a maxpooling layer to reduce it to 32 × 32 ×
8, further the next convolution layer uses 16 filters and generates in 32 × 32 × 16.
Again with the use of maxpooling layer, it cuts the dimension down to 16 × 16 ×
16. And finally, we flatten it to obtain the output of the 64 × 64 image in form of
4096 nodes. Refer Fig. 7.
In this experiment instead of using a complex RNN architecture to extract the
features from the text part that is the questions. We have used the bag of words
technique to form a fixed length vector and simple feedforward network to extract
the features refer to Fig. 8. The figure below represents the process. Here, we have
passed the bag of words to two fully connected layers and applied “tanh” activation
function to obtain the output. Both these components have been merged using the
element-wise multiplication as discussed in the previous section as well.
Fig. 7 CNN—Experiment 2
30 A. Azade et al.
5.1 Experiment 1
From Figs. 9 and 10 we can see that in the given image there are few solid and rubber
shapes having different colors. For this respective image, we have a question “What
number of small rubber balls are there”. For this question we have an actual answer
as 1. and our model also predicts the value as 1 which is correct.
5.2 Experiment 2
In the second experiment, we have considered a simpler form of the CLEVR dataset.
And as explained in the methodology uses different models and variations of the
approach. In Fig. 11 we can see that we have given an image and for that image we
have a question “Does this image not contain a circle?” and our model predicted the
correct answer as “No”.
Observing the gradual increase in accuracy with each epoch with positive changes
shows us that there is learning happening in our model at each step. Since calculating
the accuracy for a VQA task is not objective because of open-ended nature of the
questions. We have achieved a training accuracy of 90.01% and test accuracy of
85.5% Table 3, this is a decent result when compared to the existing methodologies
[1]. These results were observed on easy-VQA dataset.
32 A. Azade et al.
6 Conclusion
References
1. Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Zitnick CL, Parikh D (2015) Vqa: Visual
question answering. In: Proceedings of the IEEE international conference on computer vision,
pp 2425–2433
2. Dataset: https://visualqa.org/download.html
3. Yang Z, He X, Gao J, Deng L, Smola A (2016) Stacked attention networks for image question
answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition,
pp 21–29
4. Yi K, Wu J, Gan C, Torralba A, Kohli P, Tenenbaum J (2018) Neural-symbolic vqa: disentan-
gling reasoning from vision and language understanding. Adv Neural Inf Process Syst 31
5. Liang J, Jiang L, Cao L, Li LJ, Hauptmann AG (2018) Focal visual-text attention for visual
question answering. In: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp 6135–6143
6. Wu C, Liu J, Wang X, Li R (2019) Differential networks for visual question answering. Proc
AAAI Conf Artif Intell 33(01), 8997–9004. https://doi.org/10.1609/aaai.v33i01.33018997
7. Zheng Z, Wang W, Qi S, Zhu SC (2019) Reasoning visual dialogs with structural and partial
observations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, pp 6669–6678
8. https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-
introduction-to-lstm/
9. https://towardsdatascience.com/review-mobilenetv2-light-weight-model-image-
classification-8febb490e61c
10. Liu Y, Zhang X, Huang F, Tang X, Li Z (2019) Visual question answering via attention-based
syntactic structure tree-LSTM. Appl Soft Comput 82, 105584. https://doi.org/10.1016/j.asoc.
2019.105584, https://www.sciencedirect.com/science/article/pii/S1568494619303643
11. Nisar R, Bhuva D, Chawan P (2019) Visual question answering using combination of LSTM
and CNN: a survey, pp 2395–0056
12. Kan C, Wang J, Chen L-C, Gao H, Xu W, Nevatia R (2015) ABC-CNN, an attention based
convolutional neural network for visual question answering
13. Sharma N, Jain V, Mishra A (2018) An analysis of convolutional neural networks for image
classification. Procedia Comput Sci 132, 377–384. ISSN 1877-0509. https://doi.org/10.1016/
j.procs.2018.05.198, https://www.sciencedirect.com/science/article/pii/S1877050918309335
14. Staudemeyer RC, Morris ER (2019) Understanding LSTM–a tutorial into long short-term
memory recurrent neural networks. arXiv:1909.09586
Visual Question Answering Using Convolutional and Recurrent Neural Networks 33
15. Zabirul Islam M, Milon Islam M, Asraf A (2020) A combined deep CNN-LSTM network for
the detection of novel coronavirus (COVID-19) using X-ray images. Inform Med Unlocked
20, 100412. ISSN 2352-9148. https://doi.org/10.1016/j.imu.2020.100412
16. Boulila W, Ghandorh H, Ahmed Khan M, Ahmed F, Ahmad J (2021) A novel CNN-LSTM-
based approach to predict urban expansion. Ecol Inform 64. https://doi.org/10.1016/j.ecoinf.
2021.101325, https://www.sciencedirect.com/science/article/pii/S1574954121001163
Brain Tumor Segmentation Using Deep
Neural Networks: A Comparative Study
Pankaj Kumar Gautam, Rishabh Goyal, Udit Upadhyay, and Dinesh Naik
1 Introduction
In a survey conducted in 2020, in USA about 3,460 children were diagnosed with
the brain tumor having age under 15 years, and around 24,530 adults [1]. Tumors
like Gliomas are most common, they are less threatening (lower grade) in a case
where the expectancy of life is of several years or more threatening (higher grade)
where it is almost two years. One of the most common medications for tumors is
brain surgery. Radiation and Chemotherapy have also been used to regulate tumor
growth that cannot be separated through surgery. Detailed images of the brain can
be obtained using Magnetic resonance imaging (MRI). Brain tumor segmentation
from MRI can significantly impact improved diagnostics, growth rate prediction, and
treatment planning.
There are some categories of tumors like gliomas, glioblastomas, and menin-
giomas. Tumors such as meningiomas can be segmented easily, whereas the other
two are much harder to locate and segment [2]. The scattered, poorly contrasted, and
extended arrangements make it challenging to segment these tumors. One more dif-
ficulty in segmentation is that they can be present in any part of the brain with nearly
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 35
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_4
36 P. Kumar Gautam et al.
any size-shape. Depending on the type of MRI machine used, the identical tumor
cell may vary based on gray-scale values when diagnosed at different hospitals.
There are three types of tissues that form a healthy brain: white matter, gray matter,
and cerebro-spinal fluid [3]. The tumor image segmentation helps in determining the
size, position, and spread [4]. Since glioblastomas are permeated, the edges are
usually blurred and tough to differentiate from normal brain tissues. T1-contrasted
(T1-C), T1, T2 (spin-lattice and spin-spin relaxation, respectively) pulse sequences
are frequently utilized as a solution [5]. Every sort of brain tissue receives a nearly
different perception due to the differences between the modalities.
Segmenting brain tumors using the 2-pathway CNN design has already been
proven to assist achieve reasonable accuracy and resilience [6, 7]. The research veri-
fied their methodology on MRI scan datasets of BRATS 2013 and 2015 [7]. Previous
investigations also used encoder-decoder-based CNN design that uses autoencoder
architectures. The research attached a different path to the end of the encoder section
to recreate the actual scan image [8]. The purpose of adopting the autoencoder path
was to offer further guidance and regularization to the encoder section because the
size of the dataset was restricted [9]. In the past, the Vgg and Resnet designs were
used to transfer learning for medical applications such as electroencephalograms
(EEG). “EEG is a method of measuring brainwaves that have been often employed
in brain-computer interface (BCI) applications” [10].
In this research, segmentation of tumors in the brain using two different CNN
architectures is done. Modern advances in Convolutional Neural Network designs
and learning methodologies, including Max-out hidden nodes and Dropout regular-
ization, were utilized in this experiment. The BRATS-13 [11] dataset downloaded
from the SMIR repository is available for educational use. This dataset was used
to compare our results with the results of previous work [6]. In pre-processing, the
one percent highest and lowest intensity levels were removed to normalize data.
Later, the work used CNN to create a novel 2-pathway model that memorizes local
brain features and then uses a two-stage training technique which was observed to be
critical in dealing with the distribution of im-balanced labels for the target variable
[6]. Traditional structured output techniques were replaced with a unique cascaded
design, which was both effective and theoretically superior. We proposed a U-net
machine learning model for further implementation, which has given extraordinary
results in image segmentation [12].
The research is arranged as follows. Section 2 contains the methodology for the
research, which presents two different approaches for the segmentation of brain
tumor, i.e., Cascade CNN and U-net. Section 3 presents empirical studies that
include a description of data, experimental setup, and performance evaluation met-
rics. Section 4 presents the visualization and result analysis, while Sect. 5 contains
the conclusion of the research.
Brain Tumor Segmentation Using Deep Neural Networks: A Comparative Study 37
2 Methodology
This section presents the adopted methodology based on finding the tumor from the
MRI scan of the patients by using two different architectures based on Convolutional
Neural Networks (CNN). (A) Cascaded CNN [6], and (B) U-net [12]. First, we mod-
eled the CNN architecture based on the cascading approach and then calculated the
F1 score for all three types of cascading architecture. Then secondly, we modeled the
U-net architecture and calculated the dice score and the dice loss for our segmented
tumor output. Finally, we compared both these models based on dice scores. Figure 1
represents the adopted methodology in our research work. The research is divided
into two parallel which represents the two approaches described above. The results
of these two were then compared based on the F1 score and Dice loss.
38 P. Kumar Gautam et al.
The architecture includes two paths: a pathway with 13 * 13 large receptive and 7
* 7 small receptive fields [6]. These paths were referred to as the global and local
pathway, respectively as shown in Fig. 2. This architectural method is used to predict
the pixel’s label to be determined by 2 characteristics: (i) visible features of the area
nearby the pixel, (ii) location of the patch. The structure of the two pathways is as
follows:
1. Local: The 1st layer is of size (7, 7) and max-pooling of (4, 4), and the 2nd one is
of size (3, 3). Because of the limited neighborhood and visual characteristics of
the area around the pixel, the local path processes finer details because the kernel
is smaller.
2. Global: The layer is of size (13, 13), Max-out is applied, and there is no max-
pooling in the global path, giving (21, 21) filters.
Two layers for the local pathway were used to concatenate the primary-hidden
layers of both pathways, with 3 * 3 kernels for the 2nd layer. This signifies that the
effective receptive field of features in the primary layer of each pathway is the same.
Also, the global pathway’s parametrization models feature in that same region more
flexibly. The union of the feature maps of these pathways is later supplied to the final
output layer. The “Softmax” activation is applied to the output activation layer.
The 2-Path CNN architecture was expanded using a cascade of CNN blocks. The
model utilizes the first CNN’s output as added inputs to the hidden layers of the
secondary CNN block.
This research implements three different cascading designs that add initial con-
volutional neural network results to distinct levels of the 2nd convolutional neural
network block as described below [6]:
1. Input cascade CNN: The first CNN’s output is directly applied to the second CNN
(Fig. 3). They are thus treated as additional MRI images scan channels of the input
patch.
2. Local cascade CNN: In the second CNN, the work ascends up a layer in the local
route and add to its primary-hidden layer (Fig. 4).
3. Mean-Field cascade CNN: The work now goes to the end of the second CNN
and concatenates just before the output layer (Fig. 5). This method is similar
to computations performed in Conditional random fields using a single run of
mean-field inference.
40 P. Kumar Gautam et al.
2.3 U-Net
The traditional convolutional neural network architecture helps us predict the tumor
class but cannot locate the tumor in an MRI scan precisely and effectively. Applying
segmentation, we can recognize where objects of distinct classes are present in our
image. U-net [13] is a Convolutional Neural Network (CNN) modeled in the shape
of “U” that is expanded with some changes in the traditional CNN architecture. It
was designed to semantically segment the bio-medical images where the target is to
classify whether there is contagion or not, thus identifying the region of infection or
tumor.
CNN helps to learn the feature mapping of an image, and it works well for clas-
sification problems where an input image is converted into a vector used for classi-
fication. However, in image segmentation, it is required to reproduce an image from
this vector. While transforming an image into a vector, we already learned the fea-
ture mapping of the image, so we use the same feature maps used while contracting
to expand a vector to a segmented image. The U-net model consists of 3 sections:
Encoder, Bottleneck, and Decoder block as shown in Fig. 6. The encoder is made of
many contraction layers. Each layer takes an input and employs two 3 * 3 convolu-
tions accompanied by a 2 * 2 max-pooling. The bottom-most layer interferes with
the encoder and the decoder blocks. Similarly, each layer passes the input to two con-
volutional layers of size 3 * 3 for the encoder, accompanied by a 2 * 2 up-sampling
layer, which follows the same as encoder blocks.
To maintain symmetry, the amount of feature maps gets halved. The number of
expansion and contraction blocks is the same. After that, the final mapping passes
through another 3 * 3 convolutional layer with an equal number of features mapped
as that of the number of segments.
Brain Tumor Segmentation Using Deep Neural Networks: A Comparative Study 41
3 Empirical Studies
This section presents the dataset description, experimental setup, data pre-processing,
and metrics for performance evaluation.
3.1 Dataset
BRATS-13 MRI dataset [11] was used for the research. It consists of actual patient
scans and synthetic scans created by SMIR (SICAS medical image repository). The
size is around 1Gb and was stored in “Google drive” for further use. Dataset consists
of synthetic and natural images. Each category contains MRI scans for high-graded
gliomas (HG) and low-graded gliomas (LG). There are 25 patients with synthetic
HG and LG scans and 20 patients with actual HG, and ten patients with actual LG
scans. Dataset consists of four modalities (different types of scans) like T1, T1-C,
T2, and FLAIR. For each patient and each modality, we get a 3-D image of the brain.
We’re concatenating these modalities as four channels slice-wise. Figure 7 shows
tumors along with their MRI scan. We have used 126th slice for representation. For
HG, the dimensions are (176, 216, and 160). Image in gray-scale represents the MRI
scan, and that in blue-colored represents the tumor for their respective MRI scans.
The research was carried out using Google Colab, which provides a web interface to
run Jupyter notebooks free of cost. “Pandas” and “Numpy” libraries were used for
data pre-processing, CNN models were imported from “Keras” library for segmen-
tation, and “SkLearn” is used for measuring different performance metrics like F1
score (3), and Dice loss (4). Also, the MRI scan data was the first download under
the academic agreement and is then uploaded on Colab. For data pre-processing,
42 P. Kumar Gautam et al.
multiple pre-processing steps have been applied to the dataset as presented in the
next section. The data was split into 70:30 for training and testing data, respectively.
First, slices of MRI scans where the tumor information was absent were removed
from the original dataset. This will help us in minimizing the dataset without affecting
the results of the segmentation. Then the one percent highest and lowest intensity
levels were eliminated. Intensity levels for T1 and T1-C modalities were normalized
using N4ITK bias field correction [14]. Also, the image data is normalized in each
input layer by subtracting the average and then dividing it by the standard deviation
of a channel. Batch normalization was used because of the following reasons:
1. Speeds up training makes the optimization landscape much smoother, producing a
more predictive and constant performance of gradients, allowing quicker training.
2. In the case of “Batch Norm” we can use a much larger learning rate to get to the
minima, resulting in fast learning.
We have used various performance metrics for comparing both model performance.
Precision, Recall, F1-Score, and Dice Loss were selected as our performance param-
eters. Precision (Pr) is the proportion between the True Positives and all the Positives.
Brain Tumor Segmentation Using Deep Neural Networks: A Comparative Study 43
The Recall (Re) is the measure of our model perfectly identifying True Positives.
F1-Score is a function of Recall and Precision and is the harmonic mean of both. It
helps in considering both the Recall and Precision values. Finally, accuracy is the
fraction of predictions our model got correct.
TP
Precision(Pr) = (1)
FP + T P
TP
Recall(Re) = (2)
FN + T P
Pr × Re
F1 Score = 2 × (3)
Pr + Re
where True Positive (TP) represents that the actual and the predicted labels co-
respond to the same positive class. True Negative (TN) represents that the actual
and the predicted label co-responds to the same negative class. False Positive (FP)
tells that the actual label belongs to a negative class; however, the predicted label
belongs to a positive class. It is also called the Type-I error. False Negative (FN) or
the Type-II error tells that the actual labels belong to a positive class; however, the
model predicted it into a negative class.
2× i pi × gi
Lossdice = (4)
i ( pi + gi )
2 2
Also, Dice loss (Lossdice ) (4) measures how clean and meaningful boundaries
were calculated by the loss function. Here, pi and gi represent pairs of corresponding
pixel values of predicted and ground truth, respectively. Dice loss considers the loss
of information both locally and globally, which is critical for high accuracy.
This section presents the results after performing both the Cascaded and U-net archi-
tecture.
The three cascading architectures were trained on 70% of data using the “cross-
entropy loss” and “Adam Optimizer.” Testing was done on the rest 30% of the data,
and then the F1 score was computed for all the three types of cascading architecture
(as shown in Table 1). The F1 score of Local cascade CNN is the highest for the
44 P. Kumar Gautam et al.
research; also, it is very similar to the previous work done by [6]. For Input Cascade
and MF cascade, the model has a difference of around 4% compared to previous
work.
Figure 8a, b shows the results for the segmentation on two instances of test MRI
scan images. The segmented output was compared with the ground truth and was
concluded that the model was able to get an accurate and precise boundary of the
tumor from the test MRI scan image dataset.
4.2 U-Net
This deep neural network (DNN) architecture is modeled using the Dice Loss, which
takes account information loss both globally and locally and is essential for high
accuracy. Dice loss varies from [0, 1], where 0 means that the segmented output and
the ground truth do not overlap at all, and 1 represents that both the segmented result
Brain Tumor Segmentation Using Deep Neural Networks: A Comparative Study 45
and the ground truth image are fully overlapped. We achieved a dice loss of 0.6863
on our testing data, which means most of our segmented output is similar in terms
of boundaries and region with ground truth images.
Figure 9 shows the results for the segmentation on three random instances of
test MRI scan images. From left to right, we have the MRI scan, the Ground truth
image, then we have segmented output from Cascade CNN, and finally, we have
the segmented output for the U-net model. The segmented output was compared
to ground truth, and the model was capable of obtaining an accurate and precise
boundary of the tumor from the test MRI scan image dataset.
From Fig. 9 it was concluded that the U-net model performs better than the
Cascaded architecture in terms of F1 score.
5 Conclusions
The research used convolutional neural networks (CNN) to perform brain tumor
segmentation. The research looked at two designs (Cascaded CNN and U-net) and
analyzed their performance. We test our findings on the BRAT 2013 dataset, which
contains authentic patient images and synthetic images created by SMIR. Significant
46 P. Kumar Gautam et al.
performance was produced using a novel 2-pathway model (which can represent the
local features and global meaning), extending it to three different cascading models
and represent local label dependencies by piling 2 convolutional neural networks.
Two-phase training was followed, which allowed us to model the CNNs when the
distribution has un-balanced labels efficiently. The model using the cascading archi-
tecture could reproduce almost similar results compared with the base paper in terms
of F1 score. Also, in our research, we concluded that the Local cascade CNN per-
forms better than the Local and MF cascade CNN. Finally, the research compared
the F1 score of cascaded architecture and U-net model, and it was concluded that
the overall performance of the semantic-based segmentation model, U-net performs
better than the cascaded architecture. The Dice loss for the U-net was 0.6863, which
describes that our model produces almost similar segmented images like that of the
ground truth images.
References
1. ASCO: Brain tumor: Statistics (2021) Accessed 10 Nov 2021 from https://www.cancer.net/
cancer-types/brain-tumor/statistics
2. Zacharaki EI, Wang S, Chawla S, Soo Yoo D, Wolf R, Melhem ER, Davatzikos C (2009)
Classification of brain tumor type and grade using mri texture and shape in a machine learning
scheme. Magn Reson Med: Off J Int Soc Magn Reson Med 62(6):1609–1618
3. 3T How To: Structural MRI Imaging—Center for Functional MRI - UC San Diego. Accessed
10 Nov 2021 from https://cfmriweb.ucsd.edu/Howto/3T/structure.html
4. Rajasekaran KA, Gounder CC (2018) Advanced brain tumour segmentation from mri images.
High-Resolut Neuroimaging: Basic Phys Princ Clin Appl 83
5. Lin X, Zhan H, Li H, Huang Y, Chen Z (2020) Nmr relaxation measurements on complex
samples based on real-time pure shift techniques. Molecules 25(3):473
6. Havaei M, Davy A, Warde-Farley D, Biard A, Courville A, Bengio Y, Pal C, Jodoin PM,
Larochelle H (2017) Brain tumor segmentation with deep neural networks. Med Image Anal
35, 18–31
7. Razzak MI, Imran M, Xu G (2018) Efficient brain tumor segmentation with multiscale two-
pathway-group conventional neural networks. IEEE J Biomed Health Inform 23(5):1911–1919
8. Myronenko, A (2018) 3d mri brain tumor segmentation using autoencoder regularization. In:
International MICCAI brainlesion workshop. Springer, Berlin, pp 311–320
9. Aboussaleh I, Riffi J, Mahraz AM, Tairi H (2021) Brain tumor segmentation based on deep
learning’s feature representation. J Imaging 7(12):269
10. Singh D, Singh S (2020) Realising transfer learning through convolutional neural network and
support vector machine for mental task classification. Electron Lett 56(25):1375–1378
11. SMIR: Brats—sicas medical image repository (2013) Accessed 10 Nov 2021 from https://
www.smir.ch/BRATS/Start2013
12. Yang T, Song J (2018) An automatic brain tumor image segmentation method based on the
u-net. In: 2018 IEEE 4th international conference on computer and communications (ICCC).
IEEE, pp 1600–1604
13. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image
segmentation. In: Medical image computing and computer-assisted intervention (MICCAI).
LNCS, vol 9351, pp 234–241. Springer, Berlin
14. Tustison NJ, Avants BB, Cook PA, Zheng Y, Egan A, Yushkevich PA, Gee JC (2010) N4itk:
improved n3 bias correction. IEEE Trans Med Imaging 29(6), 1310–1320
Predicting Bangladesh Life Expectancy
Using Multiple Depend Features
and Regression Models
1 Introduction
The word “life expectancy” refers to how long a person can expect to live on average
[1]. Life expectancy is a measurement of a person’s projected average lifespan. Life
expectancy is measured using a variety of factors such as the year of birth, current age,
and demographic sex. A person’s life expectancy is determined by his surroundings.
Surrounding refers to the entire social system, not just society. In this study, our target
area is the average life expectancy in Bangladesh, the nation in South Asia where
the average life expectancy is 72.59 years. Research suggests that the average life
expectancy depends on lifestyle, economic status (GDP), healthcare, diet, primary
education, and population. The death rate in the present is indeed lower than in the
past. The main reason is the environment. Lifestyle and Primary Education are among
the many environmental surroundings. Lifestyle depends on primary education. If a
person does not receive primary education, he will not be able to be health conscious
in any way. This can lead to premature death from the damage to the health of the
person. So that it affects the average life expectancy of the whole country. Indeed,
the medical system was not good before, so it is said that both the baby and the
mother would have died during childbirth. Many people have died because they did
not know what medicine to take, or how much to take because they did not have
the right knowledge and primary education. It is through this elementary education
that economic status (GDP) and population developed. The average lifespan varies
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 47
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_5
48 F. Tuj Jannat et al.
from generation to generation. We are all aware that our life expectancy is increasing
year after year. Since its independence in 1971, Bangladesh, a poor nation in South
Asia, has achieved significant progress in terms of health outcomes. There was the
expansion of the economic sector. There were a lot of good things in the late twentieth
century and amifications all around the globe.
In this paper, we used some features for a measure of life expectancies such as
GDP, Rural Population Growth (%), Urban Population Growth (%), Services Value,
Industry Value Food Production, Permanent Cropland (%), Cereal production (metric
tons), Agriculture, forestry, and fishing value (%). We will measure the impact of
these depending on features to predict life expectancy. Use various regression models
to find the most accurate model in search of find life expectancy of Bangladesh with
these depending on features. It will assist us in determining which feature aids in
increasing life expectancy. This research aids a country in increasing the value of its
features for life expectancy and also finding which regression model performs best
for predicting life expectancy.
2 Literature Review
Several studies on life expectancy have previously been produced by several different
researchers. As part of the literature review, we are reporting a few past studies to
understand the previously identified factors.
Beeksma et al. [2] obtained data from seven different healthcare facilities in
Nijmegen, the Netherlands, with a set of 33,509 EMRs dataset. The accuracy of
their model was 29%. While clinicians overestimated life expectancy in 63 percent
of erroneous prognoses, causing delays in receiving adequate end-of-life care, his
model which was the keyword model only overestimated life expectancy in 31%
of inaccurate prognoses. Another study by Nigri et al. [3] worked on recurrent
neural networks with a long short-term memory, which was a new technique for
projecting life expectancy, and lifespan discrepancy was measured. Their projec-
tions appeared to be consistent with the past patterns and offered a more realistic
picture of future life expectancy and disparities. The LSTM model, ARIMA model,
DG model, Lee-Carter model, CoDa model, and VAR model are examples of applied
recurrent neural networks. It is shown that both separate and simultaneous projec-
tions of life expectancy and lifespan disparity give fresh insights for a thorough
examination of the mortality forecasts, constituting a valuable technique to identify
irregular death trajectories. The development of the age-at-death distribution assumes
more compressed tails with time, indicating a decrease in longevity difference across
industrialized nations. Khan et al. [4] analyzed gender disparities in terms of disabil-
ities incidence and disability-free life expectancy (DFLE) among Bangladeshi senior
citizens. They utilized the data from a nationwide survey that included 4,189 senior
people aged 60 and above, and they employed the Sullivan technique. They collected
Predicting Bangladesh Life Expectancy Using Multiple Depend … 49
the data from the Bangladeshi household income and expenditure survey (HIES)-
2010, a large nationwide survey conducted by the BBS. The data-collecting proce-
dure was a year-long program. There was a total of 12,240 households chosen, with
7,840 from rural regions and 4,400 from urban areas. For a total of 55,580 people,
all members of chosen homes were surveyed. They discovered that at the age of 70,
both men and women can expect to spend more than half of their lives disabled and
have a significant consequence for the likelihood of disability, as well as the require-
ment for the usage of long-term care services and limitations, including, to begin
with, the study’s data is self-reported. Due to a lack of solid demographic factors, the
institutionalized population was not taken into consideration. The number of senior
individuals living in institutions is tiny, and they have the same health problems and
impairments as the elderly in the general population.
Tareque, et al. [5] explored the link between life expectancy and disability-free
life expectancy (DFLE) in the Rajshahi District of Bangladesh by investigating the
connections between the Active Aging Index (AAI) and DFLE. Data were obtained
during April 2009 from the Socio-Demographic status of the aged population and
elderly abuse study project. They discovered that urban, educated, older men are
more engaged in all parts of life and have a longer DFLE. In rural regions, 93 percent
of older respondents lived with family members, although 45.9% of nuclear families
and 54.1 percent of joint families were noted. In urban regions, however, 23.4 percent
were nuclear families and 76.6 percent were joint families, and they face restrictions
in terms of several key indicators, such as the types and duration of physical activity.
For a post-childhood-life table, Preston and Bennett’s (1983) estimate technique was
used. Because related data was not available, the institutionalized population was
not examined. Tareque et al. [6] multiple linear regression models, as well as the
Sullivan technique, were utilized. They based their findings on the World Values
Survey, which was performed between 1996 and 2002 among people aged 15 and
above. They discovered that between 1996 and 2002, people’s perceptions of their
health improved. Males predicted fewer life years spent in excellent SRH in 2002
than females, but a higher proportion of their expected lives were spent in good
SRH. The study has certain limitations, such as the sample size being small, and the
institutionalized population was not included in the HLE calculation. The subjective
character of SRH, as opposed to health assessments based on medical diagnoses,
may have resulted in gender bias in the results. In 2002, the response category “very
poor” was missing from the SRH survey. In 2002, there’s a chance that healthy
persons were overrepresented. Tareque et al. [6] investigated how many years older
individuals expect to remain in excellent health, as well as the factors that influence
self-reported health (SRH). By integrating SRH, they proposed a link between LE and
HLE. The project’s brief said that it was socioeconomic and demographic research of
Rajshahi district’s elderly population (60 years and over). They employed Sullivan’s
approach for solving the problem. For their work, SRH was utilized to estimate HLE.
They discovered that as people became older, LE and anticipated life in both poor
and good health declined. Individuals in their 60 s were anticipated to be in excellent
health for approximately 40% of their remaining lives, but those in their 80 s projected
just 21% of their remaining lives to be in good health, and their restrictions were
50 F. Tuj Jannat et al.
more severe. The sample size is small, and it comes from only one district, Rajshahi;
it is not indicative of the entire country. As a result, generalizing the findings of this
study to the entire country of Bangladesh should be approached with caution. The
institutionalized population was not factored into the HLE calculation.
Ho et al. [7] examine whether decreases in life expectancy happened across high-
income countries from 2014 to 2016 with 18 nations. They conducted a demo-
graphic study based on aggregated data and data from the WHO mortality database,
which was augmented with data from Statistics Canada and Statistics Portugal, and
their contribution to changes in life expectancy between 2014 and 2015. Arriaga’s
decomposition approach was used. They discovered that in the years 2014–15, life
expectancy fell across the board in high-income nations. Women’s life expectancy
fell in 12 of the 18 nations studied, while men’s life expectancy fell in 11 of them.
They also have certain flaws, such as the underreporting of influenza and pneu-
monia on death certificates, the issue of linked causes of death, often known as the
competing hazards dilemma, and the comparability of the cause of death coding
between nations. Meshram et al. [8] for the comparison of life expectancy between
developed and developing nations, Linear Regression, Decision Tree, and Random
Forest Regressor were applied. The Random Forest Regressor was chosen for the
construction of the life expectancy prediction model because it had R2 scores of 0.99
and 0.95 on training and testing data, respectively, as well as Mean Squared Error
and Mean Absolute Error of 4.43 and 1.58. The analysis is based on HIV or AIDS,
Adult Mortality, and Healthcare Expenditure, as these are the key aspects indicated
by the model. This suggests that India has a higher adult mortality rate than other
affluent countries due to its low healthcare spending.
Matsuo et al. [9] investigate survival predictions using clinic laboratory data in
women with recurrent cervical cancer, as well as the efficacy of a new analytic tech-
nique based on deep-learning neural networks. Alam et al. [10] using annual data
from 1972 to 2013 investigate the impact of financial development on Bangladesh’s
significant growth in life expectancy. The unit root properties of the variables are
examined using a structural break unit root test. In their literature review, they mention
some studies on the effects of trade openness and foreign direct investment on life
expectancy. Using annual data from 1972 to 2013, investigate the impact of finan-
cial development on Bangladesh’s significant growth in life expectancy. The unit
root properties of the variables are examined using a structural break unit root test.
In their literature review, they mention some studies on the effects of trade open-
ness and foreign direct investment on life expectancy. Furthermore, the empirical
findings support the occurrence of counteraction in long-run associations. Income
disparity appears to reduce life expectancy in the long run, according to the long-run
elasticities. Finally, their results provide policymakers with fresh information that is
critical to improving Bangladesh’s life expectancy. Husain et al. [11] conducted a
multivariate cross-national study of national life expectancy factors. The linear and
log-linear regression models are the first regression models. The data on explana-
tory factors comes from UNDP, World Bank, and Rudolf’s yearly statistics releases
(1981). His findings show that if adequate attention is paid to fertility reduction
Predicting Bangladesh Life Expectancy Using Multiple Depend … 51
and boosting calorie intake, life expectancies in poor nations may be considerably
enhanced.
3 Proposed Methodology
In any research project, we must complete numerous key stages, including data
collecting, data preparation, picking an appropriate model, implementing it, calcu-
lating errors, and producing output. To achieve our aim, we use the step-to-step
working technique illustrated in Fig. 1.
Preprocessing, which includes data cleaning and standardization, noisy data filtering,
and management of missing information, is necessary for machine learning to be
done. Any data analysis will succeed if there is enough relevant data. The information
was gathered from Trends Economics. The dataset contained data from 1960 to 2020.
Combine all of the factors that are linked to Bangladesh’s Life Expectancy. We
replaced the null values using the mean values. We examined the relationship where
GDP, Rural Population Growth (%), Urban Population Growth (%), Services Value,
Industry Value Food Production, Permanent Cropland (%), Cereal production (metric
tons), Agriculture, forestry, and fishing value (%) were the independent features and
Life Expectancy (LE) being the target variable. We separated the data into two subsets
to test the model and develop the model: A total of 20% of the data was used for
testing, with the remaining 80% divided into training subsets.
Multiple Linear Regression (MLR): A statistical strategy [12] for predicting the
outcome of a variable using the values of two or more variables is known as multiple
linear regression. Multiple regression is a type of regression that is an extension of
linear regression. The dependent variable is the one we’re trying to forecast, and the
independent or explanatory elements are employed to predict its value. In the case
of multiple linear regression, the formula is as follows in “(1)”.
Y = β0 + β1 X1 + β2 X2 + . . . .. + βn Xn + ∈ (1)
On the basis of their prediction, error, and accuracy, the estimated models are
compared and contrasted.
Mean Absolute Error (MAE): The MAE is a measure for evaluating regression
models. The MAE of a model concerning the test set is the mean of all individual
prediction errors on all occurrences in the test set. For each event, a prediction error is
a difference between the true and expected value. Following is the formula in “(2)”.
1
n
MAE = |Ai − A| (2)
n i=1
Mean Squared Error (MSE): The MSE shows us how close we are to a collection
of points. By squaring the distances between the points and the regression line, it
achieves this. Squaring is required to eliminate any undesirable signs. Inequalities
with greater magnitude are also given more weight. The fact that we are computing
the average of a series of errors gives the mean squared error its name. The better
the prediction, the smaller the MSE. The following is the formula in “(3)”.
1
n
MSE = |Actual − Prediction| (3)
n i=1
Root Mean Square Error (RMSE): The RMSE measures the distance between
data points and the regression line, and the RMSE is a measure of how to spread out
these residuals. The following is the formula in “(4)”.
54 F. Tuj Jannat et al.
n
1
RMSE = |Actual − Prediction| (4)
n i=1
Figure 3 shows that the data reveals the value of GDP that has risen steadily over
time. As a consequence, GDP in 1960 was 4,274,893,913.49536, whereas GDP in
2020 was 353,000,000,000. It was discovered that the value of GDP had risen. The
two factors of life expectancy and GDP are inextricably linked. The bigger the GDP,
the higher the standard of living will be. As a result, the average life expectancy
may rise. Life expectancy is also influenced by service value and industry value.
The greater the service and industry values are, the better the quality of life will
be. As can be seen, service value and industry value have increased significantly
year after year, and according to the most recent update in 2020, service value has
increased significantly and now stands at 5,460,000,000,000. And the industry value
was 7,540,000,000,000, which has a positive impact on daily life. Food production
influences life expectancy and quality of life. Our level of living will improve if our
food production is good, and this will have a positive influence on life expectancy.
From 1990 to 2020, food production ranged between 26.13 and 109.07. Agriculture,
forestry, and fishing value percent are also shortly involved with life expectancy.
Fig. 4 Population growth of a Urban Area (%) and b Rural Area (%)
Figure 4a, b shows there are two types of population growth; rural and urban. In
the 1990s century urban population percent was more than rural and year by year
rural population growth decreased and urban population growth increased. The level
of living improves as more people move to the city.
Figure 2 shows that life expectancy and rural population growth have a nega-
tive relationship. We can see how these characteristics are intertwined with life
expectancy and have an influence on how we live our lives. Its worth has fluctuated
over time. Its value has fluctuated in the past, increasing at times and decreasing at
others. We drop Rural population growth and Agriculture, Forestry, and Fishing value
as it was having a negative correlation and less correlation between life expectancy.
Table 1 shows that we utilize eight different regression models to determine which
models are the most accurate. Among all the models, the Extreme Gradient Boosting
Regressor has the best accuracy and the least error. It was 99 percent accurate. The
accuracy of K-Neighbors, Random Forest, and Stacking Regressor was 94 percent.
Among them, Slightly Stacking had the highest accuracy. We utilized three models
for the stacking regressor: K-Neighbors, Gradient Boosting, and Random Forest
Regressor, and Random Forest for the meta regressor. Among all the models, the
Decision Tree has the lowest accuracy at 79 percent. With 96 percent accuracy, the
Gradient Boosting Regressor comes in second. 88 percent and 87 percent for Multiple
Linear Regression and Light Gradient Boosting Machine Regressor, respectively.
The term “life expectancy” refers to the average amount of time a person can
anticipate to live. Life expectancy is a measure of a person’s projected average
lifespan. Life expectancy is calculated using a variety of factors such as the year
of birth, current age, and demographic sex. Figure 5 shows the accuracy among all
the models. The Extreme Gradient Boosting Regressor has the best accuracy.
Predicting Bangladesh Life Expectancy Using Multiple Depend … 57
Table 1 Error and accuracy comparison between all the regressor models
Models MAE MSE RMSE ACCURACY
Multiple linear regression 1.46 8.82 2.97 88.07%
K-Neighbors regressor 0.96 4.17 2.04 94.35%
Decision tree regressor 2.63 15.30 3.91 79.32%
Random forest regressor 1.06 4.28 2.06 94.21%
Stacking regressor 1.02 3.90 1.97 94.72%
Gradient boosting regressor 0.94 2.43 1.55 96.71%
Extreme gradient boosting regressor 0.58 0.44 0.66 99.39%
Light gradient boosting machine regressor 2.62 9.57 3.09 87.06%
References
1. Rubi MA, Bijoy HI, Bitto AK (2021) Life expectancy prediction based on GDP and population
size of Bangladesh using multiple linear regression and ANN model. In: 2021 12th international
conference on computing communication and networking technologies (ICCCNT), pp 1–6.
https://doi.org/10.1109/ICCCNT51525.2021.9579594.
2. Beeksma M, Verberne S, van den Bosch A, Das E, Hendrickx I, Groenewoud S (2019)
Predicting life expectancy with a long short-term memory recurrent neural network using
electronic medical records. BMC Med Informat Decision Making 19(1):1–15.
3. Nigri A, Levantesi S, Marino M (2021) Life expectancy and lifespan disparity forecasting: a
long short-term memory approach. Scand Actuar J 2021(2):110–133
4. Khan HR, Asaduzzaman M (2007) Literate life expectancy in Bangladesh: a new approach of
social indicator. J Data Sci 5:131–142.
5. Tareque MI, Hoque N, Islam TM, Kawahara K, Sugawa M (2013) Relationships between the
active aging index and disability-free life expectancy: a case study in the Rajshahi district of
Bangladesh. Canadian J Aging/La Revue Canadienne du vieillissement 32(4):417–432
6. Tareque MI, Islam TM, Kawahara K, Sugawa M, Saito Y (2015) Healthy life expectancy and
the correlates of self-rated health in an ageing population in Rajshahi district of Bangladesh.
Ageing & Society 35(5):1075–1094
7. Ho JY, Hendi AS (2018) Recent trends in life expectancy across high income countries:
retrospective observational study. bmj 362
8. Meshram SS (2020) Comparative analysis of life expectancy between developed and devel-
oping countries using machine learning. In 2020 IEEE Bombay Section Signature Conference
(IBSSC), pp 6–10. IEEE
9. Matsuo K, Purushotham S, Moeini A, Li G, Machida H, Liu Y, Roman LD (2017) A pilot
study in using deep learning to predict limited life expectancy in women with recurrent cervical
cancer. Am J Obstet Gynecol 217(6):703–705
10. Alam MS, Islam MS, Shahzad SJ, Bilal S (2021) Rapid rise of life expectancy in Bangladesh:
does financial development matter? Int J Finance Econom 26(4):4918–4931
11. Husain AR (2002) Life expectancy in developing countries: a cross-section analysis. The
Bangladesh Development Studies 28, no. 1/2 (2002):161–178.
12. Choubin B, Khalighi-Sigaroodi S, Malekian A, Kişi Ö (2016) Multiple linear regression,
multi-layer perceptron network and adaptive neuro-fuzzy inference system for forecasting
precipitation based on large-scale climate signals. Hydrol Sci J 61(6):1001–1009
13. Kramer O (2013) K-nearest neighbors. In Dimensionality reduction with unsupervised nearest
neighbors, pp 13–23. Springer, Berlin, Heidelberg
14. Joshi N, Singh G, Kumar S, Jain R, Nagrath P (2020) Airline prices analysis and prediction
using decision tree regressor. In: Batra U, Roy N, Panda B (eds) Data science and analytics.
REDSET 2019. Communications in computer and information science, vol 1229. Springer,
Singapore. https://doi.org/10.1007/978-981-15-5827-6_15
A Data-Driven Approach to Forecasting
Bangladesh Next-Generation Economy
1 Introduction
Although Bangladesh ranks 92nd in terms of landmass, it now ranks 8th in terms
of people, showing that Bangladesh is the world’s most populous country. After a
9-month length and deadly battle, in 1971, under the leadership of Banga Bandhu
Sheikh Mujibur Rahman., the Father of the Nation, a war of freedom was waged.
We recognized Bangladesh as an independent sovereign country. But Bangladesh,
despite being a populous country, remains far behind the wealthy countries of the
world [1] and the developed world, particularly in economic terms. Bangladesh
is a developing country with a primarily agricultural economy. According to the
United Nations, it is a least developed country. Bangladesh’s per capita income was
$12.5992 US dollars in March 2016. It increased to 2,084 per capita in August
2020. However, according to our population, this is far too low. The economy of
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 59
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_6
60 Md. M. H. Shohug et al.
2 Literature Review
Many papers, articles, and research projects focus on text categorization, text recog-
nition, and categories, while some focus on particular points. Here are some of the
work reviews that have been provided.
Hassan et al. [5] used the Box-Jenkins method to develop an ARIMA method
for the Sudan GDP from 1960 to 2018 and evaluate the autoregressive and moving
normal portions’ elective ordering. The four phases of the Box-Jenkins technique are
performed to produce an OK ARIMA model. They used MLE to evaluate the model.
From the monetary year 1972 to the financial year 2010, Anam et al. [6] provide a
period series model based on Agriculture’s contribution to GDP. In this investigation,
they discovered the ARIMA (1, 2, 1) methods to be a useful method for estimating
Bangladesh’s annual GDP growth rate. From 1972 to 2013, Sultana et al. [7] used
univariate analysis to time series data on annual rice mass production in Bangladesh.
The motivation of this study was to analyze the factors that influence the behavior
of ARIMA and ANN. The backpropagation approach was used to create a simple
ANN model with an acceptable amount of hubs or neurons in a single secret layer,
variable edge worth, and swotting value [8]. The values of RMSE, MAE, and MAPE
are used. The findings revealed that the ANN’s estimated blunder is significantly
A Data-Driven Approach to Forecasting Bangladesh Next-Generation Economy 61
larger than the selected ARIMA’s estimated error. In this article, they considered the
ARIMA model and the ANN model using univariate data.
Wang et al. [9] used Shenzhen GDP for time series analysis, and the methodology
shows that the ARIMA method created using the B-J technique has more vaticination
validity. The ARIMA (3, 3, 5) method developed in this focus superior addresses the
principle of financial evolution and is employed to forecast the Shenzhen GDP over
the medium and long term. In light of Bangladesh’s GDP data from 1960 to 2017,
Miah et al. [10] developed an ARIMA method and forecasted. The used method was
ARIMA (autoregressive coordinated moving normal) (1, 2, 1). The remaining diag-
nostics included a correlogram, Q-measurement, histogram, and ordinariness test.
For solidity testing, they used the Chow test. In Bangladesh, Awal et al. [11] develop
an ARIMA model for predicting instantaneous rice yields. According to the review,
the best-fitted models for short-run expecting Aus, Aman, and Boro rice generation
were ARIMA (4,1,1), ARIMA (2,1,1), and ARIMA (2,2,3), respectively. Abonazel
et al. [12] used the Box-Jenkins approach to create a plausible ARIMA technique for
the Egyptian yearly GDP. The World Bank provided yearly GDP statistics figures for
Egypt from 1965 to 2016. They show that the ARIMA method is superior for esti-
mating Egyptian GDP (1, 2, 1). Lastly, using the fitted ARIMA technique, Egypt’s
GDP was front-projected over the next ten years.
From 2008–09 to 2012–13, Rahman et al. [13] used the ARIMA technique to
predict the Boro rice harvest in Bangladesh. The ARIMA (0,1,2) model was shown
to be excellent for regional, current, and absolute Boro rice proffering, respectively.
Voumik et al. [14] looked at annual statistics for Bangladesh from 1972 to 2019
and used the ARIMA method to estimate future GDP per capita. ARIMA is the
best model for estimating Bangladeshi GDP apiece, according to the ADF, PP, and
KPSS tests (0, 2, 1). Finally, in this study, we used the ARIMA method (0,2,1) to
estimate Bangladesh’s GDP apiece for the following 10 years. The use of ARIMA
demonstration techniques in the Nigeria Gross Domestic Product between 1980 and
2007 is depicted in this research study by Fatoki et al. [15]. Zakai et al. [16] examine
the quality of the International Monetary Fund’s (IMF) annual GDP statistics for
Pakistan from 1953 to 2012. To display the GDP, a number of ARIMA methods are
created using the Box-Jenkins approach. They discovered that by using the master
modeler technique and the best-fit model, they were able to achieve ARIMA (1,1,0).
Finally, using the best-fit ARIMA model, gauge values for the next several years have
been obtained. According to their findings, they were in charge of test estimates from
1953 to 2009, and visual representation of prediction values revealed appropriate
behavior.
To demonstrate and evaluate GDP growth rates in Bangladesh’s economy, Voumik
et al. [17] used the time series methods ARIMA and the method of exponen-
tial smoothing. World Development Indicators (WDI), a World Bank subsidiary,
compiled the data over a 37-year period. The Phillips-Perron (PP) and Augmented
Dickey-Fuller (ADF) trials were used to look at the fixed person of the features.
Smoothing measures are used to guess the rate of GDP growth. Furthermore, the
triple exceptional model outperformed all other Exponential Smoothing models in
terms of the lowest Sum of Square Error (SSE) and Root Mean Square Error (RMSE).
62 Md. M. H. Shohug et al.
Khan et al. [18] started the ball rolling. A time series model can assess the value-
added of financial hypotheses in comparison to the pure evaluative capacity of the
variable’s prior actions; continuous improvements in the analysis of time series that
suggest more current time series techniques might impart more precise standards for
monetary techniques. From the monetary years 1979–1980 to 2011–2012, the char-
acteristics of annual data on a modern commitment to GDP are examined. They used
two strategies to create their informative index: Holt’s straight Smoothing technique
and the Auto-Regressive Integrated Moving Average (ARIMA).
3 Methodology
The main goal of our research is to develop a model to forecast the Growth Domestic
Product (GDP) of Bangladesh. Our proposed model relevant theory is given below.
Autoregressive Model (AR): The AR model stands for the autoregressive model.
An auto-backward model is created when a value from a time series is reverted on
earlier gain against a comparable time series. This model has a request with the letter
“p” in it. The documentation AR indicates that request “p” has an auto-backward
model (p). In “(1)”, the AR(p) model is depicted.
where σ is the mean of the series, the parameters of the mode are σ0 , σ1 , σ3 ……….
σk, and the white noise error terms are αt−1 , αt−2 , αt−3 …αt-k
Autoregressive Integrated Moving Average Model (ARIMA): The Autoregressive
Integrated Moving Average model [19, 20] is abbreviated as ARIMA. In time series
data, a type of model may catch a variety of common transitory occurrences. ARIMA
models are factual models that are used to analyze and figure out time series data.
In the model, each of these elements is clearly stated as a border. ARIMA (p, d, q)
is a type of standard documentation in which the borders are replaced by numerical
A Data-Driven Approach to Forecasting Bangladesh Next-Generation Economy 63
attributes in order to recognize the ARIMA method. We may suppose that the ARIMA
(p, 1, q) method and the condition decide in “(3)” in this connected model.
In this equation here, Yt is defined as a combined of those (1) number and (2)
number equations. Therefore, Yt = Yt − Yt−k , to account for a linear trend in the
data, a first difference might be utilized.
Seasonal Autoregressive Integrated Moving Average Exogenous Model:
SARIMAX stands for Seasonal Autoregressive Integrated Moving Average Exoge-
nous model. The SARIMAX method is created by stretching the ARIMA technique.
This method has a sporadic component. As we’ve shown, ARIMA can make a non-
fixed time series fixed by modifying the pattern. By removing patterns and irregulari-
ties, the SARIMAX model may be able to handle a non-fixed time series. SARIMAX
grew as a result of the model’s limitations (P, D, Q, s). They are described as follows:
P This denotes the autoregressive seasonality’s order.
D This is the seasonal differentiation order.
Q This is the seasonality order of the moving average.
s This is mainly defining our season’s number of periods.
Akaike Information Criterion (AIC): The Akaike Information Criterion (AIC)
permits us to examine how good our model runs the enlightening record beyond
overfishing it. Furthermore, the AIC score pursues a method that gains a maximum
fairness of-fit rate and rebuffs them assuming they suit exorbitantly synthesis. With
no one else, the AIC score isn’t very useful except if we contrast it and the AIC score
of a contending time series model. It relied on the model with the lower AIC score
to find harmony between its capacity to fit the informational index and its capacity
to try not to over-fit the informational index. The formula of AIC value:
AI C = 2m − 2ln(δ) (4)
Here the parameters define that m = Number of model parameters. δ = δ(θ ) = highest
value of the possible function of the method. For my model here, θ = maximum
likelihood.
Autocorrelation Function (ACF): It demonstrates how data values in a time series
are correlated to the data values before them on the mean value.
Partial Autocorrelation Function (PACF): The theoretical PACF for an AR model
“closes off” once the model is solicited. The articulation “shut off” implies that the
partial auto-relationships are equivalent to 0 beyond that point on a fundamental
level. In other words, the number of non-zero halfway autocorrelations provides
the AR model with the request. The most ludicrous leeway of “Bangladesh GDP
development rate” that is used as a pointer is referred to as “demand for the model.”.
64 Md. M. H. Shohug et al.
Mean Square Error: The mean square error (MSE) is another strategy for assessing
an estimating method. Every error or leftover is squared. The quantity of perceptions
then added and partitioned these. This method punishes enormous determining errors
because the mistakes are squared, which is significant. The MSE is given by
1 2
n
MSE = yi − y i (5)
n i=1
Root Mean Square Error: Root mean square error is a commonly utilized fraction
of the difference between allying rate (test and real) by a technique or assessor and
the characteristics perceived. The RMSE is calculated as in “(6)”:
n
1 2
RMSE = yi − y i (6)
n i=1
We use the target variable Bangladesh GDP growth rate (according to the percentage)
from the year 1960 to 2021 collected from the World Bank database’s official website.
A portion of this typical data is shown in Table 1.
We showed the time series plots of the whole dataset from the year 1960 to 2021
on a yearly basis in Fig. 1 for both (a) GDP growth (annual%) data and (b) First
Difference of GDP growth (annual%). It is observed that there is a sharp decrease
in GDP growth from 1970 to 1972. After that, on an average, an upward trend is
observed but has another decrease in 2020 because of the spread of Coronavirus
infection.
In Fig. 2, decomposing of a time series involves the collection of level, trend,
seasonality, and noise components. The Auto ARIMA system provided the AIC
values for the several combinations of the p, d, and q values. The ARIMA model
with minimum AIC value is chosen and also suggested SARIMAX (0, 1, 1) function
Fig. 1 The time series plots of yearly a GDP growth (annual %) data and b first difference of GDP
growth (annual %)
to be used. According to this sequence, the ARIMA (p, d, q) which is ARIMA (0, 1,
1) and after that with auto ARIMA system, which is shown in Table 4. In this figure,
here auto ARIMA system is defined as a SARIMAX (0, 1, 1) for creating seasonality.
Here, this is dependent on the AIC value and makes the result in SARIMAX function
for the next fitted ARIMA model through the train data (Tables 2 and 3).
After finding the function and ARIMA (p, d, q) values in this dataset for fitting
the model, it divided the data into 80% as training data and the other 20% as test
data. In Table 5, ARIMA (0, 1, 1) model result is shown which is built by the training
data of this dataset.
After fitting the model with the training dataset, the values of the test data and
predicted data are shown in Fig. 6 and Table 7. Here for predicting the data using
SARIMAX seasonality is half of the year context; for this reason, the SARIMAX
function is defined as SARIMAX (0, 1, 1, 6) as it has shown less error compared to
the other seasonal orders. And this will be the best-fitted model, which is defined in
A Data-Driven Approach to Forecasting Bangladesh Next-Generation Economy 67
the model evaluation. The RMSE value, MAE value, and model accuracy are given
in Table 8, which suggested that SARIMAX (0, 1, 1, 6) model can be used as the
best model for predicting the GDP growth rate (annual %).
Figure 7 depicted the forthcoming 10 years Bangladesh GDP growth rate plot
after the model was evaluated. The built web application, GDP indicator [21] based
on time series ARIMA model, and Fig. 8a, which introduces the GDB indicator
application with the table of predicted GDP growth (%) values shown in Fig. 8b.
Table 8 Evaluation
Evaluation parameter for model SARIMAX (0, 1,1, Value
parameter values for the
6)
SARIMAX (0, 1, 1, 6) model
RMSE error value 0.991
MAE error value 0.827
Model accuracy 87.51%
According to our study, we are successfully predicting the Bangladesh GDP Growth
Rate with the machine learning time series ARIMA model with the order of (0, 1, 1).
Here, in this model, we found this model performs 87.51% accurately. This model is
verified with a minimum AIC value which is generated by the auto ARIMA function.
In this model, auto ARIMA defines SARIMAX (0, 1, 1) model which is observed by
A Data-Driven Approach to Forecasting Bangladesh Next-Generation Economy 69
Fig. 8 User interface of GDB indicator a homepage and b predicted GDP growth rate for the next
upcoming year (2022–2050)
the whole historical data. The half-yearly seasonality of this data observed this and,
after that, this model predicts automatically for the upcoming year. We implement
this Machine Learning time series ARIMA model on the web application as GDB
indicator-BD. Here users can find Bangladesh’s future GDP growth rate and they
can observe that yearBangladesh’s upcoming economy. In this dataset, we can also
implement another machine learning or upgraded deep learning model, but we cannot
implement this. So, we think that this is the gap in our research. In the future, we also
work on this data with multiple features and implement other upcoming and upgraded
models and I will show how it’s performed in this dataset for future prediction.
70 Md. M. H. Shohug et al.
References
A. K. Punnoose
1 Introduction
A. K. Punnoose (B)
Flare Speech Systems, Bangalore, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 71
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_7
72 A. K. Punnoose
recording and noise could be present. The task is to identify whether noise is present
in the speech. On the other hand, in VAD the default is a noisy recording and speech
could be present. The task is to identify whether speech is present in the recording.
The techniques developed for VAD can be used interchangeably with noisy speech
identification.
2 Problem Statement
3 Prior Work
Noisy speech detection is covered extensively in the literature. Filters like Kalman fil-
ter [2, 3] and spectral subtraction [4, 5] have been used to remove noise in speech. But
this requires an understanding of the nature of the noise, which is mostly infeasible.
A more generic way is to estimate the signal-to-noise ratio(SNR) of the recording and
use appropriate thresholding on SNR to filter out noisy recordings [6–9]. Voice activ-
ity detection is also extensively covered in the literature. Autocorrelation functions
and their various derivatives have been used extensively for voice activity detection.
Subband decomposition and suppression of certain sub-bands based on stationarity
assumptions on autocorrelation function are used for robust voice activity detection
[10]. Autocorrelation derived features like harmonicity, clarity, and periodicity pro-
vide more speech-like characteristics. Pitch continuity in speech has been exploited
for robust speech activity detection [11]. For highly degraded channels, GABOR
features along with autocorrelation derived features are also used [12]. Modulation
frequency is also used in conjunction with harmonicity for VAD [13].
Another very common method is to use mel frequency cepstral features with
classifiers like SVMs to predict speech regions [14]. Derived spectral features like
low short-time energy ratio, high zero-crossing rate ratio, line spectral pairs, spectral
flux, spectral centroid, spectral rolloff, ratio of magnitude in speech band, top peaks,
and ratio of magnitude under top peaks are also used to predict speech/non-speech
regions [15].
Sparse coding has been used to learn a combined dictionary of speech and noise
and then, remove the noise part to get the pure speech representation [16, 17]. The
correspondence between the features derived from the clean speech dictionary and the
speech/non-speech labels can be learned using discriminative models like conditional
random fields [18]. Along with sparse coding, acoustic-phonetic features are also
explored for speech and noise analysis [19].
From the speech intelligibility perspective, vowels remain more resilient to noise
[20]. Moreover, speech intelligibility in the presence of noise also depends on the
listener’s native language [21–24]. Any robust noisy speech identification system
A Cross Dataset Approach for Noisy Speech Identification 73
must take into consideration the inherent intelligibility of phonemes while scoring the
sentence hypothesis. The rest of the paper is organized as follows. The experimental
setup is first defined. Certain measures, that could be used to differentiate clean
speech from noisy speech, are explored. A scoring function is defined to score the
noisy speech. Simple thresholding on the scoring is used to differentiate noisy speech
and clean speech.
4 Experimental Setup
60 h of Voxforge dataset is used to train the MLP. The rationale behind using Vox-
forge data is its closeness to real-world conditions, in terms of recording, speaker
variability, noise, etc. ICSI Quicknet [25] is used for the training. Perceptual linear
coefficients (plp) along with delta and double-delta coefficients are used as the input.
Softmax layer is employed at the output. Cross entropy error is the loss function
used. Output labels are the standard English phonemes.
For a 9 plp frame window given as the input, MLP outputs a probability vector
with individual components corresponding to the phonemes. The phoneme which
gets the highest probability is treated as the top phoneme for that frame. The highest
softmax probability of the frame is termed as the top softmax probability of the frame.
A set of consecutive frames classified as the same phoneme constitutes a phoneme
chunk. The number of frames in a phoneme chunk is referred to as the phoneme
chunk size.
For the subsequent stages, TIMIT training set is used as the clean speech training
data. A subset of background noise data from the CHiME dataset [26] is mixed with
the TIMIT training set and is treated as the noisy speech training data. We label this
dataset as dtrain . dtrain is passed through the MLP to get the phoneme posteriors. From
the MLP posteriors, the required measures and distributions needed to detect noisy
speech recording are computed. A noisy speech scoring mechanism is defined. For
testing, the TIMIT testing set is used as a clean speech testing dataset. TIMIT testing
set mixed with a different subset of CHiME background noise is used as the noisy
speech testing data. This data is labeled as dtest
We define 2 new measures, phoneme detection rate and softmax probability of
clean and noisy speech. These measures are combined to get a recording level score,
which is used to determine the noise level in a recording.
For a phoneme p, let g be the ratio of the number of frames that got recognized as
true positives to the number of frames that got recognized as false positives, for clean
speech. Let h represent the same ratio for the noisy speech. The phoneme detection
nature of clean speech and noisy speech can be broadly classified into three cases.
74 A. K. Punnoose
In the first category, both g and h are low. In the second case, g is high and h is low.
In the third case, both g and h are high. A phoneme weighting function is defined as
⎧
⎨ x1 g < 1 and h < 1
f 1 ( p; g, h) = x2 g > 1 and h < 1 (1)
⎩
x3 g > 1 and h > 1
where xi = 1 and xi ∈ (0, 1]. This is not a probability distribution function. The
optimal values of x1 , x2 and x3 will be derived in the next section. Note that g and h
are computed from the clean speech and noisy speech training data. x3 corresponds
to the most robust phoneme while x1 corresponds to non-robust phoneme.
Figure 1 plots the density of top softmax probability of the frames of true positive
detections for the noisy speech. Figure 2 plots the same for false positive detections
of clean speech. Any approach to identify noisy recordings must be able to take into
account the subtle difference in these densities. As the plots are asymmetrical and
skewed, we use gamma distribution to model the density. The probability density
function of the gamma distribution is given by
β α x α−1 e−βx
f 2 (x; α, β) = (2)
Γ (α)
where
Γ (α) = (α − 1)! (3)
where α+ and β+ are the shape and rate parameters of the true positive detection of
noisy speech and α− and β− are the same for false positive detection of clean speech.
Using wi = f 1 ( pi ) and Ai = ff22 (qi ;α+ ,β+ )
(qi ;α− ,β− )
, Eq. 4 can be rewritten as
N
s = exp 1
N
ln(wi Ai ) (5)
i
which implies
76 A. K. Punnoose
1
N
s∝ ln(wi Ai ) (6)
N i
The conditions defined above can be expressed through appropriate values of the
variables x1 , x2 and x3 , which could be found by solving the following optimization
problem.
min ln(Ax1 ) + ln(Ax2 ) − 5 ln(Bx3 )
x1 ,x2 ,x3
s.t. ln(Ax1 ) − 2 ln(Bx3 ) > 0
ln(Ax2 ) − 3 ln(Bx3 ) > 0
x1 + x2 + x3 = 1
0 < xi ≤ 1
The objective function ensures that the inequalities are just satisfied. The Hessian of
the objective function is given by
⎡ −1 ⎤
x12
0 0
⎢ −1
0⎥
H =⎣0 x22 ⎦ (9)
5
0 0 x32
H is indefinite and the inequality constraints are not convex. Hence the standard
convex optimization approaches can’t be employed. In the training phase, the values
of A and B have to be found. For a given A and B, the values of x1 , x2 and x3 which
satisfy the inequalities have to be computed. As the optimization problem is in R 3 a
grid search will yield the optimal solution.
A Cross Dataset Approach for Noisy Speech Identification 77
Assume the same wi for all the frames, i.e., for every phoneme, the weightage is
the same. Now consider the scenario where a set of noisy speech recordings with a
roughly equal number of non-robust and robust frames are recognized, per recording.
And assume that Ai values are high for non-robust phonemes, and low for robust
phonemes. Then any threshold t, set for classification, will be dominated by the
non-robust phoneme frames. While testing, assume a noisy speech recording with
predominantly robust phonemes with low Ai values, then the recording level score s
will be less than the required threshold value t, thus effectively reducing the recall of
the system. To alleviate this issue, conditions are set on the weightage of phonemes
based on their robustness.
5 Results
The variable values A = 4.1 and B = 1.27 are computed from dtrain . The optimal
variable values x1 = 0.175, x2 = 0.148, x3 = 0.677 are obtained by grid search on
the variable space. With the optimal variable values, testing is done for noisy speech
recording identification on dtest . A simple thresholding on the recording level score
s is used as the decision mechanism. In this context, a true positive refers to the
identification of a noisy speech recording correctly. Figure 3 plots the ROC curve
for noisy speech recording identification. Note that silence phonemes are excluded
from all the computations.
In the ROC curve, it is evident that the utterance level scoring with equal weightage
for all the phonemes is not useful. But the differential scoring of phonemes based on
their recognition capability makes the utterance level scoring much more meaningful.
References
1. Renevey P, Drygajlo A (2001) Entropy based voice activity detection in very noisy conditions.
In: Proceedings of eurospeech, pp 1887–1890
2. Shrawankar U, Thakare V (2010) Noise estimation and noise removal techniques for speech
recognition in adverse environment. In: Shi Z, Vadera S, Aamodt A, Leake D (eds) Intelli-
gent information processing V. IIP 2010. IFIP advances in information and communication
technology, vol 340. Springer, Berlin
3. Fujimoto M, Ariki Y (2000) Noisy speech recognition using noise reduction method based
on Kalman filter. In: 2000 IEEE international conference on acoustics, speech, and signal
processing. Proceedings (Cat. No.00CH37100), vol 3, pp 1727–1730. https://doi.org/10.1109/
ICASSP.2000.862085
4. Boll S (1979) Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans
Acoust, Speech, Signal Process 27(2): 113–120. https://doi.org/10.1109/TASSP.1979.1163209
5. Mwema WN, Mwangi E (1996) A spectral subtraction method for noise reduction in speech
signals. In: Proceedings of IEEE. AFRICON ’96, vol 1, pp 382–385. https://doi.org/10.1109/
AFRCON.1996.563142
6. Kim C, Stern R (2008) Robust signal-to-noise ratio estimation based on waveform amplitude
distribution analysis. In: Proceedings of interspeech, pp 2598–2601
7. Papadopoulos P, Tsiartas A, Narayanan S (2016) Long-term SNR estimation of speech signals
in known and unknown channel conditions. IEEE/ACM Trans Audio, Speech, Lang Process
24(12): 2495–2506
A Cross Dataset Approach for Noisy Speech Identification 79
1 Introduction
S. Sahu (B)
Faculty, School of Computing Science & Engineering, VIT Bhopal University, Sehore
(MP) 466114, India
e-mail: [email protected]
S. Silakari
Professor, University Institute of Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya,
Bhopal, Madhya Pradesh, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 81
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_8
82 S. Sahu and S. Silakari
been extensively explored and discussed [2]. Some sensors fail to operate after their
estimated battery life has expired, reducing the network’s total lifespan and func-
tionality. Numerous researchers have made significant contributions to fault-related
obstacles such as sensor failures, coverage, connectivity issues, network partitioning,
data delivery inaccuracy, and dynamic routing, among others [3, 4].
2 Literature Review
coverage and connection. When a node fails, the “up to fail” node is evaluated and
replaced before the entire network fails. However, if the “up to fail” node cannot
be replaced, a quick rerouting method has been suggested to redirect the routed
traffic initially through the “up to fail” node. The performance assessment of the
proposed technique indicated that the number of nodes suitable for the “up to fail”
node replacement is dependent on characteristics such as the node redundancy level
threshold and network density [10].
Numerous researchers have examined different redundancy mechanisms in
WSNs, including route redundancy, time redundancy or temporal redundancy, data
redundancy, node redundancy, and physical redundancy [10]. These strategies maxi-
mize energy efficiency and assure WSNS’s dependability, security, and fault toler-
ance. When the collector node detects that the central cluster head (CH) has failed, it
sends data to the backup cluster head (CH) rather than simultaneously broadcasting
data to the leading CH and backup CH. IHR’s efficacy was compared to Dual Homed
Routing (DHR) and Low-Energy Adaptive Clustering Hierarchy (LEACH) [11].
In this [12] paper, the authors offer a novel fault-tolerant sensor node scheduling
method, named FANS (Fault-tolerant Adaptive Node Scheduling), that takes into
account not only sensing coverage but also sensing level. The suggested FANS algo-
rithm helps retain sensor coverage, enhance network lifespan, and achieve energy
efficiency.
Additionally, it may result in data loss if sensors or CHs are affected (forwarders).
Fault-tolerant clustering techniques can replace failed sensors with other redundant
sensors and keep the network stable. These approaches allow for replacing failing
sensors with other sensors, maintaining the network’s stability.
We have extended the article proposed in [8] and proposed a robust distributed
clustered fault-tolerant scheduling which is based on the redundancy check algo-
rithm (sweep-line approach [7, 8]) that provides the number of redundant sensors
for R. The proposed RDCFT determines the 1-coverage requirement precisely and
fast while ensuring the sensor’s redundancy eligibility criterion at a low cost and with
better fault tolerance capability at sensor and cluster level fault detection and replace-
ment. Additionally, we simulated and analyzed the suggested work’s correctness and
efficiency in various situations.
3 Proposed Work
ways will be invoked if fault detection and recovery modes are necessary. Why it is
called “two-way”? Because the proposed approach has two modes of execution: first,
when all sensors are internally deployed, i.e., at the start of the first network round,
and second, when all sensors are externally deployed (assuming a 100% energy
level). Following that, the second method is when the remaining energy level of all
sensors is 50% or less. The suggested process that we apply in our scheme consists of
two phases: randomly selecting a cluster head (CH) and forming a collection of clus-
ters. We should emphasize that we presume the WSNs employed in our method are
homogeneous. Figure 1 shows the clustered architecture of WSNs that we consider
in this section.
The failure of one or more sensors may disrupt connection and result in the
network being divided into several discontinuous segments. It may also result in
connection and coverage gaps in the surrounding region, which may damage the
monitoring process of the environment. The only way to solve this issue is to replace
the dead sensors with other redundant ones. Typically, the CH monitors the distri-
bution, processing of data and making judgments. When a CH fails, its replacement
alerts all sensors of its failure.
The first step of the fault management process is fault identification, the most critical
phase. However, errors must be identified precisely and appropriately. One challenge
is the specification of fault tolerance in WSNs; there is a trade-off between energy
usage and accuracy. As a result, we use a cluster-based fault detection approach that
A Robust Distributed Clustered Fault-Tolerant Scheduling for Wireless … 85
conserves node energy and is highly accurate. Our technique for detecting faults is
as follows [12–15]:
Detection of intra-cluster failures: If CH does not receive data from a node for
a preset length of time, it waits for the next period. Due to the possibility of data
loss due to interference and noise when the node is healthy, if CH does not receive
a packet after the second period, this node is presumed to be malfunctioning. As a
result, CH transmits a message to all surrounding CHs and cluster nodes, designating
this node with this ID as faulty.
Intra-cluster error detection: When CH obtains data from nodes that are physically
close together, it computes and saves a "median value" for the data. CH compares
newly collected data to the per-request "median value." When the difference between
the two values exceeds a predefined constant deviation, represented by CH detects
an error and declares the node that generated the data faultily. Again, CH notified all
surrounding CHs and nodes in his cluster that the node with this ID was faulty.
Detection of inter-cluster faults: CHs are a vital component of WSNs, and their
failure must be identified promptly. As a result, we employ this method. CHs commu-
nicate with other CHs regularly. This packet contains information on the cluster’s
nodes. If a CH cannot receive this packet from an adjacent CH, it is deemed faulty.
The sensors are assumed to be arranged randomly and densely over an R-shaped
dispersed rectangular grid. All sensors are identical in terms of sensing and commu-
nication ranges, as well as battery power consumption. Consider two sensors Si and
Sj , with a distance between them of Rs (Si and Sj ≤ Rs ). The sector apexed at Si with
an angle of 2α can be used to approximate the fraction of Si ’s sensing region covered
by Sj , as illustrated in Fig. 2. As indicated in Eq. 1, the angle can be computed using
the simple cosine rule, also explained in [6].
2 2 2
|Si p| + Si S j − S j p Si S j
2
cos α = 2 . Hence α = arcos (1)
2|Si p| Si S j 2R S
By their initial setup phase, each sensor creates a table of 1-hop detecting neigh-
bors based on received HELLO messages. The contribution of a sensor’s one-hop
detecting neighbors is determined. A sensor Sj is redundant for full-covered if its
1-hop active sensing neighbors cover the complete 360˚ circle surrounding it. To put
it another way, the union of the sectors contributed by sensors in its vicinity to cover
the entire 360˚ is defined as a sensor Sj ’s redundant criterion for full-covered. As a
result, the sensor Sj is redundant.
It is possible to accomplish this algorithmically by extending the sweep -ine-based
algorithm for sensor redundancy checking. Assume an imaginary vertical line sweeps
these intervals between 0˚ to 360˚. If the sweep-line intersects kp intervals from INj
86 S. Sahu and S. Silakari
and is in the interval ipi in ICQj , the sensor Sj is redundant in ip. If this condition holds
true for all intervals in ICQj , then the sensor Sj is redundant, as illustrated in Fig. 3 as a
flowchart for a sweep-line algorithm-based redundancy check of a sensor [7, 8]. We
hold a variable CCQ for the current CQ and a sweep-line status l for the length of an
interval from ICQj intersected by the sweep-line. Only when the sweep-line crosses
the left, or right terminus of an interval does its status change. As a result, the event
queue Q retains the endpoints of the intervals ICQj and INj .
• If sweep-line crosses the event left endpoint of ip in ICQj then CCQ is kp .
• If sweep-line crosses the event left endpoint of sp in INj then increment l.
• If sweep-line crosses the event right endpoint of sp in INj then decrement l.
Fig. 3 Flowchart for Redundancy check of a sensor (sweep-line based [7, 8])
A Robust Distributed Clustered Fault-Tolerant Scheduling for Wireless … 87
If the sweep-line status l remains greater than or equal to CCQ for the sweep
duration, the sensor node Sj serves as a redundant sensor for the full-covered. Figure 4
shows the transition state of a sensor, i.e., it can be either one of the states viz., active,
presleep, or sleep. This sleeping competition can be avoided using simple back-off
time.
The Cluster Head (CH) is chosen at random by the base station (BS) among the
cluster’s members. Then, CH will send a greeting message to all cluster members,
requesting their energy levels. The CH will communicate the energy levels to the
BS and then perform the hierarchy process. The BS will now establish a hierarchy
based on the excess energy. A CH will be created according to the hierarchy that has
been established. As the CH grows with each round, the hierarchy is built. The first
round’s CH will be the node with the most considerable energy storage capacity. The
second round includes the node with the second-highest energy level.
Similarly, the third round’s CH will comprise the three top nodes. Dynamically,
CH is selected for each round. The initial round continues until the highest node’s
energy level meets the energy level of the second-highest node. The second round
will continue until the first two nodes with the most significant energy levels reach
the third level. Likewise, for the third round, the process is repeated. As a result, time
allocation is also accomplished dynamically.
88 S. Sahu and S. Silakari
Sensors may be deterministically or randomly scattered at the target region for moni-
toring within the R. We propose a clustered fault-tolerant sensor scheduling consisting
of a sequence of algorithms to effectively operate the deployed WSN. Each sensor
executes the defined duties periodically in every round of the total network life-
time and periodically detects the faulty sensor nodes. The flowchart of the proposed
mechanism is also shown in Fig. 5.
Our proposed clustered fault-tolerant sensor scheduling protocol has the following
assumptions:
• All deployed sensor nodes are assigned a unique identifier (sensorid ).
• Sensors are homogeneous and all are locally synchronized.
• Sensors are densely deployed with static mode. (redundancy gives better fault
tolerance capability in WSNs).
• The set of active/alive sensor nodes is represented by {San } = {Sa1 }, Sa2 , Sa3 , …,
San }.
• The set of cluster head nodes is represented by {CHm } = {SCH1 , SCH2 , SCH3 , …,
SCHm }.
• The set of faulty sensor nodes is represented by {Sfn } = {Sf1 }, Sf2 , Sf3 , …, Sfn }.
90 S. Sahu and S. Silakari
Table 1 Simulation
Parameter Value
parameters
Number of sensor nodes 100, 200, 300, & random
Network area (meter2 ) 100 × 100
Clusters Differs
Distributed subregion size 30 m x 30 m
Initial level of energy in each 10 J
sensor
Energy consumption for 0.02 J
transmission
Energy consumption for 0.001 Jules
receiving
Communication & sensing 4 m- 3 m
ranges
Threshold energy in each 2J
sensor
Simulation & round time 1000–1500 s and 200 s
This section illustrates the experimental setup and the proposed algorithms’ findings.
We assessed the proposed algorithms’ performance using a network simulator [16].
The proposed protocol for RDCFT is simulated using NS2, and the parameters
utilized are shown in Table 1. RDCFT is being tested against existing methods
LEACH and randomized scenarios using the mentioned standard metrics and is
defined as follows in Table 1. The simulation is divided into the following steps:
The proposed approach is based on the redundancy of the sensor and CHs. This
clustering approach maximizes the longevity of the network. We have extended the
article proposed in [8] and the proposed method begins with detecting defects and
can find the faulty sensor nodes using fault detection algorithm and replace the faulty
sensor with redundant sensors for the same R. The fault detection process is carried
out by scheduled communication messages exchanged between nodes and CHs is
O (nlogn). Second, the approach commences a recovery period for CHs/common
sensors that have been retrieved with the help of redundant sensors using the proposed
92 S. Sahu and S. Silakari
Fig. 7 Measurements of average alive, CHs, Backup and faulty sensors for R
algorithm. This is a novel and self-contained technique since the proposed method
does not need communication with the BS/sink to work. Simulations are performed
to evaluate the efficiency and validity of the overall proposed works in terms of
energy consumption, coverage ratio, fault tolerance scheduling.
Our proposed efforts are based on WSNs’ largely two-dimensional architecture.
Future remarks should include 3D-based WSNs. Future studies will also address
three critical challenges in 3D-WSNs, including energy, coverage, and faults, which
may pave the way for a new approach to researching sustainable WSNs to optimize
the overall network lifetime.
References
1. Yick J, Mukherjee B, Ghosal D (2008) Wireless sensor network survey. Comput Netw
52(12):2292–2330
2. Alrajei N, Fu H (2014) A survey on fault tolerance in wireless sensor networks. Sensor COMM
09366–09371
3. Sandeep S, Sanjay S (2021) Analysis of energy, coverage, and fault issues and their impacts
on applications of wireless sensor networks: a concise survey. IJCNA 8(4):358–380
A Robust Distributed Clustered Fault-Tolerant Scheduling for Wireless … 93
1 Introduction
With the rapidly increasing availability of digital media, the ability to efficiently
process audio data has become very essential. Organizing audio documents with
topic labels is useful for sorting, filtering and efficient searching to find the most
relevant audio file. Audio scene recognition (ASR) involves identifying the location
and surrounding where the audio was recorded. It is similar to visual scene recognition
that involves identifying the environment of the image as a whole, with the only
difference that here it is applied to audio data [1–3].
In this paper, we attempt to perform ASR with topic modelling. Topic modelling
is a popular text mining technique that involves using the semantic structure of
documents to group similar documents based on the high-level subject discussed.
Assigning topic labels to documents helps for an efficient information retrieval
by yielding more relevant search results. In recent times, researchers have applied
topic modelling to audio data and achieved significant results. Topic modelling can
be extended to ASR due to the presence of analogous counterparts between text
documents and audio documents.
While an entire document can be split into words and then lemmatized, an audio
document can be segmented at the right positions to derive the words and each frame
can correspond to the lemmatization results. There are several advantages to using
audio over video for classification tasks such as the ease of recording, lesser storage
requirement, lesser pre-processing overhead and ease of streaming over networks.
ASR has many useful applications [4]. ASR aids in the development of intelligent
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 95
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_9
96 J. Sangeetha et al.
2 Related Work
and word topics and eventually creating a Bayesian form of PLSA. Mesaros et al.
[10] performed audio event detection with HMM model. PLSA was used for proba-
bilities prior to the audio events which were then transformed to derive the transition
probabilities. Hu et al. [11] improved the performance of LDA for audio retrieval
by modelling with a Gaussian distribution. It uses a multivariate distribution for the
topic and word distribution to alleviate the effects of vector quantization.
In this proposed work we adopted LSA and LDA to achieve ASR. As PLSA/LDA
based ASR algorithms [12–14] have been compared with this proposed algorithm, it
utilizes document event cooccurrence matrix, whereas in [12–14], document word
cooccurrence matrix has been used for analyzing the topic. This method extracts the
distribution of topics that would express the audio document in a better way, and then
also we can attain better recognition results. Common audio events suppression and
emphasizing unique topics are achieved by weighting the event distribution audio
documents.
LSA is a technique which uses vector-based representations for texts to map the
text model using the terms. LSA is a statistical model which compares the similarity
between texts based on their semantics. It is a technique used in information retrieval,
analyzing relationships between documents and terms, identifying the hidden topics,
etc., The LSA technique analyzes large corpus data and forms a document-term cooc-
currence matrix to find the existence of the term in the document. It is a technique
to find the hidden data in a document [15]. Every document and the terms are repre-
sented in the form of vectors with their corresponding elements related to the topics.
The amount of match for a document or term is found using vector elements. The
hidden similarities can be found by specifying the documents and terms in a common
way.
The proposed framework is shown in Fig. 3. First audio input vocabulary set is
prepared, then the document-term cooccurrence matrix is generated to finally classify
the audio.
Considering the audio vocabulary set as input, each frame is matched with a similar
term in the vocabulary for training the model. Then the document-term cooccurrence
matrix is counted which is represented as Ztrain In the training set, the labels of the
audio frames can be known previously to calculate the number of event term cooccur-
rence matrix Xtrain In the training dataset, if there are ‘J’ documents {d1 ,d2 ….dI }and
‘j’ audio events {ae1 ,ae2 ,…aei } and if the audio vocabulary set size is ‘I’, then the
matrix I x J represents Ztrain and the matrix I x j represents Xtrain . Then Ytrain denotes
the document event cooccurrence matrix j x J of a particular document dh and for a
particular event eg . Ytrain will take the form [paeh d g ] j × J. {p d g e h} is the (g,h)th
item of Ytrain , which is the representation of the distribution of document d g on the
event e h.
As many audio events occur simultaneously, the event term cooccurrence matrix
Xtrain must be counted with care for various audio documents. We can annotate as
many audio events but not more than 3 for a particular time interval. The audio
frame containing multiple labels has been presented for all audio events with equal
proportions to count the event term cooccurrence matrix in the statistics. For ‘m’
audio events, if we count the event term cooccurrence matrix, the result will be 1/m
while.
Different annotators will produce different results for the same set of audio events
for a given time interval. So we need at least three annotators to annotate the same
set of audio events for a given time interval [17]. Finally, we retain the event labels
annotated by more than one annotator and omit the rest. Here document-term cooc-
currence matrix Xtest of test case can be calculated by splitting the audio into terms
and matching the frames with the terms. Having the event term cooccurrence matrix
of test and training stages are the same, we derive the document event cooccurrence
matrix Ztrain of the test set which is similar to the training stage. The document event
cooccurrence for the test and training sets are obtained through Latent Semantic Anal-
ysis matrix factorization. Instead of LSA matrix factorization, we can also obtain the
Ztrain by counting the number of occurrences.
Weighing the audio event’s distribution is required for recognizing the influence
of the events. Topic distribution along with its feature set is the input for the classifier.
If the occurrence of the events reflects less topics, then they are less influential. But if
the occurrence of the events reflects few topics, then they are more influential. Using
entropy, we can find the influence of the events as mentioned in [18]. If there are t1
latent topics, then T will give the event topic distribution matrix [ paeh d g ] t1 x j where
h = 1,2, ···, t 1, g = 1,2, ···, j). The event distribution aeh on the topic dg is denoted
by p aeh d g . So the event entropy can be computed by using the vector E = E(aeh ),
where E(aeh ) denotes the value of entropy of the aeh th event can be computed using
the formula.
If the entropy value is too small the topic is very specific and if the entropy value
is larger the audio event will be common to many topics. So, we choose audio events
with smaller entropy values for classification. Using this entropy value, we calculate
the coefficient to find the influence of an audio event [19]. Vector z represented as
z(aeh ) represents the coefficient of the event aeh where the coefficient should be larger
than or equal to 1. We can design it as.
The document event distribution in Ytrain and Ytest can be found using the coef-
ficient vector z. By reframing the formula for document event distribution, we
get.
paehdg ⇐ z(aeh). paehdg where h = 1,2, …,j and g = 1,2, …,J.
100 J. Sangeetha et al.
4.3.1 Pre-processing
• The document-term matrix is taken as the input for topic models. The documents
are considered as rows in the matrix and the terms as columns.
• The size of the corpus is equal to the number of rows and the vocabulary size is
the number of columns.
• We should tokenize the document for representing the document with the
frequency of the term like stem words removal, punctuation removal, number
removal, stop word removal, case conversion and omission of low length terms.
• In a group of vocabulary, the index will map the exact document from where the
exact term was found.
Here the distribution of the topics is taken as a feature set to achieve topic
modelling. Ytrain (Fig. 1) and Ytest (Fig. 2) are broken to find the distribution of
topics for training and test documents of audio respectively. We can break Ytrain as
Y1train and Y2train, Ytest can be broken into Y1train and Y2test can be keeping Y1train
fixed. Y2train is a L2 x J matrix if there are L2 latent topics and each of its column
represents the training audio document’s topic distribution. Y2test is a L2 x J test
matrix if there are J test audio inputs and each of its column represents the test audio
document’s topic distribution. We consider this distribution of topics as a feature set
for the audio documents to perform our classification model using SVM. We adopt
a one–one multiclass classification technique in SVM to classify the audio scene
which has been used in many applications [1, 20].
5 Experimental Results
The proposed algorithm is tested by two publicly available dataset IEEE AASP
challenge and DEMAND (Diverse Environments Multi-channel Acoustic Noise
Database) dataset [21]. There are 10 classes such as tube, busy street, office, park,
Quiet Street, restaurant, open market, supermarket and bus. Each class consists of ten
audio files which consists of 30 s long, sampled in 44.1 k Hz and stereo. Diverse Envi-
ronments Multi-channel Acoustic Noise Database (DEMAND) dataset [22] offers
various types of indoor and outdoor settings and eighteen audio scene classes are
there, which includes kitchen, living, field, park, washing, river, hallway, office, cafe-
teria, restaurant, meeting, station, cafe, traffic, car, metro and bus. Each audio class
includes 16 recordings related to 16 channels. For experiments, only the first channel
recording is used and every recording is three hundred seconds long. Then it is sliced
into 10 equal documents, with 30 s long each. As a summary, the dataset DEMAND
contains 18 categories of audio scenes, each category has 10 audio files of 30 s long.
Audio Scene Classification Based on Topic Modelling and Audio Events … 101
Input Audio
Vocabulary
Classifier
Generate Audio
Output
In this present work, audio documents have been partitioned into 30 ms-long
frames spending 50% overlap of the hamming window; for every frame, 39 dimen-
sional MFCCs features were obtained as the feature set; the distribution of topic is
utilized for characterizing each audio document, which is given as the input for SVM,
after carrying out topic analysis through LSA/LDA. One-to-one strategy is followed
in SVM for multiclass type of classification and the kernel function has been taken
as RBF (Radial Basis Function). The evaluation of algorithms have been done, in
terms of classification accuracy,
Table 1 Classification
Dataset Topic model Algorithm Accuracy
performance document event
(DE) cooccurrence matrix AASP LSA Document event (DE) & 45.6
and document word (DW) LSA
cooccurrence matrix Document word (DW) & 60.1
LSA
LDA Document event (DE) & 46.9
LDA
Document word (DW) & 52.8
LDA
DEMAND LSA Document event (DE) & 62.1
LSA
Document word (DW) & 81.3
LSA
LDA Document event (DE) & 62.6
LDA
Document word (DW) & 76.5
LDA
Table 2 Performance of
Data Set Topic model Accuracy (%)
SVM on AASP and
DEMAND AASP LSA 61
LDA 55
DEMAND LSA 82
LDA 77
In this proposed approach, new ASR algorithm which utilizes document event cooc-
currence matrix for topic modelling instead of most widely used document word
cooccurrence matrix. The adopted technique outperforms well than the existing
matrix based topic modelling. To acquire the document event cooccurrence matrix
in more efficient method, this proposed work uses a matrix factorization method.
Even though this work found least results on AASP dataset, at least we have verified
that using the existing matrix for analyzing the topic is much better to go with the
proposed method matrix. As a future enhancement of our work, the deep learning
models can be considered as a reference, and motivated in using the neural network
to encompass in present system, by identifying the merits of topic models and neural
networks, the recognition performance can be improved.
Audio Scene Classification Based on Topic Modelling and Audio Events … 103
References
1 Introduction
A brain tumor is a cluster of irregular cells that form a group. Growth in this type
of area may cause issues that be cancerous. The pressure inside the skull will rise as
benign or malignant tumors get larger. This will harm the brain and may even result
in death Pereira et al. [1]. This sort of tumor affects 5–10 people per 100,000 in India,
and it’s on the rise [12]. Brain and central nervous system tumors are also the second
most common cancers in children, accounting for about 26% of childhood cancers. In
the last decade, various advancements have been made in the field of computer-aided
diagnosis of brain tumor. These approaches are always available to aid radiologists
who are unsure about the type of tumor or wish to visually analyze it in greater detail.
MRI (Magnetic Resonance Imaging) and CT-Scan (Computed Tomography) are two
methods used by doctors for detecting tumor but MRI is preferred so researchers
are concentrated on MRI. A major task of brain tumor diagnosis is segmentation.
Researchers are focusing on this task using Deep Learning techniques [3]. In medical
imaging, deep learning models have various advantages from the identification of
important parts, pattern recognition in cell parts, feature extraction, and giving better
results for the smaller dataset as well [3]. Transfer learning is a technique in deep
learning where the parameters (weights and biases) of the network are copied from
another network trained on a different dataset. It helps identify generalized features
in our targeted dataset with help of features extracted from the trained dataset. The
new network can now be trained by using the transferred parameters as initialization
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 105
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_10
106 T. Shelatkar and U. Bansal
(this is called fine-tuning), or new layers can be added on top of the network and
only the new layers are trained on the dataset of interest.
Deep learning is a subset of machine learning. It is used to solve complex prob-
lems with large amounts of data using an artificial neural network. The artificial
neural network is a network that mimics the functioning of the brain. The ‘deep’ in
deep learning represents more than one layer network. Here each neuron represents a
function and each connection has its weight. The network is trained using the adjust-
ment of weights which is known as the backpropagation algorithm. Deep learning
has revolutionized the computer vision field with increased accuracy on the complex
data set. Image analysis employs a specific sort of network known as a convolutional
network, which accepts photos as input and convolves them into a picture map using
a kernel. This kernel contains weight that changes after training.
A frequent practice for deep learning models is to use pre-trained parameters
on dataset. The new network can now be trained by using transferred parameters
as initialization (this is called fine-tuning), or new layers can be added on top of
the network and only the new layers are trained on the dataset of interest. Some
advantages of transfer learning are it reduces the data collection process as well it
benefits generalization. It decreases the training duration of a large dataset.
2 Motivation
The motivation behind this research is to build a feasible model in terms of time
and computing power so that small healthcare systems will also benefit from the
advancements in computer-aided brain tumor analysis. The model should be versatile
enough so that it can deal with customized data and provide an acceptable result by
using adequate time.
3 Literature Review
Various deep learning models have been employed for the diagnosis of brain tumor
but very restricted research has been done by using object detection models. Some
of the reviewed papers have been mentioned below.
Pereira and co-authors have used the modern deep learning model of the 3D Unet
model which helps in grading tumor according to the severity of the tumor. It achieves
up to 92% accuracy. It has considered two regions of interest first is the whole brain
and another is the tumors region of interest [1].
Neelum et al. achieve great success in the analysis of the problem as they use pre-
trained models DesNet and Inception-v3 model as which achieves 99% accuracy.
Feature concatenation has helped a great deal in improving the model [4].
Mohammad et al. have applied various machine learning algorithms like decision
tree, support vector machine, convolutional neural network, etc. as well as deep
Diagnosis of Brain Tumor Using Light Weight Deep Learning Model … 107
learning models, i.e., VGG 16, ResNET, Inception, etc. on the limited dataset of 2D
images without using any image processing techniques. The most successful model
was VGG19 which achieved 97.8% of F1 scope on top of the CNN framework. Some
points stated by the author were that there is trade off between the time complexity
and performance of the model. The ML method has lesser complexity and DL is
better in performance. The requirement of a benchmark dataset was also stated by
Majib et al. They have employed two methods FastAi and Yolov5 for the automation
of tumor analysis. But Yolov5 gains only 85% accuracy as compared to the 95% of
FastAI. Here they haven’t employed any transfer learning technique to compensate
for the smaller dataset [18].
A comprehensive study [7] is been provided on brain tumor analysis for small
healthcare facilities. The author has done a survey that listed various challenges in the
techniques. They have also proposed some advice for the betterment of techniques.
Al-masni et al. have used the YOLO model for bone detection. The YOLO method
relieves a whooping 99% accuracy. So here we can see that the YOLO model can
give much superior results in medical imaging [13].
Yale et al. [14] detected Melanoma skin disease using the YOLO network. The
result was promising even though the test was conducted on a smaller dataset. The
Dark Net Framework provided improved performance for the extraction of the fea-
ture. A better understanding of the working of YOLO is still needed.
Kang et al. [21] proposed a hybrid model of machine learning classifiers and deep
features. The ensemble of various DL methods with a classifier like SVM, RBF,
KNN, etc. The ensemble feature has helped the model for higher performance. But
author suggested the model developed is not feasible for real-time medical diagnosis.
Muhammad et al. [18] have studied various deep learning and transfer learning
techniques from 2015–2019. The author has identified challenges for the techniques
to be deployed in the actual world. Apart from the higher accuracy, the researchers
should also focus on other parameters while implementing models. Some concerns
highlighted are the requirement of end-to-end deep learning models, enhancement
in run time, reduced computational cost, adaptability, etc. The author also suggested
integrating modern technologies like edge computing, fog and cloud computing,
federated learning, GAN technique, and the Internet of Things.
As we have discussed various techniques are used in medical imaging and specif-
ically on MRI images of brain tumor. Classification, segmentation, and detection
algorithms were used but each one had its limitation. We can refer to Table 1 for a
better understanding of the literature review.
4 Research Gap
Although classification methods take fewer resources, they are unable to pinpoint
the specific site of a tumor. The segmentation methods which can detect exact loca-
tions take large amounts of resources. The existing models do not work efficiently
on the comparatively smaller dataset for small healthcare facilities. Harder to imple-
108 T. Shelatkar and U. Bansal
ment models by healthcare facilities with limited resources and custom data created.
Human intervention is needed for feature extraction and preprocessing of the dataset.
5 Our Contribution
Our model needs to consume less storage and computing resources. As we are keen
on designing a model which can be used by smaller healthcare facilities. So model
size must be smaller as well as it should be occupied lesser storage.
6.2 Reliability
The radiologist must beware of the false positive as they can’t directly rely on the
analysis as it may not be completely precise and the system should be only used by
the proper radiologists as our system can’t completely replace the doctors.
The system for brain tumor diagnosis must consume lesser time to be implemented
in the real world. The time complexity must be feasible even without the availability
of higher end systems at healthcare facilities.
110 T. Shelatkar and U. Bansal
7 Dataset
Various datasets are available for brain tumor analysis from 2D to 3D data. Since we
are focusing on the MRI data set it includes high-grade glioma, low-grade glioma,
etc. The images can be of 2D or 3D nature. The types of MRI are mostly of T1-
weighted scans. Some datasets are (a) multigrade brain tumor data set, (b) brain
tumor public data set, (c) cancer imaging archive, (d) brats, and (e) internet brain
segmentation repository.
Brats 2020 is an updated version of the brats dataset. The Brats dataset has been
used in organizing events from 2012 to up till now, they encourage participants to
research their own collected dataset. The Brats 2017–2019 varies largely from all
the previous versions. The Brats 20 is an upgraded version of this series. Figure 1
displays our selected dataset.
8.1 Yolov5
The three major architectural blocks of the YOLO family of models are the back-
bone, neck, and head. The backbone of YOLOv5 is CSPDarknet, which is used
to extract features from photos made up of cross-stage partial networks. YOLOv5
Neck generates a feature pyramids network using PANet to do feature aggregation
and passes it to Head for prediction. YOLOv5 Head has layers that provide object
detection predictions from anchor boxes. Yolov5 is built on the Pytorch platform
which is different from previous versions which are built on DarkNet. Due to this,
there are various advantages like it has fewer dependencies and it doesn’t need to be
built from the source.
9 Proposed Model
As mentioned above we are going to use the state-of-the-art model Yolov5. The
pre-trained weights are taken from COCO (Microsoft Common Objects in Context)
dataset. Fine-tuning is done using these parameters. The model is trained using the
BRat 2020 dataset. The model is fed with 3D scans of patients. Once the model is
trained we input the test image to get information about the tumor. The new network
can now be trained by using the transferred parameters as initialization (this is called
fine-tuning), or new layers can be added on top of the network and only the new
layers are trained on the dataset of interest. Some advantages of transfer learning are
it reduces the data collection process as well as benefits generalization. It decreases
the training duration of a large dataset.
Some preprocessing is needed before we train the model using the YOLO model,
the area of the tumor must be marked by the box region. This can be done using
the tool which creates a bounding box around the object of interest in an image.
For Transfer learning we can use the NVIDIA transfer learning toolkit, we can feed
the COCO dataset as it also supports the YOLO architecture. This fine-tunes our
model and makes up an insufficient or unlabeled dataset. Afterward we can train our
BRats dataset on our model. The environment used for development is Google Colab
which gives 100 Gb storage, 12 GB Ram, and GPU support. The yolov5 authors have
made available their training results on the COCO dataset to download and use their
pre-trained parameters for our own model. For applying the yolov5 algorithm on our
model we need a labeled dataset for training which is present in the brats dataset.
Since we need to train it for better results on BRats dataset we will freeze some layers
and add our own layer on top of the YOLO model for better results. Since we need a
model which takes lesser space we will use the YOLOv5n model. As mentioned in
the official repository YOLOv5 model provides us a mean average precision score of
72.4 with a speed of 3136 ms on the COCO dataset [25]. The main advantage of this
model is smaller and easier to use in production and it is 88 percent smaller than the
previous YOLO model [26]. This model is able to process images at 140 FPS. The
pre-trained weights are taken from COCO (Microsoft Common Objects in Context)
dataset. Fine-tuning is done using these parameters. The model is trained using the
BRats 2020 dataset. Here specifically we are going to use the yolov5 nano model
112 T. Shelatkar and U. Bansal
since it has smaller architecture than the other models as our main priority is the size
of the model. The YOLO model has a much lower 1.9 M params as compared to the
other models. Our model needs a certain configuration to be able to perform on brain
scans. Since the scanned data of Brats is complex, we perform various preprocessing
on the data from resizing to masking. Since the image data is stored in nii format
with different types of scans like FLAIR, T1, T2, it is important to process the dataset
according to the familiarity of our model. The model is fed with scans of patients.
For evaluating the results of our model we use the dice score jaccard score and map
value but our main focus is on the speed of the model to increase the usability of
the model. For training and testing the dataset is already partitioned for Brats. Our
dataset contains almost 360 patient scans for training and 122 scans for patient scans
for testing. The flow of our model is mentioned in Fig. 2. The yolov5 models provide
with their yml file for our custom configuration so we can test the network according
to our own provision. Since we have only 3 classes we will configure it into three.
As well as many convolution layers must be given our parameters in the backbone or
head of our model. Once the model is trained we can input the test image dataset on
our models. The expected result of the model must be close to a dice score of 0.85
which compares to the segmentation models. This model takes up lesser storage and
better speed in processing of brats dataset as compared to the previous models.
Diagnosis of Brain Tumor Using Light Weight Deep Learning Model … 113
10 Conclusion
Various models and toolkits for brain tumor analysis have been developed in the
past which has given us promising results but the viability of the model in terms
of real-time application has been not considered. Here we present a deep learning-
based method for brain tumor identification and classification using YOLOv5 in
this research. These models are crucial in the development of a lightweight brain
tumor detection system. A model like this with lesser computational requirements
and relatively reduced storage will provide a feasible solution to be considered by
various healthcare facilities.
References
1. Pereira S et al (2018) Automatic brain tumor grading from MRI data using convolutional
neural networks and quality assessment. In: Understanding and interpreting machine learning
in medical image computing applications. Springer, Cham, pp 106–114
2. Rehman A et al (2020) A deep learning-based framework for automatic brain tumors classifi-
cation using transfer learning. Circuits Syst Signal Process 39(2): 757–775
3. Salçin Kerem (2019) Detection and classification of brain tumours from MRI images using
faster R-CNN. Tehnički glasnik 13(4):337–342
4. Noreen N et al (2020) A deep learning model based on concatenation approach for the diagnosis
of brain tumor. IEEE Access 8: 55135–55144
5. Montalbo FJP (2020) A computer-aided diagnosis of brain tumors using a fine-tuned YOLO-
based model with transfer learning. KSII Trans Internet Inf Syst 14(12)
6. Dipu NM, Shohan SA, Salam KMA (2021) Deep learning based brain tumor detection and
classification. In: 2021 international conference on intelligent technologies (CONIT). IEEE
7. Futrega M et al (2021) Optimized U-net for brain tumor segmentation. arXiv:2110.03352
8. Khan P et al (2021) Machine learning and deep learning approaches for brain disease diagnosis:
principles and recent advances. IEEE Access 9:37622–37655
9. Khan P, Machine learning and deep learning approaches for brain disease diagnosis: principles
and recent advances
10. Amin J et al (2021) Brain tumor detection and classification using machine learning: a com-
prehensive survey. Complex Intell Syst 1–23
11. https://www.ncbi.nlm.nih.gov/
12. Krawczyk Z, Starzyński j (2020) YOLO and morphing-based method for 3D individualised
bone model creation. In: 2020 international joint conference on neural networks (IJCNN).
IEEE
13. Al-masni MA et al (2017) Detection and classification of the breast abnormalities in digital
mammograms via regional convolutional neural network. In: 2017 39th annual international
conference of the IEEE engineering in medicine and biology society (EMBC). IEEE
14. Nie Y et al (2019) Automatic detection of melanoma with yolo deep convolutional neural
networks. In: 2019 E-health and bioengineering conference (EHB). IEEE
15. Krawczyk Z, Starzyński J (2018) Bones detection in the pelvic area on the basis of YOLO neural
network. In: 19th international conference computational problems of electrical engineering.
IEEE
16. https://blog.roboflow.com/yolov5-v6-0-is-here/
17. Hammami M, Friboulet D, Kechichian R (2020) Cycle GAN-based data augmentation for
multi-organ detection in CT images via Yolo. In: 2020 IEEE international conference on image
processing (ICIP). IEEE
114 T. Shelatkar and U. Bansal
18. Majib MS et al (2021) VGG-SCNet: A VGG net-based deep learning framework for brain
tumor detection on MRI images. IEEE Access 9:116942–116952
19. Muhammad K et al (2020) Deep learning for multigrade brain tumor classification in smart
healthcare systems: a prospective survey. IEEE Trans Neural Netw Learn Syst 32(2): 507–522
20. Baid U et al (2021) The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation
and radiogenomic classification. arXiv:2107.02314
21. Kang J, Ullah Z, Gwak J (2021) MRI-based brain tumor classification using ensemble of deep
features and machine learning classifiers. Sensors 21(6):2222
22. Lu S-Y, Wang S-H, Zhang Y-D (2020) A classification method for brain MRI via MobileNet
and feedforward network with random weights. Pattern Recognit Lett 140:252–260
23. Saba T et al (2020) Brain tumor detection using fusion of hand crafted and deep learning
features. Cogn Syst Res 59:221–230
24. Menze BH et al (2015) The multimodal brain tumor image segmentation benchmark (BRATS).
IEEE Trans Med Imaging 34(10):1993–2024. https://doi.org/10.1109/TMI.2014.2377694
25. https://github.com/ultralytics/yolov5/
26. https://models.roboflow.com/object-detection/yolov5
Comparative Study of Loss Functions
for Imbalanced Dataset of Online
Reviews
1 Introduction
Please Google Play serves as an authentic application database or store for authorized
devices running on the Android operating system. The application allows the users
to look at different applications developed using the Android Software Development
Kit (SDK) and download them. As the name itself indicates, the digital distribution
service has been developed, released, and maintained through Google [1]. It is the
largest app store globally, with over 82 billion app downloads and over 3.5 million
published apps. The Google Play Store is one of the most widely used digital distri-
bution services globally and has many apps and users. For this reason, there is a lot
of data about app and user ratings. In Google Play shop Console, you may get a
top-level view of various users’ rankings on an application, your app’s rankings, and
precis facts approximately your app’s rankings. An application can be ranked and
evaluated on Google Play in the form of stars and reviews by the users. Users can
rate the app only once, but these ratings and reviews can be updated at any time. The
play store can also see the top reviews of certified users and their ratings [2]. These
user ratings help many other users analyze your app’s performance before using it.
Different developers from different companies also take their suggestions for further
product development seriously and help them improve their software.
Leaving an app rating is helpful to users and developers and the Google Play
Store itself [3]. The goal of the Play Store as an app platform is to quickly display
accurate and personalized results and maintain spam when searching for the app
you need. Launch the app. This requires information about the performance of the
app displayed through user ratings [4]. A 4.5-star rating app may be safer and more
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 115
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_11
116 P. Vyas et al.
relevant than a 2-star app in the same genre. This information helps Google algorithms
classify and download apps in the Play Store and provide high-quality results for a
great experience [5]. The cheaper the app’s ratings and reviews, the more people will
download and use the Play Store services.
Natural language processing (NLP) has gained immense momentum in previous
years, and this paper covers one such sub-topic of NLP: sentiment analysis. Sentiment
analysis refers to the classification of sentiment present in a sentence, paragraph, or
manuscript based on a trained dataset [6]. Sentiment analysis has been done through
trivial machine learning algorithms such as k-nearest neighbors (KNN) or support
vector machine (SVM) [7, 8]. However, for more optimization, for the search of
this problem, the model selected for the sentiment analysis on Google Play reviews
was the Bidirectional Encoder Representations from Transformers (BERT) model,
a transfer learning model [9]. The BERT model is a pre-trained transformer-based
model which tries to learn through the words and predicts the sentiment conveyed
through the word in a sentence.
For this paper, selected models of BERT from Google BERT were implemented
on textual data of the Google Play reviews dataset, which performed better than the
deep neural networking models. This paper implemented the Google BERT model for
sentiment analysis and loss function evaluation. After selecting the training model,
the loss function to be evaluated was studied, namely the cross-entropy loss and focal
loss. After testing the model with various loss functions, the f1 score was calculated
with the Google Play reviews dataset, and the best out of the loss functions which
could be used for sentiment analysis of an imbalanced dataset will be concluded [10].
2 Literature Review
With the increasing demand for balance and equality, there is also an increasing
imbalance in the datasets on which NLP tasks are performed nowadays [6]. If a
correct or optimized loss function is not used with these imbalance datasets, the
result may appropriate the errors due to these loss functions. For this reason, many
research papers have been studied extensively. At last, the conclusion was to compare
the five loss functions and find which will be the best-optimized loss function for
sentiment analysis of imbalanced datasets. This segment provides a literature review
of the results achieved in this field.
For the comparison of loss functions first need was for an imbalanced dataset.
Therefore, from the various datasets available, it was decided to construct the dataset
on Google Play apps reviews manually and then modify the constructed dataset to
create an imbalance [11]. The dataset was chosen as in the past and has been studied
on deep learning sentiment analysis on Google Play reviews of customers in Chinese
[12]. The paper proposes the various models of long short-term memory (LSTM),
SVM, and Naïve Bayes approach for sentiment analysis [7, 13–15]. However, the
dataset has to be prepared to compare the cross-entropy loss and the focal loss
function. Focal loss is a modified loss function based on cross-entropy loss which
Comparative Study of Loss Functions for Imbalanced Dataset of Online Reviews 117
is frequently used in imbalanced datasets. Thus, both the losses will be compared
to check which loss will perform better for normal and both imbalanced datasets.
Multimodal Sentiment Analysis of #MeToo Tweets using Focal Loss proposes the
Roberta model of BERT, which is a robust BERT model which does not account
for the bias of data while classification and for further reduction of errors due to
misclassification of imbalance of dataset they have used the focal loss function [16].
After finalizing the dataset, the next topic of discussion is the model to be trained
on this dataset. The research started with trivial machine learning models based on
KNN and SVM [7, 8]. Sentiment Analysis Using SVM suggests the SVM model
for sentiment analysis of pang corpus, which is a 4000-movie review dataset, and
the Taboada corpus, which is a 400-website opinion dataset [7, 17, 18]. In sentiment
Analysis of Law Enforcement Performance Using an SVM and KNN, the KNN
model has been trained on the law enforcement of the trial of Jessica Kumala Wongso.
The result of the paper shows that the SVM model is better than the KNN model
[8]. But these machine learning algorithms like KNN and SVM are better only for
a small dataset with few outliers; however, these algorithms cease to perform better
for the dataset with such a large imbalance and large dataset. For training of larger
dataset with high imbalance, the model was changed to the LSTM model.
The LSTM model is based on the recurrent neural network model units, which
can train a large set of data and classify them efficiently [7, 15]. An LSTM model can
efficiently deal with the exploding and vanishing gradient problem [19]. However,
since the LSTM model has to be trained for classification, there are no pre-trained
LSTM models. In the LSTM model, the model is trained sequentially from left to
right and simultaneously from right to left, also in the case of the bi-directional LSTM
model. Thus, LSTM model predicts the sentiment of a token based on its predecessor
or successor and not based on the contextual meaning of the token. So, in search of
a model which can avoid these problems, transfer learning model BERT was finally
selected for data classification.
Bidirectional encoder representations from transformer abbreviated as BERT are
a combination of encoder blocks of transformer which are at last connected to the
classification layer for classification [20–22]. The BERT model is based on the trans-
formers, which are mainly used for end-to-end speech translation by creating embed-
dings of sentences in one language and then using the same embeddings by the
decoder to change them in a different language [20, 23]. These models are known as
transfer learning because these models are pre-trained on a language such as BERT is
trained on the Wikipedia English library, which is then just needed to be fine-tuned,
and then the model is good to go for training and testing [21, 22]. Comparing BERT
and the trivial machine learning algorithms in comparing BERT against traditional
machine learning text classification, the BERT performed far better than the other
algorithms in NLP tasks [24]. Similarly, with the comparison of the BERT model
with the LSTM model in A Comparison of LSTM and BERT for Small Corpus, it
was seen that the BERT model performed with better accuracy in the case of a small
training dataset. In contrast, LSTM performed better when the training dataset was
increased above 80 percent [25]. And also, Bidirectional LSTM is trained both from
left-to-right to predict the next word, and right-to-left to predict the previous work.
118 P. Vyas et al.
But, in BERT, the model is made to learn from words in all positions, meaning the
entire sentence and in this paper using Bidirectional LSTM made the model high
overfit. At last BERT, the model was finalized for the model’s training and checking
the performance. In the case of the BERT model still, there is two most famous model,
first is the Google BERT model, and the other is Facebook ai research BERT model
“Roberta” [26]. In comparison, the Roberta model outperforms the BERT model by
Google on the general language understanding evaluation benchmark because of its
enhanced training methodology [27].
After finalizing the training model, the loss functions to be compared for imbal-
anced dataset evaluation were then studied through previous research. Each of the
five loss functions has been described in further sections.
3 Loss Functions
M
CE = − y ∗ log( p) (2)
c=1
Comparative Study of Loss Functions for Imbalanced Dataset of Online Reviews 119
where M refers to the total number of classes and the loss is calculated by summation
of all the losses calculated for each class separately.
To address the case of classification and object detection, a high imbalance focal
loss was introduced [16, 30]. Starting with the cross-entropy loss to incur the high
imbalance in any dataset, an adjusting factor is added to the cross-entropy loss, α for
class 1 and 1−α for class 0. Even since the α can differentiate between positive and
negative examples, it is still unable to differentiate between easy and hard examples.
The hard examples are related to the examples of the classification of the minority
class. Thus, in the loss function, instead of α, a modulating factor is introduced in the
cross-entropy loss to reshape the loss function to focus on hard negatives and down
weight the easy examples. The modulating factor (1 − pt)γ contains the tunable
factor γ ≥ 0, which changes to the standard cross-entropy loss when equated to zero.
The focal loss equation is
4 Dataset
The dataset used in this manuscript is on the Google Play reviews dataset, which has
been scrapped manually using the Google Play scrapper library based on NodeJS
[11]. The data was scraped from Google Play based on the productivity category of
Google Play. Various apps were picked up from the productivity category using the
app on any website, and the info of the app was kept in a separate excel file which was
then used for scrapping out the reviews of each app contained in the excel file. Finally,
the data was scraped out, which contained the user’s name; the user reviews the stars
the user has given to the app, the user image, and other pieces of information that
are not needed in the model training. The total number of user reviews was 15,746
reviews, out of which 50,674 reviews were of stars of 4 and above, 5042 reviews
were of stars of 3, and 5030 reviews were for stars two and below. The model training
was done using the reviews of the users and the stars which have been given to an
120 P. Vyas et al.
app. Since the range of the stars given to an app was from 1 to 5, the range was
to be normalized into three classes: negative, neutral, and positive. Therefore, the
reviews containing stars from 1 to 2 were classified in the negative class. Reviews
with three stars were classified in neutral class, and the reviews which contained stars
from 4 and above were classified in the positive class. The text blob positive review
can be seen in Fig. 1. Which shows the most relevant words related to the positive
sentiment in a review. More is the relevancy of a word in review greater will be the
size of the token. Since the comparison was done between the two-loss function on
an imbalanced dataset, the data percentage for the classification was calculated, as
shown in Table 1. As per Table 1, the dataset so formed was a balanced one. However,
the imbalance was created to compare the two-loss functions for imbalance on the
dataset. The number of neutral reviews decreased up to 20 percent for one iteration to
form one dataset. Similarly, in the balanced dataset, neutral reviews were decreased
up to 40 percent for the next iteration to create a second set of imbalanced datasets.
The classification percentage for both datasets can be seen in the Table. Finally, these
three datasets, namely, the dataset with balanced classifications, the dataset with 20
percent fewer neutral reviews, and finally the dataset with 40 percent fewer neutral
datasets were used for training, and finally, a comparison of cross-entropy and focal
loss was done on these datasets to see the difference in accuracy on using a weighted
loss function on imbalanced datasets.
Table 1 Data percentage for different reviews in all the different datasets
Datasets Positive review Negative review Neutral review
percentage percentage percentage
Reviews without any 36 31 33
imbalance
Reviews with 20 percent 38 34 28
less neutral reviews
Reviews with 40 percent 42 36 22
less neutral reviews
Comparative Study of Loss Functions for Imbalanced Dataset of Online Reviews 121
5 Methodology
The basic model structure is deployed as given in Fig. 2. Firstly, the Roberta model
was trained on all the three datasets of Google Play reviews. After the model training,
the model is tested with both the loss functions for accuracy and f1 score, calculated
to compare the loss functions [16, 31]. The following steps are executed in the model
for data processing and evaluation.
• Data Pre-Processing
– Class Normalization—As per the previous section first the class normalization
that is the stars in reviews will be normalized to positive, neutral, and negative
classes.
– Data Cleaning—In this phase, all characters of non-alphabet characters are
removed. For example, Twitter hashtags like #Googleplayreview will be
removed as every review will be containing such hashtags, thus it will lead
to errors in classification.
– Tokenization—In this step, a sentence is split into each word that composes it.
In this manuscript, BERT tokenizer is used for the tokenization of reviews.
– Stop words Removal—All the irrelevant general tokens like of, and our which
are generalized and present in each sentence are removed.
– Lemmatization—Complex words or different words having same root word
are changed to the root word for greater accuracy and easy classification.
• The tokens present in the dataset were transformed into BERT embeddings of 768
dimensions which are converted by BERT model implicitly. Also, the advantage
of BERT vectorizer over other vectorization methods like Word2Vec method is
that BERT produces word representations as per the dynamic information of the
words present around the token. After embeddings creation, the BERT training
model was selected.
• There are two models of BERT available for BERT training one is “BERT
uncased,” and the other is “BERT case.” In the case of the BERT uncased model,
the input words are lowercased before workpiece training. Thus, the model does
not remain case sensitive [21]. On the other hand, in the case of the BERT cased
model, the model does not lowercase the input; thus, both the upper case and lower
case of a particular word will be trained differently, thus making the process more
time-consuming and complex. Therefore, in this paper, BERT uncased model is
used.
• After training of the BERT model on first, the actual pre-processed Google Play
reviews dataset, the testing data accuracy has been calculated with both the loss
functions separately. Then the accuracy and f1 score has been calculated for each
loss function for each column of the dataset.
• After calculating the f1-score score for Google Play reviews, the training and
testing step is repeated for the dataset with 20 percent fewer neutral reviews and
40 percent less neutral dataset.
Lastly, Tables 2, 3, and 4 are plotted to ease the results and comparison, shown in
the next section.
Table 2 Performance metrics for both focal loss and cross-entropy loss for balanced dataset
Performance metrics (focal loss) Performance metrics (cross-entropy
loss)
Precision Recall F1-Score Support Precision Recall F1-Score Support
Negative 0.80 0.73 0.76 245 0.88 0.84 0.86 245
Neutral 0.69 0.70 0.69 254 0.79 0.80 0.80 254
Positive 0.83 0.88 0.85 289 0.89 0.91 0.90 289
Accuracy 0.77 788 0.86 788
Comparative Study of Loss Functions for Imbalanced Dataset of Online Reviews 123
Table 3 Performance metrics for both focal loss and cross-entropy loss for 20 percent fewer neutral
classes of the dataset
Performance metrics (focal loss) Performance metrics (cross-entropy
loss)
Precision Recall F1-Score Support Precision Recall F1-Score Support
Negative 0.77 0.81 0.79 245 0.87 0.84 0.86 245
Neutral 0.67 0.65 0.66 220 0.69 0.77 0.73 220
Positive 0.84 0.81 0.83 269 0.87 0.82 0.85 269
Accuracy 0.77 734 0.81 734
Table 4 Performance metrics for both focal loss and cross-entropy loss for 40 percent fewer neutral
classes of the dataset
Performance metrics (focal loss) Performance metrics (cross-entropy
loss)
Precision Recall F1-Score Support Precision Recall F1-Score Support
Negative 0.82 0.87 0.84 243 0.91 0.91 0.91 243
Neutral 0.70 0.62 0.66 152 0.77 0.79 0.78 152
Positive 0.87 0.87 0.87 289 0.91 0.91 0.91 289
Accuracy 0.76 684 0.88 684
The datasets trained on the BERT base uncased model is used for training and clas-
sification of the model. The datasets were split into train and test with 10 percent
data for testing with a random seed. The value for gamma and alpha used in focal
loss functions has been fixed at gamma = 2 and alpha value = 0.8. The epochs used
for the training model in the case of BERT are fixed to three for all three datasets.
Lastly, the f1 score was calculated for all the classes individually. Then the accuracy
of the model was calculated using the f1 score itself, where the f1 score is defined
as the harmonic mean of precision and recall of the evaluated model [10]. Further
support and precision, and recall have been calculated for each of the classes, and
then the overall support has been calculated for the model in each of the dataset cases
[32, 33].
7 Results
In this paper, the model is trained at three epochs for three categories of data as
follows:
124 P. Vyas et al.
8 Conclusion
As the reach of technology grows along with it grows the number of users on different
apps to meet the benefits are providing and solving problems for them and making
their life easier. As the users use other apps, they tend to give their reviews on the
Google Play Store about how the app helped them or if they faced any problem with
the app. Most developers take note of the reviews to fix any bugs on applications
and try to improve the application more efficiently. Many times, reviews of different
apps on the Google Play Store help other users use different apps. Good reviews on
any application tend to grow faster. Similarly, other people’s reviews can help you
navigate your app better and reassure your downloads are safe and problem-free.
And also, if the number of positive reviews is way more than the negative reviews,
then negative reviews may get overshadowed, and the developer may not take note
of the bugs. The loss function that showed better results is the cross-entropy loss
Comparative Study of Loss Functions for Imbalanced Dataset of Online Reviews 125
function over the focal loss function. Focal loss doesn’t differentiate on multiclass
as cross-entropy loss is able to classify. Although focal loss is a modification of
cross-entropy loss function, it is able to outperform only when the imbalance is
high. In slight imbalanced data, the focal loss function ignores many loss values
due to the modulating factor. In the future, more experiments will be conducted on
different datasets to make conclusion that a particular loss function performs well
with a particular model. Also, model can be further upgraded by comparing other loss
functions for imbalanced data’s most reliable loss function. Lastly, the upgradation
on focal loss has to be done mathematically so that the loss function can perform
well even in a multiclass dataset and slightly imbalanced dataset.
References
1. Malavolta I, Ruberto S, Soru T, Terragni V (2015) Hybrid mobile apps in the google play
store: an exploratory investigation. In: 2nd ACM international conference on mobile software
engineering and systems, pp. 56–59
2. Viennot N, Garcia E, Nieh J (2014) A measurement study of google play. ACM SIGMETRICS
Perform Eval Rev 42(1), 221–233
3. McIlroy S, Shang W, Ali N, Hassan AE (2017) Is it worth responding to reviews? Studying
the top free apps in Google Play. IEEE Softw 34(3):64–71
4. Shashank S, Naidu B (2020) Google play store apps—data analysis and ratings prediction. Int
Res J Eng Technol (IRJET) 7:265–274
5. Arxiv A Longitudinal study of Google Play page, https://arxiv.org/abs/1802.02996, Accessed
21 Dec 2021
6. Patil HP, Atique M (2015) Sentiment analysis for social media: a survey. In: 2nd international
conference on information science and security (ICISS), pp. 1–4
7. Zainuddin N, Selamat, A.: Sentiment analysis using support vector machine. In: International
conference on computer, communications, and control technology (I4CT) 2014, pp. 333–337
8. Dubey A, Rasool A (2021) Efficient technique of microarray missing data imputation using
clustering and weighted nearest neighbor. Sci Rep 11(1)
9. Li X, Wang X, Liu H (2021) Research on fine-tuning strategy of sentiment analysis model based
on BERT. In: International conference on communications, information system and computer
engineering (CISCE), pp. 798–802
10. Mohammadian S, Karsaz A, Roshan YM (2017) A comparative analysis of classification algo-
rithms in diabetic retinopathy screening. In: 7th international conference on computer and
knowledge engineering (ICCKE) 2017, pp. 84–89
11. Latif R, Talha Abdullah M, Aslam Shah SU, Farhan M, Ijaz F, Karim A (2019) Data scraping
from Google Play Store and visualization of its content for analytics. In: 2nd international
conference on computing, mathematics and engineering technologies (iCoMET) 2019, pp. 1–8
12. Day M, Lin Y (2017) Deep learning for sentiment analysis on Google Play consumer review.
IEEE Int Conf Inf Reuse Integr (IRI) 2017:382–388
13. Abdul Khalid KA, Leong TJ, Mohamed K (2016) Review on thermionic energy converters.
IEEE Trans Electron Devices 63(6):2231–2241
14. Regulin D, Aicher T, Vogel-Heuser B (2016) Improving transferability between different engi-
neering stages in the development of automated material flow modules. IEEE Trans Autom Sci
Eng 13(4):1422–1432
15. Li D, Qian J (2016) Text sentiment analysis based on long short-term memory. In: First IEEE
international conference on computer communication and the internet (ICCCI) 2016, pp. 471–
475 (2016)
126 P. Vyas et al.
16. Lin T, Goyal P, Girshick R, He K, Dollár P (2020) Focal loss for dense object detection. IEEE
Trans Pattern Anal Mach Intell 42(2):318–327
17. Arxiv A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based
on Minimum Cuts, https://arxiv.org/abs/cs/0409058, Accessed 21 Dec 2021
18. Sfu Webpage Methods for Creating Semantic Orientation Dictionaries, https://www.sfu.ca/
~mtaboada/docs/publications/Taboada_et_al_LREC_2006.pdf, Accessed 21 Dec 2021
19. Sudhir P, Suresh VD (2021) Comparative study of various approaches, applications and
classifiers for sentiment analysis. Glob TransitS Proc 2(2):205–211
20. Gillioz A, Casas J, Mugellini E, Khaled OA (2020) Overview of the transformer-based models
for NLP tasks. In: 15th conference on computer science and information systems (FedCSIS)
2020, pp. 179–183
21. Zhou Y, Li M (2020) Online course quality evaluation based on BERT. In: 2020 International
conference on communications, information system and computer engineering (CISCE) 2020,
pp. 255–258
22. Truong TL, Le HL, Le-Dang TP (2020) Sentiment analysis implementing BERT-based pre-
trained language model for Vietnamese. In: 7th NAFOSTED conference on information and
computer science (NICS) 2020, pp. 362–367 (2020)
23. Kano T, Sakti S, Nakamura S (2021) Transformer-based direct speech-to-speech translation
with transcoder. IEEE spoken language technology workshop (SLT) 2021, pp. 958–965
24. Arxiv Comparing BERT against traditional machine learning text classification, https://arxiv.
org/abs/2005.13012, Accessed 21 Dec 2021
25. Arxiv A Comparison of LSTM and BERT for Small Corpus, https://arxiv.org/abs/2009.05451,
Accessed 21 Dec 2021
26. Arxiv BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,
https://arxiv.org/abs/1810.04805, Accessed 21 Dec 2021
27. Naseer M, Asvial M, Sari RF (2021) An empirical comparison of BERT, RoBERTa, and Electra
for fact verification. In: International conference on artificial intelligence in information and
communication (ICAIIC) 2021, pp. 241–246
28. Ho Y, Wookey S (2020) The real-world-weight cross-entropy loss function: modeling the costs
of mislabeling. IEEE Access 8:4806–4813
29. Zhou Y, Wang X, Zhang M, Zhu J, Zheng R, Wu Q (2019) MPCE: a maximum probability based
cross entropy loss function for neural network classification. IEEE Access 7:146331–146341
30. Yessou H, Sumbul G, Demir B (2020) A Comparative study of deep learning loss functions for
multi-label remote sensing image classification. IGARSSIEEE international geoscience and
remote sensing symposium 2020, pp. 1349–1352
31. Liu L, Qi H (2017) Learning effective binary descriptors via cross entropy. In: IEEE winter
conference on applications of computer vision (WACV) 2017, pp. 1251–1258 (2017)
32. Riquelme N, Von Lücken C, Baran B (2015) Performance metrics in multi-objective
optimization. In: Latin American Computing Conference (CLEI) 2015, pp. 1–11
33. Dubey A, Rasool A (2020) Clustering-based hybrid approach for multivariate missing data
imputation. Int J Adv Comput Sci Appl (IJACSA) 11(11):710–714
A Hybrid Approach for Missing Data
Imputation in Gene Expression Dataset
Using Extra Tree Regressor
and a Genetic Algorithm
1 Introduction
Missing data is a typical problem in data sets gathered from real-world applications
[1]. Missing data imputation has received considerable interest from researchers as
it widely affects the accuracy and efficiency of various machine learning models.
Missing values typically occur due to manual data entry practices, device errors,
operator failure, and inaccurate measurements [2]. A general approach to deal with
missing values is calculating statistical data (like mean) for each column and substi-
tuting all missing values with the statistic, deleting rows with missing values, or
replacing them with zeros. But a significant limitation of these methods was a
decrease in efficiency due to incomplete and biased information [3]. If missing values
are not handled appropriately, they can estimate wrong deductions about the data.
This issue becomes more prominent in Gene expression data which often contain
missing expression values. Microarray technology plays a significant role in current
biomedical research [4]. It allows observation of the relative expression of thousands
of genes under diverse practical states. Hence, it has been used widely in multiple
analyses, including cancer diagnosis, the discovery of the active gene, and drug
identification [5].
Microarray expression data often contain missing values for different reasons,
such as scrapes on the slide, blotting issues, fabrication mistakes, etc. Microarray data
may have 1–15% missing data that could impact up to 90–95% of genes. Hence, there
is a need for precise algorithms to accurately impute the missing data in the dataset
utilizing modern machine learning approaches. The imputation technique known as
k-POD uses the K-Means approach to predict missing values [6]. This approach
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 127
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_12
128 A. Yadav et al.
works even when external knowledge is unavailable, and there is a high percentage
of missing data. Another method based on Fuzzy C-means clustering uses Support
vector regression and genetic algorithm to optimize parameters [7]. The technique
suggested in this paper uses both of these models as a baseline. This paper presents a
hybrid method for solving the issue. The proposed technique applies a hybrid model
that works on optimizing parameters for the K-Means clustering algorithm using
an Extra tree regression and genetic algorithm. In this paper, the proposed model is
implemented on the Mice Protein Expression Data Set and then its performance is
compared with baseline models.
2 Literature Survey
Missing value, also known as missing data, is where some of the observations in
a dataset are empty. Missing data is classified into three distinctive classes. These
classes are missing completely at random (MCAR), missing at random (MAR), and
missing not at random (MNAR) [2, 8]. These classes are crucial as missing data
in the dataset generates issues, and the remedies to these concerns vary depending
on which of the three types induces the situation. MCAR estimation presumes that
missing data is irrelevant to any unobserved response, indicating any observation in
the data set does not impact the chances of missing data. MCAR produces unbiased
and reliable estimates, but there is still a loss of power due to inadequate design but
not the absence of the data [2]. MAR means an organized association between the
tendency of missing data and the experimental data, while not the missing data.
For instance, men are less likely to fill in depression surveys, but this is not asso-
ciated with their level of depression after accounting for maleness. In this case, the
missing and observed observations are no longer coming from the same distribution
[2, 9]. MNAR describes an association between the propensity of an attribute entry
to be missing and its actual value. For example, individuals with little schooling
are missing out on education, and the unhealthiest people will probably drop out
of school. MNAR is termed “non-ignorable” as it needs to be handled efficiently
with the knowledge of missing data. It requires mechanisms to address such missing
data issues using prior information about missing values [2, 9]. There must be some
model for reasoning the missing data and possible values. MCAR and MAR are both
viewed as “ignorable” because they do not require any knowledge about the missing
data when dealing with it.
The researchers have proposed many methods to solve the accurate imputa-
tion of missing data. Depending on the type of knowledge employed in the tech-
niques, the existent methodology can be classified into four distinct categories: (i)
Global approach, (ii) Local approach, (iii) Hybrid approach, and (iv) Knowledge-
assisted approach [10, 11]. Each of the approaches has distinct characteristics. Global
methods use the information about data from global correlation [11, 12]. Two widely
utilized global techniques are Singular Value Decomposition imputation (SVDim-
pute) and Bayesian Principal Component Analysis (BPCA) methods [13, 14]. These
A Hybrid Approach for Missing Data Imputation in Gene Expression … 129
for imputation of missing value, domain knowledge from data is utilized. Fuzzy C-
Means clustering (FCM) and Projection Onto Convex Sets (POCS) are some of the
knowledge-assisted methods [1, 25]. FCM process missing value imputation using
gene ontology annotation as external information. On the other hand, it becomes
hard to extract and regulate prior knowledge. Furthermore, the computation time is
increased.
n
k 2
j
J= xi − c j (1)
j=1 i=1
In Eq. (1) j indicates cluster, cj is the centroid for cluster j, x i represents case i, k
represents number of clusters, n represents number of cases, and |x i − cj | is distance
functions.
In addition to K-Means clustering, each case x i has a membership function repre-
senting a degree of belongingness to a particular cluster cj . The membership function
is described as
1
ui j = m−1 (2)
c xi −c j
2
k=1 xi −ck
In Eq. (2), m is the weighting factor parameter, and its domain lies from one to
infinity. c represents the number of clusters, whereas ck represents the centroid for the
kth cluster. Only the complete attributes are considered for revising the membership
functions and centroids.
Missing value for any case x i is calculated using the membership function and
value of centroid for a cluster. Function used for missing value imputation is described
as.
In Eq. (3), mi is the estimated membership value for ith cluster, ci represents
centroid of a ith cluster and c is the number of clusters. denotes summation of
product of mi and ci .
132 A. Yadav et al.
4 About Dataset
For the implementation of the model, this paper uses the Mice Protein Expression
Data Set from UCI Machine Learning Repository. The data set consists of the expres-
sion levels of 77 proteins/protein modifications. There are 38 control mice and 34
trisomic mice for 72 mice. This dataset contains eight categories of mice which are
defined based on characteristics such as genotype, type of behavior, and treatment.
The dataset contains 1080 rows and 82 attributes. These attributes are Mouse ID,
Values of expression levels of 77 proteins, Genotype, Treatment type, and Behavior.
Dataset is artificially renewed such that it has 1%, 5%, 10%, and 15% missing
value ratios. All the irrelevant attributes such as MouseID, Genotype, Behavior, and
Treatment are removed from the dataset. Next, 558 rows were selected from shuf-
fled datasets for the experiment. For dimensionality reduction, the PCA (Principal
Component Analysis) method was used to reduce the dimensions of the dataset to
20. To normalize the data values between 0 and 1, a MinMax scaler was used.
5 Proposed Model
This research proposes a method to evaluate missing values using K-means clustering
optimized with an Extra Tree regression and a genetic algorithm. The novelty of the
proposed approach is the application of an ensemble technique named Extra Tree
regression for estimating accurate missing values. These accurate predictions with
the genetic algorithm further help in the better optimization of K-Means parameters.
Figure 2 represents the implementation of the proposed model. First, to implement
the model on the dataset, missing values are created artificially. Then the dataset with
missing values is divided into a complete dataset and an incomplete dataset. In the
complete dataset, those rows are considered in which none of the attributes contains
a missing value. In contrast, an incomplete dataset contains rows with attributes with
one or more missing values.
A Hybrid Approach for Missing Data Imputation in Gene Expression … 133
In the proposed approach, an Extra Tree regression and Genetic algorithm are
used for the optimization of parameters of the K-Means algorithm. The Extra tree
regression and K-Means model are trained on a complete row dataset to predict
the output. Then, K-means is used to evaluate the missing data for the dataset with
incomplete rows. K-means outcome is compared with the output vector received from
the Extra Tree regression. The optimized value for c and m parameters is obtained by
operating the genetic algorithm to minimize the difference between the Extra Tree
regression and K-means output. The main objective is to reduce error function =
(X − Y )2 , where X is the prediction output of the Extra Tree regression method and
Y is the outcome of the prediction from the K-means model. Finally, Missing data
are estimated using K-means with optimized parameters.
The code for the presented model is written in Python version 3.4. The K-means clus-
tering and Extra Tree regression are imported from the sklearn library. The number
of clusters = 3 and membership operator value = 1.5 is fed in the K-Means algo-
rithm. In the Extra tree regression, the number of decision trees = 100 is used as a
parameter. The genetic algorithm uses 20 as population size, 40 as generations, 0.60
crossover fraction, and a mutation fraction of 0.03 as parameters.
134 A. Yadav et al.
6 Performance Analysis
1 n
y j − yj
MAE = (4)
n j=1
RMSE is one of the most commonly used standards for estimating the quality of
predictions.
n
y j − y j 2
RMSE =
(5)
j=1
n
In Eqs. (4) and (5), ŷj represents predicted output and yj represents actual output.
“n” depicts total number of cases. The relative classification accuracy is given by.
ct
A= ∗ 10 (6)
c
In Eq. (6), c represents the number of all predictions, and ct represents the number
of accurate predictions within a specific tolerance. A 10% tolerance is used for
comparative prediction, which estimates data as correct for values within a range of
±10% of the exact value.
7 Experimental Results
This section discusses the performance evaluation of the proposed model. Figures 3–
4 shows box plots of the performance evaluation of the three different methods for
the Mice Protein Expression Data Set, with 1%, 5%, 10%, and 15% missing values.
In Box plots, the halfway mark on each box represents the median. The whiskers
cover most of the data points except outliers. Outliers are plotted separately. Figure 3a
compares three methods on the dataset with 1–15% missing data. Each box includes
4 results in the RMSE. The median RMSE values are 0.01466, 0.01781, and 0.68455.
Figure 3b compares the MAE on the dataset with 1–15% missing values. The median
MAE values are 0.10105, 0.10673, and 0.78131. Better performance is indicated
from lower error. Figure 4 compares the accuracy of different models used for the
experiment. This accuracy is estimated by computing the difference between the
correct and predicted value using a 10% tolerance. Accuracy is calculated for three
techniques executed on the dataset with 1–15% missing values. The median accuracy
A Hybrid Approach for Missing Data Imputation in Gene Expression … 135
Fig. 3 Box plot for RMSE and MAE in three methods for 1–15% missing ratio
values are 22.32143, 19.04762, and 0.67. Better imputations are indicated from higher
accuracy.
It is evident from the box plots that the proposed method gives the lowest RMSE
and MAE error and the highest relative accuracy on the given dataset. Figure 5–
6 represents a line graph of the performance evaluation of the three different
methods against the missing ratios. Figure 5a illustrates that the hybrid K-Means and
ExtraTree-based method has a lower RMSE error value compared to both methods
for the mice dataset. Figure 5b indicates that the proposed hybrid K-Means and
136 A. Yadav et al.
Fig. 5 RMSE and MAE comparison of different techniques for 1–15% missing ratio in the dataset
ExtraTree-based hybrid method has a lower MAE error value than both methods for
the mice dataset. Figure 6 demonstrates that the accuracy of the evaluated and actual
data with 10% tolerance is higher for the proposed method than the FcmSvrGa and
k-POD method.
The graphs in Figs. 5–6 indicate that k-POD gives the highest error and lowest
accuracy at different ratios [6]. The FcmSvrGa method gives a slightly lower error at
a 1% missing ratio, but when compared to overall missing ratios KExtraGa method
provides the lowest error than other baseline models. Furthermore, compared to other
methods, the KExtraGa method gives better accuracy over each missing ratio. It is
clearly illustrated from Figs. 3–6 that the proposed model KExtraGa performs better
than the FcmSvrGa and k-POD method. The Extra Tree regression-based method
achieves better relative accuracy than the FcmSvrGa and k-POD method. In addition,
A Hybrid Approach for Missing Data Imputation in Gene Expression … 137
the proposed method also achieves overall less median RMSE and MAE error than
both methods. There are some drawbacks to the proposed method. Training of Extra
tree regression is a substantial issue. Although the training time of the proposed
model is slightly better than FcmSvrGa, it still requires an overall high computation
time.
This paper proposes a hybrid method based on the K-Means clustering, which utilizes
an Extra tree regression and genetic algorithm to optimize parameters to the K-
Means algorithm. This model was applied to the Mice protein expression dataset and
gave better performance than the other algorithms. In the proposed model, complete
dataset rows were clustered based on similarity, and each data point is assigned
a membership function for each cluster. Hence, this method yields more practical
results as each missing value belongs to more than one cluster. The experimental
results clearly illustrate that the KExtraGa model yields better accuracy (with 10%
tolerance) and low RMSE and MAE error than the FCmSvrGa and k-POD algorithm.
The limitation of the model proposed in this research paper has indicated a need for
a fast algorithm. Hence, the main focus area for the future would be a reduction of
the computation time of the proposed algorithm. Another future goal would be to
implement the proposed model on a large dataset and enhance its accuracy [22].
References
1. Gan X, Liew AWC, Yan H (2006) Microarray missing data imputation based on a set theoretic
framework and biological knowledge. Nucleic Acids Res 34(5):1608–1619
2. Pedersen AB, Mikkelsen EM, Cronin-Fenton D, Kristensen NR, Pham TM, Pedersen L,
Petersen I (2017) Missing data and multiple imputation in clinical epidemiological research.
Clin Epidemiol 9:157
3. Dubey A, Rasool A (2020) Time series missing value prediction: algorithms and applications.
In: International Conference on Information, Communication and Computing Technology.
Springer, pp. 21–36
4. Trevino V, Falciani F, Barrera- HA (2007) DNA microarrays: a powerful genomic tool for
biomedical and clinical research. Mol Med 13(9):527–541
5. Chakravarthi BV, Nepal S, Varambally S (2016) Genomic and epigenomic alterations in cancer.
Am J Pathol 186(7):1724–1735
6. Chi JT, Chi EC, Baraniuk RG (2016) k-pod: A method for k-means clustering of missing data.
Am Stat 70(1):91–99
7. Aydilek IB, Arslan A (2013) A hybrid method for imputation of missing values using optimized
fuzzy c-means with support vector regression and a genetic algorithm. Inf Sci 233:25–35
8. Dubey A, Rasool A (2020) Clustering-based hybrid approach for multivariate missing data
imputation. Int J Adv Comput Sci Appl (IJACSA) 11(11):710–714
9. Gomer B (2019) Mcar, mar, and mnar values in the same dataset: a realistic evaluation of
methods for handling missing data. Multivar Behav Res 54(1):153–153
138 A. Yadav et al.
10. Meng F, Cai C, Yan H (2013) A bicluster-based bayesian principal component analysis method
for microarray missing value estimation. IEEE J Biomed Health Inform 18(3):863–871
11. Liew AWC, Law NF, Yan H (2011) Missing value imputation for gene expression data:
computational techniques to recover missing data from available information. Brief Bioinform
12(5):498–513
12. Li H, Zhao C, Shao F, Li GZ, Wang X (2015) A hybrid imputation approach for microarray
missing value estimation. BMC Genomics 16(S9), S1
13. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB
(2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525
14. Oba S, Sato Ma, Takemasa I, Monden M, Matsubara, Ki, Ishii S (2003) A Bayesian missing
value estimation method for gene expression profile data. Bioinformatics 19(16), 2088–2096
15. Celton M, Malpertuy A, Lelandais G, De Brevern AG (2010) Comparative analysis of missing
value imputation methods to improve clustering and interpretation of microarray experiments.
BMC Genomics 11(1):1–16
16. Kim H, Golub GH, Park H (2005) Missing value estimation for DNA microarray gene
expression data: local least squares imputation. Bioinformatics 21(2):187–198
17. Ouyang M, Welsh WJ, Georgopoulos P (2004) Gaussian mixture clustering and imputation of
microarray data. Bioinformatics 20(6):917–923
18. Sehgal MSB, Gondal I, Dooley LS (2005) Collateral missing value imputation: a new robust
missing value estimation algorithm for microarray data. Bioinformatics 21(10):2417–2423
19. Burgette LF, Reiter JP (2010) Multiple imputation for missing data via sequential regression
trees. Am J Epidemiol 172(9):1070–1076
20. Yu Z, Li T, Horng SJ, Pan Y, Wang H, Jing Y (2016) An iterative locally auto-weighted least
squares method for microarray missing value estimation. IEEE Trans Nanobiosci 16(1):21–33
21. Dubey A, Rasool A (2021) Efficient technique of microarray missing data imputation using
clustering and weighted nearest neighbour. Sci Rep 11(1):24–29
22. Dubey A, Rasool A (2020) Local similarity-based approach for multivariate missing data
imputation. Int J Adv Sci Technol 29(06):9208–9215
23. Purwar A, Singh SK (2015) Hybrid prediction model with missing value imputation for medical
data. Expert Syst Appl 42(13):5621–5631
24. Aydilek IB, Arslan A (2012) A novel hybrid approach to estimating missing values in databases
using k-nearest neighbors and neural networks. Int J Innov Comput, Inf Control 7(8):4705–4717
25. Tang J, Zhang G, Wang Y, Wang H, Liu F (2015) A hybrid approach to integrate fuzzy c-means
based imputation method with genetic algorithm for missing traffic volume data estimation.
Transp Res Part C: Emerg Technol 51:29–40
26. Marwala T, Chakraverty S (2006) Fault classification in structures with incomplete measured
data using autoassociative neural networks and genetic algorithm. Curr Sci 542–548
27. Hans-Hermann B (2008) Origins and extensions of the k-means algorithm in cluster analysis.
Electron J Hist Probab Stat 4(2)
28. Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
29. Yadav A, Dubey A, Rasool A, Khare N (2021) Data mining based imputation techniques to
handle missing values in gene expressed dataset. Int J Eng Trends Technol 69(9):242–250
30. Gond VK, Dubey A, Rasool A (2021) A survey of machine learning-based approaches for
missing value imputation. In: Proceedings of the 3rd International Conference on Inventive
Research in Computing Applications, ICIRCA 2021, pp. 841–846
A Clustering and TOPSIS-Based
Developer Ranking Model
for Decision-Making in Software Bug
Triaging
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 139
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_13
140 P. Rathoriya et al.
attributes and generating the ranked list of solutions to such problems. TOPSIS [2]
is one of the popular techniques under the MADM paradigm to solve problems. The
main attributes for bug triaging include the consideration of the attributes, namely,
the experience of developers in years (D), the number of assigned bugs (A), the
number of newly assigned bugs (N), the number of fixed or resolved bugs (R), and
the average resolving time (T). Software bugs are managed through online software
bug repositories. For example, the Mozilla bugs are available online at 1 https://bug
zilla.mozilla.org/users_profile?.user_id=X, where X is the id of the software bug.
In this paper, Sect. 2 presents motivation, and Sect. 3 presents some related work
to bug triaging. In Sect. 4, the methodology is presented. Sect. 5 describes our model
with an illustrative example. Sect. 6 covers some threats to validity, and Sect. 7
discusses the conclusion and future work.
2 Motivation
The problem with machine learning techniques mostly depends on the historical
dataset for training and do not consider the availability of developers in bug triaging.
For example, the machine learning algorithm can identify one developer as an expert
for the newly reported bug, but at the same time, the developer might have been
assigned numerous bugs at the same time as the developer is an expert developer.
With the help of MCDM approaches, such a problem can be handled efficiently
by considering the availability of the developer as one of the non-profit criteria
(negative/lossy attribute or maximize/minimize attributes).
3 Related Work
existing machine learning mechanism were that it is difficult to label bug reports with
missing or insufficient label information, and most classification algorithms used in
existing approaches are costly and inefficient with large datasets.
Deep learning techniques to automate the bug assignment process are another set
of approaches that can be used with large datasets being researched by researchers
[11–18]. By extracting features, Mani et al. [19] proposed the deep triage technique,
which is based on the Deep Bidirectional Recurrent Neural Network with Attention
(DBRN-A) model. Unsupervised learning is used to learn the semantic and syntactic
characteristics of words. Similarly, Tuzun et al. [15] improved the accuracy of the
method by using Gated Recurrent Unit (GRU) instead of Long-Short Term Memory
(LSTM), combining different datasets to create the corpus, and changing the dense
layer structure (simply doubling the number of nodes and increasing the layer) for
better results. Guo et al. [17] proposed an automatic bug triaging technique based on
CNN and developer activity, in which they first apply text preprocessing, then create
the word2vec, and finally use CNN to predict developer activity. The problem asso-
ciated with the deep learning approach is that, based on the description, a developer
can be selected accurately, but availability and expertise can’t be determined.
Several studies [19–23] have increased bug triaging accuracy by including addi-
tional information such as components, products, severity, and priority. Hadi et al.
[19] have presented the Dependency-aware Bug Triaging Method (DABT). This
considers both bug dependency and bug fixing time in bug triaging using Topic
mode Linear Discriminant Analysis (LDA) and integer programming to determine
the appropriate developer who can fix the bug in a given time slot. Iyad et al. [20]
have proposed the graph-based feature augmentation approach, which uses graph-
based neighborhood relationships among the terms of the bug report to classify the
bug based on their priority using the RFSH [22] technique.
Some bugs must be tossed due to the inexperience of the developer. However, this
may be decreased by selecting the most suitable developer. Another MCDM-based
method is discussed in [24–28] for selecting the best developer. Goyal et al. [27]
used the MCDM method, namely the Analytic Hierarchy Process (AHP) method,
for bug triaging, in which the first newly reported bug term tokens are generated and
developers are ranked based on various criteria with different weightages. Gupta et al.
[28] used a fuzzy technique for order of preference by similarity to ideal solution
(F-TOPSIS) with the Bacterial Foraging Optimization Algorithm (BFOA) and Bar
Systems (BAR) to improve bug triaging results.
From the above-discussed method, it can be concluded that the existing methods
do not consider the ranking of bugs or developers using metadata and multi-
criteria decision-making for selecting the developer. And in reality, all the parame-
ters/features are not equal, so the weight should be assigned explicitly for features
and their prioritization. Other than these, in the existing method, developer avail-
ability is also not considered. Hence, these papers identify this gap and suggest a
hybrid bug triaging mechanism.
142 P. Rathoriya et al.
4 Methodology
The proposed method explained in this section consists of the following steps
(Fig. 1).–
In the first step, the bug data is collected from open sources like Kaggle or bugzilla.
In the present paper, the dataset has 10,000 raw and 28 columns of attributes which
contain information related to bugs, like the developer who fixed it, when the bug
was triggered, when the bug was fixed, bud id, bug summary, etc. The dataset is taken
from Kaggle.
In step two, preprocessing tasks are applied to the bug summary, for example, text
lowercasing, stop word removal, tokenization, stemming, and lemmatization.
In the third step, developer metadata is extracted from the dataset. It consists of
the following: developer name, total number of bugs assigned to each developer,
number of bugs resolved by each developer, new bugs assigned to each developer,
total experience of developer, average fixing time of developer to resolve all bugs.
And developer vocabulary is also created by using bug developer names and bug
summary.
In the fourth step, the newly reported preprocessed bug summary is matched
with developer vocabulary using the cosine similarity [17] threshold filter. Based on
similarity, developers are filtered from the developer vocabulary for further steps.
In the fifth step, developer metadata is extracted from step 3 only for the filtered
developer from step 4.
In step six AHP method is applied to find the criteria weight. It has the following
steps for bug triaging:
1. Problem definition: The problem is to identify the appropriate developer to fix
the newly reported bug.
2. Parameter selection: Appropriate parameter (Criteria) are selected for finding
their weight. It has the following criteria: name of developer (D), developer
experience in year (E), total number of bugs assigned (A), newly assigned bugs
(N), total bug fixed (R), and average fixing time (F).
3. Create a judgement matrix (A): An squared matrix named A order of m × m is
created for pairwise comparison of all the criteria, and the element of the matrix
the relative importance of criteria to the other criteria-
Am∗m = ai j (1)
Here i, j = 1,2.3……..m.
For relative importance, the following data will be used:
4. Then normalized the matrix A.
5. Then find the eigenvalue and eigenvector Wt .
6. Then a consistency check of weight will be performed. It has the following steps:
i. Calculate λmaκ by given Eq. (4):
1 i th in AW t
n
λmaκ = (4)
n i=1 i th inW t
(λmaκ − n)
CI = (5)
(n − 1)
144 P. Rathoriya et al.
CI
CR = (6)
RI
Here RI is random index [24]. If the consistency ratio is less than 0.10, the
weight is consistent and we can use the weight (W) for further measurement
in next step. If not, repeat from step 3 of AHP and use the same step.
In step 7, the TOPSIS [27] model is applied for ranking the developer. The TOPSIS
model has the following steps for developer ranking:
1. Make a performance matrix (D) for each of the selected developers with the
order m × n, where n is the number of criteria and m is the number of developers
(alternatives),
And the element of the matrix will be the respective value for the developer
according to the criteria.
2. Normalize the Matrix Using the Following Equation
m
Ri j = ai j / ai2j (7)
k=1
3. Multiply the Normalized Matrix (Rij ) with the Calculated Weight from the Ahp
Method.
4. Determine the positive idea solution (best alternative) (A* ) and the anti-idea
solution (A− )(worst alternative) using the following equations.
5. Find the Euclidean distance between the alternative and the best alternative
called d* , and similarly, from the worst alternative called d − , using the following
formula
n
di =
∗
(vi j − v ∗j )2 (11)
j=1
A Clustering and TOPSIS-Based Developer Ranking Model … 145
n
di =
− 2
(vi j − v ij ) (12)
j=1
6. Find the similarity to the worst condition (CC). It is also called the close-
ness ration. Using the following formula, the higher the closeness ration of an
alternative, the higher the ranking
di−
CCi = (13)
di∗ + di−
For bug triaging, the developer who has the highest closeness will be ranked
first, and the lowest closeness developer will have the last rank for bug triaging.
relative importance of criteria to other criteria. For example, in Table 4 criteria “total
bug assigned” has strong importance, our “experience” hence assigns 5 and in vice
versa case assigns 1/5, and criteria “total bug resolved” has very strong importance,
our experience assigns 7 and in visa versa case 1/7 will be assigned. Similarly, other
values can be filled by referring to Table 2, and diagonal value is always fixed to one.
The resultant judgement matrix is given in table format in Table 3.
In next step, Table 4 will normalize and eigenvalue and eigenvector will be calcu-
lated, the transpose of eigenvector is the weight of criteria that is given in Table
4.
Then the calculated criteria weight consistency will be checked by following
Eqs. (4), (5) and (6) and get the result shown below.
λmaκ = 5.451
Since the CR 0. 1, hence the weights are consistent and it can be used for further
calculation.
Now in next Step TOPSIS method is applied, The TOPSIS method generates
the first evolutionary matrix (D) of size m*n, where m is the number of alternatives
(developers) and n is the number of criteria, and in our example, there are 5 developers
and 5 criteria, so the evolution matrix will be 5* 5.In the next step, matrix D will be
normalized by using Eq. (7), and then get the resultant matrix R, and, matrix R will be
multiplied with weight W by using Eq. (8) and get the weighted normalized matrix
shown in Table 5.
In the next step, the best alternative and worst alternative are calculated by using
Eqs. 9 and 10 shown in Table 6.
Next, find the distance between the best substitute and the target substitute and
also from the worst alternative using Eqs. (11) and (12). Then, using Eq. (13), get
the closeness ratio that is shown in Table 7.
Generally, CC = 1 if the alternative has the best solution. Similarly, if CC = 0,
then the alternative has the worst solution. Based on the closeness ration, D1 as the
first rank, then D4 has the second rank, and respectively, D5, D3, and D2 have the
third, fourth, and fifth ranks. Ranking Bar graph based on closeness ration is also
shown in Fig. 2.
6 Threats to Validity
The suggested model poses a threat due to the use of the ahp approach to calculate
the criterion weight. Because the judgement matrix is formed by humans, there may
be conflict in the emotions of humans when assigning weight to criteria, and there
may be a chance of obtaining distinct criteria weight vectors, which may affect the
overall rank of a developer in bug triaging.
A new algorithm is proposed for bug triaging using hybridization of two MCDM
algorithms respectively AHP for criteria weight calculation and TOPSIS for ranking
of the developers with considering the availability of the developers. The future work
can be applying other MCDM algorithms for the effective ranking of developers in
bug triaging.
References
1. Yalcin AS, Kilic HS, Delen D (2022) The use of multi-criteria decision-making methods
in business analytics: A comprehensive literature review. Technol Forecast Soc Chang 174,
121193
2. Mojaver M, et al. (2022) Comparative study on air gasification of plastic waste and conventional
biomass based on coupling of AHP/TOPSIS multi-criteria decision analysis. Chemosphere 286,
131867
3. Sawarkar R, Nagwani NK, Kumar S (2019) Predicting available expert developer for newly
reported bugs using machine learning algorithms. In: 2019 IEEE 5th International Conference
A Clustering and TOPSIS-Based Developer Ranking Model … 149
26. Goyal A, Sardana N (2017) Optimizing bug report assignment using multi criteria decision
making technique. Intell Decis Technol 11(3):307–320
27. Gupta C, Inácio PRM, Freire MM (2021) Improving software maintenance with improved
bug triaging in open source cloud and non-cloud based bug tracking systems. J King Saud
Univ-Comput Inf Sci
28. Goyal A, Sardana N (2021) Feature ranking and aggregation for bug triaging in open-source
issue tracking systems. In: 2021 11th International Conference on Cloud Computing, Data
Science & Engineering (Confluence). IEEE
GujAGra: An Acyclic Graph to Unify
Semantic Knowledge, Antonyms,
and Gujarati–English Translation
of Input Text
1 Introduction
One of the most challenging issues in NLP is recognizing the correct sense of each
word that appears in input expressions. Words in natural languages can have many
meanings, and several separate words frequently signify the same notion. WordNet
can assist in overcoming such challenges. WordNet is an electronic lexical database
that was created for English and has now been made available in various other
languages [1]. Words in WordNet are grouped together based on their semantic simi-
larity. It segregates words into synonym sets or synsets, which are sets of cognitively
synonymous terms. A synset is a collection of words that share the same part of
speech and may be used interchangeably in a particular situation. WordNet is widely
regarded as a vital resource for scholars working in computational linguistics, text
analysis, and a variety of other fields. A number of WordNet compilation initiatives
have been undertaken and carried out in recent years under a common framework
for lexical representation, and they are becoming more essential resources for a wide
range of NLP applications such as a Machine Translation System (MTS).
The rest of the paper is organized as follows:
Next section gives a brief of Gujarati Language. Section 3 gives an overview of
previous Relevant Work in this topic. Section 4 covers the description about each
component of System Architecture for the software used to build WordNet graph
with respect to Gujarati–English–Gujarati Language. Section 5 demonstrates the
Proposed Algorithm for the same. Section 6 is about the Experiment and Results.
Section 7 brings the work covered in this article to a Conclusion.
M. Patel (B)
Indore Institute of Science and Technology, Indore, India
e-mail: [email protected]
B. K. Joshi
Military College of Telecommunication Engineering, Mhow, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 151
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_14
152 M. Patel and B. K. Joshi
2 Gujarati Language
3 Literature Review
Word Sense Disambiguation (WSD) is the task of identifying the correct sense of
a word in a given context. WSD is an important intermediate step in many NLP
tasks especially in Information extraction, Machine translation [N3]. Word sense
ambiguity arises when a word has more than one sense. Words which have multiple
meanings are called homonyms or polysemous words. The word mouse clearly has
different senses. In the first sense it falls in the electronic category, the computer
mouse that is used to move the cursor in computers and in the second sense it falls
in animal category. The distinction might be clear to the humans but for a computer
to recognize the difference it needs a knowledge base or needs to be trained. Various
approaches have been proposed to achieve WSD: Knowledge-based methods rely on
dictionaries, lexical databases, thesauri, or knowledge graphs as primary resources,
and use algorithms such as lexical similarity measures or graph-based measures.
Supervised methods, on the other hand make use of sense annotated corpora as
training instances. These use machine learning techniques to learn a classifier from
labeled training sets. Some of the common techniques used are decision lists, decision
trees, Naive Bayes, neural networks, support vector machines (SVM).
Finally, unsupervised methods make use of only raw unannotated corpora and do
not exploit any sense-tagged corpus to provide a sense choice for a word in context.
These methods are context clustering, word clustering, and cooccurence graphs.
Supervised methods are by far the most predominant as they generally offer the best
results [N1]. Many works try to leverage this problem by creating new sense annotated
corpora, either automatically, semi-automatically, or through crowdsourcing.
In this work, the idea is to solve this issue by taking advantage of the semantic
relationships between senses included in WordNet, such as the hypernymy, the
hyponymy, the meronymy, and the antonymy. The English WordNet was the first of
GujAGra: An Acyclic Graph to Unify Semantic Knowledge, Antonyms … 153
its kind in this field to be developed. It was devised in 1985 and is still being worked
on today at Princeton University’s Cognitive Science Laboratory [6]. The success
of English WordNet has inspired additional projects to create WordNets for other
languages or to create multilingual WordNets. EuroWordNet is a semantic network
system for European languages. The Dutch, Italian, Spanish, German, French, Czech,
and Estonian languages are covered by the Euro WordNet project [7]. The BalkaNet
WordNet project [8] was launched in 2004 with the goal of creating WordNets for
Bulgarian, Greek, Romanian, Serbian, and Turkish languages. IIT, Bombay, created
the Hindi WordNet in India. Hindi WordNet was later expanded to include Marathi
WordNet. Assamese, Bengali, Bodo, Gujarati, Hindi, Kannada, Kashmiri, Konkani,
Malayalam, Manipuri, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Tamil, Telugu,
and Urdu are among the main Indian languages represented in the Indo WordNet
project [9]. These WordNets were generated using the expansion method, with Hindi
WordNet serving as a kingpin and being partially connected to English WordNet.
4 Software Description
In this section, we describe the salient features of the architecture of the system. The
Gujarati WordNet is implemented on Google Colaboratory platform. To automati-
cally generate semantic networks from text, we need to provide some preliminary
information to the algorithm so that additional unknown relation instances may be
retrieved. We used Indo WordNet, which was developed utilizing the expansion
strategy with Hindi WordNet as a pivot, for this purpose As a result, we manually
created Gujarati antonyms for over 700 words as a tiny knowledge base.
5 Proposed Algorithm
This section describes a method for producing an acyclic graph, which is essentially a
visualization tool for the WordNet lexical database. Through the proposed algorithm
we wish to view the WordNet structure from the perspective of a specific word in
the database using the suggested technique. Here we have focused on WordNet’s
main relation, the synonymy or SYNSET relation, antonym and the word’s English
translation.
This algorithm is based on what we will call a sense graph, which we formulate as
follows. Nodes in the sense graph comprise the words wi in a vocabulary W together
with the senses sij for those words. Labeled, undirected edges include word-sense
GujAGra: An Acyclic Graph to Unify Semantic Knowledge, Antonyms … 155
edges wi, si,j, which connect each word to all of its possible senses, and sense-sense
edges sij, sij labeled with a meaning relationship r that holds between the two senses.
WordNet is used to define their sense graph. Synsets in the WordNet ontology define
the sense nodes, a word-sense edge exists between any word and every synset to which
it belongs, and WordNet’s synset-to-synset relations of synonymy, hypernymy, and
hyponymy define the sense-sense edges. Figures 4 and 5 illustrate a fragment of a
WordNet- based sense graph.
Key point to observe is that this graph can be based on any inventory of word-sense
and sense-sense relationships. In particular, given a parallel corpus, we can follow
the tradition of translation-as-sense-annotation: the senses of an Gujarati word type
can be defined by different possible translations of that word in any other language.
Operationalizing this observation is straightforward, given a word-aligned parallel
corpus. If English word form ei is aligned with Gujarati word form gj, then ei(gj) is a
sense of ei in the sense graph, and there is a word-sense edge ei, ei(gj). Edges signi-
fying a meaning relation are drawn between sense nodes if those senses are defined
by the same translation word. For instance, English senses Defeat and Necklace both
arise via alignment (Haar), so a sense-sense edge will be drawn between these
sense nodes.
For the experimental purpose more than 200 random sentences have been found from
different Guajarati language e-books, e-newspapers, etc. A separate excel document
(file contains 700+ words) named as ‘Gujarati Opposite words.xlsx’ keeping one
word and its corresponding antonym in each row was created. Now for the generation
of the word net graph, google colab is used as it’s an online cloud service provided
by google (standalone system having Jupiter Notebook can also be used).
Firstly, all the APIs are being installed using pip install command. Then importing
required packages for processing of tokens like pywin, tensorflow tokenizer google
translator, and netwrokx. Figure 2 displays the content of ‘sheet 1’ of excel file
named ‘Gujarati Opposite words.xlsx’ using panda (pd) library. Then, the instance
of ‘Tokenzier’ from ‘keras’ api is called for splitting the sentence into number of
tokens as shown in Fig. 3.
Different color coding is used to represent different things. Like Light Blue is used
to represent token in our input string, Yellow color is used to represent synonyms,
Green color is used to represent opposite, Red color is used for English translation
of the token, and Pink color is used to represent pronunciation of the token. Hence if
no work has been done on a particular synset of the Gujarati WordNet, then acyclic
graph will not contain yellow node. As in our example, synonym of (hoy) is
not found so acyclic graph is not plotted for the same. Same way if antonym is not
available in knowledge base then green node will be omitted and so on.
Thereafter, a custom function is created which uniquely read the token and calls
different functions for obtaining respective value of synonyms, anonym, English
translation, pronunciation, and then create an acyclic WordNet graph.
Finally, the result is being saved in different dot png files showing the acyclic
WordNet graph for each token as shown in Fig. 4.
We have made acyclic graph to more than 200 sentences through our proposed
system. In some of the cases, we faced challenges. One of which is:
when (Ram e prativilok tyag kariyu) was given as input, then
acyclic graph for the word (prathvilok) is shown in Fig. 5. Here, the linguistic
resource that is used to extract synonyms of (Prathvilok) is Synset provided
by IIT, ID 1427. The concept means the
place meant for all of us to live. But in Synset (Mratyulok) is given as
co-synonym of .
7 Conclusion
by simply listing the different word forms that might be used to describe it in a
synonym set (synset). Through the proposed architecture, we extracted tokens from
the inputted sentence. Synonyms, antonyms, pronunciation, and translation of these
tokens are identified. Synonyms, antonyms, pronunciation, and translation of the
tokens identified previously are then plotted to form an acyclic graph to give picto-
rial view. Different color coding is used to represent the tokens, its synonyms, its
antonyms, its pronunciation, and its translation (Gujarati or English). We demon-
strated the visualization of WordNet structure from the perspective of a specific
word in this work. That is, we want to focus on a specific term and then survey the
greater structure of WordNet from there. While we did not design our method with
the intention of creating WordNets for languages other than Gujarati, we realize the
possibility of using it in this fashion with other language combinations as well. Some
changes must be made to the system’s architecture, for example, in Concept Extrac-
tion phase, linguistic resources of other languages for providing needed synonyms
have to be made available. But the overall design of displaying the information of
the Gujarati WordNet can be easily applied in developing a WordNet for another
language. We have presented an alternative means of deriving information about
GujAGra: An Acyclic Graph to Unify Semantic Knowledge, Antonyms … 159
References
1. Miller GA, Fellbaum C (2007) WordNet then and now. Lang Resour Eval 41(2), 209–214.
http://www.jstor.org/stable/30200582
2. Scheduled Languages in descending order of speaker’s strength - 2001”. Census of India.
https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers_in_India
3. Mikael Parkvall, “Världens 100 största språk 2007” (The World’s 100 Largest Languages
in 2007), in National encyclopedia. https://en.wikipedia.org/wiki/List_of_languages_by_num
ber_of_native_speakers
4. “Gujarati: The language spoken by more than 55 million people”. The Straits Times.
2017–01–19. https://www.straitstimes.com/singapore/gujarati-the-language-spoken-by-more-
than-55-million-people
5. Introduction to Gujarati wordnet (GCW12) IIT Bombay, Powai, Mumbai-400076 Maharashtra,
India. http://www.cse.iitb.ac.in/~pb/papers/gwc 12-gujarati-in.pdf
6. Miller GA (1990) WordNet: An on-line lexical database. Int J Lexicogr 3(4):235–312. Special
Issue.
7. Vossen P (1998) EuroWordNet: a multilingual database with lexical semantic networks. J
Comput Linguist 25(4):628–630
8. Tufis D, Cristea D, Stamou S (2004) Balkanet: aims, methods, results and perspectives: a
general overview. Romanian J Sci Technol Inf 7(1):9–43
9. Bhattacharya P (2010) IndoWordNet. In: lexical resources engineering conference, Malta
10. Narang A, Sharma RK, Kumar P (2013) Development of punjabi WordNet. CSIT 1:349–354.
https://doi.org/10.1007/s40012-013-0034-0
11. Kanojia D, Patel K, Bhattacharyya P (2018) Indian language Wordnets and their linkages
with princeton WordNet. In: Proceedings of the eleventh international conference on language
resources and evaluation (LREC 2018), Miyazaki, Japan
12. Patel M, Joshi BK (2021) Issues in machine translation of indian languages for information
retrieval. Int J Comput Sci Inf Secur (IJCSIS) 19(8), 59–62
13. Patel M, Joshi BK (2021) GEDset: automatic dataset builder for machine translation system
with specific reference to Gujarati–English. In: Presented in 11th International Advanced
Computing Conference held on 18th & 19th December, 2021
Attribute-Based Encryption Techniques:
A Review Study on Secure Access
to Cloud System
1 Introduction
Cloud computing is turning into the principal computing model in the future
because of its benefits, for example, high asset use rate and saving the signif-
icant expense of execution. The existing algorithms for security issues in cloud
computing are advanced versions of cryptography. Mainly cloud computing algo-
rithms are concerned about data security and privacy-preservation of the user. Most
solutions for privacy are based on encryption and data to be downloaded is encrypted
and stored in the cloud. To implement the privacy protection of data owners and data
users, the cryptographic data are shared and warehoused in cloud storage by applying
Cyphertext privacy—Attribute-based encryption (CP-ABE). Algorithms like AES,
DES, and so on are utilized for encoding the information before downloading it to
the cloud.
The main three features of clouds define the easy allocation of resources, a
platform for service management, and massive scalability to designate key design
components of processing and clouds storage. A customer of cloud administrations
might see an alternate arrangement of characteristics relying upon their remarkable
requirements and point of view [1]:
• Location free asset pools—process and storage assets might be located anyplace
that is the network available; asset pools empower reduction of the dangers of
weak links and redundancy,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 161
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_15
162 A. Kumar and G. Verma
The paper is divided into mainly five sections. Section one deals with an intro-
duction to research work with an explanation of basic concepts in brief. The second
section is about the background study of cloud computing security issues. The
third section deals with the survey study of the existing studies which are useful
as exploratory data for the research work, and to evaluate the review for new frame-
work designing. This section also presents the tabular form of the survey studies.
The fourth section describes the summary of the literature study done in Sect. 4 and
also presents the research gap analysis. Last Sect. 5 contains concludes the paper.
In the current conventional framework, there exist security issues for storing the
information in the cloud. Cloud computing security incorporates different issues like
data loss, authorization of cloud, multi-occupancy, inner threats, spillage, and so
forth. It isn’t difficult to carry out the safety efforts that fulfill the security needs of
all the clients. It is because clients might have dissimilar security concerns relying
on their motivation of utilizing the cloud services [5].
164 A. Kumar and G. Verma
• Access controls: The security concept in cloud system require the CSP to provide
an access control policy so that the data owner can restrict end-user to access it
from authenticated network connections and devices.
• Long-term resiliency of the encryption system: With most current cryptog-
raphy, the capacity to maintain encoded data secret is put together not concerning
the cryptographic calculation, which is generally known, yet on a number consid-
ered a key that should be utilized with the cryptographic algorithm to deliver an
encoded result or to decode the encoded data. Decryption with the right key is
basic. Decoding without the right key is undeniably challenging, and sometimes
for all practical purposes.
3 Review Study
4 Review Summary
There are some points summarized after surveying the distinctive encryption-based
cloud security strategies for late exploration improvements that are as per the
following:
Attribute-Based Encryption Techniques: A Review Study on Secure … 167