How does transfer learning enhance the performance of image classification models? Use examples from the provided sources.

Transfer learning enhances performance by using pre-trained models to extract features from images, thus reducing the amount of training data and computational resources needed. For instance, the study by Fan et al. employs transfer learning and achieves high classification accuracy on lung and colon cancer datasets . Similarly, the Xception model uses depthwise separable convolution layers as a pre-trained feature extractor to improve image classification accuracy .

Explain how ensemble methods are used in machine learning to improve model performance, citing examples from the text.

Ensemble methods in machine learning improve model performance by merging outputs from multiple models to create a more robust and accurate final output. Random Forest, for example, is an ensemble technique combining outputs from multiple decision trees to reduce overfitting and improve computational efficiency . The ensemble classier further consolidates results by using multiple machine learning models like Logistic Regression, Gaussian Naive Bayes, and Random Forest to predict the final output based on the highest probability class .

Discuss how the performance of different missing data imputation methods is evaluated and compared in Source 3.

Performance of different missing data imputation methods is evaluated using metrics such as accuracy, RMSE, and MAE. For instance, the KExtraGa method is shown to perform better than the FcmSvrGa and k-POD methods by achieving better relative accuracy and lower median RMSE and MAE error. However, despite its improved accuracy, KExtraGa's high computation time remains a drawback .

Based on the article, how do gradient boosting techniques, particularly LightGBM, affect predictive modeling tasks?

Gradient boosting techniques, and specifically LightGBM, enhance predictive modeling tasks by optimizing computational efficiency and boosting prediction accuracy through the selection of features and emphasis on larger gradient situations. This makes LightGBM particularly effective for competitive machine learning environments where speedy and accurate predictive modeling is crucial .

In the context of machine learning contests, why is LightGBM often preferred over other models for handling tabular data?

LightGBM is often preferred in machine learning contests for tabular data due to its efficient implementation of gradient boosting that incorporates automated feature selection, focusing on boosting scenarios with larger gradients, leading to faster training and improved accuracy .

What are the significant limitations of the KExtraGa model in missing data imputation, and what future improvements are suggested?

The significant limitation of the KExtraGa model is its high computation time, despite its improved accuracy over other methods. Future improvements suggested include reducing computation time and applying the model to larger datasets to enhance accuracy further .

Discuss the concept of ensemble learning as applied in the recognition of Indian classical dance forms, focusing on its architecture.

Ensemble learning in recognizing Indian classical dance forms involves combining predictions from multiple classifiers to enhance accuracy. This approach includes deep learning classifiers, such as Deep Neural Network (DNN) and Convolution Neural Network (CNN), along with a voting mechanism involving Logistic Regression, Gaussian Naive Bayes, and Random Forest classifiers. The architecture is designed to aggregate strengths of individual models, resulting in more robust predictions based on the ensemble outcome with the highest probability class .

Describe how CNN architectures are utilized in feature extraction for medical image classification.

CNN architectures are utilized in feature extraction by processing multi-dimensional medical image data through convolution, pooling, and activation layers that identify and emphasize important features within the images. These architectures are employed in transfer learning settings to minimize computational resources and improve classification accuracy, as the features learned from existing datasets (like ImageNet) are directly applicable to medical image classification tasks .

How does the Light Gradient Boosting Machine Regressor (LGBMR) improve training speed and prediction accuracy compared to traditional gradient boosting methods?

Light Gradient Boosted Machine Regressor (LGBMR) improves training speed and prediction accuracy by incorporating automated feature selection and focusing on boosting situations with larger gradients, which results in more efficient training and improved prediction accuracy. These features make LightGBM the de facto technique for machine learning contests involving tabular data for predictive modeling tasks .

What role does the Mean Absolute Error (MAE) play in evaluating regression models, and how is it computed according to the document?

Mean Absolute Error (MAE) is used in evaluating regression models by averaging the absolute differences between actual and predicted values. It is computed as the mean of all individual prediction errors on data entries in the test set, as per the formula: MAE = 1/n ∑|Ai - A|, where n is the number of instances, Ai is the actual value, and A is the predicted value .

Open navigation menu

Upload

100% found this document useful (2 votes)

2K views885 pages

Machine Learning

Uploaded by

Anand Krishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

2K views885 pages

Machine Learning

Uploaded by

Anand Krishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lecture Notes in Electrical Engineering 998

Pradeep Singh
Deepak Singh
Vivek Tiwari
Sanjay Misra Editors

Machine Learning
and Computational
Intelligence
Techniques for Data
Engineering
Proceedings of the 4th International
Conference MISP 2022, Volume 2
Lecture Notes in Electrical Engineering

Volume 998

Series Editors

Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli
Federico II, Naples, Italy
Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán, Mexico
Bijaya Ketan Panigrahi, Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India
Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, Munich, Germany
Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China
Shanben Chen, Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore, Singapore,
Singapore
Rüdiger Dillmann, Humanoids and Intelligent Systems Laboratory, Karlsruhe Institute for Technology, Karlsruhe,
Germany
Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China
Gianluigi Ferrari, Università di Parma, Parma, Italy
Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid, Madrid,
Spain
Sandra Hirche, Department of Electrical Engineering and Information Science, Technische Universität München,
Munich, Germany
Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA, USA
Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Alaa Khamis, German University in Egypt El Tagamoa El Khames, New Cairo City, Egypt
Torsten Kroeger, Stanford University, Stanford, CA, USA
Yong Li, Hunan University, Changsha, Hunan, China
Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA
Ferran Martín, Departament d’Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra,
Barcelona, Spain
Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore
Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany
Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA
Sebastian Möller, Quality and Usability Laboratory, TU Berlin, Berlin, Germany
Subhas Mukhopadhyay, School of Engineering and Advanced Technology, Massey University,
Palmerston North, Manawatu-Wanganui, New Zealand
Cun-Zheng Ning, Electrical Engineering, Arizona State University, Tempe, AZ, USA
Toyoaki Nishida, Graduate School of Informatics, Kyoto University, Kyoto, Japan
Luca Oneto, Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genova,
Genova, Genova, Italy
Federica Pascucci, Dipartimento di Ingegneria, Università degli Studi Roma Tre, Roma, Italy
Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Gan Woon Seng, School of Electrical and Electronic Engineering, Nanyang Technological University,
Singapore, Singapore
Joachim Speidel, Institute of Telecommunications, Universität Stuttgart, Stuttgart, Germany
Germano Veiga, Campus da FEUP, INESC Porto, Porto, Portugal
Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Beijing, China
Walter Zamboni, DIEM—Università degli studi di Salerno, Fisciano, Salerno, Italy
Junjie James Zhang, Charlotte, NC, USA
The book series Lecture Notes in Electrical Engineering (LNEE) publishes the
latest developments in Electrical Engineering—quickly, informally and in high
quality. While original research reported in proceedings and monographs has
traditionally formed the core of LNEE, we also encourage authors to submit books
devoted to supporting student education and professional training in the various
fields and applications areas of electrical engineering. The series cover classical and
emerging topics concerning:
• Communication Engineering, Information Theory and Networks
• Electronics Engineering and Microelectronics
• Signal, Image and Speech Processing
• Wireless and Mobile Communication
• Circuits and Systems
• Energy Systems, Power Electronics and Electrical Machines
• Electro-optical Engineering
• Instrumentation Engineering
• Avionics Engineering
• Control Systems
• Internet-of-Things and Cybersecurity
• Biomedical Devices, MEMS and NEMS

For general information about this book series, comments or suggestions, please
contact [email protected].
To submit a proposal or request further information, please contact the Publishing
Editor in your country:
China
Jasmine Dou, Editor ([email protected])
India, Japan, Rest of Asia
Swati Meherishi, Editorial Director ([email protected])
Southeast Asia, Australia, New Zealand
Ramesh Nath Premnath, Editor ([email protected])
USA, Canada
Michael Luby, Senior Editor ([email protected])
All other Countries
Leontina Di Cecco, Senior Editor ([email protected])
** This series is indexed by EI Compendex and Scopus databases. **
Pradeep Singh · Deepak Singh · Vivek Tiwari ·
Sanjay Misra
Editors

Machine Learning
and Computational
Intelligence Techniques
for Data Engineering
Proceedings of the 4th International
Conference MISP 2022, Volume 2
Editors
Pradeep Singh Deepak Singh
Department of Computer Science Department of Computer Science
and Engineering and Engineering
National Institute of Technology Raipur National Institute of Technology Raipur
Raipur, Chhattisgarh, India Raipur, Chhattisgarh, India

Vivek Tiwari Sanjay Misra

Department of Computer Science Østfold University College
and Engineering Halden, Norway
International Institute of Information
Technology
Naya Raipur, Chhattisgarh, India

ISSN 1876-1100 ISSN 1876-1119 (electronic)

Lecture Notes in Electrical Engineering
ISBN 978-981-99-0046-6 ISBN 978-981-99-0047-3 (eBook)
https://doi.org/10.1007/978-981-99-0047-3

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Contents

A Review on Rainfall Prediction Using Neural Networks . . . . . . . . . . . . . . 1

Sudipta Mandal, Saroj Kumar Biswas, Biswajit Purayastha,
Manomita Chakraborty, and Saranagata Kundu
Identifying the Impact of Crime in Indian Jail Prison Strength
with Statical Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Sapna Singh kshatri and Deepak Singh
Visual Question Answering Using Convolutional and Recurrent
Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Ankush Azade, Renuka Saini, and Dinesh Naik
Brain Tumor Segmentation Using Deep Neural Networks:
A Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Pankaj Kumar Gautam, Rishabh Goyal, Udit Upadhyay, and Dinesh Naik
Predicting Bangladesh Life Expectancy Using Multiple Depend
Features and Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Fatema Tuj Jannat, Khalid Been Md. Badruzzaman Biplob,
and Abu Kowshir Bitto
A Data-Driven Approach to Forecasting Bangladesh
Next-Generation Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Md. Mahfuj Hasan Shohug, Abu Kowshir Bitto, Maksuda Akter Rubi,
Md. Hasan Imam Bijoy, and Ashikur Rahaman
A Cross Dataset Approach for Noisy Speech Identification . . . . . . . . . . . . 71
A. K. Punnoose
A Robust Distributed Clustered Fault-Tolerant Scheduling
for Wireless Sensor Networks (RDCFT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Sandeep Sahu and Sanjay Silakari

v
vi Contents

Audio Scene Classification Based on Topic Modelling and Audio

Events Using LDA and LSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
J. Sangeetha, P. Umamaheswari, and D. Rekha
Diagnosis of Brain Tumor Using Light Weight Deep Learning
Model with Fine Tuning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Tejas Shelatkar and Urvashi Bansal
Comparative Study of Loss Functions for Imbalanced Dataset
of Online Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Parth Vyas, Manish Sharma, Akhtar Rasool, and Aditya Dubey
A Hybrid Approach for Missing Data Imputation in Gene
Expression Dataset Using Extra Tree Regressor and a Genetic
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Amarjeet Yadav, Akhtar Rasool, Aditya Dubey, and Nilay Khare
A Clustering and TOPSIS-Based Developer Ranking Model
for Decision-Making in Software Bug Triaging . . . . . . . . . . . . . . . . . . . . . . . 139
Pavan Rathoriya, Rama Ranjan Panda, and Naresh Kumar Nagwani
GujAGra: An Acyclic Graph to Unify Semantic Knowledge,
Antonyms, and Gujarati–English Translation of Input Text . . . . . . . . . . . 151
Margi Patel and Brijendra Kumar Joshi
Attribute-Based Encryption Techniques: A Review Study
on Secure Access to Cloud System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Ashutosh Kumar and Garima Verma
Fall Detection and Elderly Monitoring System Using the CNN . . . . . . . . . 171
Vijay Mohan Reddy Anakala, M. Rashmi, B. V. Natesha,
and Ram Mohana Reddy Guddeti
Precise Stratification of Gastritis Associated Risk Factors
by Handling Outliers with Feature Selection in Multilayer
Perceptron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Brindha Senthil Kumar, Lalhma Chhuani, Lalrin Jahau,
Madhurjya Sarmah, Nachimuthu Senthil Kumar, Harvey Vanlalpeka,
and Lal Hmingliana
Portfolio Selection Using Golden Eagle Optimizer in Bombay
Stock Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Faraz Hasan, Faisal Ahmad, Mohammad Imran, Mohammad Shahid,
and Mohd. Shamim Ansari
Hybrid Moth Search and Dragonfly Algorithm for Energy-Efficient
5G Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Shriganesh Yadav, Sameer Nanivadekar, and B. M. Vyas
Contents vii

Automatic Cataract Detection Using Ensemble Model . . . . . . . . . . . . . . . . 219

Ashish Shetty, Rajeshwar Patil, Yogeshwar Patil, Yatharth Kale,
and Sanjeev Sharma
Nepali Voice-Based Gender Classification Using MFCC and GMM . . . . . 233
Krishna Dev Adhikari Danuwar, Kushal Badal, Simanta Karki,
Sirish Titaju, and Swostika Shrestha
Analysis of Convolutional Neural Network Architectures
for the Classification of Lung and Colon Cancer . . . . . . . . . . . . . . . . . . . . . . 243
Ankit Kumar Titoriya and Maheshwari Prasad Singh
Wireless String: Machine Learning-Based Estimation of Distance
Between Two Bluetooth Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Mritunjay Saha, Hibu Talyang, and Ningrinla Marchang
Function Characterization of Unknown Protein Sequences Using
One Hot Encoding and Convolutional Neural Network Based Model . . . 267
Saurabh Agrawal, Dilip Singh Sisodia, and Naresh Kumar Nagwani
Prediction of Dementia Using Whale Optimization Algorithm
Based Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Rajalakshmi Shenbaga Moorthy, Rajakumar Arul, K. Kannan,
and Raja Kothandaraman
Goodput Improvement with Low–Latency in Data Center Network . . . . 291
M. P. Ramkumar, G. S. R. Emil Selvan, M. Mahalakshmi,
and R. Jeyarohini
Empirical Study of Image Captioning Models Using Various Deep
Learning Encoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Gaurav and Pratistha Mathur
SMOTE Variants for Data Balancing in Intrusion Detection
System Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
S. Sams Aafiya Banu, B. Gopika, E. Esakki Rajan, M. P. Ramkumar,
M. Mahalakshmi, and G. S. R. Emil Selvan
Grey Wolf Based Portfolio Optimization Model Optimizing Sharpe
Ratio in Bombay Stock Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
Mohammad Imran, Faraz Hasan, Faisal Ahmad, Mohammad Shahid,
and Shafiqul Abidin
Fission Fusion Behavior-Based Rao Algorithm (FFBBRA):
Applications Over Constrained Design Problems in Engineering . . . . . . . 341
Saurabh Pawar and Mitul Kumar Ahirwal
A Novel Model for the Identification and Classification of Thyroid
Nodules Using Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Rajshree Srivastava and Pardeep Kumar
viii Contents

Food Recipe and Nutritional Information Generator . . . . . . . . . . . . . . . . . . 369

Ayush Mishra, Ayush Gupta, Arvind Sahu, Amit Kumar,
and Pragya Dwivedi
Can Machine Learning Algorithms Improve Dairy Management? . . . . . . 379
Rita Roy and Ajay Kumar Badhan
Flood Severity Assessment Using DistilBERT and NER . . . . . . . . . . . . . . . 391
S. N. Gokul Raj, P. Chitra, A. K. Silesh, and R. Lingeshwaran
Heart Disease Detection and Classification using Machine
Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
Saroj Kumar Chandra, Ram Narayan Shukla, and Ashok Bhansali
Recognizing Indian Classical Dance Forms Using Transfer Learning . . . 413
M. R. Reshma, B. Kannan, V. P. Jagathyraj, and M. K. Sabu
Improved Robust Droop Control Design Using Artificial Neural
Network for Islanded Mode Microgrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
Shraddha Gajbhiye and Navita Khatri
AI-Driven Prediction and Data Analytics for Neurological
Disorders—A Case Study of Multiple Sclerosis . . . . . . . . . . . . . . . . . . . . . . . 437
Natasha Vergis, Sanskriti Shrivastava, L. N. B. Srinivas,
and Kayalvizhi Jayavel
Rice Leaf Disease Identification Using Transfer Learning . . . . . . . . . . . . . 449
Prince Rajak, Yogesh Rathore, Rekh Ram Janghel,
and Saroj Kumar Pandey
Surface Electromyographic Hand Gesture Signal Classification
Using a Set of Time-Domain Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
S. Krishnapriya, Jaya Prakash Sahoo, and Samit Ari
Supervision Meets Self-supervision: A Deep Multitask Network
for Colorectal Cancer Histopathological Analysis . . . . . . . . . . . . . . . . . . . . . 475
Aritra Marik, Soumitri Chattopadhyay, and Pawan Kumar Singh
Study of Language Models for Fine-Grained Socio-Political Event
Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
Kartick Gupta and Anupam Jamatia
Fruit Recognition and Freshness Detection Using Convolutional
Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
R. Helen, T. Thenmozhi, R. Nithya Kalyani, and T. Shanmuga Priya
Modernizing Patch Antenna Wearables for 5G Applications . . . . . . . . . . . 513
T. N. Suresh Babu and D. Sivakumar
Contents ix

Veridical Discrimination of Expurgated Hyperspectral Image

Utilizing Multi-verse Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
Divya Mohan, S. Veni, and J. Aravinth
Self-supervised Learning for Medical Image Restoration:
Investigation and Finding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
Jay D. Thakkar, Jignesh S. Bhatt, and Sarat Kumar Patra
An Analogy of CNN and LSTM Model for Depression Detection
with Multiple Epoch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
Nandani Sharma and Sandeep Chaurasia
Delaunay Tetrahedron-Based Connectivity Approach for 3D
Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565
Ramesh Kumar and Tarachand Amgoth
CNN Based Apple Leaf Disease Detection Using Pre-trained
GoogleNet Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
Sabiya Fatima, Ranjeet Kaur, Amit Doegar, and K. G. Srinivasa
Adaptive Total Variation Based Image Regularization Using
Structure Tensor for Rician Noise Removal in Brain Magnetic
Resonance Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587
V. Kamalaveni, S. Veni, and K. A. Narayanankuttty
Survey on 6G Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
Rishav Dubey, Sudhakar Pandey, and Nilesh Das
Human Cognition Based Models for Natural and Remote Sensing
Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
Naveen Chandra and Himadri Vaidya
Comparison of Attention Mechanisms in Machine Learning
Models for Vehicle Routing Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629
V. S. Krishna Munjuluri Vamsi, Yashwanth Reddy Telukuntla,
Parimi Sanath Kumar, and Georg Gutjahr
Performance Analysis of ResNet in Facial Emotion Recognition . . . . . . . . 639
Swastik Kumar Sahu and Ram Narayan Yadav
Combined Heat and Power Dispatch by a Boost Particle Swarm
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
Raghav Prasad Parouha
A QoE Framework for Video Services in 5G Networks
with Supervised Machine Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . 661
K. B. Ajeyprasaath and P. Vetrivelan
A Survey of Green Communication and Resource Allocation in 5G
Ultra Dense Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669
Dhanashree Shukla and Sudhir D. Sawarkar
x Contents

A Survey on Attention-Based Image Captioning: Taxonomy,

Challenges, and Future Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681
Himanshu Sharma, Devanand Padha, and Arvind Selwal
QKPICA: A Socio-Inspired Algorithm for Solution of Large-Scale
Quadratic Knapsack Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695
Laxmikant, C. Vasantha Lakshmi, and C. Patvardhan
Balanced Cluster-Based Spatio-Temporal Approach for Traffic
Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
Gaganbir Kaur, Surender K. Grewal, and Aarti Jain
HDD Failure Detection using Machine Learning . . . . . . . . . . . . . . . . . . . . . . 721
I. Gokul Ganesh, A. Selva Sugan, S. Hariharan, M. P. Ramkumar,
M. Mahalakshmi, and G. S. R. Emil Selvan
Energy Efficient Cluster Head Selection Mechanism Using Fuzzy
Based System in WSN (ECHF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
Nripendra Kumar, Ditipriya Sinha, and Raj Vikram
Choosing Data Splitting Strategy for Evaluation of Latent Factor
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745
Alexander Nechaev, Vasily Meltsov, and Dmitry Strabykin
DBN_VGG19: Construction of Deep Belief Networks with VGG19
for Detecting the Risk of Cardiac Arrest in Internet of Things
(IoT) Healthcare Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757
Jyoti Mishra and Mahendra Tiwari
Detection of Malignant Melanoma Using Hybrid Algorithm . . . . . . . . . . . 773
Rashmi Patil, Aparna Mote, and Deepak Mane
Shallow CNN Model for Recognition of Infant’s Facial Expression . . . . . 783
P. Uma Maheswari, S. Mohamed Mansoor Roomi, M. Senthilarasi,
K. Priya, and G. Shankar Mahadevan
Local and Global Thresholding-Based Breast Cancer Detection
Using Thermograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793
Vartika Mishra, Subhendu Rath, and Santanu Kumar Rath
Multilevel Crop Image Segmentation Using Firefly Algorithm
and Recursive Minimum Cross Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805
Arun Kumar, A. Kumar, and Amit Vishwakarma
Deep Learning-Based Pipeline for the Detection of Multiple
Ocular Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 819
Ananya Angadi, Aneesh N. Bhat, P. Ankitha, Parul S. Kumar,
and Gowri Srinivasa
Contents xi

Development of a Short Term Solar Power Forecaster Using

Artificial Neural Network and Particle Swarm Optimization
Techniques (ANN-PSO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
Temitope M. Adeyemi-Kayode, Hope E. Orovwode,
Chibuzor T. Williams, Anthony U. Adoghe, Virendra Singh Chouhan,
and Sanjay Misra
A Rule-Based Deep Learning Method for Predicting Price of Used
Cars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845
Femi Emmanuel Ayo, Joseph Bamidele Awotunde, Sanjay Misra,
Sunday Adeola Ajagbe, and Nishchol Mishra
Classification of Fundus Images Based on Severity Utilizing SURF
Features from the Enhanced Green and Value Planes . . . . . . . . . . . . . . . . . 859
Minal Hardas, Sumit Mathur, and Anand Bhaskar
Hybrid Error Detection Based Spectrum Sharing Protocol
for Cognitive Radio Networks with BER Analysis . . . . . . . . . . . . . . . . . . . . 873
Anjali Gupta and Brijendra Kumar Joshi
Lie Detection with the SMOTE Technique and Supervised
Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885
M. Ramesh and Damodar Reddy Edla
About the Editors

Dr. Pradeep Singh received a Ph.D. in Computer science and Engineering from the
National Institute of Technology, Raipur, and an M.Tech. in Software engineering
from the Motilal Nehru National Institute of Technology, Allahabad, India. Dr. Singh
is an Assistant Professor in the Computer Science & Engineering Department at the
National Institute of Technology. He has over 15 years of experience in various
government and reputed engineering institutes. He has published over 80 refereed
articles in journals and conference proceedings. His current research interests areas
are machine learning and evolutionary computing and empirical studies on software
quality, and software fault prediction models.

Dr. Deepak Singh completed his Bachelor of Engineering from Pt. Ravi Shankar
University, Raipur, India, in 2007. He earned his Master of Technology with honors
from CSVTU Bhilai, India, in 2011. He received a Ph.D. degree from the Department
of Computer Science and Engineering at the National Institute of Technology (NIT)
in Raipur, India, in 2019. Dr. Singh is currently working as an Assistant Professor at
the Department of Computer Science and Engineering, National Institute of Tech-
nology Raipur, India. He has over 8 years of teaching and research experience along
with several publications in journals and conferences. His research interests include
evolutionary computation, machine learning, domain adaptation, protein mining, and
data mining.

Dr. Vivek Tiwari is a Professor in Charge of the Department of Data Science

and AI and Faculty of Computer Science and Engineering at IIIT Naya Raipur,
India. He received B.Eng. from the Rajiv Gandhi Technical University, Bhopal, in
2004 and M.Tech. from SATI, Vidisha (MP), in 2008. He obtained a Ph.D. degree
from the National Institute of Technology, Bhopal (MA-NIT), India, in 2015 in data
mining and warehousing. Dr. Tiwari has over 65 research papers, 2 edited books,
and one international patent published to his credit. His current research interest is
in machine/deep learning, data mining, pattern recognition, business analytics, and
data warehousing.

xiii
xiv About the Editors

Dr. Sanjay Misra Sr. Member of IEEE and ACM Distinguished Lecturer, is Professor
at Østfold University College (HIOF), Halden, Norway. Before coming to HIOF, he
was Professor at Covenant University (400-500 ranked by THE (2019)) for 9 years.
He holds a Ph.D. in Information & Knowledge Engineering (Software Engineering)
from the University of Alcala, Spain, and an M.Tech. (Software Engineering) from
MLN National Institute of Tech, India. Total around 600 articles (SCOPUS/WoS)
with 500 co-authors worldwide (-130 JCR/SCIE) in the core & appl. area of Soft-
ware Engineering, Web engineering, Health Informatics, Cybersecurity, Intelligent
systems, AI, etc.
A Review on Rainfall Prediction Using
Neural Networks

Sudipta Mandal, Saroj Kumar Biswas, Biswajit Purayastha,

Manomita Chakraborty, and Saranagata Kundu

1 Introduction

Rain plays the most vital function in human life during all types of meteorological
events [1]. Rainfall is a natural climatic phenomenon that has a massive impact on
human civilization and demands precise forecasting [2]. Rainfall forecasting has a
link with agronomics, which contributes remarkably to the country’s providence [3,
4]. There are three methods for developing rainfall forecasting: (i) Numerical, (ii)
Statistical, and (iii) Machine Learning.
Numerical Weather Prediction (NWP) forecasts using computer power [5, 6].
To forecast future weather, NWP computer models process current weather obser-
vations. The model’s output is formulated on current weather monitoring, which
digests into the model’s framework and is used to predict temperature, precipitation,
and lots of other meteorological parameters from the ocean up to the top layer of the
atmosphere [7].
Statistical forecasting entails using statistics based on historical data to forecast
what might happen in the future [8]. For forecasting, the statistical method employs

S. Mandal (B) · S. K. Biswas · B. Purayastha

Computer Science and Engineering, National Institute of Technology, Silchar, Assam, India
e-mail: [email protected]
S. K. Biswas
e-mail: [email protected]
B. Purayastha
e-mail: [email protected]
M. Chakraborty
Computer Science and Engineering, VIT-AP Campus, Amaravati, Andhra Pradesh, India
e-mail: [email protected]
S. Kundu
Computer Science and Engineering, The University of Burdwan, Bardhaman, West Bengal, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 1
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_1
2 S. Mandal et al.

linear time-series data [9]. Each statistical model comes with its own set of limi-
tations. The statistical model, Auto-Regressive (AR), regresses against the series’
previous values. The AR term simply informs how many linearly associated lagged
observations are there, therefore it’s not suitable for data with nonlinear correla-
tions. The Moving Average (MA) model uses the previous error which is used as a
describing variable. It keeps track of the past of distinct periods for each anticipated
period, and it frequently overlooks intricate dataset linkages. It does not respond
to fluctuations that occur due to factors such as cycles and seasonal effects [10].
The ARIMA model (Auto-Regressive Integrated Moving Average) is a versatile and
useful time-series model that combines the AR and MA models [11]. Using stationary
time-series data, the ARIMA model can only forecast short-term rainfall. Because of
the dynamic nature of climatic phenomena and the nonlinear nature of rainfall data,
statistical approaches cannot be used to forecast long-term rainfall.
Machine learning can be used to perform real-time comparisons of historical
weather forecasts and observations. Because the physical processes which affect rain-
fall occurrence are extremely complex and nonlinear, some machine learning tech-
niques such as Artificial Neural Network (ANN), Support Vector Machine (SVM),
Random Forest Regression Model, Decision Forest Regression, and Bayesian linear
regression models are better suited for rainfall forecasting. However, among all
machine learning techniques, ANNs perform the best in terms of rainfall forecasting.
The usage of ANNs has grown in popularity, and ANNs are one of the most exten-
sively used models for forecasting rainfall. ANNs are a data-driven model that does
not need any limiting suppositions about the core model’s shape. Because of their
parallel processing capacity, ANNs are effective at training huge samples and can
implicitly recognize complex nonlinear correlations between conditional and non-
conditional variables. This model is dependable and robust since it learns from the
original inputs and their relationships with unseen data. As a result, ANNs can
estimate the approximate peak value of rainfall data with ease.
This paper presents the different rainfall forecasting models proposed using ANNs
and highlights some special features observed during the survey. This study also
reports the suitability of different ANN architectures in different situations for rain-
fall forecasting. Besides, the paper finds the weather parameters responsible for
rainfall and discusses different issues in rainfall forecasting using machine learning.
The paper has been assembled with the sections described as follows. Section 2
discusses the literature survey of papers using different models. Section 3 discusses
the theoretical analysis of the survey and discussion and, at last, the conclusion part
is discussed in Sect. 4. Future scope of this paper is also discussed in Sect. 4.

2 Literature Survey

Rainfall prediction is one of the most required jobs in the modern world. In general,
weather and rainfall are highly nonlinear and complex phenomena, which require the
latest computer modeling and recreation for their accurate prediction. An Artificial
A Review on Rainfall Prediction Using Neural Networks 3

Neural Network (ANN) can be used to foresee the behavior of such nonlinear systems.
Soft computing hands out with estimated models where an approximation answer and
result are obtained. Soft computing has three primary components; those are Artificial
Neural Network (ANN), Fuzzy logic, and Genetic Algorithm. ANN is commonly
used by researchers in the field of rainfall prediction. The human brain is highly
complex and nonlinear. On the other hand, Neural Networks are simplified models
of biological neuron systems. A neural network is a massively parallel distributed
processor built up of simple processing units, which has a natural tendency for
storing experiential knowledge and making it available for use. Many researchers
have attempted to forecast rainfall using various machine learning models. In most
of the cases, ANNs are used to forecast rainfall. Table 1 shows some types of ANNs
like a Backpropagation Neural Network (BPNN) and Convolutional Neural Network
(CNN) that are used based on the quality of the dataset and rainfall parameters for
better understanding and comprehensibility. Rainfall accuracy is measured using
accuracy measures such as MSE and RMSE.
One of the most significant advancements in neural networks is the Backprop-
agation learning algorithm. For complicated, multi-layered networks, this network
remains the most common and effective model. One input layer, one output layer,
and at least one hidden layer make up the backpropagation network. The capacity
of a network to provide correct outputs for a given dataset is determined by each
layer’s neuron count and the hidden layer’s number. The raw data is divided into
two portions, one for training purpose and the other for testing purpose of the
model. Vamsidhar et al. [1] have proposed an efficient rainfall prediction system
using BPNN. They created a 3-layered feedforward neural network architecture
and initialized the weights of the neural network. A 3-layered feedforward neural
network architecture was created by initializing the weights of the neural network by
random values between −1.0 and 1.0. Monthly rainfall from the year 1901 to the year
2000 was used here. Using humidity, dew point, and pressure as input parameters,
they obtained an accuracy of 99.79% in predicting rainfall and 94.28% for testing
purposes. Geeta et al. [2] have proposed monthly monsoon rainfall for Chennai,
using the BPNN model. Chennai’s monthly rainfall data from 1978 to 2009 were
taken for the desired output data for training and testing purposes. Using wind speed,
mean temperature, relative humidity, and aerosol values (RSPM) as rainfall param-
eter, they got a prediction of 9.96 error rate. Abhishek et al. [3] have proposed an
Artificial Neural Network system-based rainfall prediction model. They concluded
that when there is an increase in the number of hidden neurons in ANN, then MSE of
that model decreases. The model was built by five sequential steps: 1. Input and the
output data selection for supervised backpropagation learning. 2. Input and the output
data normalization. 3. Normalized data using Backpropagation learning. 4. Testing
of fit of the model. 5. Comparing the predicted output with the desired output. Input
parameters were the average humidity and the average wind speed for the 8 months
of 50 years for 1960–2010. Back Propagation Algorithm (BPA) was implemented in
the nftools, and they obtained a minimum MSE = 3.646. Shrivastava et al. [4] have
proposed a rainfall prediction model using backpropagation neural network. They
used rainfall data from Ambikapur region of Chhattisgarh, India. They concluded that
4

Table 1 Different rainfall forecasting models using neural network

Year Authors Region used Dataset Model used Parameters used Accuracy
2010 E. Vamsidhar et al. [1] Rainfall data of Website (www.tyndall. Backpropagation Humidity, Dew point, Obtained 99.79% of
time period ac.uk) neural network and Pressure accuracy and 94.28% in
1901–2000 testing
2011 G. Geetha et al. [2] Case study of Chennai 32 years of monthly Backpropagation Wind speed, mean Prediction of error was
mean data neural network temperature, relative 9.96% only
humidity, aerosol
values (RSPM)
2012 K. Abhishek et al. [3] Udipi district of 50 years, 1960–2010 Backpropagation Average humidity and Accuracy
Karnataka Algorithm (BPA) the average wind speed measure—MSE (Mean
Squared Error) MSE =
3.646
2013 G. Shrivastava et al. Ambikapur region of 1951 to 2012 Backpropagation Not mentioned 94.4% of L.P.A
[4] Chhattisgarh, India Neural (BPN)
2015 A. Sharma et al. [5] Region of Delhi Year not given, 365 Backpropagation Temperature, humidity, Accuracy Graph
(India) samples Neural (BPN) wind speed, pressure, plotted with NFTOOL.
dew point MSE = 8.70
2015 A. Chaturvedi [6] Region of Delhi Monsoon period from Backpropagation Humidity, wind MSE = 8.70
(India) May to September Feedforward Neural speed
Network
2018 Y. A. Lesnussaa et al. Ambon City Monthly data, Backpropagation Air temperature, wind MSE = 0.022
[7] 2011–2015 Feedforward Neural speed, air pressure
Network
1998 S Lee et al. [8] Switzerland 367 locations based on Radial basis function Rainfall data of 4 RMSE = 78.65
the daily rainfall at neural network regions Relative Error—0.46
nearby 100 locations Absolute Error—55.9
(continued)
S. Mandal et al.
Table 1 (continued)
Year Authors Region used Dataset Model used Parameters used Accuracy
2006 C. Lee et al. [9] Taipei City in Taiwan Daily rainfall-runoff of Radial basis function Rainfall frequency, the Success Rate—98.6%
Taipei neural network amount of
runoff, the water
continuity, and the
reliability
2015 Liu et al. [10] Handan city Monthly rainfall data Radial basis function Rainfall Data IMF Signal Graph
on 39-year in Handan neural network
city
2017 M. Qiu et al. [10] Manizales city Daily accumulated Convolutional Neural Rain gauges RMSE = 11.253
rainfall data Networks (Root Mean Square
Error)
2018 A. Haider et al. [12] Bureau of Not mentioned One-dimensional Deep Mean minimum RMSE = 15.951
Meteorology Convolutional Neural temperature (MinT)
Network and the mean maximum
temperature (MaxT)
A Review on Rainfall Prediction Using Neural Networks

2018 S. Aswin et al. [13] Not Given 468 months Convolutional Neural Precipitation RMSE = 2.44
Networks
2020 C. Zhang et al. [14] Shenzhen, China, 2014 to 2016 (March Deep convolutional Gauge rainfall and RMSE = 9.29
to September), neural network Doppler radar echo map
2019 R. Kaneko. et al. [15] Kyushu region in From 2016 to 2018 2-layer stacked LSTMs Wind direction and RMSE = 2.07
Japan wind velocity,
temperature,
precipitation, pressure,
and relative humidity
2020 A. Pranolo et al. [16] Tenggarong of East 1986 to 2008 A Long Short-Term Not mentioned RMSE = 0.2367
Kalimantan-Indonesia Memory
(continued)
5
6

Table 1 (continued)
Year Authors Region used Dataset Model used Parameters used Accuracy
2020 I. Salehin et al. [17] Bangladesh 2020 (1 Aug to 31 Long Short-Term Temperature, dew 76% accuracy
Meteorological Aug) Memory point, humidity, wind
Department properties (pressure,
speed, and direction)
2020 A. Samad et al. [18] Albany, Walpole, and 2007–2015 for Long Short-Term Temperature, pressure, MSE, RMSE, and
Witchcliffe training, 2016 for Memory humidity, wind speed, MAE
testing and wind direction RMSE = 5.343
2020 D. Zatusiva East Java, Indonesia December 29, 2014 to Long Short-Term El Nino Index (NI) and MAAPE = 0.9644
et al. [19] August 4, 2019 Memory Indian Ocean Dipole
(IOD)
2019 S. Poornima et al. Hyderabad, India Hyderabad region Intensified Long Maximum temperature, Accuracy—87.99%
[20] starting from 1980 Short-Term Memory minimum temperature,
until 2014 maximum relative
humidity, minimum
relative humidity, wind
speed, sunshine and
S. Mandal et al.
A Review on Rainfall Prediction Using Neural Networks 7

BPN is suitable for the identification of internal dynamics of high dynamic monsoon
rainfall. The performance of the model was evaluated by comparing Standard Devi-
ation (SD) and Mean Absolute Deviation (MAD). Based on backpropagation, they
were able to get 94.4% accuracy. Sharma et al. [5] have proposed a rainfall prediction
model on backpropagation neural network by using Delhi’s rainfall data. The input
and target data had to be normalized because of having different units. By using
temperature, humidity, wind speed, pressure, and dew point as input parameters of
the prediction model, MSE was approximately 8.70 and accuracy graph was plotted
with NFTools. Chaturvedi [6] has proposed rainfall prediction using backpropaga-
tion neural network. He took 70% of data for training purpose, 15% for testing, and
other 15% for validation purpose. The input data for the model consisted of 365
samples within that testing purpose of 255 samples, 55 samples for testing, and the
rest samples are for validation purpose. He plotted a graph using NFTools among the
predicted value and the target values which showed a minimized MSE of 8.7. He also
concluded increase in the neuron number of the network shows a decrease in MSE of
the model. Lessnussaa et al. [7] have proposed a rainfall prediction using backprop-
agation neural network in Ambon city. The researchers have used monthly rainfall
data from 2011 to 2015 and considered weather parameters such as air temperature,
air velocity, and pressure. They got a result of accuracy 80% by using alpha 0.7,
iteration number ( in terms of epoch) 10,000, and also MSE value 0.022.
Radial Basis Function Networks are a class of nonlinear layered feedforward
networks. It is a different approach which views the design of neural network as a
curve fitting problem in a high-dimensional space. The construction of a RBF network
involves three layers with entirely different roles: the input layer, the only hidden
layer, and the output layer. Lee et al. [8] have proposed rainfall prediction using an
artificial neural network. The dataset has been taken from 367 locations based on the
daily rainfall at nearly 100 locations in Switzerland. They proposed a divide-and-
conquer approach where the whole region is divided into four sub-areas and each is
modeled with a different method. For two larger areas, they used radial basis function
(RBF) networks to perform rainfall forecasting. They achieved a result of RMSE
of the whole dataset: 78.65, Relative and Absolute Errors, Relative Error—0.46,
Absolute Error—55.9 from rainfall prediction. For the other two smaller sub-areas,
they used a simple linear regression model to predict the rainfall. Lee et al. [9] have
proposed “Artificial neural network analysis for reliability prediction of regional
runoff utilization”. They used artificial neural networks to predict regional runoff
utilization, using two different types of artificial neural network models (RBF and
BPNN) to build up small-area rainfall–runoff supply systems. A historical rainfall
for Taipei City in Taiwan was applied in the study. As a result of the impact variances
between the results used in training, testing, and prediction and the actual results,
the overall success rates of prediction are about 83% for BPNN and 98.6% for
RBF. Liu Xinia et al. [10] have proposed Filtering and Multi-Scale RBF Prediction
Model of Rainfall Based on EMD Method, a new model based on empirical mode
decomposition (EMD) and the Radial Basis Function Network (RBFN) for rainfall
prediction. They used monthly rainfall data for 39 years in Handan city. Therefore, the
8 S. Mandal et al.

results obtained were evidence of the fact that the RBF network can be successfully
applied to determine the relationship between rainfall and runoff.
Convolutional Neural Network (ConvNet/CNN) is a well-known deep learning
algorithm which takes an input image, sets some relevance (by using learnable
weights and biases) to different aspects/objects in the image, and discriminates among
them. CNN is made up of different feedforward neural network layers, such as convo-
lution, pooling, and fully connected layers. CNN is used to predict rainfall for time-
series rainfall data. Qiu et al. [11] have proposed a multi-task convolutional neural
networks-based rainfall prediction system. They evaluated two real-world datasets.
The first one was the daily collected rainfall data from the meteorological station
of Manizales city. Another was a large-scale rainfall dataset taken from the obser-
vation sites of Guangdong province, China. They got a result of RMSE = 11.253
in their work. Halder et al. [12] have proposed a one-dimensional Deep Convolu-
tional Neural Network based on a monthly rainfall prediction system. Additional
local attributes were also taken like Mint and MaxT. They got a result of RMSE =
15.951 in their work. Aswin et al. [13] have proposed a rainfall prediction model
using a convolutional neural network. They used precipitation as an input parameter
and using 468 months of precipitation as an input parameter, they got an RMSE
accuracy of 2.44. Ghang et al. [14] have proposed a rainfall prediction model using
deep convolutional neural network. They collected this rainfall data from the mete-
orological observation center in Shenzhen, China, for the years 2014 to 2016 from
March to September. They got RMSE = 9.29 for their work. They have concluded
that Tiny-RainNet model’s overall performance is better than fully connected LSTM
and convolutional LSTM.
Recurrent Neural Network is an abstraction of feedforward neural networks that
possess intrinsic memory. RNN is recurring as it brings about the same function for
every input data and the output depends on the past compilation. After finding the
output, it is copied and conveyed back into the recurrent network unit.
LSTM is one of the RNNs that has the potential to forecast rainfall. LSTM is
a component of the Recurrent Neural Network (RNN) layer, which is accustomed
to addressing the gradient problem by forcing constant error flow. A LSTM unit
is made up of three primary gates, each of which functions as a controller for the
data passing through the network, making it a multi-layer neural network. Kaneko
et al. [15] have proposed a 2-layer stacked RNN-LSTM-based rainfall prediction
system with batch normalization. The LSTM model performance was compared
with MSM (Meso Scale Model by JMA) from 2016 to 2018. The LSTM model
successfully predicted hourly rainfall and surprisingly some rainfall events were
predicted better in the LSTM model than MSM. RMSE of the LSTM model and
MSM were 2.07 mm h-1 and 2.44 mm h-1, respectively. Using wind direction and
wind velocity, temperature, precipitation, pressure, and relative humidity as rainfall
parameters, they got an RMSE of 2.07. Pranolo et al. [16] have proposed a LSTM
model for predicting rainfall. The data consisted of 276 data samples, which were
subsequently separated into 216 (75%) training datasets for the years 1986 to 2003,
and 60 (25%) test datasets for the years 2004 to 2008. In this study, the LSTM
and BPNN architecture included a hidden layer of 200, a maximum epoch of 250,
A Review on Rainfall Prediction Using Neural Networks 9

gradient threshold of 1, and learning rate of 0.005, 0.007, and 0.009. These results
clearly indicate the advantages of the LSTM produced good accuracy than the BPNN
algorithm. They got a result of RMSE = 0.2367 in their work. Salehin et al. [17]
have proposed a LSTM and Neural Network-based rainfall prediction system. Time-
series forecasting with LSTM is a modern approach to building a rapid model of
forecasting. After analyzing all data using LSTM, they found 76% accuracy in this
work. LSTM networks are suitable for time-series data categorization, processing,
and prediction. So, they concluded that LSTM gives the most controllability and thus
better results were obtained. Samad et al. [18] have proposed a rainfall prediction
model using Long Short-Term memory. Using temperature, pressure, humidity, wind
speed, and wind direction as input parameters on the rainfall data of years 2007–
2015, they got an accuracy of RMSE 5.343. Haq et al. [19] have proposed a rainfall
prediction model using long short-term memory based on El Nino and IOD Data.
They used 60% training data with variation in the hidden layer, batch size, and learn
rate drop periods to achieve the best prediction results. They got an accuracy of
MAAPE = 0.9644 in their work. S. Poornima [20] has proposed an article named
“Prediction of Rainfall Using Intensified LSTM-Based Recurrent Neural Network
with Weighted Linear Units”. This paper presented Intensified Long Short-Term
Memory (Intensified LSTM)-based Recurrent Neural Network (RNN) to predict
rainfall. The parameters considered for the evaluation of the performance and the
efficiency of the proposed rainfall prediction model were Root Mean Square Error
(RMSE), accuracy, number of epochs, loss, and learning rate of the network. The
initial learning rate was fixed to 0.1, and no momentum was set as default, with a
batch size of 2500 undergone for 5 iterations since the total number of rows in the
dataset is 12,410 consisting of 8 attributes. The accuracy achieved by the Intensified
LSTM-based rainfall prediction model is 87.99%.
For prediction, all of these models use nearly identical rainfall parameters.
Humidity, wind speed, and temperature are important parameters for backpropa-
gation [21]. Temperature and precipitation are important factors in convolutional
neural networks (Covnet). Temperature, wind speed, and humidity are all impor-
tant factors for Recurrent Neural Network (RNN) and Long Short-Term Memory
(LSTM) networks. In most of the cases, accuracy measures such as MSE, RMSE,
and MAE are used. With temperature, air pressure, humidity, wind speed, and wind
direction as input parameters, BPNN has achieved an accuracy of 2.646, CNN has
achieved an accuracy of 2.44, and LSTM has achieved a better accuracy of RMSE
= 0.236. As a result, from this survey it can be said that LSTM is an effective model
for rainfall forecasting.
10 S. Mandal et al.

3 Theoretical Analysis of Survey and Discussion

High variability in rainfall patterns is the main problem of rainfall forecasting. Data
inefficiency and absence of the records like temperature, wind speed, and wind direc-
tions can affect prediction [22, 23]. So, data preprocessing is required for compen-
sating the missing values. As future data is unpredictable, models have to use esti-
mated data and assumptions to predict future weather [24]. Besides massive defor-
estation, abrupt changes in climate conditions may prove the prediction false. In the
case of the yearly rainfall dataset, there is no manageable procedure to determine rain-
fall parameters such as wind speed, humidity, and soil temperature. In some models,
researchers have used one hidden layer, and for that large number of hidden nodes
are required and performance gets minimized. To compensate this, 2 hidden layers
are used. More than 2 hidden layers give the same results. Either a few or more input
parameters can influence the learning or prediction capability of the network [25].
The model simulations use dynamic equations which demonstrate how the atmo-
sphere will respond to changes in temperature, pressure, and humidity over time.
Some of the frequent challenges while implementing several types of ANN archi-
tecture for modeling weekly, monthly, and yearly rainfall data are such as hidden
layer and node count, and training and testing dataset division. So, prior knowledge
about these methods and architectures is needed. As ANNs are prone to overfitting
problems, this can be reduced by early stopping or regularizing methods. Choosing
accurate performance measures and activation functions for simulation are also an
important part of rainfall prediction implementation.

4 Conclusions

This paper considers a study of various ANNs used by researchers to forecast rain-
fall. The survey shows that BPN, CNN RNN, LSTM, etc. are suitable to predict
rainfall than other forecasting techniques such as statistical and numerical methods.
Moreover, this paper discussed the issues that must be addressed when using ANNs
for rainfall forecasting. In most cases, previous daily data of rainfall and maximum
and minimum temperature, humidity, and wind speed are considered. All the models
provide good prediction accuracy, but as the models progress from neural networks
to deep learning, the accuracy improves, implying a lower error rate. Finally, based
on the literature review, it can be stated that ANN is practical for rainfall forecasting
because several ANN models have attained significant accuracy. RNN shows better
accuracy as there are memory units incorporated, so it can remember the past trends
of rainfall. Depending on past trends, the model gives a more accurate prediction.
Accuracy can be enhanced even more if other parameters are taken into account.
Rainfall prediction will be more accurate as ANNs progress, making it easier to
understand weather patterns.
A Review on Rainfall Prediction Using Neural Networks 11

From this research work after analyzing all the results from these mentioned
research papers, it can be concluded that neural networks perform better, so, for
further works, rainfall forecasting implementation will be done by using neural
networks. If RNN and LSTM are used, then forecasting would be better for their
additional memory unit. So, for the continuation of this paper, rainfall forecasting
of a particular region will be done using LSTM. And additionally, there will be
a comparative study with other neural networks for a better understanding of the
importance of artificial neural networks in rainfall forecasting.

References

1. Vamsidhar E, Varma KV, Rao PS, Satapati R (2010) Prediction of rainfall using back
propagation neural network model. Int J Comput Sci Eng 02(04):1119–1121
2. Geetha G, Samuel R, Selvaraj (2011) Prediction of monthly rainfall in Chennai using back
propagation neural network model. Int J Eng Sci Technol 3(1):211–213
3. Abhishek K, Kumar A, Ranjan R, Kumar S (2012) A rainfall prediction model using artificial
neural network. IEEE Control Syst Graduate Res Colloquium. https://doi.org/10.1109/ICS
GRC.2012.6287140
4. Shrivastava G, Karmakar S, Kowar MK, Guhathakurta P (2012) Application of artificial neural
networks in weather forecasting: a comprehensive literature review. IJCA 51(18):0975–8887.
https://doi.org/10.5120/8142-1867
5. Sharma A, Nijhawan G (2015) Rainfall prediction using neural network. Int J Comput Sci
Trends Technol (IJCST) 3(3), ISSN 2347–8578
6. Chaturvedi A (2015) Rainfall prediction using back propagation feed forward network. Int J
Comput Appl (0975 – 8887) 119(4)
7. Lesnussa YA, Mustamu CG, Lembang FK, Talakua MW (2018) Application of backpropaga-
tion neural networks in predicting rainfall data in Ambon city. Int J Artif Intell Res 2(2). ISSN
2579–7298
8. Lee S, Cho S, Wong PM (1998) Rainfall prediction using artificial neural network. J Geog Inf
Decision Anal 2:233–242
9. Lee C, Lin HT (2006) Artificial neural network analysis for reliability prediction of regional
runoff utilization. S. CIB W062 symposium 2006
10. Xinia L, Anbing Z, Cuimei S, Haifeng W (2015) Filtering and multi-scale RBF prediction
model of rainfall based on EMD method. ICISE 2009:3785–3788
11. Qiu M, Zha P, Zhang K, Huang J, Shi X, Wa X, Chu W (2017) A short-term rainfall prediction
model using multi-task convolutional neural networks. In: IEEE international conference on
data mining. https://doi.org/10.1109/ICDM.2017.49
12. Haidar A, Verma B (2018) Monthly rainfall forecasting using one-dimensional deep convo-
lutional neural network. Project: Weather Forecasting using Machine Learning Algorithm,
UNSW Sydney. https://doi.org/10.1109/ACCESS.2018.2880044
13. Aswin S, Geetha P, Vinayakumar R (2018) Deep learning models for the prediction of rainfall.
In: International conference on communication and signal procesing. https://doi.org/10.1109/
ICCSP.2018.8523829,2018
14. Zhang CJ, Wang HY, Zeng J, Ma LM, Guan L (2020) Tiny-RainNet: a deep convolutional neural
network with bi-directional long short-term memory model for short-term rainfall prediction.
Meteorolog Appl 27(5)
15. Kaneko R, Nakayoshi M, Onomura S (2019) Rainfall prediction by a recurrent neural network
algorithm LSTM learning surface observation data. Am Geophys Union, Fall Meeting
12 S. Mandal et al.

16. Pranolo A, Mao Y, Tang Y, Wibawa AP (2020) A long short term memory implemented
for rainfall forecasting. In: 6th international conference on science in information technology
(ICSITech). https://doi.org/10.1109/ICSITech49800.2020.9392056
17. Salehin I, Talha IM, Hasan MM, Dip ST, Saifuzzaman M, Moon NN (2020) An artificial
intelligence based rainfall prediction using LSTM and neural network. https://doi.org/10.1109/
WIECON-ECE52138.2020.9398022
18. Samad A, Gautam V, Jain P, Sarkar K (2020) An approach for rainfall prediction using long short
term memory neural network. In: IEEE 5th international conference on computing communi-
cation and automation (ICCCA) Galgotias University, GreaterNoida,UP, India. https://doi.org/
10.1109/ICCCA49541.2020.9250809
19. Haq DZ, Novitasari DC, Hamid A, Ulinnuha N, Farida Y, Nugraheni RD, Nariswari R, Rohayani
H, Pramulya R, Widjayanto A (2020) Long short-term memory algorithm for rainfall prediction
based on El-Nino and IOD Data. In: 5th international conference on computer science and
computational intelligence
20. Poornima S, Pushpalatha M (2019) Prediction of rainfall using intensified LSTM based
recurrent neural network with weighted linear units. Comput Sci Atmos. 10110668
21. Parida BP, Moalafhi DB (2008) Regional rainfall frequency analysis for Botswana using L-
Moments and radial basis function network. Phys Chem Earth Parts A/B/C 33(8). https://doi.
org/10.1016/j.pce.2008.06.011
22. Dubey AD (2015) Artificial neural network models for rainfall prediction in Pondicherry. Int
J Comput Appl (0975–8887). 10.1.1.695.8020
23. Biswas S, Das A, Purkayastha B, Barman D (2013) Techniques for efficient case retrieval
and rainfall prediction using CBR and Fuzzy logic. Int J Electron Commun Comput Eng
4(3):692–698
24. Basha CZ, Bhavana N, Bhavya P, Sowmya V (2020) Proceedings of the international conference
on electronics and sustainable communication systems. IEEE Xplore Part Number: CFP20V66-
ART; ISBN: 978-1-7281-4108-4.
25. Biswas SK, Sinha N, Purkayastha B, Marbaniang L (2014) Weather prediction by recurrent
neural network dynamics. Int J Intell Eng Informat Indersci, 2(2/3):166–180 (ESCI journal)
Identifying the Impact of Crime
in Indian Jail Prison Strength
with Statical Measures

Sapna Singh kshatri and Deepak Singh

1 Introduction

The use of machine learning algorithms to forecast any crime is becoming common-
place. This research is separated into two parts: the forecast of violent crime and its
influence in prison, and the prediction of detainees in jail. We are using data from
a separate source. The first two datasets used are violent crime and total FIR data
from the police department, followed by data on prisoners and detainees sentenced
for violent crimes from the Jail Department.
A guide for the correct use of correlation in crime and jail strength is needed to
solve this issue. Data from the NCRB shows how correlation coefficients can be used
in real-world situations. As shown in Fig. 1, a correlation coefficient will be used for
the forecast of crime and the prediction of jail overcrowding.
Regression and correlation and are two distinct yet complementary approaches.
In general, regressions are used to make predictions (which do not extend beyond the
data used in the research), whereas correlation is used to establish the degree of link.
There are circumstances in which the x variable is neither fixed nor readily selected
by the researcher but is instead a random covariate of the y variable [1]. In this article,
the observer’s subjective features and the latest methodology are used. The begin-
nings and increases of crime are governed by age groups, racial backgrounds, family
structure, education, housing size [2], employed-to-unemployed ratio, and cops per
capita. Rather than systematic classification to categorize under the impressionistic

S. S. kshatri (B)
Department of Computer Science and Engineering (AI), Shri Shankaracharya Institute of
Professional Management and Technology, Raipur, C.G., India
e-mail: [email protected]
D. Singh
Department of Computer Science and Engineering, National Institute of Technology, Raipur,
C.G., India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 13
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_2
14 S. S. kshatri and D. Singh

Fig. 1 Violence Crime state-wide in 2001

literary paradigm, we present a quick overview of a few of these measures in this

section. This is not a thorough assessment; rather, it emphasizes measurements that
demonstrate the many techniques that have been offered. Logical associates use the
term correlation to refer to affiliation, association, or, on the other hand, any part-
nership, relation, or correspondence. This extensional everyday meaning now means
that researchers in science are misusing the factual word “correlation.“ Correlation is
so widely used that some analysts say they never invented it [3]. Crime percentages
are widely used to survey the threat of crime by adjusting for the population at risk
of this property, which defines the crime percentage as one of the most well-known
indicators of crime investigation [4]. Is the rate of various crimes a result of the
progress in the number of prisoners in jails?
The correlation coefficient is the statistical extent of similarity between two contin-
uous variables. There are several types of correlation coefficients, but the most
common is the Pearson correlation coefficient, denoted by the symbol.
A correlation matrix is a simple way; it summarizes the correlation between all
variables in a dataset. We have a dataset of statistics about crime rates and jail popu-
lation staring at us. Fortunately, a correlation matrix helps us understand in quickly
comprehending the relationship between each variable. A key principle of multiple
linear regression is that no independent variables in the model are highly associ-
ated with one another. Correlation describes the link between defined values of one
Identifying the Impact of Crime in Indian Jail Prison Strength … 15

variable (the independent, explanatory, regressor, exogenous, carrier, or predictor)

and the means of all corresponding values of the second variable (the dependent,
outcome, response variable, or variable being explained). On the other hand, regres-
sion expresses the link between specific values of one variable (the independent,
predictor, explanatory, exogenous, regressor, or carrier) and the means of all corre-
sponding values. We could argue that investigating interdependence leads to an exam-
ination of correlations in general. The concept of regression is derived from the study
of dependency. When the x variable is a random covariate to the y variable, we are
more concerned with establishing the linear relationship’s strength than with predic-
tion. That is, x and y vary together (continuous variables), and the sample correlation
coefficient, r x y (r), is the statistics used for this purpose.
The purpose of this study is to compare crime and prison data in order to determine
the relationship so that we can reduce the stench of the prison. We have divided the rest
of the work as follows: in the following section, the literature on artificial intelligence
and deep learning method, which is one of the crime predictions researched, is
discussed. The third section discusses the dataset’s content and defines the problem.
Following a discussion of the problem, the fourth section explains the methodology
of the correlation employed in the study. Findings are presented in the fifth section.

2 Related Work

The classification paradigm was the clinical/theoretical approach, which began to

include philosophy and surveys. The justice system uses forecasting models to predict
future crime and recidivism, allocate resources effectively, and provide risk-based
and criminogenic treatment programs [5]. A novel path-building operator is provided
as a single population, which is integrated with MMAS and ACS to produce a three-
colony Ant Colony Optimization method. The method uses the Pearson correlation
coefficient as the evaluation criterion, choosing two colonies with the highest simi-
larity, and rewarding the parameters of the standard route in the two colonies to
accelerate convergence. We may adaptively manage the frequency of information
transmission between two colonies based on the dynamic feedback of the diversity
among colonies to ensure the ant colony algorithm finds better solutions [6].
Urban crime is an ongoing problem that concerns people all around the globe.
The article examines the impact of overlapping noises (local outliers and irregular
waves) in actual crime data and suggests a DuroNet encoder-decoder network for
capturing deep crime patterns. We compare two real-world crime databases from
New York and Chicago. DuroNet model outperforms previous approaches in high
accuracy and robustness [7]. The current work developed and implemented machine
learning models using machine learning algorithms (ensemble and simile), namely
SMO, J48, and Naive Bayes in an ensemble SVM-bagging, SVM-Random-forest,
and SVM-stacking (C4.5, SMO, J48). Each preset element is included in a dataset
for training on violent crime (murder, rape, robbery, etc.). After successfully training
and verifying six models, we came to a significant conclusion [8]. J. Jeyaboopathiraja
16 S. S. kshatri and D. Singh

et al. present the police department used big data analytics (BDA), support vector
machine (SVM), artificial neural networks (ANNs), K-means algorithm Naive Bayes.
AI (machine learning) and DL (deep learning approaches). This study’s aim is to
research the most accurate AI and DL methods for predicting crime rates and the
application of data approaches in attempts to forecast crime, with a focus on the
dataset.
Modern methods based on machine learning algorithms can provide predictions
in circumstances where the relationships between characteristics and outcomes are
complex. Using algorithms to detect potential criminal areas, these forecasts may
assist politicians and law enforcement to create successful programs to minimize
crime and improve the nation’s growth. The goal of this project is to construct a
machine learning system for predicting a morally acceptable output value. Our results
show that utilizing FAMD as a feature selection technique outperforms PCA on
machine learning classifiers. With 97.53 percent accuracy for FAMD and 97.10
percent accuracy for PCA, the naive Bayes classifier surpasses other classifiers [9].
Retrospective models employ past crime data to predict future crime. These include
hotspot approaches, which assume that yesterday’s hotspots are likewise tomorrow’s.
Empirical research backs this up: although hotspots may flare up and quiet down
quickly, they tend to stay there over time [10].
Prospective models employ more than just historical data to examine the causes
of major crime and build a mathematical relationship between the causes and levels
of crime. Future models use criminological ideas to anticipate criminal conduct. As
a consequence, these models should be more relevant and provide more “enduring”
projections [11]. Previous models used either socioeconomic factors (e.g., RTM
[15]) or near-repeat phenomena (e.g., Promap [12]; PredPol [13]). The term “near-
repeat” refers to a phenomenon when a property or surrounding properties or sites
are targeted again shortly after the initial criminal incident.
Another way, Drones may also be used to map cities, chase criminals, investigate
crime scenes and accidents, regulate traffic flow, and search and rescue after a catas-
trophe. In Ref. [14], legal concerns surrounding drone usage and airspace allocation
are discussed. The public has privacy concerns when the police acquire power and
influence. Concerns concerning drone height are raised by airspace dispersal. These
include body cameras and license plate recognition. In Ref. [15], the authors state that
face recognition can gather suspect profiles and evaluate them from various databases.
A license plate scanner may also get data on a vehicle suspected of committing a
crime. They may even employ body cameras to see more than the human eye can
perceive, meaning the reader sees and records all a cop sees. Normally, we cannot
recall the whole picture of an item we have seen. The influence of body cameras
on officer misbehaviors and domestic violence was explored in Ref. [16]. Patrol
personnel now have body cameras. Police misconduct protection. However, wearing
a body camera is not just for security purposes but also to capture crucial moments
during everyday activities or major operations.
While each of these ways is useful, they all function separately. While the police
may utilize any of these methods singly or simultaneously, having a device that
Identifying the Impact of Crime in Indian Jail Prison Strength … 17

can combine the benefits of all of these techniques would be immensely advan-
tageous. Classification of threats, machine learning, deep learning, threat detection,
intelligence interpretation, voice print recognition, natural language processing Core
analytics, Computer linguistics, Data collection, Neural networks Considering all of
these characteristics is critical for crime prediction.

3 Methods and Materials

3.1 Dataset

The dataset in this method consists of 28 states and seven union territories. As a
result, the crime has been split into parts. We chose some of the pieces that were held
in the category of Violence Crime for our study. A type for the Total Number of FIRs
has also been included. The first dataset was gathered from the police and prison
departments, and it is vast. Serious sequential data are typically extensive in size,
making it challenging to manage terabytes of data every day from various crimes.
Time series modeling is performed by using classification models, which simplify
the data and enable it to model to construct an outcome variable. Data from 2001 to
2015 was plotted in an excel file in a state-by-state format. The most common crime
datasets are chosen from a large pool of data. Within the police and jail departments,
violent offenses are common—one proposed arduous factor for both departments.
Overcrowding refers to a problem defining and describing thoughts, and it is also
linked to externally focused thought. This study aimed to investigate the connection
between violent crime, FIR, and strength in jail. There are some well-documented
psychological causes of aggression. For example, both impulsivity and anger have
indeed been linked to criminal attacks [17].
The frequent crime datasets are selected from huge data [17]. A line graph is used
to analyze the total IPC crimes for each state (based on districts) from 2001 to 2015.
The attribute “States? UT” is used to generate the data, as compared to the attribute
“average FIR.” The supervised and unsupervised data techniques are used to predict
crime accurately from the collected data [18, 19] (Table 1).

3.2 Experimental Work

The imported dataset is pictured with the class attribute being STATE/UT. The
representation diagram shows the distribution of attribute STATE/UT with different
attributes in the dataset; each shade in the perception graph represents a specific state.
The imported dataset is pictured; the representation diagram shows the circulation of
crime as 1–5 levels specific attributes with class attributes which are people captured
during the year.
18 S. S. kshatri and D. Singh

Table 1 Types of violence

Number Types of violence crime
crime
1 Chnot amounting to murder
2 Riots
3 Rape
4 Murder
5 Prep
6 Dowry death
7 Robbery
8 Dacoity
9 Kidnapping and abduction
10 Assembly for dacoity
11 Arson
12 Attempt to murder

The blue region in the chart represents high crime like murder, and the pink area
represents low crime like the kidnapping of a particular attribute in the dataset. Police
data create a label for murder, attempt to murder, and dowry death as 1—the rape,
2—attempt to rape, 3—dacoity, assembly to dacoity, and, likewise, up to 5 (Fig. 2).

Fig. 2 Violence crime visualizations of 2021 of different states of india

Identifying the Impact of Crime in Indian Jail Prison Strength … 19

3.3 Correlation Coefficient Between Two Random Variables

The dataset with crime rates and the prison population is gazing at us. Fortunately,
a correlation matrix can assist us in immediately comprehending the relationship
between each variable. One basic premise of multiple linear regression is that no
independent variable in the model is substantially associated with another variable.
Numerous numerical techniques are available to help you understand how effec-
tively a regression equation relates to the data, in addition to charting. The sample
coefficient of determination, R 2 , of a linear regression fit (with any number of predic-
tors) is a valuable statistic to examine. Assuming a homoscedastic model (wi = 1),
R 2 is the ratio between SSReg and Syy , the sum of squares of deviations from the
mean (Syy ,) accounted for by regression [1].
The primary objective behind relapse is to demonstrate and examine the relation-
ship between the dependent and independent variables. The errors are proportionally
independent and normally distributed with a mean of 0 and variance σ. By decreasing
the error or residual sums of squares, the βs are estimated:
n k
S(β0 , β1,.......βm ) = Yi − (β0 + β j Xi j ) (1)
i=1 j=1

To locate the base of (2) regarding β, the subsidiary of the capacity in (2), as for
each of the βs, is set to zero and tackled. This gives the accompanying condition:
⎛ ⎛ ⎞⎞
δs
n
k
= −2 ⎝Yi − ⎝β̂0 + β̂ j X i j ⎠⎠ = 0, j = 0, 1, 2 . . . k
δβ|β̂0 , β̂1 . . . β̂m i=1 j=1
(2)

And
⎛ ⎛ ⎞⎞
δs
n
k
= −2 ⎝Yi − ⎝β̂0 + β̂ j X i j ⎠⎠ = 0, j = 0, 1, 2 . . . k
δβ|β̂0 , β̂1 . . . β̂m i=1 j=1
(3)

The β̂s, the answers for (3) and (4), are the least squares appraisals of the βs.
It is helpful to communicate both the n conditions in (1) and the k + 1 condition
in (3) and (4) (which depend on straight capacity of the βs) in a lattice structure.
Model (1) can be communicated as

y = Xβ + (4)

where y is the nx1 vector of perception, X is a nx(k + 1) network of autonomous

factors (and an additional section of 1 s for the intercept β̂ 0 , β is a (k + 1)Xi vector
20 S. S. kshatri and D. Singh

Fig. 3 Correlation matrix between crime data and prison strength

of coefficients, and ε is a Xi vector of free and indistinguishably circulated mistakes

related to (1).
At the point when two free variables are exceptionally corresponded, this ends
in issue regression. Looking at a grid and outwardly checking whether any of the
elements are profoundly connected is one of the most accessible techniques to
collinearity outwards containing to recognize a potential multicollinearity issue.
The estimations of the correlation coefficient can extend from −1 to + 1. The
closer it is to + 1 or −1, the more intently the two factors are connected. The positive
sign means the heading of the relationship; for example, on the off chance that one
of the factors expands, the other variable is additionally expected to increment as
shown in Fig. 3, correlation matrixes between crime information and jail population.
Every cell in the table offers a connection between two explicit factors. For instance,
the featured cell beneath indicates that the relationship between “Assault cases in
FIR” and “Assault cases in prison” is 0.81, alluded to as multicollinearity. It can
make it challenging to decipher the consequences of the that they’re unequivocally
emphatically connected. Each piece of information is coordinating, however, 0.57,
which demonstrates that they’re feebly adversely related.
A correlation value of precisely r = 1.000 indicates that the two variables have
a completely positive association. When one variable rises, the other rises with it.
A correlation value of precisely r = −1.000 indicates that the two variables have a
completely negative connection. The value of one variable drops as the value of the
other rises.

4 Result and Discussion

The crime expands, and the measure of detainees doesn’t diminish; this shows a
negative relationship and would, by expansion, have a negative connection coeffi-
cient. A positive correlation coefficient would be the relationship between crime and
prisoners’ strength; as crime increases, so does the prison crowd. As we can see in
the correlation matrix, there is no relation between crime and prison strength, so we
Identifying the Impact of Crime in Indian Jail Prison Strength … 21

say there is a weak correlation coefficient in Prison strength. There is no impact on

the prison strength of the crime rate, to solve this problem.
The estimations of the correlation coefficient can extend from −1 to + 1. The
closer it is to + 1 or −1, the more intently the two factors are connected. The positive
sign means the heading of the relationship; for example, on the off chance that one
of the factors expands, the other variable is additionally expected to increment as
shown in Fig. 3.10, correlation matrixes between crime information and jail quality.
Every cell in the table offers a connection between two explicit factors. For instance,
the featured cell beneath indicates that the relationship between “Assault cases in
FIR” and “Assault cases in prison” is 0.81, alluded to as multicollinearity. It can
make it challenging to decipher the consequences of the that they’re unequivocally
emphatically connected. However, each piece of information is coordinating 0.57,
which demonstrates that they’re feebly adversely related.
Furthermore, the matrix reveals that the relationship between “dacoity crime” and
“dacoity prison” is −0.32, indicating that they are unrelated. Similarly, the relation-
ship coefficients along the incline of the table are mainly equal to 1 since each element
is entirely correlated with itself. These cells are not helpful for comprehension.

5 Conclusion

The research moves on to depictions of the survey structure, information estimates,

and scaling. Following that, the discussion focuses on information-gathering tactics,
crime prediction, and resolving prison overcrowding. Standard statistical metrics
and measures are designed specifically for use in criminology research. However,
there is no consensus on the best way to measure and compare predictive model
outcomes. It is difficult to find the correlation between crime and prison strength
crime. We observed that could be improved research in many ways. To enhance
crime prediction, we can say that few crimes impact prison strength. This study
shows no association between FIR and violent crime in prisoners’ strength when
established correlations of all violent crimes such as murder, robbery, dacoity, and
kidnapping are taken into account.
In future, we can create a model for every correlated crime, not only violent but
even with other crimes that directly impact jail.

References

1. Asuero AG, Sayago A, González AG (2006) The correlation coefficient: an overview. Crit Rev
Anal Chem 36(1):41–59. https://doi.org/10.1080/10408340500526766
2. Wang Z, Lu J, Beccarelli P, Yang C (2021) Neighbourhood permeability and burglary: a case
study of a city in China. Intell Build Int 1–18. https://doi.org/10.1080/17508975.2021.1904202
3. Mukaka MM (2012) Malawi Med J 24, no. September:69–71. https://www.ajol.info/index.php/
mmj/article/view/81576
22 S. S. kshatri and D. Singh

4. Andresen MA (2007) Location quotients, ambient populations, and the spatial analysis of crime
in Vancouver, Canada. Environ Plan A Econ Sp 39(10):2423–2444. https://doi.org/10.1068/
a38187
5. Clipper S, Selby C (2021) Crime prediction/forecasting. In: The encyclopedia of research
methods in criminology and criminal justice, John Wiley & Sons, Ltd, 458–462
6. Zhu H, You X, Liu S (2019) Multiple ant colony optimization based on pearson correlation
coefficient. IEEE Access 7:61628–61638. https://doi.org/10.1109/ACCESS.2019.2915673
7. Hu K, Li L, Liu J, Sun D (2021) DuroNet: a dual-robust enhanced spatial-temporal learning
network for urban crime prediction. ACM Trans Internet Technol 21, 1. https://doi.org/10.
1145/3432249
8. Kshatri SS, Singh D, Narain B, Bhatia S, Quasim MT, Sinha GR (2021) An empirical analysis
of machine learning algorithms for crime prediction using stacked generalization: an ensemble
approach. IEEE Access 9:67488–67500. https://doi.org/10.1109/ACCESS.2021.3075140
9. Albahli S, Alsaqabi A, Aldhubayi F, Rauf HT, Arif M, Mohammed MA (2020) Predicting the
type of crime: intelligence gathering and crime analysis. Comput Mater Contin 66(3):2317–
2341. https://doi.org/10.32604/cmc.2021.014113
10. Spelman W (1995) The severity of intermediate sanctions. J Res Crime Delinq 32(2):107–135.
https://doi.org/10.1177/0022427895032002001
11. Caplan JM, Kennedy LW, Miller J (2011) Risk terrain modeling: brokering criminological
theory and GIS methods for crime forecasting. Justice Q 28(2):360–381. https://doi.org/10.
1080/07418825.2010.486037
12. Johnson SD, Birks DJ, McLaughlin L, Bowers KJ, Pease K (2008) Prospective crime mapping
in operational context: final report. London, UK Home Off. online Rep., vol. 19, no. September,
pp. 07–08. http://www-staff.lboro.ac.uk/~ssgf/kp/2007_Prospective_Mapping.pdf
13. Wicks M (2016) Forecasting the future of fish. Oceanus 51(2):94–97
14. McNeal GS (2014) Drones and aerial surveillance: considerations for legislators, p 34. https://
papers.ssrn.com/abstract=2523041.
15. Fatih T, Bekir C (2015) Police Use of Technology To Fight, Police Use Technol. To Fight
Against Crime 11(10):286–296
16. Katz CM et al (2014) Evaluating the impact of officer worn body cameras in the Phoenix Police
Department. Centre for Violence Prevention and Community Safety, Arizona State University,
December, pp 1–43
17. Krakowski MI, Czobor P (2013) Depression and impulsivity as pathways to violence: implica-
tions for antiaggressive treatment. Schizophr Bull 40(4):886–894. https://doi.org/10.1093/sch
bul/sbt117
18. Kshatri SS, Narain B (2020) Analytical study of some selected classification algorithms and
crime prediction. Int J Eng Adv Technol 9(6):241–247. https://doi.org/10.35940/ijeat.f1370.
089620
19. Osisanwo FY, Akinsola JE, Awodele O, Hinmikaiye JO, Olakanmi O, Akinjobi J (2017) Super-
vised machine learning algorithms: classification and comparison. Int J Comput Trends Technol
48(3):128–138. https://doi.org/10.14445/22312803/ijctt-v48p126
Visual Question Answering Using
Convolutional and Recurrent Neural
Networks

Ankush Azade, Renuka Saini, and Dinesh Naik

1 Introduction

“Visual Question Answering” is a topic that inculcates the input as an image and a set
of questions corresponding to a particular image which when fed to neural networks
and machine learning models generate an answer or multiple answers. The purpose
of building such systems is to assist the advanced tasks of computer vision like object
detection and automatic answering by machine learning models when receiving the
data in the form of images or in even advanced versions, receiving as video data. This
task is very essential when we consider research objectives in artificial intelligence. In
recent developments of AI [1], the importance of image data and integration of tasks
involving textual and image forms of input is huge. Visual question-answering task
will sometimes be used to answer open-ended questions, otherwise multiple choice,
or close-ended answers. In our methodology, we have considered the formulation
of open-ended answers instead of close-ended ones because in the real world, we
see that most of the human interactions involve non-binary answers to questions.
Open-ended questions are a part of a much bigger pool of the set of answers, when
compared to close-ended, binary, or even multiple choice answers.
Some of the major challenges that VQA tasks face is computational costs, exe-
cution time, and the integration of neural networks for textual and image data. It is
practically unachievable and inefficient to implement a neural network that takes into
account both text features and image features and learns the weights of the network to

A. Azade (B) · R. Saini · D. Naik

National Institute of Technology Karnataka, Surathkal 575025, India
e-mail: [email protected]
R. Saini
e-mail: [email protected]
D. Naik
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 23
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_3
24 A. Azade et al.

make decisions and predictions. For the purposes of our research, we have considered
the state-of-the-art dataset which is publically available. The question set that could
be formed using that dataset is very wide. For instance one of the questions for an
image containing multiple 3-D shapes of different colors can be “How many objects
of cylinder shape are present?” [1]. As we can see this question pertains to a very
deep observation, similar to human observation. After observing, experimenting,
and examining the dataset questions we could see that each answer requires multi-
ple queries to converge to an answer. Performing this task requires knowledge and
application of natural language processing techniques in order to analyze the textual
question and form answers. In this paper, we discuss the model constructed using
Convolutional Neural Network layers for processing image features and Recurrent
Neural Network based model for analyzing text features.

2 Literature Survey

A general idea was to take features from a global feature vector by convolution
network and to basically extract and encode the questions using a “lstm” or long
short term memory networks. These are then combined to make out a consolidated
result. This gives us great answers but it fails to give accurate results when the answers
or questions are dependent on the specific focused regions of images.
We also came across the use of stacked attention networks for VQA by Yang [3]
which used extraction of the semantics of a question to look for the parts and areas
of the picture that related to the answer. These networks are advanced versions of
the “attention mechanism” that were applied in other problem domains like image
caption generation and machine translation, etc. The paper by Yang [3] proposed a
multiple-layer stacked attention network.
This majorly constituted of the following components: (1) A model dedicated to
image, (2) a separate model dedicated to the question, which can be implemented
using a convolution network or a Long Short Term Memory (LSTM) [8] to make
out the semantic vector for questions, and (3) the stacked attention model to see
and recognize the focus and important part and areas of the image. But despite its
promising results, this approach had its own limitations.
Research by Yi et al. [4] in 2018 proposed a new model, this model had multiple
parts or components to deal with images and questions/answers. They made use of a
“scene parser”, a “question parser” and something to execute. In the first component,
Mask R-CNN was used to create segmented portions of the image. In the second
component meant for the question, they used a “seq2seq” model. The component
used for program execution was made using modules of python that would deal with
the logical aspects of the questions in the dataset.
Focal visual-text attention for visual question-answering Liang et al. [5] This
model (Focal Visual Text Attention) combines the sequence of image features gen-
erated by the network, text features of the image, and the question. Focal Visual Text
Attention used a hierarchical approach to dynamically choose the modalities and
Visual Question Answering Using Convolutional and Recurrent Neural Networks 25

snippets in the sequential data to focus on in order to answer the question, and so can
not only forecast the proper answers but also identify the correct supporting argu-
ments to enable people to validate the system’s results. Implemented on a smaller
dataset and not tested against more standard datasets.
Visual Reasoning Dialogs with Structural and Partial Observations Zhu et al. [7]
Nodes in this Graph Neural Network model represent dialog entities (title, question
and response pairs, and the unobserved questioned answer) (embeddings). The edges
reflect semantic relationships between nodes. They created an EM-style inference
technique to estimate latent linkages between nodes and missing values for unob-
served nodes. (The M-step calculates the edge weights, whereas the E-step uses
neural message passing (embeddings) to update all hidden node states.)

3 Dataset Description

The CLEVR10(“A Diagnostic Dataset for Compositional Language and Elementary

Visual Reasoning”) [2] dataset was used, which includes a 70,000-image training
set with 699,989 questions, a 15,000-image validation set with 149,991 questions,
a 15,000-image test set with 14,988 questions, and responses to all train and val
questions. Refer Dataset-1 statistics from Table 1 and a sample image from Fig. 1.
For Experminet-2 we have used a dataset titled easy-VQA which is publically
available. This dataset is a simpler version of the CLEVR dataset, it mainly contains
2-Dimensional images of different shapes with different colors and positions. Dataset
Statistics can be referred from Table 2 and a sample image from the easy-VQA dataset
from Fig. 2.

4 Proposed Method

After reading about multiple techniques and models used to approach VQA task,
we have used CNN+LSTM as the base approach for the model and worked our way
up. CNN-LSTM model, where Image features and language features are computed
separately and combined together and a multi-layer perceptron is trained on the
combined features. The questions are encoded using a two-layer LSTM, while the
visuals are encoded using the last hidden layer of CNN. After that, the picture features
are l2 normalized. Then the question and image features are converted to a common

Table 1 Dataset-1 statistics

Train Validation Test
Image 70,000 15,000 15,000
Question 699,989 149,991 14,988
26 A. Azade et al.

Fig. 1 Sample image from Dataset-1

Table 2 Dataset-2 statistics

Train Test
Image 4,000 1,000
Question 38,575 9,673
Binary questions 28,407 7,136

Fig. 2 Sample image from

Dataset-2

space and we have taken an element-wise multiplication to obtain an answer. As a

part of another approach we have used CNN-based model architecture for image
feature extraction and for text features extraction bag of words technique has been
Visual Question Answering Using Convolutional and Recurrent Neural Networks 27

Fig. 3 Proposed model

used to form a fixed length vector and simple feed forward network to extract the
features. Refer Fig. 3 for the proposed model.

4.1 Experiment 1

4.1.1 CNN

A CNN takes into account the parts and aspects of an input fed to the network
as an image. The importance termed as weights and biases in neural networks is
assigned based on the relevance of the aspects of the image and also points out
what distinguishes them. A ConvNet requires far less pre-processing than other
classification algorithms. CNN model is shown in Fig. 4. We have used mobilenetv2
in our CNN model. MobileNetV2 is a convolutional neural network design that as
the name suggests is portable and in other words “mobile-friendly”. It is built on
an inverted residual structure, with residual connections between bottleneck levels.
MobileNetV2 [9] is a powerful feature extractor for detecting and segmenting objects.
The CNN model consists of the image input layer, mobilenetv2 layer, and global
average pooling layer.

Fig. 4 Convolutional neural network

28 A. Azade et al.

4.1.2 MobileNetV2

In MobileNetV2, there are two types of blocks. A one-stride residual block is one of
them. A two-stride block is another option for downsizing. Both sorts of blocks have
three levels. 1 × 1 convolution using ReLU6 is the initial layer, followed by depthwise
convolution. The third layer employs a 1 × 1 convolution with no non-linearity.

4.1.3 LSTM

In sequence prediction problems, LSTM networks are a type of recurrent neural

network that can learn order dependency. Given time lags of varying lengths, LSTM
is ideally suited to identifying, analyzing, and forecasting time series. The model is
trained via back-propagation. Refer to Fig. 5.
LSTM model consists of the text input layer, one embedding layer and three
bidirectional layers consisting of LSTM layers.
After implementation of CNN and the LSTM model, we take their outputs and
concatenate them.
Out = Multi ply([x1, x2]) (1)

where
x1 = Output from CNN,
x2 = Output from LSTM,
Out = Concatenation of x1 and x2.

After this, we will create a dense layer consisting of a softmax activation function
with the help of TensorFlow. Then we will give CNN output, LSTM output, and
the concatenated dense layer to the model. Refer Fig. 6 for overall architecture. The
adam optimizer and sparse categorical cross-entropy loss were used to create this
model. For merging the two components, we have used element-wise multiplication
and fed it to the network to predict answers.

Fig. 5 Recurrent neural network

Visual Question Answering Using Convolutional and Recurrent Neural Networks 29

Fig. 6 Visual question-answering

4.2 Experiment 2

As a first step, we have preprocessed both the image data and the text data, i.e.,
the questions given as input. For this experiment, we have used a CNN model for
extracting features from the image dataset. In Fig. 8, we have represented the model
architecture used in the form of block representation. The input image of 64 * 64 is
given as the input shape and fed to further layers. Then through a convolution layer
with eight 3 × 3 filters using “same” padding, the output of this layer results in 64
× 64 × 8 dimensions. Then we used a maxpooling layer to reduce it to 32 × 32 ×
8, further the next convolution layer uses 16 filters and generates in 32 × 32 × 16.
Again with the use of maxpooling layer, it cuts the dimension down to 16 × 16 ×
16. And finally, we flatten it to obtain the output of the 64 × 64 image in form of
4096 nodes. Refer Fig. 7.
In this experiment instead of using a complex RNN architecture to extract the
features from the text part that is the questions. We have used the bag of words
technique to form a fixed length vector and simple feedforward network to extract
the features refer to Fig. 8. The figure below represents the process. Here, we have
passed the bag of words to two fully connected layers and applied “tanh” activation
function to obtain the output. Both these components have been merged using the
element-wise multiplication as discussed in the previous section as well.

Fig. 7 CNN—Experiment 2
30 A. Azade et al.

Fig. 8 Text feature extraction—Experiment 2

5 Results and Analysis

Following are the results for the Experiment 1 and Experiment 2.

5.1 Experiment 1

From Figs. 9 and 10 we can see that in the given image there are few solid and rubber
shapes having different colors. For this respective image, we have a question “What
number of small rubber balls are there”. For this question we have an actual answer
as 1. and our model also predicts the value as 1 which is correct.

5.2 Experiment 2

In the second experiment, we have considered a simpler form of the CLEVR dataset.
And as explained in the methodology uses different models and variations of the
approach. In Fig. 11 we can see that we have given an image and for that image we
have a question “Does this image not contain a circle?” and our model predicted the
correct answer as “No”.

Fig. 9 Results of Experiment 1a

Visual Question Answering Using Convolutional and Recurrent Neural Networks 31

Fig. 10 Results of Experiment 1b

Fig. 11 Results of Experiment 2

Table 3 Train and test accuracy

Epoch Train accuracy Test accuracy
1 67.79% 72.69%
2 74.68% 76.89%
3 76.55 77.20%
4 77.77 77.87%
5 79.10 79.09%
6 82.17 81.82%
7 85.28 83.32%
8 87.02 83.60
9 88.40% 84.23%
10 90.01% 85.5%

Observing the gradual increase in accuracy with each epoch with positive changes
shows us that there is learning happening in our model at each step. Since calculating
the accuracy for a VQA task is not objective because of open-ended nature of the
questions. We have achieved a training accuracy of 90.01% and test accuracy of
85.5% Table 3, this is a decent result when compared to the existing methodologies
[1]. These results were observed on easy-VQA dataset.
32 A. Azade et al.

6 Conclusion

Visual question-answering result analysis is a subjective task. We used two-

component approaches which after performing separate extractions, merged their
findings to obtain a consolidated result and predict the open-ended answers. It can
be concluded that the approach performed well and that the use of CNN network
is very essential for image feature extraction. And also the use of natural language
processing techniques is essential for question feature extraction. Compared to base-
line models the strategy is similar with tweaks discussed in the methodology section
proved to be working well for a visual question-answering system.

References

1. Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Zitnick CL, Parikh D (2015) Vqa: Visual
question answering. In: Proceedings of the IEEE international conference on computer vision,
pp 2425–2433
2. Dataset: https://visualqa.org/download.html
3. Yang Z, He X, Gao J, Deng L, Smola A (2016) Stacked attention networks for image question
answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition,
pp 21–29
4. Yi K, Wu J, Gan C, Torralba A, Kohli P, Tenenbaum J (2018) Neural-symbolic vqa: disentan-
gling reasoning from vision and language understanding. Adv Neural Inf Process Syst 31
5. Liang J, Jiang L, Cao L, Li LJ, Hauptmann AG (2018) Focal visual-text attention for visual
question answering. In: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp 6135–6143
6. Wu C, Liu J, Wang X, Li R (2019) Differential networks for visual question answering. Proc
AAAI Conf Artif Intell 33(01), 8997–9004. https://doi.org/10.1609/aaai.v33i01.33018997
7. Zheng Z, Wang W, Qi S, Zhu SC (2019) Reasoning visual dialogs with structural and partial
observations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, pp 6669–6678
8. https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-
introduction-to-lstm/
9. https://towardsdatascience.com/review-mobilenetv2-light-weight-model-image-
classification-8febb490e61c
10. Liu Y, Zhang X, Huang F, Tang X, Li Z (2019) Visual question answering via attention-based
syntactic structure tree-LSTM. Appl Soft Comput 82, 105584. https://doi.org/10.1016/j.asoc.
2019.105584, https://www.sciencedirect.com/science/article/pii/S1568494619303643
11. Nisar R, Bhuva D, Chawan P (2019) Visual question answering using combination of LSTM
and CNN: a survey, pp 2395–0056
12. Kan C, Wang J, Chen L-C, Gao H, Xu W, Nevatia R (2015) ABC-CNN, an attention based
convolutional neural network for visual question answering
13. Sharma N, Jain V, Mishra A (2018) An analysis of convolutional neural networks for image
classification. Procedia Comput Sci 132, 377–384. ISSN 1877-0509. https://doi.org/10.1016/
j.procs.2018.05.198, https://www.sciencedirect.com/science/article/pii/S1877050918309335
14. Staudemeyer RC, Morris ER (2019) Understanding LSTM–a tutorial into long short-term
memory recurrent neural networks. arXiv:1909.09586
Visual Question Answering Using Convolutional and Recurrent Neural Networks 33

15. Zabirul Islam M, Milon Islam M, Asraf A (2020) A combined deep CNN-LSTM network for
the detection of novel coronavirus (COVID-19) using X-ray images. Inform Med Unlocked
20, 100412. ISSN 2352-9148. https://doi.org/10.1016/j.imu.2020.100412
16. Boulila W, Ghandorh H, Ahmed Khan M, Ahmed F, Ahmad J (2021) A novel CNN-LSTM-
based approach to predict urban expansion. Ecol Inform 64. https://doi.org/10.1016/j.ecoinf.
2021.101325, https://www.sciencedirect.com/science/article/pii/S1574954121001163
Brain Tumor Segmentation Using Deep
Neural Networks: A Comparative Study

Pankaj Kumar Gautam, Rishabh Goyal, Udit Upadhyay, and Dinesh Naik

1 Introduction

In a survey conducted in 2020, in USA about 3,460 children were diagnosed with
the brain tumor having age under 15 years, and around 24,530 adults [1]. Tumors
like Gliomas are most common, they are less threatening (lower grade) in a case
where the expectancy of life is of several years or more threatening (higher grade)
where it is almost two years. One of the most common medications for tumors is
brain surgery. Radiation and Chemotherapy have also been used to regulate tumor
growth that cannot be separated through surgery. Detailed images of the brain can
be obtained using Magnetic resonance imaging (MRI). Brain tumor segmentation
from MRI can significantly impact improved diagnostics, growth rate prediction, and
treatment planning.
There are some categories of tumors like gliomas, glioblastomas, and menin-
giomas. Tumors such as meningiomas can be segmented easily, whereas the other
two are much harder to locate and segment [2]. The scattered, poorly contrasted, and
extended arrangements make it challenging to segment these tumors. One more dif-
ficulty in segmentation is that they can be present in any part of the brain with nearly

P. Kumar Gautam (B) · R. Goyal · U. Upadhyay · D. Naik

Department of Information Technology, National Institute of Technology Karnataka, Surathkal,
Karnataka, India
e-mail: [email protected]
R. Goyal
e-mail: [email protected]
U. Upadhyay
e-mail: [email protected]
D. Naik
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 35
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_4
36 P. Kumar Gautam et al.

any size-shape. Depending on the type of MRI machine used, the identical tumor
cell may vary based on gray-scale values when diagnosed at different hospitals.
There are three types of tissues that form a healthy brain: white matter, gray matter,
and cerebro-spinal fluid [3]. The tumor image segmentation helps in determining the
size, position, and spread [4]. Since glioblastomas are permeated, the edges are
usually blurred and tough to differentiate from normal brain tissues. T1-contrasted
(T1-C), T1, T2 (spin-lattice and spin-spin relaxation, respectively) pulse sequences
are frequently utilized as a solution [5]. Every sort of brain tissue receives a nearly
different perception due to the differences between the modalities.
Segmenting brain tumors using the 2-pathway CNN design has already been
proven to assist achieve reasonable accuracy and resilience [6, 7]. The research veri-
fied their methodology on MRI scan datasets of BRATS 2013 and 2015 [7]. Previous
investigations also used encoder-decoder-based CNN design that uses autoencoder
architectures. The research attached a different path to the end of the encoder section
to recreate the actual scan image [8]. The purpose of adopting the autoencoder path
was to offer further guidance and regularization to the encoder section because the
size of the dataset was restricted [9]. In the past, the Vgg and Resnet designs were
used to transfer learning for medical applications such as electroencephalograms
(EEG). “EEG is a method of measuring brainwaves that have been often employed
in brain-computer interface (BCI) applications” [10].
In this research, segmentation of tumors in the brain using two different CNN
architectures is done. Modern advances in Convolutional Neural Network designs
and learning methodologies, including Max-out hidden nodes and Dropout regular-
ization, were utilized in this experiment. The BRATS-13 [11] dataset downloaded
from the SMIR repository is available for educational use. This dataset was used
to compare our results with the results of previous work [6]. In pre-processing, the
one percent highest and lowest intensity levels were removed to normalize data.
Later, the work used CNN to create a novel 2-pathway model that memorizes local
brain features and then uses a two-stage training technique which was observed to be
critical in dealing with the distribution of im-balanced labels for the target variable
[6]. Traditional structured output techniques were replaced with a unique cascaded
design, which was both effective and theoretically superior. We proposed a U-net
machine learning model for further implementation, which has given extraordinary
results in image segmentation [12].
The research is arranged as follows. Section 2 contains the methodology for the
research, which presents two different approaches for the segmentation of brain
tumor, i.e., Cascade CNN and U-net. Section 3 presents empirical studies that
include a description of data, experimental setup, and performance evaluation met-
rics. Section 4 presents the visualization and result analysis, while Sect. 5 contains
the conclusion of the research.
Brain Tumor Segmentation Using Deep Neural Networks: A Comparative Study 37

Fig. 1 Proposed methodology flow diagram

2 Methodology

This section presents the adopted methodology based on finding the tumor from the
MRI scan of the patients by using two different architectures based on Convolutional
Neural Networks (CNN). (A) Cascaded CNN [6], and (B) U-net [12]. First, we mod-
eled the CNN architecture based on the cascading approach and then calculated the
F1 score for all three types of cascading architecture. Then secondly, we modeled the
U-net architecture and calculated the dice score and the dice loss for our segmented
tumor output. Finally, we compared both these models based on dice scores. Figure 1
represents the adopted methodology in our research work. The research is divided
into two parallel which represents the two approaches described above. The results
of these two were then compared based on the F1 score and Dice loss.
38 P. Kumar Gautam et al.

2.1 2-Path Convolutional Neural Network

The architecture includes two paths: a pathway with 13 * 13 large receptive and 7
* 7 small receptive fields [6]. These paths were referred to as the global and local
pathway, respectively as shown in Fig. 2. This architectural method is used to predict
the pixel’s label to be determined by 2 characteristics: (i) visible features of the area
nearby the pixel, (ii) location of the patch. The structure of the two pathways is as
follows:
1. Local: The 1st layer is of size (7, 7) and max-pooling of (4, 4), and the 2nd one is
of size (3, 3). Because of the limited neighborhood and visual characteristics of
the area around the pixel, the local path processes finer details because the kernel
is smaller.
2. Global: The layer is of size (13, 13), Max-out is applied, and there is no max-
pooling in the global path, giving (21, 21) filters.
Two layers for the local pathway were used to concatenate the primary-hidden
layers of both pathways, with 3 * 3 kernels for the 2nd layer. This signifies that the
effective receptive field of features in the primary layer of each pathway is the same.
Also, the global pathway’s parametrization models feature in that same region more
flexibly. The union of the feature maps of these pathways is later supplied to the final
output layer. The “Softmax” activation is applied to the output activation layer.

2.2 Cascaded Architecture

The 2-Path CNN architecture was expanded using a cascade of CNN blocks. The
model utilizes the first CNN’s output as added inputs to the hidden layers of the
secondary CNN block.
This research implements three different cascading designs that add initial con-
volutional neural network results to distinct levels of the 2nd convolutional neural
network block as described below [6]:

Fig. 2 2-Path CNN architecture [6]

Brain Tumor Segmentation Using Deep Neural Networks: A Comparative Study 39

Fig. 3 Architecture for input cascade CNN

Fig. 4 Architecture for local cascade CNN

Fig. 5 Architecture for MF cascade CNN

1. Input cascade CNN: The first CNN’s output is directly applied to the second CNN
(Fig. 3). They are thus treated as additional MRI images scan channels of the input
patch.
2. Local cascade CNN: In the second CNN, the work ascends up a layer in the local
route and add to its primary-hidden layer (Fig. 4).
3. Mean-Field cascade CNN: The work now goes to the end of the second CNN
and concatenates just before the output layer (Fig. 5). This method is similar
to computations performed in Conditional random fields using a single run of
mean-field inference.
40 P. Kumar Gautam et al.

Fig. 6 U-net architecture [12]

2.3 U-Net

The traditional convolutional neural network architecture helps us predict the tumor
class but cannot locate the tumor in an MRI scan precisely and effectively. Applying
segmentation, we can recognize where objects of distinct classes are present in our
image. U-net [13] is a Convolutional Neural Network (CNN) modeled in the shape
of “U” that is expanded with some changes in the traditional CNN architecture. It
was designed to semantically segment the bio-medical images where the target is to
classify whether there is contagion or not, thus identifying the region of infection or
tumor.
CNN helps to learn the feature mapping of an image, and it works well for clas-
sification problems where an input image is converted into a vector used for classi-
fication. However, in image segmentation, it is required to reproduce an image from
this vector. While transforming an image into a vector, we already learned the fea-
ture mapping of the image, so we use the same feature maps used while contracting
to expand a vector to a segmented image. The U-net model consists of 3 sections:
Encoder, Bottleneck, and Decoder block as shown in Fig. 6. The encoder is made of
many contraction layers. Each layer takes an input and employs two 3 * 3 convolu-
tions accompanied by a 2 * 2 max-pooling. The bottom-most layer interferes with
the encoder and the decoder blocks. Similarly, each layer passes the input to two con-
volutional layers of size 3 * 3 for the encoder, accompanied by a 2 * 2 up-sampling
layer, which follows the same as encoder blocks.
To maintain symmetry, the amount of feature maps gets halved. The number of
expansion and contraction blocks is the same. After that, the final mapping passes
through another 3 * 3 convolutional layer with an equal number of features mapped
as that of the number of segments.
Brain Tumor Segmentation Using Deep Neural Networks: A Comparative Study 41

In image segmentation, we focus on the shape and boundary of the segmented

image rather than its colors, texture, and illumination. The loss function can measure
how clean and meaningful boundaries were segmented from the original image.
The loss is computed as the mean of per-pixel loss in cross-entropy loss, and the
per-pixel loss is calculated discretely without knowing whether or not its nearby
pixels are borders. As a result, cross-entropy loss only takes into account loss in a
micro-region rather than the entire image, which is insufficient for medical image
segmentation. As a result, the research uses the Dice loss function to train our U-net
model.

3 Empirical Studies

This section presents the dataset description, experimental setup, data pre-processing,
and metrics for performance evaluation.

3.1 Dataset

BRATS-13 MRI dataset [11] was used for the research. It consists of actual patient
scans and synthetic scans created by SMIR (SICAS medical image repository). The
size is around 1Gb and was stored in “Google drive” for further use. Dataset consists
of synthetic and natural images. Each category contains MRI scans for high-graded
gliomas (HG) and low-graded gliomas (LG). There are 25 patients with synthetic
HG and LG scans and 20 patients with actual HG, and ten patients with actual LG
scans. Dataset consists of four modalities (different types of scans) like T1, T1-C,
T2, and FLAIR. For each patient and each modality, we get a 3-D image of the brain.
We’re concatenating these modalities as four channels slice-wise. Figure 7 shows
tumors along with their MRI scan. We have used 126th slice for representation. For
HG, the dimensions are (176, 216, and 160). Image in gray-scale represents the MRI
scan, and that in blue-colored represents the tumor for their respective MRI scans.

3.2 Experiment Setup

The research was carried out using Google Colab, which provides a web interface to
run Jupyter notebooks free of cost. “Pandas” and “Numpy” libraries were used for
data pre-processing, CNN models were imported from “Keras” library for segmen-
tation, and “SkLearn” is used for measuring different performance metrics like F1
score (3), and Dice loss (4). Also, the MRI scan data was the first download under
the academic agreement and is then uploaded on Colab. For data pre-processing,
42 P. Kumar Gautam et al.

Fig. 7 MRI scan with their labeled data (tumor location)

multiple pre-processing steps have been applied to the dataset as presented in the
next section. The data was split into 70:30 for training and testing data, respectively.

3.3 Data Preprocessing

First, slices of MRI scans where the tumor information was absent were removed
from the original dataset. This will help us in minimizing the dataset without affecting
the results of the segmentation. Then the one percent highest and lowest intensity
levels were eliminated. Intensity levels for T1 and T1-C modalities were normalized
using N4ITK bias field correction [14]. Also, the image data is normalized in each
input layer by subtracting the average and then dividing it by the standard deviation
of a channel. Batch normalization was used because of the following reasons:
1. Speeds up training makes the optimization landscape much smoother, producing a
more predictive and constant performance of gradients, allowing quicker training.
2. In the case of “Batch Norm” we can use a much larger learning rate to get to the
minima, resulting in fast learning.

3.4 Performance Evaluation Metrics

We have used various performance metrics for comparing both model performance.
Precision, Recall, F1-Score, and Dice Loss were selected as our performance param-
eters. Precision (Pr) is the proportion between the True Positives and all the Positives.
Brain Tumor Segmentation Using Deep Neural Networks: A Comparative Study 43

The Recall (Re) is the measure of our model perfectly identifying True Positives.
F1-Score is a function of Recall and Precision and is the harmonic mean of both. It
helps in considering both the Recall and Precision values. Finally, accuracy is the
fraction of predictions our model got correct.

TP
Precision(Pr) = (1)
FP + T P

TP
Recall(Re) = (2)
FN + T P

Pr × Re
F1 Score = 2 × (3)
Pr + Re

where True Positive (TP) represents that the actual and the predicted labels co-
respond to the same positive class. True Negative (TN) represents that the actual
and the predicted label co-responds to the same negative class. False Positive (FP)
tells that the actual label belongs to a negative class; however, the predicted label
belongs to a positive class. It is also called the Type-I error. False Negative (FN) or
the Type-II error tells that the actual labels belong to a positive class; however, the
model predicted it into a negative class.

2× i pi × gi
Lossdice = (4)
i ( pi + gi )
2 2

Also, Dice loss (Lossdice ) (4) measures how clean and meaningful boundaries
were calculated by the loss function. Here, pi and gi represent pairs of corresponding
pixel values of predicted and ground truth, respectively. Dice loss considers the loss
of information both locally and globally, which is critical for high accuracy.

4 Visualization and Result Analysis

This section presents the results after performing both the Cascaded and U-net archi-
tecture.

4.1 Cascaded CNN

The three cascading architectures were trained on 70% of data using the “cross-
entropy loss” and “Adam Optimizer.” Testing was done on the rest 30% of the data,
and then the F1 score was computed for all the three types of cascading architecture
(as shown in Table 1). The F1 score of Local cascade CNN is the highest for the
44 P. Kumar Gautam et al.

Table 1 F1 score comparison with [6]

Model type [6] Proposed work
Input cascaded 0.88 0.848
Local cascade 0.88 0.872
MF cascade 0.86 0.828

Fig. 8 Ground truth versus

predicted segment of tumor
mask

research; also, it is very similar to the previous work done by [6]. For Input Cascade
and MF cascade, the model has a difference of around 4% compared to previous
work.
Figure 8a, b shows the results for the segmentation on two instances of test MRI
scan images. The segmented output was compared with the ground truth and was
concluded that the model was able to get an accurate and precise boundary of the
tumor from the test MRI scan image dataset.

4.2 U-Net

This deep neural network (DNN) architecture is modeled using the Dice Loss, which
takes account information loss both globally and locally and is essential for high
accuracy. Dice loss varies from [0, 1], where 0 means that the segmented output and
the ground truth do not overlap at all, and 1 represents that both the segmented result
Brain Tumor Segmentation Using Deep Neural Networks: A Comparative Study 45

Fig. 9 F1 score: cascade CNN versus U-net architecture

and the ground truth image are fully overlapped. We achieved a dice loss of 0.6863
on our testing data, which means most of our segmented output is similar in terms
of boundaries and region with ground truth images.
Figure 9 shows the results for the segmentation on three random instances of
test MRI scan images. From left to right, we have the MRI scan, the Ground truth
image, then we have segmented output from Cascade CNN, and finally, we have
the segmented output for the U-net model. The segmented output was compared
to ground truth, and the model was capable of obtaining an accurate and precise
boundary of the tumor from the test MRI scan image dataset.
From Fig. 9 it was concluded that the U-net model performs better than the
Cascaded architecture in terms of F1 score.

5 Conclusions

The research used convolutional neural networks (CNN) to perform brain tumor
segmentation. The research looked at two designs (Cascaded CNN and U-net) and
analyzed their performance. We test our findings on the BRAT 2013 dataset, which
contains authentic patient images and synthetic images created by SMIR. Significant
46 P. Kumar Gautam et al.

performance was produced using a novel 2-pathway model (which can represent the
local features and global meaning), extending it to three different cascading models
and represent local label dependencies by piling 2 convolutional neural networks.
Two-phase training was followed, which allowed us to model the CNNs when the
distribution has un-balanced labels efficiently. The model using the cascading archi-
tecture could reproduce almost similar results compared with the base paper in terms
of F1 score. Also, in our research, we concluded that the Local cascade CNN per-
forms better than the Local and MF cascade CNN. Finally, the research compared
the F1 score of cascaded architecture and U-net model, and it was concluded that
the overall performance of the semantic-based segmentation model, U-net performs
better than the cascaded architecture. The Dice loss for the U-net was 0.6863, which
describes that our model produces almost similar segmented images like that of the
ground truth images.

References

1. ASCO: Brain tumor: Statistics (2021) Accessed 10 Nov 2021 from https://www.cancer.net/
cancer-types/brain-tumor/statistics
2. Zacharaki EI, Wang S, Chawla S, Soo Yoo D, Wolf R, Melhem ER, Davatzikos C (2009)
Classification of brain tumor type and grade using mri texture and shape in a machine learning
scheme. Magn Reson Med: Off J Int Soc Magn Reson Med 62(6):1609–1618
3. 3T How To: Structural MRI Imaging—Center for Functional MRI - UC San Diego. Accessed
10 Nov 2021 from https://cfmriweb.ucsd.edu/Howto/3T/structure.html
4. Rajasekaran KA, Gounder CC (2018) Advanced brain tumour segmentation from mri images.
High-Resolut Neuroimaging: Basic Phys Princ Clin Appl 83
5. Lin X, Zhan H, Li H, Huang Y, Chen Z (2020) Nmr relaxation measurements on complex
samples based on real-time pure shift techniques. Molecules 25(3):473
6. Havaei M, Davy A, Warde-Farley D, Biard A, Courville A, Bengio Y, Pal C, Jodoin PM,
Larochelle H (2017) Brain tumor segmentation with deep neural networks. Med Image Anal
35, 18–31
7. Razzak MI, Imran M, Xu G (2018) Efficient brain tumor segmentation with multiscale two-
pathway-group conventional neural networks. IEEE J Biomed Health Inform 23(5):1911–1919
8. Myronenko, A (2018) 3d mri brain tumor segmentation using autoencoder regularization. In:
International MICCAI brainlesion workshop. Springer, Berlin, pp 311–320
9. Aboussaleh I, Riffi J, Mahraz AM, Tairi H (2021) Brain tumor segmentation based on deep
learning’s feature representation. J Imaging 7(12):269
10. Singh D, Singh S (2020) Realising transfer learning through convolutional neural network and
support vector machine for mental task classification. Electron Lett 56(25):1375–1378
11. SMIR: Brats—sicas medical image repository (2013) Accessed 10 Nov 2021 from https://
www.smir.ch/BRATS/Start2013
12. Yang T, Song J (2018) An automatic brain tumor image segmentation method based on the
u-net. In: 2018 IEEE 4th international conference on computer and communications (ICCC).
IEEE, pp 1600–1604
13. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image
segmentation. In: Medical image computing and computer-assisted intervention (MICCAI).
LNCS, vol 9351, pp 234–241. Springer, Berlin
14. Tustison NJ, Avants BB, Cook PA, Zheng Y, Egan A, Yushkevich PA, Gee JC (2010) N4itk:
improved n3 bias correction. IEEE Trans Med Imaging 29(6), 1310–1320
Predicting Bangladesh Life Expectancy
Using Multiple Depend Features
and Regression Models

Fatema Tuj Jannat , Khalid Been Md. Badruzzaman Biplob ,

and Abu Kowshir Bitto

1 Introduction

The word “life expectancy” refers to how long a person can expect to live on average
[1]. Life expectancy is a measurement of a person’s projected average lifespan. Life
expectancy is measured using a variety of factors such as the year of birth, current age,
and demographic sex. A person’s life expectancy is determined by his surroundings.
Surrounding refers to the entire social system, not just society. In this study, our target
area is the average life expectancy in Bangladesh, the nation in South Asia where
the average life expectancy is 72.59 years. Research suggests that the average life
expectancy depends on lifestyle, economic status (GDP), healthcare, diet, primary
education, and population. The death rate in the present is indeed lower than in the
past. The main reason is the environment. Lifestyle and Primary Education are among
the many environmental surroundings. Lifestyle depends on primary education. If a
person does not receive primary education, he will not be able to be health conscious
in any way. This can lead to premature death from the damage to the health of the
person. So that it affects the average life expectancy of the whole country. Indeed,
the medical system was not good before, so it is said that both the baby and the
mother would have died during childbirth. Many people have died because they did
not know what medicine to take, or how much to take because they did not have
the right knowledge and primary education. It is through this elementary education
that economic status (GDP) and population developed. The average lifespan varies

F. Tuj Jannat · K. B. Md. B. Biplob · A. K. Bitto (B)

Department of Software Engineering, Daffodil International University, Dhaka 1207, Bangladesh
e-mail: [email protected]
F. Tuj Jannat
e-mail: [email protected]
K. B. Md. B. Biplob
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 47
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_5
48 F. Tuj Jannat et al.

from generation to generation. We are all aware that our life expectancy is increasing
year after year. Since its independence in 1971, Bangladesh, a poor nation in South
Asia, has achieved significant progress in terms of health outcomes. There was the
expansion of the economic sector. There were a lot of good things in the late twentieth
century and amifications all around the globe.
In this paper, we used some features for a measure of life expectancies such as
GDP, Rural Population Growth (%), Urban Population Growth (%), Services Value,
Industry Value Food Production, Permanent Cropland (%), Cereal production (metric
tons), Agriculture, forestry, and fishing value (%). We will measure the impact of
these depending on features to predict life expectancy. Use various regression models
to find the most accurate model in search of find life expectancy of Bangladesh with
these depending on features. It will assist us in determining which feature aids in
increasing life expectancy. This research aids a country in increasing the value of its
features for life expectancy and also finding which regression model performs best
for predicting life expectancy.

2 Literature Review

Several studies on life expectancy have previously been produced by several different
researchers. As part of the literature review, we are reporting a few past studies to
understand the previously identified factors.
Beeksma et al. [2] obtained data from seven different healthcare facilities in
Nijmegen, the Netherlands, with a set of 33,509 EMRs dataset. The accuracy of
their model was 29%. While clinicians overestimated life expectancy in 63 percent
of erroneous prognoses, causing delays in receiving adequate end-of-life care, his
model which was the keyword model only overestimated life expectancy in 31%
of inaccurate prognoses. Another study by Nigri et al. [3] worked on recurrent
neural networks with a long short-term memory, which was a new technique for
projecting life expectancy, and lifespan discrepancy was measured. Their projec-
tions appeared to be consistent with the past patterns and offered a more realistic
picture of future life expectancy and disparities. The LSTM model, ARIMA model,
DG model, Lee-Carter model, CoDa model, and VAR model are examples of applied
recurrent neural networks. It is shown that both separate and simultaneous projec-
tions of life expectancy and lifespan disparity give fresh insights for a thorough
examination of the mortality forecasts, constituting a valuable technique to identify
irregular death trajectories. The development of the age-at-death distribution assumes
more compressed tails with time, indicating a decrease in longevity difference across
industrialized nations. Khan et al. [4] analyzed gender disparities in terms of disabil-
ities incidence and disability-free life expectancy (DFLE) among Bangladeshi senior
citizens. They utilized the data from a nationwide survey that included 4,189 senior
people aged 60 and above, and they employed the Sullivan technique. They collected
Predicting Bangladesh Life Expectancy Using Multiple Depend … 49

the data from the Bangladeshi household income and expenditure survey (HIES)-
2010, a large nationwide survey conducted by the BBS. The data-collecting proce-
dure was a year-long program. There was a total of 12,240 households chosen, with
7,840 from rural regions and 4,400 from urban areas. For a total of 55,580 people,
all members of chosen homes were surveyed. They discovered that at the age of 70,
both men and women can expect to spend more than half of their lives disabled and
have a significant consequence for the likelihood of disability, as well as the require-
ment for the usage of long-term care services and limitations, including, to begin
with, the study’s data is self-reported. Due to a lack of solid demographic factors, the
institutionalized population was not taken into consideration. The number of senior
individuals living in institutions is tiny, and they have the same health problems and
impairments as the elderly in the general population.
Tareque, et al. [5] explored the link between life expectancy and disability-free
life expectancy (DFLE) in the Rajshahi District of Bangladesh by investigating the
connections between the Active Aging Index (AAI) and DFLE. Data were obtained
during April 2009 from the Socio-Demographic status of the aged population and
elderly abuse study project. They discovered that urban, educated, older men are
more engaged in all parts of life and have a longer DFLE. In rural regions, 93 percent
of older respondents lived with family members, although 45.9% of nuclear families
and 54.1 percent of joint families were noted. In urban regions, however, 23.4 percent
were nuclear families and 76.6 percent were joint families, and they face restrictions
in terms of several key indicators, such as the types and duration of physical activity.
For a post-childhood-life table, Preston and Bennett’s (1983) estimate technique was
used. Because related data was not available, the institutionalized population was
not examined. Tareque et al. [6] multiple linear regression models, as well as the
Sullivan technique, were utilized. They based their findings on the World Values
Survey, which was performed between 1996 and 2002 among people aged 15 and
above. They discovered that between 1996 and 2002, people’s perceptions of their
health improved. Males predicted fewer life years spent in excellent SRH in 2002
than females, but a higher proportion of their expected lives were spent in good
SRH. The study has certain limitations, such as the sample size being small, and the
institutionalized population was not included in the HLE calculation. The subjective
character of SRH, as opposed to health assessments based on medical diagnoses,
may have resulted in gender bias in the results. In 2002, the response category “very
poor” was missing from the SRH survey. In 2002, there’s a chance that healthy
persons were overrepresented. Tareque et al. [6] investigated how many years older
individuals expect to remain in excellent health, as well as the factors that influence
self-reported health (SRH). By integrating SRH, they proposed a link between LE and
HLE. The project’s brief said that it was socioeconomic and demographic research of
Rajshahi district’s elderly population (60 years and over). They employed Sullivan’s
approach for solving the problem. For their work, SRH was utilized to estimate HLE.
They discovered that as people became older, LE and anticipated life in both poor
and good health declined. Individuals in their 60 s were anticipated to be in excellent
health for approximately 40% of their remaining lives, but those in their 80 s projected
just 21% of their remaining lives to be in good health, and their restrictions were
50 F. Tuj Jannat et al.

more severe. The sample size is small, and it comes from only one district, Rajshahi;
it is not indicative of the entire country. As a result, generalizing the findings of this
study to the entire country of Bangladesh should be approached with caution. The
institutionalized population was not factored into the HLE calculation.
Ho et al. [7] examine whether decreases in life expectancy happened across high-
income countries from 2014 to 2016 with 18 nations. They conducted a demo-
graphic study based on aggregated data and data from the WHO mortality database,
which was augmented with data from Statistics Canada and Statistics Portugal, and
their contribution to changes in life expectancy between 2014 and 2015. Arriaga’s
decomposition approach was used. They discovered that in the years 2014–15, life
expectancy fell across the board in high-income nations. Women’s life expectancy
fell in 12 of the 18 nations studied, while men’s life expectancy fell in 11 of them.
They also have certain flaws, such as the underreporting of influenza and pneu-
monia on death certificates, the issue of linked causes of death, often known as the
competing hazards dilemma, and the comparability of the cause of death coding
between nations. Meshram et al. [8] for the comparison of life expectancy between
developed and developing nations, Linear Regression, Decision Tree, and Random
Forest Regressor were applied. The Random Forest Regressor was chosen for the
construction of the life expectancy prediction model because it had R2 scores of 0.99
and 0.95 on training and testing data, respectively, as well as Mean Squared Error
and Mean Absolute Error of 4.43 and 1.58. The analysis is based on HIV or AIDS,
Adult Mortality, and Healthcare Expenditure, as these are the key aspects indicated
by the model. This suggests that India has a higher adult mortality rate than other
affluent countries due to its low healthcare spending.
Matsuo et al. [9] investigate survival predictions using clinic laboratory data in
women with recurrent cervical cancer, as well as the efficacy of a new analytic tech-
nique based on deep-learning neural networks. Alam et al. [10] using annual data
from 1972 to 2013 investigate the impact of financial development on Bangladesh’s
significant growth in life expectancy. The unit root properties of the variables are
examined using a structural break unit root test. In their literature review, they mention
some studies on the effects of trade openness and foreign direct investment on life
expectancy. Using annual data from 1972 to 2013, investigate the impact of finan-
cial development on Bangladesh’s significant growth in life expectancy. The unit
root properties of the variables are examined using a structural break unit root test.
In their literature review, they mention some studies on the effects of trade open-
ness and foreign direct investment on life expectancy. Furthermore, the empirical
findings support the occurrence of counteraction in long-run associations. Income
disparity appears to reduce life expectancy in the long run, according to the long-run
elasticities. Finally, their results provide policymakers with fresh information that is
critical to improving Bangladesh’s life expectancy. Husain et al. [11] conducted a
multivariate cross-national study of national life expectancy factors. The linear and
log-linear regression models are the first regression models. The data on explana-
tory factors comes from UNDP, World Bank, and Rudolf’s yearly statistics releases
(1981). His findings show that if adequate attention is paid to fertility reduction
Predicting Bangladesh Life Expectancy Using Multiple Depend … 51

and boosting calorie intake, life expectancies in poor nations may be considerably
enhanced.

3 Proposed Methodology

In any research project, we must complete numerous key stages, including data
collecting, data preparation, picking an appropriate model, implementing it, calcu-
lating errors, and producing output. To achieve our aim, we use the step-to-step
working technique illustrated in Fig. 1.

3.1 Data Preprocessing

Preprocessing, which includes data cleaning and standardization, noisy data filtering,
and management of missing information, is necessary for machine learning to be
done. Any data analysis will succeed if there is enough relevant data. The information
was gathered from Trends Economics. The dataset contained data from 1960 to 2020.
Combine all of the factors that are linked to Bangladesh’s Life Expectancy. We
replaced the null values using the mean values. We examined the relationship where
GDP, Rural Population Growth (%), Urban Population Growth (%), Services Value,
Industry Value Food Production, Permanent Cropland (%), Cereal production (metric
tons), Agriculture, forestry, and fishing value (%) were the independent features and
Life Expectancy (LE) being the target variable. We separated the data into two subsets
to test the model and develop the model: A total of 20% of the data was used for
testing, with the remaining 80% divided into training subsets.

Fig. 1 Working procedure diagram

52 F. Tuj Jannat et al.

3.2 Regressor Relevant Theory

Multiple Linear Regression (MLR): A statistical strategy [12] for predicting the
outcome of a variable using the values of two or more variables is known as multiple
linear regression. Multiple regression is a type of regression that is an extension of
linear regression. The dependent variable is the one we’re trying to forecast, and the
independent or explanatory elements are employed to predict its value. In the case
of multiple linear regression, the formula is as follows in “(1)”.

Y = β0 + β1 X1 + β2 X2 + . . . .. + βn Xn + ∈ (1)

K-Neighbors Regressor (KNNR): It’s a non-parametric strategy for logically

averaging data in the same neighborhood to approximate the link between inde-
pendent variables and continuous outcomes [13]. To discover the neighborhood
size that minimizes the mean squared error, the analyst must define the size of the
neighborhood.
Decision Tree Regressor (DTR): A decision tree [14] is a hierarchical archi-
tecture that resembles a flowchart and is used to make decisions. In a supervised
learning approach, the decision tree technique is categorized. It may be utilized with
both categorical and continuous output variables. The Decision Tree method has
become one of the most commonly used machine learning algorithms. The use of a
Decision Tree can help with both classification and regression difficulties.
Random Forest Regressor (RFR): A Random Forest is an ensemble method
for solving regression and classification problems that use several decision trees
with the Bootstrap and Aggregation methodology. Rather than relying on individual
decision trees to decide the outcome, the fundamental concept is to combine many
decision trees. Random Forest employs several decision trees as a foundation learning
paradigm.
Stacking Regressor (SR): The phrase “stacking” or “stacked” refers to the process
of stacking objects. Each estimator’s output is piled, and a regressor is used to calcu-
late the final forecast. By feeding the output of each estimate into a final estimator,
you may make use of each estimate’s strengths. Using a meta-learning technique, it
learns how to combine predictions from two or more fundamental machine learning
algorithms. On a classification or regression problem, stacking has the benefit of
combining the talents of several high-performing models to create predictions that
surpass any one model in the ensemble.
Gradient Boosting Regressor (GBR): Gradient Boosting Regressor is a forward
stage-wise additive model that allows any differentiable loss function to be opti-
mized. At each level, a regression tree is fitted based on the negative gradient of the
supplied loss function. It’s one of the most efficient ways to build predictive models.
It was feasible to build an ensemble model by combining the weak learners or weak
predictive models. The gradient boosting approach can help with both regression and
classification issues. The Gradient Boosting Regression technique is used to fit the
model that predicts the continuous value.
Predicting Bangladesh Life Expectancy Using Multiple Depend … 53

Extreme Gradient Boosting Regressor (XGBR): Extreme Gradient Boosting is

an open-source application that executes the gradient boosting approach efficiently
and effectively. Extreme Gradient Boosting (EGB) is a machine learning technique
that creates a prediction model from a set of weak prediction models, most frequently
decision trees, for regression, classification, and other tasks. When a decision tree is
a poor learner, the resulting technique is called gradient enhanced tree, and it often
outperforms random forests.
Light Gradient Boosting Machine Regressor (LGBMR): Light Gradient
Boosted Machine is an open-source toolkit that efficiently and effectively imple-
ments the gradient boosting approach. LightGBM enhances the gradient boosting
approach by incorporating automated feature selection and focusing on boosting
situations with larger gradients. This might result in a considerable boost in training
speed as well as improved prediction accuracy. As a result, LightGBM has been
the de facto technique for machine learning contests when working with tabular data
for regression and classification predictive modeling tasks.

3.3 Preformation Calculation

On the basis of their prediction, error, and accuracy, the estimated models are
compared and contrasted.
Mean Absolute Error (MAE): The MAE is a measure for evaluating regression
models. The MAE of a model concerning the test set is the mean of all individual
prediction errors on all occurrences in the test set. For each event, a prediction error is
a difference between the true and expected value. Following is the formula in “(2)”.

1
n
MAE = |Ai − A| (2)
n i=1

Mean Squared Error (MSE): The MSE shows us how close we are to a collection
of points. By squaring the distances between the points and the regression line, it
achieves this. Squaring is required to eliminate any undesirable signs. Inequalities
with greater magnitude are also given more weight. The fact that we are computing
the average of a series of errors gives the mean squared error its name. The better
the prediction, the smaller the MSE. The following is the formula in “(3)”.

1
n
MSE = |Actual − Prediction| (3)
n i=1

Root Mean Square Error (RMSE): The RMSE measures the distance between
data points and the regression line, and the RMSE is a measure of how to spread out
these residuals. The following is the formula in “(4)”.
54 F. Tuj Jannat et al.

n
1
RMSE = |Actual − Prediction| (4)
n i=1

4 Results and Discussions

The life expectancy of a nation is determined by several variables. Figure 2 depicted

the pairwise association between life expectancy and a variety of independent charac-
teristics such as GDP, Rural Population Growth (%), Urban Population Growth (%),
Services Value, Industry Value Food Production, Permanent Cropland (%), Cereal
production (metric tons), Agriculture, forestry, and fishing value (%).

Fig. 2 Correlation matrix between features

Predicting Bangladesh Life Expectancy Using Multiple Depend … 55

Figure 3 shows that the data reveals the value of GDP that has risen steadily over
time. As a consequence, GDP in 1960 was 4,274,893,913.49536, whereas GDP in
2020 was 353,000,000,000. It was discovered that the value of GDP had risen. The
two factors of life expectancy and GDP are inextricably linked. The bigger the GDP,
the higher the standard of living will be. As a result, the average life expectancy
may rise. Life expectancy is also influenced by service value and industry value.
The greater the service and industry values are, the better the quality of life will
be. As can be seen, service value and industry value have increased significantly
year after year, and according to the most recent update in 2020, service value has
increased significantly and now stands at 5,460,000,000,000. And the industry value
was 7,540,000,000,000, which has a positive impact on daily life. Food production
influences life expectancy and quality of life. Our level of living will improve if our
food production is good, and this will have a positive influence on life expectancy.
From 1990 to 2020, food production ranged between 26.13 and 109.07. Agriculture,
forestry, and fishing value percent are also shortly involved with life expectancy.

Fig. 3 Visualize all of the features

56 F. Tuj Jannat et al.

Fig. 4 Population growth of a Urban Area (%) and b Rural Area (%)

Figure 4a, b shows there are two types of population growth; rural and urban. In
the 1990s century urban population percent was more than rural and year by year
rural population growth decreased and urban population growth increased. The level
of living improves as more people move to the city.
Figure 2 shows that life expectancy and rural population growth have a nega-
tive relationship. We can see how these characteristics are intertwined with life
expectancy and have an influence on how we live our lives. Its worth has fluctuated
over time. Its value has fluctuated in the past, increasing at times and decreasing at
others. We drop Rural population growth and Agriculture, Forestry, and Fishing value
as it was having a negative correlation and less correlation between life expectancy.
Table 1 shows that we utilize eight different regression models to determine which
models are the most accurate. Among all the models, the Extreme Gradient Boosting
Regressor has the best accuracy and the least error. It was 99 percent accurate. The
accuracy of K-Neighbors, Random Forest, and Stacking Regressor was 94 percent.
Among them, Slightly Stacking had the highest accuracy. We utilized three models
for the stacking regressor: K-Neighbors, Gradient Boosting, and Random Forest
Regressor, and Random Forest for the meta regressor. Among all the models, the
Decision Tree has the lowest accuracy at 79 percent. With 96 percent accuracy, the
Gradient Boosting Regressor comes in second. 88 percent and 87 percent for Multiple
Linear Regression and Light Gradient Boosting Machine Regressor, respectively.
The term “life expectancy” refers to the average amount of time a person can
anticipate to live. Life expectancy is a measure of a person’s projected average
lifespan. Life expectancy is calculated using a variety of factors such as the year
of birth, current age, and demographic sex. Figure 5 shows the accuracy among all
the models. The Extreme Gradient Boosting Regressor has the best accuracy.
Predicting Bangladesh Life Expectancy Using Multiple Depend … 57

Table 1 Error and accuracy comparison between all the regressor models
Models MAE MSE RMSE ACCURACY
Multiple linear regression 1.46 8.82 2.97 88.07%
K-Neighbors regressor 0.96 4.17 2.04 94.35%
Decision tree regressor 2.63 15.30 3.91 79.32%
Random forest regressor 1.06 4.28 2.06 94.21%
Stacking regressor 1.02 3.90 1.97 94.72%
Gradient boosting regressor 0.94 2.43 1.55 96.71%
Extreme gradient boosting regressor 0.58 0.44 0.66 99.39%
Light gradient boosting machine regressor 2.62 9.57 3.09 87.06%

Fig. 5 Accuracy among all the models

5 Conclusion and Future Work

A country’s life expectancy is affected by a variety of factors. The paper showed

the pairwise relationship between life expectancy and several independent variables.
We apply some machine learning models to make the prediction. Extreme Gradients
Boosting Regressors in general forecast better than other regressors. Our findings lead
us to conclude that life expectancy may be predicted using GDP, urban population
growth (%), services value, industry value, food production, permanent cropland
(%), and cereal output (metric tons). Larger datasets may result in more accurate
predictions. In the ahead, additional data and newer machine learning methods would
be used to improve the accuracy of forecasts.
58 F. Tuj Jannat et al.

References

1. Rubi MA, Bijoy HI, Bitto AK (2021) Life expectancy prediction based on GDP and population
size of Bangladesh using multiple linear regression and ANN model. In: 2021 12th international
conference on computing communication and networking technologies (ICCCNT), pp 1–6.
https://doi.org/10.1109/ICCCNT51525.2021.9579594.
2. Beeksma M, Verberne S, van den Bosch A, Das E, Hendrickx I, Groenewoud S (2019)
Predicting life expectancy with a long short-term memory recurrent neural network using
electronic medical records. BMC Med Informat Decision Making 19(1):1–15.
3. Nigri A, Levantesi S, Marino M (2021) Life expectancy and lifespan disparity forecasting: a
long short-term memory approach. Scand Actuar J 2021(2):110–133
4. Khan HR, Asaduzzaman M (2007) Literate life expectancy in Bangladesh: a new approach of
social indicator. J Data Sci 5:131–142.
5. Tareque MI, Hoque N, Islam TM, Kawahara K, Sugawa M (2013) Relationships between the
active aging index and disability-free life expectancy: a case study in the Rajshahi district of
Bangladesh. Canadian J Aging/La Revue Canadienne du vieillissement 32(4):417–432
6. Tareque MI, Islam TM, Kawahara K, Sugawa M, Saito Y (2015) Healthy life expectancy and
the correlates of self-rated health in an ageing population in Rajshahi district of Bangladesh.
Ageing & Society 35(5):1075–1094
7. Ho JY, Hendi AS (2018) Recent trends in life expectancy across high income countries:
retrospective observational study. bmj 362
8. Meshram SS (2020) Comparative analysis of life expectancy between developed and devel-
oping countries using machine learning. In 2020 IEEE Bombay Section Signature Conference
(IBSSC), pp 6–10. IEEE
9. Matsuo K, Purushotham S, Moeini A, Li G, Machida H, Liu Y, Roman LD (2017) A pilot
study in using deep learning to predict limited life expectancy in women with recurrent cervical
cancer. Am J Obstet Gynecol 217(6):703–705
10. Alam MS, Islam MS, Shahzad SJ, Bilal S (2021) Rapid rise of life expectancy in Bangladesh:
does financial development matter? Int J Finance Econom 26(4):4918–4931
11. Husain AR (2002) Life expectancy in developing countries: a cross-section analysis. The
Bangladesh Development Studies 28, no. 1/2 (2002):161–178.
12. Choubin B, Khalighi-Sigaroodi S, Malekian A, Kişi Ö (2016) Multiple linear regression,
multi-layer perceptron network and adaptive neuro-fuzzy inference system for forecasting
precipitation based on large-scale climate signals. Hydrol Sci J 61(6):1001–1009
13. Kramer O (2013) K-nearest neighbors. In Dimensionality reduction with unsupervised nearest
neighbors, pp 13–23. Springer, Berlin, Heidelberg
14. Joshi N, Singh G, Kumar S, Jain R, Nagrath P (2020) Airline prices analysis and prediction
using decision tree regressor. In: Batra U, Roy N, Panda B (eds) Data science and analytics.
REDSET 2019. Communications in computer and information science, vol 1229. Springer,
Singapore. https://doi.org/10.1007/978-981-15-5827-6_15
A Data-Driven Approach to Forecasting
Bangladesh Next-Generation Economy

Md. Mahfuj Hasan Shohug , Abu Kowshir Bitto ,

Maksuda Akter Rubi , Md. Hasan Imam Bijoy ,
and Ashikur Rahaman

1 Introduction

Although Bangladesh ranks 92nd in terms of landmass, it now ranks 8th in terms
of people, showing that Bangladesh is the world’s most populous country. After a
9-month length and deadly battle, in 1971, under the leadership of Banga Bandhu
Sheikh Mujibur Rahman., the Father of the Nation, a war of freedom was waged.
We recognized Bangladesh as an independent sovereign country. But Bangladesh,
despite being a populous country, remains far behind the wealthy countries of the
world [1] and the developed world, particularly in economic terms. Bangladesh
is a developing country with a primarily agricultural economy. According to the
United Nations, it is a least developed country. Bangladesh’s per capita income was
$12.5992 US dollars in March 2016. It increased to 2,084 per capita in August
2020. However, according to our population, this is far too low. The economy of

Md. M. H. Shohug · A. K. Bitto · A. Rahaman

Department of Software Engineering, Daffodil International University, Dhaka-1216, Bangladesh
e-mail: [email protected]
A. K. Bitto
e-mail: [email protected]
A. Rahaman
e-mail: [email protected]
M. A. Rubi
Department of General Educational Development, Daffodil International University, Dhaka-1216,
Bangladesh
e-mail: [email protected]
Md. H. I. Bijoy (B)
Department of Computer Science and Engineering, Daffodil International University,
Dhaka-1216, Bangladesh
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 59
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_6
60 Md. M. H. Shohug et al.

Bangladesh is described as a creating market economy. Its Gross Domestic Product

(GDP) is dramatically expanding after freedom. Total GDP is a significant pointer of
financial action and is regularly utilized by chiefs to design monetary strategy. It’s a
standard metric for determining the size of a country’s level of economy. A country’s
gross domestic product (GDP) is the monetary value of a significant number of
completed economic consumption produced inside its bounds over a period of time
[2]. It addresses the total measurement of all financial actions. The exhibition of the
economy can be estimated with the help of GDP. The issues of GDP have gotten
the most worried among macroeconomic factors and statistics on GDP is displayed
as the fundamental file for evaluating the public economical turn of events and for
deciding about the working status of the macro-economy [3]. It is crucial to forecast
microeconomic variables in the economic terminology. The main macroeconomic
factors to gauge are the Gross Domestic Product (GDP), swelling, and joblessness.
As a total proportion of absolute financial creation for a country, GDP is one of the
essential markers used to gauge the nation’s economy. Since significant monetary and
political choices depend on conjectures of these macroeconomic factors, it is basic
that they are just about as solid and exact as could be expected. Erroneous figures
might bring about destabilizing strategies and a more unstable business cycle. GDP
is possibly the main pointer of public financial exercises for a nation [4].
In this manner, the remainder of the study is in order. Section 2 of the paper
is a review of the literature. The approach for forecasting GDP of Bangladesh is
discussed in Sect. 3. The analysis and results are demonstrated in Sect. 4. Section 5
of the document, certainly, brings the whole thing to a conclusion.

2 Literature Review

Many papers, articles, and research projects focus on text categorization, text recog-
nition, and categories, while some focus on particular points. Here are some of the
work reviews that have been provided.
Hassan et al. [5] used the Box-Jenkins method to develop an ARIMA method
for the Sudan GDP from 1960 to 2018 and evaluate the autoregressive and moving
normal portions’ elective ordering. The four phases of the Box-Jenkins technique are
performed to produce an OK ARIMA model. They used MLE to evaluate the model.
From the monetary year 1972 to the financial year 2010, Anam et al. [6] provide a
period series model based on Agriculture’s contribution to GDP. In this investigation,
they discovered the ARIMA (1, 2, 1) methods to be a useful method for estimating
Bangladesh’s annual GDP growth rate. From 1972 to 2013, Sultana et al. [7] used
univariate analysis to time series data on annual rice mass production in Bangladesh.
The motivation of this study was to analyze the factors that influence the behavior
of ARIMA and ANN. The backpropagation approach was used to create a simple
ANN model with an acceptable amount of hubs or neurons in a single secret layer,
variable edge worth, and swotting value [8]. The values of RMSE, MAE, and MAPE
are used. The findings revealed that the ANN’s estimated blunder is significantly
A Data-Driven Approach to Forecasting Bangladesh Next-Generation Economy 61

larger than the selected ARIMA’s estimated error. In this article, they considered the
ARIMA model and the ANN model using univariate data.
Wang et al. [9] used Shenzhen GDP for time series analysis, and the methodology
shows that the ARIMA method created using the B-J technique has more vaticination
validity. The ARIMA (3, 3, 5) method developed in this focus superior addresses the
principle of financial evolution and is employed to forecast the Shenzhen GDP over
the medium and long term. In light of Bangladesh’s GDP data from 1960 to 2017,
Miah et al. [10] developed an ARIMA method and forecasted. The used method was
ARIMA (autoregressive coordinated moving normal) (1, 2, 1). The remaining diag-
nostics included a correlogram, Q-measurement, histogram, and ordinariness test.
For solidity testing, they used the Chow test. In Bangladesh, Awal et al. [11] develop
an ARIMA model for predicting instantaneous rice yields. According to the review,
the best-fitted models for short-run expecting Aus, Aman, and Boro rice generation
were ARIMA (4,1,1), ARIMA (2,1,1), and ARIMA (2,2,3), respectively. Abonazel
et al. [12] used the Box-Jenkins approach to create a plausible ARIMA technique for
the Egyptian yearly GDP. The World Bank provided yearly GDP statistics figures for
Egypt from 1965 to 2016. They show that the ARIMA method is superior for esti-
mating Egyptian GDP (1, 2, 1). Lastly, using the fitted ARIMA technique, Egypt’s
GDP was front-projected over the next ten years.
From 2008–09 to 2012–13, Rahman et al. [13] used the ARIMA technique to
predict the Boro rice harvest in Bangladesh. The ARIMA (0,1,2) model was shown
to be excellent for regional, current, and absolute Boro rice proffering, respectively.
Voumik et al. [14] looked at annual statistics for Bangladesh from 1972 to 2019
and used the ARIMA method to estimate future GDP per capita. ARIMA is the
best model for estimating Bangladeshi GDP apiece, according to the ADF, PP, and
KPSS tests (0, 2, 1). Finally, in this study, we used the ARIMA method (0,2,1) to
estimate Bangladesh’s GDP apiece for the following 10 years. The use of ARIMA
demonstration techniques in the Nigeria Gross Domestic Product between 1980 and
2007 is depicted in this research study by Fatoki et al. [15]. Zakai et al. [16] examine
the quality of the International Monetary Fund’s (IMF) annual GDP statistics for
Pakistan from 1953 to 2012. To display the GDP, a number of ARIMA methods are
created using the Box-Jenkins approach. They discovered that by using the master
modeler technique and the best-fit model, they were able to achieve ARIMA (1,1,0).
Finally, using the best-fit ARIMA model, gauge values for the next several years have
been obtained. According to their findings, they were in charge of test estimates from
1953 to 2009, and visual representation of prediction values revealed appropriate
behavior.
To demonstrate and evaluate GDP growth rates in Bangladesh’s economy, Voumik
et al. [17] used the time series methods ARIMA and the method of exponen-
tial smoothing. World Development Indicators (WDI), a World Bank subsidiary,
compiled the data over a 37-year period. The Phillips-Perron (PP) and Augmented
Dickey-Fuller (ADF) trials were used to look at the fixed person of the features.
Smoothing measures are used to guess the rate of GDP growth. Furthermore, the
triple exceptional model outperformed all other Exponential Smoothing models in
terms of the lowest Sum of Square Error (SSE) and Root Mean Square Error (RMSE).
62 Md. M. H. Shohug et al.

Khan et al. [18] started the ball rolling. A time series model can assess the value-
added of financial hypotheses in comparison to the pure evaluative capacity of the
variable’s prior actions; continuous improvements in the analysis of time series that
suggest more current time series techniques might impart more precise standards for
monetary techniques. From the monetary years 1979–1980 to 2011–2012, the char-
acteristics of annual data on a modern commitment to GDP are examined. They used
two strategies to create their informative index: Holt’s straight Smoothing technique
and the Auto-Regressive Integrated Moving Average (ARIMA).

3 Methodology

The main goal of our research is to develop a model to forecast the Growth Domestic
Product (GDP) of Bangladesh. Our proposed model relevant theory is given below.
Autoregressive Model (AR): The AR model stands for the autoregressive model.
An auto-backward model is created when a value from a time series is reverted on
earlier gain against a comparable time series. This model has a request with the letter
“p” in it. The documentation AR indicates that request “p” has an auto-backward
model (p). In “(1)”, the AR(p) model is depicted.

Yt = ϕ0 + ϕ1 × yt−1 + ϕ2 × yt−2 + ϕ3 × yt−3 . . . . . . . + ϕm × yt−m (1)

Here, T = 1, 2, 3…………., t and Y t = signifies Y as a function of time t, and φ m =

is in the autoregression coefficients.
Moving Average Model (MA): The moving normal model is a time series model
that compensates for extremely low short-run autocorrelation. It demonstrates that
the next impression is the normal of all previous perceptions. The request for the
moving assert age model “q” may be decided in great part by the ACF plot of the
time series. The documentation MA (q) refers to a moving normal model request
“q”. In “(2)”, the MA(q) model is depicted.

Yt = σ0 + σ1 × αt−1 + σ2 × αt−2 + σ3 × αt−3 . . . . . . . . . . . . . + σk × αt−k (2)

where σ is the mean of the series, the parameters of the mode are σ0 , σ1 , σ3 ……….
σk, and the white noise error terms are αt−1 , αt−2 , αt−3 …αt-k
Autoregressive Integrated Moving Average Model (ARIMA): The Autoregressive
Integrated Moving Average model [19, 20] is abbreviated as ARIMA. In time series
data, a type of model may catch a variety of common transitory occurrences. ARIMA
models are factual models that are used to analyze and figure out time series data.
In the model, each of these elements is clearly stated as a border. ARIMA (p, d, q)
is a type of standard documentation in which the borders are replaced by numerical
A Data-Driven Approach to Forecasting Bangladesh Next-Generation Economy 63

attributes in order to recognize the ARIMA method. We may suppose that the ARIMA
(p, 1, q) method and the condition decide in “(3)” in this connected model.

Yt = ϕ0 + ϕ1 × yt−1 ... + ϕm × yt−m + σ0 + σ1 × αt−1 + . . . . + σk × α t−k

(3)

In this equation here, Yt is defined as a combined of those (1) number and (2)
number equations. Therefore, Yt = Yt − Yt−k , to account for a linear trend in the
data, a first difference might be utilized.
Seasonal Autoregressive Integrated Moving Average Exogenous Model:
SARIMAX stands for Seasonal Autoregressive Integrated Moving Average Exoge-
nous model. The SARIMAX method is created by stretching the ARIMA technique.
This method has a sporadic component. As we’ve shown, ARIMA can make a non-
fixed time series fixed by modifying the pattern. By removing patterns and irregulari-
ties, the SARIMAX model may be able to handle a non-fixed time series. SARIMAX
grew as a result of the model’s limitations (P, D, Q, s). They are described as follows:
P This denotes the autoregressive seasonality’s order.
D This is the seasonal differentiation order.
Q This is the seasonality order of the moving average.
s This is mainly defining our season’s number of periods.
Akaike Information Criterion (AIC): The Akaike Information Criterion (AIC)
permits us to examine how good our model runs the enlightening record beyond
overfishing it. Furthermore, the AIC score pursues a method that gains a maximum
fairness of-fit rate and rebuffs them assuming they suit exorbitantly synthesis. With
no one else, the AIC score isn’t very useful except if we contrast it and the AIC score
of a contending time series model. It relied on the model with the lower AIC score
to find harmony between its capacity to fit the informational index and its capacity
to try not to over-fit the informational index. The formula of AIC value:

AI C = 2m − 2ln(δ) (4)

Here the parameters define that m = Number of model parameters. δ = δ(θ ) = highest
value of the possible function of the method. For my model here, θ = maximum
likelihood.
Autocorrelation Function (ACF): It demonstrates how data values in a time series
are correlated to the data values before them on the mean value.
Partial Autocorrelation Function (PACF): The theoretical PACF for an AR model
“closes off” once the model is solicited. The articulation “shut off” implies that the
partial auto-relationships are equivalent to 0 beyond that point on a fundamental
level. In other words, the number of non-zero halfway autocorrelations provides
the AR model with the request. The most ludicrous leeway of “Bangladesh GDP
development rate” that is used as a pointer is referred to as “demand for the model.”.
64 Md. M. H. Shohug et al.

Mean Square Error: The mean square error (MSE) is another strategy for assessing
an estimating method. Every error or leftover is squared. The quantity of perceptions
then added and partitioned these. This method punishes enormous determining errors
because the mistakes are squared, which is significant. The MSE is given by

1 2
n
MSE = yi − y i (5)
n i=1

Root Mean Square Error: Root mean square error is a commonly utilized fraction
of the difference between allying rate (test and real) by a technique or assessor and
the characteristics perceived. The RMSE is calculated as in “(6)”:

n
1 2
RMSE = yi − y i (6)
n i=1

4 Analysis and Results

We use the target variable Bangladesh GDP growth rate (according to the percentage)
from the year 1960 to 2021 collected from the World Bank database’s official website.
A portion of this typical data is shown in Table 1.
We showed the time series plots of the whole dataset from the year 1960 to 2021
on a yearly basis in Fig. 1 for both (a) GDP growth (annual%) data and (b) First
Difference of GDP growth (annual%). It is observed that there is a sharp decrease
in GDP growth from 1970 to 1972. After that, on an average, an upward trend is
observed but has another decrease in 2020 because of the spread of Coronavirus
infection.
In Fig. 2, decomposing of a time series involves the collection of level, trend,
seasonality, and noise components. The Auto ARIMA system provided the AIC
values for the several combinations of the p, d, and q values. The ARIMA model
with minimum AIC value is chosen and also suggested SARIMAX (0, 1, 1) function

Table 1 The typical data of

Year GDP growth (Annual %)
GDP Growth (annual %) of
Bangladesh (Partial) 01/01/1960 2.632
01/01/1961 6.058
01/01/1962 5.453
01/01/1963 −0.456
01/01/1964 10.953
01/01/1965 1.606
A Data-Driven Approach to Forecasting Bangladesh Next-Generation Economy 65

Fig. 1 The time series plots of yearly a GDP growth (annual %) data and b first difference of GDP
growth (annual %)

to be used. According to this sequence, the ARIMA (p, d, q) which is ARIMA (0, 1,
1) and after that with auto ARIMA system, which is shown in Table 4. In this figure,
here auto ARIMA system is defined as a SARIMAX (0, 1, 1) for creating seasonality.
Here, this is dependent on the AIC value and makes the result in SARIMAX function
for the next fitted ARIMA model through the train data (Tables 2 and 3).
After finding the function and ARIMA (p, d, q) values in this dataset for fitting
the model, it divided the data into 80% as training data and the other 20% as test

Fig. 2 Decomposition of GDP growth (annual %)

66 Md. M. H. Shohug et al.

Table 2 Stationary test of

Data type ADF test value Stationary
actual GDP growth rate data
and first differenced data GDP growth (annual %) −1.87 (p > 0.10) No
First Difference of GDP growth −4.80 (p < 0.01) Yes
(annual %)

Table 3 ARIMA order

ARIMA (p, d, q) model AIC
selection
ARIMA (0, 1, 0) 371.792
ARIMA (1, 1, 0) 358.185
ARIMA (0, 1, 2) 369.803
ARIMA (2, 1, 0) 350.347
ARIMA (3, 1, 0) 346.948
ARIMA (5, 1, 0) 342.707
ARIMA (3, 1, 1) 340.057
ARIMA (0, 1, 1) 334.750
ARIMA (0, 1, 2) 336.632
ARIMA (0, 1, 0) 356.198
ARIMA (1, 1, 2) 338.457

Table 4 SARIMAX (0, 1, 1)

Model No. observation: 62 SARIMAX (0,
model estimation
Years 1, 1)
Running date Tue, 17 Aug 2021 AIC: 334.750
ma.L1 Std err 0.075
ma.L1 P > |z| 0.000
sigma2 Std err 1.137
sigma2 P > |z| 0.000
Ljung-Box (L1) (Q) 0.22 Jarque–Bera
(JB):188.41
Prob(Q) 0.64 Prob (JB): 0.00
Prob(H) (two-sided) 0.04 Skew:-1.91
Heteroskedasticity 0.00 Kurtosis: 10.72
(H)

data. In Table 5, ARIMA (0, 1, 1) model result is shown which is built by the training
data of this dataset.
After fitting the model with the training dataset, the values of the test data and
predicted data are shown in Fig. 6 and Table 7. Here for predicting the data using
SARIMAX seasonality is half of the year context; for this reason, the SARIMAX
function is defined as SARIMAX (0, 1, 1, 6) as it has shown less error compared to
the other seasonal orders. And this will be the best-fitted model, which is defined in
A Data-Driven Approach to Forecasting Bangladesh Next-Generation Economy 67

Table 5 Results of ARIMA

Model No. observation: 48 ARIMA (0, 1, 1)
(0, 1, 1) model converted
years
from SARIMAX (0, 1, 1)
Running date Tue, 17 Aug 2021 AIC: 274.666
Method css-mle S.D. of innovations:
3.816
Const coef 0.0736
Const std err 0.039
Const P > |z| 0.056
ma.L1.D.GDP coef −1.0000
growth (annual %)
ma.L1.D.GDP std err 0.062
growth (annual %)
ma.L1.D.GDP P > |z| 0.000
growth (annual %)
MA.1 Real 1.0000
MA.1 Imaginary + 0.0000j

the model evaluation. The RMSE value, MAE value, and model accuracy are given
in Table 8, which suggested that SARIMAX (0, 1, 1, 6) model can be used as the
best model for predicting the GDP growth rate (annual %).
Figure 7 depicted the forthcoming 10 years Bangladesh GDP growth rate plot
after the model was evaluated. The built web application, GDP indicator [21] based
on time series ARIMA model, and Fig. 8a, which introduces the GDB indicator
application with the table of predicted GDP growth (%) values shown in Fig. 8b.

Fig. 6 Actual and predicted GDP growth rate (%)

68 Md. M. H. Shohug et al.

Table 7 Actual and predicted

Year GDP growth rate (%) Forecast
GDP growth Rate (%) from
the year 2009 to 2021 2009–01-01 5.045 5.398161
2010–01-01 5.572 6.978662
2011–01-01 6.464 5.440322
2012–01-01 6.521 6.786356
2013–01-01 6.014 5.461063
2014–01-01 6.061 5.292917
2015–01-01 6.553 5.834626
2016–01-01 7.113 7.415128
2017–01-01 7.284 5.876787
2018–01-01 7.864 7.222821
2019–01-01 8.153 5.897529
2020–01-01 5.200 5.729383
2021–01-01 6.800 6.271092

Table 8 Evaluation
Evaluation parameter for model SARIMAX (0, 1,1, Value
parameter values for the
6)
SARIMAX (0, 1, 1, 6) model
RMSE error value 0.991
MAE error value 0.827
Model accuracy 87.51%

Fig. 7 Next 10 years GDP Growth (annual %) Prediction

5 Conclusion and Future Work

According to our study, we are successfully predicting the Bangladesh GDP Growth
Rate with the machine learning time series ARIMA model with the order of (0, 1, 1).
Here, in this model, we found this model performs 87.51% accurately. This model is
verified with a minimum AIC value which is generated by the auto ARIMA function.
In this model, auto ARIMA defines SARIMAX (0, 1, 1) model which is observed by
A Data-Driven Approach to Forecasting Bangladesh Next-Generation Economy 69

Fig. 8 User interface of GDB indicator a homepage and b predicted GDP growth rate for the next
upcoming year (2022–2050)

the whole historical data. The half-yearly seasonality of this data observed this and,
after that, this model predicts automatically for the upcoming year. We implement
this Machine Learning time series ARIMA model on the web application as GDB
indicator-BD. Here users can find Bangladesh’s future GDP growth rate and they
can observe that yearBangladesh’s upcoming economy. In this dataset, we can also
implement another machine learning or upgraded deep learning model, but we cannot
implement this. So, we think that this is the gap in our research. In the future, we also
work on this data with multiple features and implement other upcoming and upgraded
models and I will show how it’s performed in this dataset for future prediction.
70 Md. M. H. Shohug et al.

References

1. Jahan N (2021) Predicting economic performance of Bangladesh using Autoregressive

Integrated Moving Average (ARIMA) model. J Appl Finance Banking 11(2):129–148
2. Jamir I (2020) Forecasting potential impact of COVID-19 outbreak on India’s GDP using
ARIMA model. Available at SSRN 3613724
3. Chowdhury IU, Khan MA (2015) The impact of climate change on rice yield in Bangladesh:
a time series analysis. Russian J Agric Socio-Econom Sci 40(4):12–28
4. Rubi MA, Bijoy HI, Bitto AK (2021) Life expectancy prediction based on GDP and population
size of Bangladesh using multiple linear regression and ANN model. In: 2021 12th international
conference on computing communication and networking technologies (ICCCNT), pp 1–6.
https://doi.org/10.1109/ICCCNT51525.2021.9579594
5. Hassan HM (2020) Modelling GDP for Sudan using ARIMA. Available at SSRN 3630099
6. Anam S, Hossain MM (2012) Time series modelling of the contribution of agriculture to GDP
of Bangladesh
7. Sultana A, Khanam M (2020) Forecasting rice production of Bangladesh using ARIMA and
artificial neural network models. Dhaka Univ J Sci 68(2):143–147
8. Chowdhury S, Rubi MA, Bijoy MH (2021) Application of artificial neural network for
predicting agricultural methane and CO2 emissions in Bangladesh. In: 2021 12th interna-
tional conference on computing communication and networking technologies (ICCCNT), pp
1–5. https://doi.org/10.1109/ICCCNT51525.2021.9580106
9. Wang T (2016) Forecast of economic growth by time series and scenario planning method—a
case study of Shenzhen. Mod Econ 7(02):212
10. Miah MM, Tabassum M, Rana MS (2019) Modeling and forecasting of GDP in Bangladesh:
an ARIMA approach. J Mech Continua Math Sci 14(3): 150–166
11. Awal MA, Siddique MAB (2011) Rice production in Bangladesh employing by ARIMA model.
Bangladesh J Agric Res 36(1):51–62
12. Abonazel MR, Abd-Elftah AI (2019) Forecasting Egyptian GDP using ARIMA models.
Reports on Economics and Finance 5(1): 35–47
13. Rahman NMF (2010) Forecasting of boro rice production in Bangladesh: an ARIMA approach.
J Bangladesh Agric Univ 8(1):103–112
14. Rahman M, Voumik LC, Rahman M, Hossain S (2019) Forecasting GDP growth rates of
Bangladesh: an empirical study. Indian J Econom Developm 7(7): 1–11
15. Fatoki O, Ugochukwu M, Abass O (2010) An application of ARIMA model to the Nigeria
Gross Domestic Product (GDP). Int J Statistics Syst 5(1):63–72
16. Zakai M (2014) A time series modeling on GDP of Pakistan. J Contemporary Issues Business
Res 3(4):200–210
17. Voumik LC, Smrity DY (2020) Forecasting GDP Per Capita In Bangladesh: using Arima
model. Eur J Business Manag Res 5(5)
18. Khan T, Rahman MA (2013) Modeling the contribution of industry to gross domestic product
of Bangladesh. Int J Econom Res 4: 66–76
19. Rahman A, Hasan MM (2017) Modeling and forecasting of carbon dioxide emissions in
Bangladesh using Autoregressive Integrated Moving Average (ARIMA) models. Open J
Statistics 7(4): 560–566
20. Khan MS, Khan U (2020) Comparison of forecasting performance with VAR vs. ARIMA
models using economic variables of Bangladesh. Asian J Probab Stat 10(2): 33–47
21. Hasan MS, Bitto AK, Rubi MA, Hasan IB, Rahman A (2021) GDP Indicator BD. https://gdp
indicatorbd.pythonanywhere.com/. Accessed 11 Feb 2022
A Cross Dataset Approach for Noisy
Speech Identification

A. K. Punnoose

1 Introduction

Noisy speech poses a great challenge to a real-time, real-world speech recognition

system. Speech recognition errors can be introduced at the phoneme level or at the
word level depending on the type of noise. There are many ways to deal with noise
in speech. One is to figure out whether the utterance is noisy before passing to the
core recognition engine. This is suitable if the recognition engine is trained using
clean speech data. Once the incoming utterance is identified as noisy, then appropriate
mechanisms can be employed to deal with the noise. In the worst case, if the speech is
too noisy, the utterance can be discarded. Here, the noisy speech detection algorithm
works as a pre-processing step before the core speech recognition stage.
Another way is to simply ignore the noise in the training phase of the speech recog-
nition engine. One advantage of this approach is that though the speech is noisy, the
learned phoneme models are averaged with respect to noise. This increases the robust-
ness of the core recognition engine. But during testing, unintended recognition error
patterns like high precision for certain phonemes at the expense of other phonemes
could be observed. This could require further benchmarking at the phoneme level,
to be used as a general purpose recognition engine.
Another approach to deal with noisy speech identification is to add noise to the
training data and train with the noise. The noise can be added through multiple
noise class labels at the frame level but is a difficult task. Another way of dealing
with the noisy speech identification problem is to focus on speech rather than noise.
As speech is more organized compared to that of noise [1], spectral level patterns
would be easily discernible for clean speech compared to that of noisy speech. At
this point, it is worth noting the difference between noisy speech identification and
voice activity detection (VAD). Noisy speech identification assumes a default speech

A. K. Punnoose (B)
Flare Speech Systems, Bangalore, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 71
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_7
72 A. K. Punnoose

recording and noise could be present. The task is to identify whether noise is present
in the speech. On the other hand, in VAD the default is a noisy recording and speech
could be present. The task is to identify whether speech is present in the recording.
The techniques developed for VAD can be used interchangeably with noisy speech
identification.

2 Problem Statement

Given an utterance, identify whether the utterance is noisy or not.

3 Prior Work

Noisy speech detection is covered extensively in the literature. Filters like Kalman fil-
ter [2, 3] and spectral subtraction [4, 5] have been used to remove noise in speech. But
this requires an understanding of the nature of the noise, which is mostly infeasible.
A more generic way is to estimate the signal-to-noise ratio(SNR) of the recording and
use appropriate thresholding on SNR to filter out noisy recordings [6–9]. Voice activ-
ity detection is also extensively covered in the literature. Autocorrelation functions
and their various derivatives have been used extensively for voice activity detection.
Subband decomposition and suppression of certain sub-bands based on stationarity
assumptions on autocorrelation function are used for robust voice activity detection
[10]. Autocorrelation derived features like harmonicity, clarity, and periodicity pro-
vide more speech-like characteristics. Pitch continuity in speech has been exploited
for robust speech activity detection [11]. For highly degraded channels, GABOR
features along with autocorrelation derived features are also used [12]. Modulation
frequency is also used in conjunction with harmonicity for VAD [13].
Another very common method is to use mel frequency cepstral features with
classifiers like SVMs to predict speech regions [14]. Derived spectral features like
low short-time energy ratio, high zero-crossing rate ratio, line spectral pairs, spectral
flux, spectral centroid, spectral rolloff, ratio of magnitude in speech band, top peaks,
and ratio of magnitude under top peaks are also used to predict speech/non-speech
regions [15].
Sparse coding has been used to learn a combined dictionary of speech and noise
and then, remove the noise part to get the pure speech representation [16, 17]. The
correspondence between the features derived from the clean speech dictionary and the
speech/non-speech labels can be learned using discriminative models like conditional
random fields [18]. Along with sparse coding, acoustic-phonetic features are also
explored for speech and noise analysis [19].
From the speech intelligibility perspective, vowels remain more resilient to noise
[20]. Moreover, speech intelligibility in the presence of noise also depends on the
listener’s native language [21–24]. Any robust noisy speech identification system
A Cross Dataset Approach for Noisy Speech Identification 73

must take into consideration the inherent intelligibility of phonemes while scoring the
sentence hypothesis. The rest of the paper is organized as follows. The experimental
setup is first defined. Certain measures, that could be used to differentiate clean
speech from noisy speech, are explored. A scoring function is defined to score the
noisy speech. Simple thresholding on the scoring is used to differentiate noisy speech
and clean speech.

4 Experimental Setup

60 h of Voxforge dataset is used to train the MLP. The rationale behind using Vox-
forge data is its closeness to real-world conditions, in terms of recording, speaker
variability, noise, etc. ICSI Quicknet [25] is used for the training. Perceptual linear
coefficients (plp) along with delta and double-delta coefficients are used as the input.
Softmax layer is employed at the output. Cross entropy error is the loss function
used. Output labels are the standard English phonemes.
For a 9 plp frame window given as the input, MLP outputs a probability vector
with individual components corresponding to the phonemes. The phoneme which
gets the highest probability is treated as the top phoneme for that frame. The highest
softmax probability of the frame is termed as the top softmax probability of the frame.
A set of consecutive frames classified as the same phoneme constitutes a phoneme
chunk. The number of frames in a phoneme chunk is referred to as the phoneme
chunk size.
For the subsequent stages, TIMIT training set is used as the clean speech training
data. A subset of background noise data from the CHiME dataset [26] is mixed with
the TIMIT training set and is treated as the noisy speech training data. We label this
dataset as dtrain . dtrain is passed through the MLP to get the phoneme posteriors. From
the MLP posteriors, the required measures and distributions needed to detect noisy
speech recording are computed. A noisy speech scoring mechanism is defined. For
testing, the TIMIT testing set is used as a clean speech testing dataset. TIMIT testing
set mixed with a different subset of CHiME background noise is used as the noisy
speech testing data. This data is labeled as dtest
We define 2 new measures, phoneme detection rate and softmax probability of
clean and noisy speech. These measures are combined to get a recording level score,
which is used to determine the noise level in a recording.

4.1 Phoneme Detection rate

For a phoneme p, let g be the ratio of the number of frames that got recognized as
true positives to the number of frames that got recognized as false positives, for clean
speech. Let h represent the same ratio for the noisy speech. The phoneme detection
nature of clean speech and noisy speech can be broadly classified into three cases.
74 A. K. Punnoose

In the first category, both g and h are low. In the second case, g is high and h is low.
In the third case, both g and h are high. A phoneme weighting function is defined as
⎧
⎨ x1 g < 1 and h < 1
f 1 ( p; g, h) = x2 g > 1 and h < 1 (1)
⎩
x3 g > 1 and h > 1

where xi = 1 and xi ∈ (0, 1]. This is not a probability distribution function. The
optimal values of x1 , x2 and x3 will be derived in the next section. Note that g and h
are computed from the clean speech and noisy speech training data. x3 corresponds
to the most robust phoneme while x1 corresponds to non-robust phoneme.

4.2 Softmax Probability of Clean Speech and Noisy Speech

Figure 1 plots the density of top softmax probability of the frames of true positive
detections for the noisy speech. Figure 2 plots the same for false positive detections
of clean speech. Any approach to identify noisy recordings must be able to take into
account the subtle difference in these densities. As the plots are asymmetrical and
skewed, we use gamma distribution to model the density. The probability density
function of the gamma distribution is given by

β α x α−1 e−βx
f 2 (x; α, β) = (2)
Γ (α)

Fig. 1 Density of noisy

speech true positive softmax
probabilities
A Cross Dataset Approach for Noisy Speech Identification 75

Fig. 2 Density of clean

speech false positive softmax
probabilities

where
Γ (α) = (α − 1)! (3)

α and β are the shape and rate parameters.

4.3 Utterance Level Scoring

Given a sequence of top phonemes [ p1 p2 . . . p N ] along with the associated softmax

probabilities [q1 q2 . . . q N ], corresponding to a recording. To get the utterance level
score, we first compute the geometric mean of the density ratio weighed by the
phoneme probability, of all frames.

N
f 2 (qi ; α+ , β+ )
s=
N
f 1 ( pi ) (4)
i=1
f 2 (qi ; α− , β− )

where α+ and β+ are the shape and rate parameters of the true positive detection of
noisy speech and α− and β− are the same for false positive detection of clean speech.
Using wi = f 1 ( pi ) and Ai = ff22 (qi ;α+ ,β+ )
(qi ;α− ,β− )
, Eq. 4 can be rewritten as

N
s = exp 1
N
ln(wi Ai ) (5)
i

which implies
76 A. K. Punnoose

1
N
s∝ ln(wi Ai ) (6)
N i

Equation 8 is the average of N terms, each term corresponding to a frame. We label

these terms as factors. Note that f 2 is independent of phoneme and wi is phoneme
dependent. To increase the robustness of the overall recording level score, a set of
conditions are introduced on these factors. Factors corresponding to frames where
the phoneme detected is non-robust should be covered by 2 factors of frames where
the phoneme detected is robust. Similarly, factors corresponding to frames where the
phoneme detected is of intermediate confidence should be covered by 3 factors of
frames where the phoneme is predicted with the highest confidence.
Define the max density ratio A as

f 2 (qi ; α+ , β+ )
A = maxi (7)
f 2 (qi ; α− , β− )

and define the average density ratio B as,

f 2 (qi ; α+ , β+ )
B = avgi (8)
f 2 (qi ; α− , β− )

The conditions defined above can be expressed through appropriate values of the
variables x1 , x2 and x3 , which could be found by solving the following optimization
problem.
min ln(Ax1 ) + ln(Ax2 ) − 5 ln(Bx3 )
x1 ,x2 ,x3
s.t. ln(Ax1 ) − 2 ln(Bx3 ) > 0
ln(Ax2 ) − 3 ln(Bx3 ) > 0
x1 + x2 + x3 = 1
0 < xi ≤ 1

The objective function ensures that the inequalities are just satisfied. The Hessian of
the objective function is given by
⎡ −1 ⎤
x12
0 0
⎢ −1
0⎥
H =⎣0 x22 ⎦ (9)
5
0 0 x32

H is indefinite and the inequality constraints are not convex. Hence the standard
convex optimization approaches can’t be employed. In the training phase, the values
of A and B have to be found. For a given A and B, the values of x1 , x2 and x3 which
satisfy the inequalities have to be computed. As the optimization problem is in R 3 a
grid search will yield the optimal solution.
A Cross Dataset Approach for Noisy Speech Identification 77

4.3.1 Need for Inequalities

Assume the same wi for all the frames, i.e., for every phoneme, the weightage is
the same. Now consider the scenario where a set of noisy speech recordings with a
roughly equal number of non-robust and robust frames are recognized, per recording.
And assume that Ai values are high for non-robust phonemes, and low for robust
phonemes. Then any threshold t, set for classification, will be dominated by the
non-robust phoneme frames. While testing, assume a noisy speech recording with
predominantly robust phonemes with low Ai values, then the recording level score s
will be less than the required threshold value t, thus effectively reducing the recall of
the system. To alleviate this issue, conditions are set on the weightage of phonemes
based on their robustness.

5 Results

The variable values A = 4.1 and B = 1.27 are computed from dtrain . The optimal
variable values x1 = 0.175, x2 = 0.148, x3 = 0.677 are obtained by grid search on
the variable space. With the optimal variable values, testing is done for noisy speech
recording identification on dtest . A simple thresholding on the recording level score
s is used as the decision mechanism. In this context, a true positive refers to the
identification of a noisy speech recording correctly. Figure 3 plots the ROC curve
for noisy speech recording identification. Note that silence phonemes are excluded
from all the computations.

Fig. 3 ROC curve

78 A. K. Punnoose

In the ROC curve, it is evident that the utterance level scoring with equal weightage
for all the phonemes is not useful. But the differential scoring of phonemes based on
their recognition capability makes the utterance level scoring much more meaningful.

6 Conclusion and Future Work

A computationally simple approach for detecting noisy speech recording is presented.

The difference in the distribution of frame-level softmax probabilities of true positive
detection of the noisy speech and false positive detection of the clean speech is
demonstrated. A ratio-based scoring is defined, which is weighed by a framewise
phoneme detection confidence score. To ensure robustness, a set of 2 conditions on
framewise scores are imposed, which gets reflected in the values of the parameters
of phoneme confidence scoring function. Grid search is done to obtain the optimal
values of the phoneme confidence scoring function parameters. The geometric mean
of the frame-level scores of a recording is considered as the recording level score for
the noisy speech. ROC curve for various thresholds on the recording level score is
plotted, from the testing dataset.
In the future, we plan to incorporate more features into this framework. Formant
transitions and stylized pitch contours can be used to improve the predictive power
of this framework. Other phoneme level features like plosives, voice bar, etc. can
also be used for noisy speech recording identification.

References

1. Renevey P, Drygajlo A (2001) Entropy based voice activity detection in very noisy conditions.
In: Proceedings of eurospeech, pp 1887–1890
2. Shrawankar U, Thakare V (2010) Noise estimation and noise removal techniques for speech
recognition in adverse environment. In: Shi Z, Vadera S, Aamodt A, Leake D (eds) Intelli-
gent information processing V. IIP 2010. IFIP advances in information and communication
technology, vol 340. Springer, Berlin
3. Fujimoto M, Ariki Y (2000) Noisy speech recognition using noise reduction method based
on Kalman filter. In: 2000 IEEE international conference on acoustics, speech, and signal
processing. Proceedings (Cat. No.00CH37100), vol 3, pp 1727–1730. https://doi.org/10.1109/
ICASSP.2000.862085
4. Boll S (1979) Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans
Acoust, Speech, Signal Process 27(2): 113–120. https://doi.org/10.1109/TASSP.1979.1163209
5. Mwema WN, Mwangi E (1996) A spectral subtraction method for noise reduction in speech
signals. In: Proceedings of IEEE. AFRICON ’96, vol 1, pp 382–385. https://doi.org/10.1109/
AFRCON.1996.563142
6. Kim C, Stern R (2008) Robust signal-to-noise ratio estimation based on waveform amplitude
distribution analysis. In: Proceedings of interspeech, pp 2598–2601
7. Papadopoulos P, Tsiartas A, Narayanan S (2016) Long-term SNR estimation of speech signals
in known and unknown channel conditions. IEEE/ACM Trans Audio, Speech, Lang Process
24(12): 2495–2506
A Cross Dataset Approach for Noisy Speech Identification 79

8. Papadopoulos P, Tsiartas A, Gibson J, Narayanan S (2014) A supervised signal-to-noise ratio

estimation of speech signals. In: Proceedings of ICASSP, pp 8237–8241
9. Plapous C, Marro C, Scalart P (2006) Improved signal-to-noise ratio estimation for speech
enhancement. IEEE Trans Audio, Speech, Lang Process 14(6): 2098–2108
10. Lee K, Ellis DPW (2006) Voice activity detection in personal audio recordings using autocor-
relogram compensation. In: Proceedings of interspeech, pp 1970–1973
11. Shao Y, Lin Q (2018) Use of pitch continuity for robust speech activity detection. In: Proceed-
ings of ICASSP, pp 5534–5538
12. Graciarena M, Alwan A, Ellis DPW, Franco H, Ferrer L, Hansen JHL, Janin AL, Lee BS, Lei
Y, Mitra V, Morgan N, Sadjadi SO, Tsai T, Scheffer N, Tan LN, Williams B (2013) All for one:
feature combination for highly channel-degraded speech activity detection. In: Proceedings of
the annual conference of the international speech communication association, interspeech, pp
709–713
13. Chuangsuwanich E, Glass J (2011) Robust voice activity detector for real world applications
using harmonicity and modulation frequency. In: Proceedings of interspeech, pp 2645–2648
14. Tomi Kinnunen, Evgenia Chernenko (2012) Tuononen Marko. Voice activity detection using
MFCC features and support vector machine, Parsi Fr
15. Misra A (2012) Speech/nonspeech segmentation in web videos. In: Proceedings of interspeech,
vol 3, pp 1975–1978
16. Deng S-W, Han J (2013) Statistical voice activity detection based on sparse representation over
learned dictionary. In: Proceedings of digital signal processing, vol 23, pp 1228–1232
17. Ahmadi P, Joneidi M (2014) A new method for voice activity detection based on sparse repre-
sentation. In: Proceedings of 7th international congress on image and signal processing CISP,
pp 878–882
18. Teng P, Jia Y (2013) Voice activity detection via noise reducing using non-negative sparse
coding. IEEE Signal Process Lett 20(5): 475–478
19. Ramakrishnan AG, Vijay GKV (2017) Speech and noise analysis using sparse representation
and acoustic-phonetics knowledge
20. Julien M, Dentel L, Meunier F (2013) Speech recognition in natural background noise. PloS
one 8(11):e79279
21. Jin S-H, Liu C (2014) Intelligibility of American English vowels and consonants spoken by
international students in the US. J Speech, Lang, Hear Res 57
22. Rogers CL, Jennifer L, Febo DM, Joan B, Harvey A (2006) Effects of bilingualism, noise,
and reverberation on speech perception by listeners with normal hearing. Appl Psycholinguist
27:465–485
23. Stuart A, Jianliang Z, Shannon S (2010) Reception thresholds for sentences in quiet and noise
for monolingual english and bilingual mandarin-english listeners. J Am Acad Audiol 21:239–
48
24. Van Engen K (2010) Similarity and familiarity: second language sentence recognition in first-
and second-language multi-talker babble. Speech Commun 52:943–953
25. David J (2004) ICSI quicknet software package
26. Jon B, Ricard M, Vincent E, Shinji W (2016) The third ‘CHiME’ speech separation and
recognition challenge: analysis and outcomes. Comput Speech Lang
A Robust Distributed Clustered
Fault-Tolerant Scheduling for Wireless
Sensor Networks (RDCFT)

Sandeep Sahu and Sanjay Silakari

1 Introduction

As daily demands, requests, and significance develop, the obligation is to strengthen

and solidify the sensor network. These destinations contribute to the development
of a fault-tolerant wireless sensor network. Additionally, WSNs are transported to
far-flung check-in sites, segregated or hazardous zones, necessitating the use of a
profoundly robust fault-tolerant component. As the organization’s requirements and
significance increase, it is necessary to strengthen its reliability. It resulted in the
development of fault-tolerant wireless sensor networks. Typically, a WSN monitors
or operates a remote or hazardous location that requires a highly reliable, fault-
tolerant architecture. Weakness is a Latin term that refers to the characteristics of the
framework, and portions of it can be easily modified and destroyed. The fundamental
key concern in WSNs is whether the principal consequences of battery depletion
and energy shortages are sensor failure and the exchange of erroneous information
among sensors. As a result, enhanced fault tolerance capabilities in WSNs result in
increased sensor residual lives. Due to its necessity, the sensor must withstand failure
and transmit accurate data to the base station [1].
Even in the presence of a fault, a fault-tolerant framework will continue to
administer itself. It is also capable of identifying errors and reviving the framework
following failure. As a result, a fault-tolerant framework requires several condi-
tions. In the realm of wireless sensor networks, fault tolerance mechanisms have

S. Sahu (B)
Faculty, School of Computing Science & Engineering, VIT Bhopal University, Sehore
(MP) 466114, India
e-mail: [email protected]
S. Silakari
Professor, University Institute of Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya,
Bhopal, Madhya Pradesh, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 81
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_8
82 S. Sahu and S. Silakari

been extensively explored and discussed [2]. Some sensors fail to operate after their
estimated battery life has expired, reducing the network’s total lifespan and func-
tionality. Numerous researchers have made significant contributions to fault-related
obstacles such as sensor failures, coverage, connectivity issues, network partitioning,
data delivery inaccuracy, and dynamic routing, among others [3, 4].

2 Literature Review

2.1 Classification of Fault Levels

The following are the two-level faults considered in [3–6].

Sensor Level: Faults might manifest in either the network’s nodes’ hardware or
software components. A Fault at the Sink/Cluster Head Level: A fault at the sink
node level will result in system failure. The active sensor’s major source of energy is
its limited battery capacity. After sensors are deployed in the R, they are not easily
rechargeable, changeable, or, as we might say, nearly impossible to replace. Certain
applications require a high coverage quality and an extended network lifetime. Due
to the sensor network’s inability to meet the required level of coverage and quality
of service due to insufficient sensor scheduling, the network’s operational life is cut
short or reduced [7, 8].

2.2 Redundancy Based Fault Tolerance in WSNs

References [7, 8] authors use the sweep-line-based sensor redundancy check in

WSNs. The authors proposed a distributed multilevel energy-efficient fault-tolerant
scheduling approach for WSNs [8] based on coverage redundancy.
Clustering methods are most often used to reduce energy usage, but they may also
be used to achieve various quality-driven goals like fault-tolerant capability in WSNs
[9, 10]. As a network management challenge, clustering methods should tolerate
malfunctioning nodes while maintaining connectivity and stability. Numerous factors
may contribute to node failure in WSNs [8]. Battery depletion may result in compa-
rable failures to physical components, such as transceiver and processor failures,
susceptible to harm from external causes. Physical or environmental issues may also
cause connectivity failures, rectified via topology management techniques. Failure
of a node might result in a loss of connection or coverage.
Additionally, the researchers considered energy usage and the number of dead
SNs. The results indicated that the new approach could dramatically minimize power
usage and data loss. To overcome these concerns, [9] presented a novel technique
for ensuring an energy efficient fault tolerant whale optimization based routing algo-
rithm in WSNs. The recommended approach was utilized to ensure the network’s
A Robust Distributed Clustered Fault-Tolerant Scheduling for Wireless … 83

coverage and connection. When a node fails, the “up to fail” node is evaluated and
replaced before the entire network fails. However, if the “up to fail” node cannot
be replaced, a quick rerouting method has been suggested to redirect the routed
traffic initially through the “up to fail” node. The performance assessment of the
proposed technique indicated that the number of nodes suitable for the “up to fail”
node replacement is dependent on characteristics such as the node redundancy level
threshold and network density [10].
Numerous researchers have examined different redundancy mechanisms in
WSNs, including route redundancy, time redundancy or temporal redundancy, data
redundancy, node redundancy, and physical redundancy [10]. These strategies maxi-
mize energy efficiency and assure WSNS’s dependability, security, and fault toler-
ance. When the collector node detects that the central cluster head (CH) has failed, it
sends data to the backup cluster head (CH) rather than simultaneously broadcasting
data to the leading CH and backup CH. IHR’s efficacy was compared to Dual Homed
Routing (DHR) and Low-Energy Adaptive Clustering Hierarchy (LEACH) [11].
In this [12] paper, the authors offer a novel fault-tolerant sensor node scheduling
method, named FANS (Fault-tolerant Adaptive Node Scheduling), that takes into
account not only sensing coverage but also sensing level. The suggested FANS algo-
rithm helps retain sensor coverage, enhance network lifespan, and achieve energy
efficiency.
Additionally, it may result in data loss if sensors or CHs are affected (forwarders).
Fault-tolerant clustering techniques can replace failed sensors with other redundant
sensors and keep the network stable. These approaches allow for replacing failing
sensors with other sensors, maintaining the network’s stability.
We have extended the article proposed in [8] and proposed a robust distributed
clustered fault-tolerant scheduling which is based on the redundancy check algo-
rithm (sweep-line approach [7, 8]) that provides the number of redundant sensors
for R. The proposed RDCFT determines the 1-coverage requirement precisely and
fast while ensuring the sensor’s redundancy eligibility criterion at a low cost and with
better fault tolerance capability at sensor and cluster level fault detection and replace-
ment. Additionally, we simulated and analyzed the suggested work’s correctness and
efficiency in various situations.

3 Proposed Work

3.1 Network Model, Preliminaries, and Assumptions

This paper discusses a distributed fault-tolerant strategy based on a clustering

methodology to ensure that the whole network is wholly linked. Clustering is a
distributed system that enables scalability in network management processes. By
reducing communication messages to the sink or base station, this strategy enables
us to build a fault-tolerant network with an energy-efficient network. Two distinct
84 S. Sahu and S. Silakari

Fig. 1 Clustered architecture of A WSN

ways will be invoked if fault detection and recovery modes are necessary. Why it is
called “two-way”? Because the proposed approach has two modes of execution: first,
when all sensors are internally deployed, i.e., at the start of the first network round,
and second, when all sensors are externally deployed (assuming a 100% energy
level). Following that, the second method is when the remaining energy level of all
sensors is 50% or less. The suggested process that we apply in our scheme consists of
two phases: randomly selecting a cluster head (CH) and forming a collection of clus-
ters. We should emphasize that we presume the WSNs employed in our method are
homogeneous. Figure 1 shows the clustered architecture of WSNs that we consider
in this section.
The failure of one or more sensors may disrupt connection and result in the
network being divided into several discontinuous segments. It may also result in
connection and coverage gaps in the surrounding region, which may damage the
monitoring process of the environment. The only way to solve this issue is to replace
the dead sensors with other redundant ones. Typically, the CH monitors the distri-
bution, processing of data and making judgments. When a CH fails, its replacement
alerts all sensors of its failure.

3.2 Fault Detection and Recovery

The first step of the fault management process is fault identification, the most critical
phase. However, errors must be identified precisely and appropriately. One challenge
is the specification of fault tolerance in WSNs; there is a trade-off between energy
usage and accuracy. As a result, we use a cluster-based fault detection approach that
A Robust Distributed Clustered Fault-Tolerant Scheduling for Wireless … 85

conserves node energy and is highly accurate. Our technique for detecting faults is
as follows [12–15]:
Detection of intra-cluster failures: If CH does not receive data from a node for
a preset length of time, it waits for the next period. Due to the possibility of data
loss due to interference and noise when the node is healthy, if CH does not receive
a packet after the second period, this node is presumed to be malfunctioning. As a
result, CH transmits a message to all surrounding CHs and cluster nodes, designating
this node with this ID as faulty.
Intra-cluster error detection: When CH obtains data from nodes that are physically
close together, it computes and saves a "median value" for the data. CH compares
newly collected data to the per-request "median value." When the difference between
the two values exceeds a predefined constant deviation, represented by CH detects
an error and declares the node that generated the data faultily. Again, CH notified all
surrounding CHs and nodes in his cluster that the node with this ID was faulty.
Detection of inter-cluster faults: CHs are a vital component of WSNs, and their
failure must be identified promptly. As a result, we employ this method. CHs commu-
nicate with other CHs regularly. This packet contains information on the cluster’s
nodes. If a CH cannot receive this packet from an adjacent CH, it is deemed faulty.

3.3 Redundancy Check and Clustering in WSNs

The sensors are assumed to be arranged randomly and densely over an R-shaped
dispersed rectangular grid. All sensors are identical in terms of sensing and commu-
nication ranges, as well as battery power consumption. Consider two sensors Si and
Sj , with a distance between them of Rs (Si and Sj ≤ Rs ). The sector apexed at Si with
an angle of 2α can be used to approximate the fraction of Si ’s sensing region covered
by Sj , as illustrated in Fig. 2. As indicated in Eq. 1, the angle can be computed using
the simple cosine rule, also explained in [6].
2 2 2
|Si p| + Si S j − S j p Si S j
2
cos α = 2 . Hence α = arcos (1)
2|Si p| Si S j 2R S

By their initial setup phase, each sensor creates a table of 1-hop detecting neigh-
bors based on received HELLO messages. The contribution of a sensor’s one-hop
detecting neighbors is determined. A sensor Sj is redundant for full-covered if its
1-hop active sensing neighbors cover the complete 360˚ circle surrounding it. To put
it another way, the union of the sectors contributed by sensors in its vicinity to cover
the entire 360˚ is defined as a sensor Sj ’s redundant criterion for full-covered. As a
result, the sensor Sj is redundant.
It is possible to accomplish this algorithmically by extending the sweep -ine-based
algorithm for sensor redundancy checking. Assume an imaginary vertical line sweeps
these intervals between 0˚ to 360˚. If the sweep-line intersects kp intervals from INj
86 S. Sahu and S. Silakari

Fig. 2 Approximated region

of Si covered by Sj

and is in the interval ipi in ICQj , the sensor Sj is redundant in ip. If this condition holds
true for all intervals in ICQj , then the sensor Sj is redundant, as illustrated in Fig. 3 as a
flowchart for a sweep-line algorithm-based redundancy check of a sensor [7, 8]. We
hold a variable CCQ for the current CQ and a sweep-line status l for the length of an
interval from ICQj intersected by the sweep-line. Only when the sweep-line crosses
the left, or right terminus of an interval does its status change. As a result, the event
queue Q retains the endpoints of the intervals ICQj and INj .
• If sweep-line crosses the event left endpoint of ip in ICQj then CCQ is kp .
• If sweep-line crosses the event left endpoint of sp in INj then increment l.
• If sweep-line crosses the event right endpoint of sp in INj then decrement l.

Fig. 3 Flowchart for Redundancy check of a sensor (sweep-line based [7, 8])
A Robust Distributed Clustered Fault-Tolerant Scheduling for Wireless … 87

Fig. 4 State transition of a sensor

If the sweep-line status l remains greater than or equal to CCQ for the sweep
duration, the sensor node Sj serves as a redundant sensor for the full-covered. Figure 4
shows the transition state of a sensor, i.e., it can be either one of the states viz., active,
presleep, or sleep. This sleeping competition can be avoided using simple back-off
time.

3.4 Selection of Cluster Head

The Cluster Head (CH) is chosen at random by the base station (BS) among the
cluster’s members. Then, CH will send a greeting message to all cluster members,
requesting their energy levels. The CH will communicate the energy levels to the
BS and then perform the hierarchy process. The BS will now establish a hierarchy
based on the excess energy. A CH will be created according to the hierarchy that has
been established. As the CH grows with each round, the hierarchy is built. The first
round’s CH will be the node with the most considerable energy storage capacity. The
second round includes the node with the second-highest energy level.
Similarly, the third round’s CH will comprise the three top nodes. Dynamically,
CH is selected for each round. The initial round continues until the highest node’s
energy level meets the energy level of the second-highest node. The second round
will continue until the first two nodes with the most significant energy levels reach
the third level. Likewise, for the third round, the process is repeated. As a result, time
allocation is also accomplished dynamically.
88 S. Sahu and S. Silakari

3.5 Algorithm Phase: Distributed Clustered Fault-Tolerant

Scheduling

Sensors may be deterministically or randomly scattered at the target region for moni-
toring within the R. We propose a clustered fault-tolerant sensor scheduling consisting
of a sequence of algorithms to effectively operate the deployed WSN. Each sensor
executes the defined duties periodically in every round of the total network life-
time and periodically detects the faulty sensor nodes. The flowchart of the proposed
mechanism is also shown in Fig. 5.
Our proposed clustered fault-tolerant sensor scheduling protocol has the following
assumptions:
• All deployed sensor nodes are assigned a unique identifier (sensorid ).
• Sensors are homogeneous and all are locally synchronized.

Fig. 5 Flowchart for the proposed scheme RDCFT for WSNs

A Robust Distributed Clustered Fault-Tolerant Scheduling for Wireless … 89

• Sensors are densely deployed with static mode. (redundancy gives better fault
tolerance capability in WSNs).
• The set of active/alive sensor nodes is represented by {San } = {Sa1 }, Sa2 , Sa3 , …,
San }.
• The set of cluster head nodes is represented by {CHm } = {SCH1 , SCH2 , SCH3 , …,
SCHm }.
• The set of faulty sensor nodes is represented by {Sfn } = {Sf1 }, Sf2 , Sf3 , …, Sfn }.
90 S. Sahu and S. Silakari

Table 1 Simulation
Parameter Value
parameters
Number of sensor nodes 100, 200, 300, & random
Network area (meter2 ) 100 × 100
Clusters Differs
Distributed subregion size 30 m x 30 m
Initial level of energy in each 10 J
sensor
Energy consumption for 0.02 J
transmission
Energy consumption for 0.001 Jules
receiving
Communication & sensing 4 m- 3 m
ranges
Threshold energy in each 2J
sensor
Simulation & round time 1000–1500 s and 200 s

3.6 Simulation Setup and Results

This section illustrates the experimental setup and the proposed algorithms’ findings.
We assessed the proposed algorithms’ performance using a network simulator [16].
The proposed protocol for RDCFT is simulated using NS2, and the parameters
utilized are shown in Table 1. RDCFT is being tested against existing methods
LEACH and randomized scenarios using the mentioned standard metrics and is
defined as follows in Table 1. The simulation is divided into the following steps:

(1) Specify the properties of the sensor node;

(2) Assign the x and y axes to a two-dimensional rectangular coordinate system
(distributed region/subregion) for each sensor node in the R;
(3) Assign a transmission radius (a communication range denoted by the symbol
Rc );
(4) Use the edges of the network graph to represent the connections between each
sensor node and its one-hop neighbors.
(5) Begin the timer for the TDMA time slot.
(6) Using the energy calculation, redundancy and assign a cluster head randomly.
(7) Apply the scheduling RDCFT algorithm for faulty sensor node(s) detection of
the network;
(8) Each sensor node uses energy when it generates, receives, or transmits a packet;
(9) Terminate the time slot timer (on a round basis) when the number of packets
received in the sink matches the number of network nodes;
(10) Repeat steps until the sink have no live neighbors;
(11) Stop the TDMA time slot timer (round basis) when the network lifetime is
over.
A Robust Distributed Clustered Fault-Tolerant Scheduling for Wireless … 91

Fig. 6 Cluster Formation in R (total area 100 × 100 & 30 × 30 subregions)

The duration of an experiment is measured in time units (seconds in our simulator),

and it is defined as the number of running steps required until no node can reach the
CH. A node that lacks the energy necessary to send or relay a packet is a dead node.
A disconnected node is also considered to be a dead node.
Figure 6 shows the deployment of sensor nodes for a subregion of RoI and Fig. 6
represents the cluster Formation in RoI (total area 100m_100m). As a simulation
result, Fig. 7 represents the average number of alive sensors for RoI compared with
the randomized method. Figure 6 represents the average number of faulty sensors
(including member sensor nodes and CHs) for RoI. There are two types of faulty
sensors that can be detected as faulty CH nodes and/or faulty, normal sensor nodes
(members of a cluster). Subsequently, Fig. 7 represents the average number of backup,
CH, and active sensors for RoI during several simulation rounds. Figure 7 shows the
proposed RDCFT simulated, and the result shows there are more number of alive
nodes in our proposed method than LEACH and randomized methods versus number
of network rounds.

4 Conclusion and Future Remarks

The proposed approach is based on the redundancy of the sensor and CHs. This
clustering approach maximizes the longevity of the network. We have extended the
article proposed in [8] and the proposed method begins with detecting defects and
can find the faulty sensor nodes using fault detection algorithm and replace the faulty
sensor with redundant sensors for the same R. The fault detection process is carried
out by scheduled communication messages exchanged between nodes and CHs is
O (nlogn). Second, the approach commences a recovery period for CHs/common
sensors that have been retrieved with the help of redundant sensors using the proposed
92 S. Sahu and S. Silakari

Fig. 7 Measurements of average alive, CHs, Backup and faulty sensors for R

algorithm. This is a novel and self-contained technique since the proposed method
does not need communication with the BS/sink to work. Simulations are performed
to evaluate the efficiency and validity of the overall proposed works in terms of
energy consumption, coverage ratio, fault tolerance scheduling.
Our proposed efforts are based on WSNs’ largely two-dimensional architecture.
Future remarks should include 3D-based WSNs. Future studies will also address
three critical challenges in 3D-WSNs, including energy, coverage, and faults, which
may pave the way for a new approach to researching sustainable WSNs to optimize
the overall network lifetime.

References

1. Yick J, Mukherjee B, Ghosal D (2008) Wireless sensor network survey. Comput Netw
52(12):2292–2330
2. Alrajei N, Fu H (2014) A survey on fault tolerance in wireless sensor networks. Sensor COMM
09366–09371
3. Sandeep S, Sanjay S (2021) Analysis of energy, coverage, and fault issues and their impacts
on applications of wireless sensor networks: a concise survey. IJCNA 8(4):358–380
A Robust Distributed Clustered Fault-Tolerant Scheduling for Wireless … 93

4. Kakamanshadi G, Gupta S, Singh S (2015) A survey on fault tolerance techniques in wireless

sensor networks. Google Sch 168–173
5. Mitra S, Das A (2017) Distributed fault tolerant architecture for wireless sensor network.
MR3650784 94A12 (68M15) 41(1), 47–55
6. Raj R, Ramesh M, Kumar S (2008) Fault-tolerant clustering approaches in wireless sensor
network for landslide area monitoring, pp 107–113
7. Tripathi RN, Rao SV (2008) Sweep line algorithm for k-coverage in wireless sensor networks:
In 2008 fourth international conference on wireless communication and sensor networks, pp
63–66
8. Sahu S, Silakari S (2022a) Distributed Multilevel k-Coverage Energy-Efficient Fault-Tolerant
Scheduling for Wireless Sensor Networks. Wireless Personal Communications Springer, Vol
124 (4), 2893–2922. https://doi.org/10.1007/s11277-022-09495-3
9. Sahu S, Silakari S (2022b) A whale optimization-based energy-efficient clustered routing for
wireless sensor networks. Soft Computing: Theories and Applications: Proceedings of SoCTA
2021, LNNS vol 425, 333–344, Springer, Singapore https://doi.org/10.1007/978-981-19-0707-
4_31
10. Shahraki A, Taherkordi A, Haugen Ø, Eliassen F (2020) Clustering objectives in wireless
sensor networks: a survey and research direction analysis. Comput Netw 180:107376
11. I. El Korbi, Y. Ghamri-Doudane, R. Jazi and L. A. Saidane: Coverage-connectivity based fault
tolerance procedure in wireless sensor networks,” 9th International Wireless Communications
and Mobile Computing Conference (IWCMC), 2013, pp. 1540–1545 (2013).
12. Choi J, Hahn J, Ha R (2009) A fault-tolerant adaptive node scheduling scheme for wireless
sensor networks. J Inf Sci Eng 25(1):273–287
13. Qiu M, Ming Z, Li J, Liu J, Quan G, Zhu Y (2013) Informer homed routing fault tolerance
mechanism for wireless sensor networks. J Syst Archit 59:260–270
14. Heinzelman WR, Chandrakasan A, Balakrishnan H (2000) Energy-efficient communication
protocol for wireless microsensor networks. In: Proceedings of the 33rd Annual Hawaii
international conference on system sciences, vol 2, pp 10
15. Zhang Z, Mehmood A, Shu L, Huo Z, Zhang Y, Mukherjee M (2018) A survey on fault diagnosis
in wireless sensor networks. IEEE Access 6:11349–11364
16. Issariyakul T, Hossain E, Issariyakul T, Hossain E (2012) Introduction to Network Simulator
NS2, USA, Springer US
Audio Scene Classification Based
on Topic Modelling and Audio Events
Using LDA and LSA

J. Sangeetha, P. Umamaheswari, and D. Rekha

1 Introduction

With the rapidly increasing availability of digital media, the ability to efficiently
process audio data has become very essential. Organizing audio documents with
topic labels is useful for sorting, filtering and efficient searching to find the most
relevant audio file. Audio scene recognition (ASR) involves identifying the location
and surrounding where the audio was recorded. It is similar to visual scene recognition
that involves identifying the environment of the image as a whole, with the only
difference that here it is applied to audio data [1–3].
In this paper, we attempt to perform ASR with topic modelling. Topic modelling
is a popular text mining technique that involves using the semantic structure of
documents to group similar documents based on the high-level subject discussed.
Assigning topic labels to documents helps for an efficient information retrieval
by yielding more relevant search results. In recent times, researchers have applied
topic modelling to audio data and achieved significant results. Topic modelling can
be extended to ASR due to the presence of analogous counterparts between text
documents and audio documents.
While an entire document can be split into words and then lemmatized, an audio
document can be segmented at the right positions to derive the words and each frame
can correspond to the lemmatization results. There are several advantages to using
audio over video for classification tasks such as the ease of recording, lesser storage
requirement, lesser pre-processing overhead and ease of streaming over networks.
ASR has many useful applications [4]. ASR aids in the development of intelligent

J. Sangeetha · P. Umamaheswari · D. Rekha (B)

Department of Computer Science and Engineering, Srinivasa Ramanujan Centre SASTRA
University, Kumabakaonam 612001, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 95
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_9
96 J. Sangeetha et al.

agents by perceiving the surrounding environment for accurate information extrac-

tion, a concept that can be extended to home automation devices. ASR can also
be used in the aqua-culture industry to accomplish classification algorithm with
respect to the context based environment, and in order to evaluate the feed intake of
prawns accuately. Acoustic controllers based on ASR have seen widespread usage
in aquaculture industry to compute feed volumes for prawns and shrimp breeds.
The remaining part of the work is structured as given here. Section 2 contains the
related work; Sect. 3 concisely gives two topic models: Latent Semantic Analysis
(LSA) and Latent Dirichlet Allocation (LDA); Sect. 4 has focussed on the proposed
framework; Sect. 5 illustrates the results of experimental analysis; Sect. 6 contains
conclusion and the future enhancements.

2 Related Work

Audio scene recognition has been studied as a computational classification problem

in the past. The setting of the recording could be any environment ranging from
densely populated food markets and shopping malls to calm interior places like a
family home. ASR can have many useful applications. It can be used in mobile
devices which can make it smart [4]. It can also be useful in aquaculture industry [5],
it makes the sound classification based on the context environment, that can help to
calculate prawns consumption of feed more accurately; ASR is also used in smart
homes [6] etc.
Leng et al. [7] devised a semi-supervised algorithm that focuses on the unlabelled
samples within the margin band of SVM that was robust to sizes of both labelled
and unlabelled training samples. Their proposed algorithm SSL_3C when applied to
audio event classification was able to achieve significant classification accuracy post
active learning. The samples had high confidence values and also being meaningful
at the same time. This algorithm is suitable for several tasks as it significantly reduces
the manual effort for labelling samples of large volumes.
Spoken document indexing is also another similar research area that is actively
studied. The Speech Find project [8] team performed indexing on the NGSW repos-
itory vastly consisting of recordings from broadcasts and speeches of the twentieth
century. The researchers used audio segmentation techniques combined with speech
transcription to identify particular segments of the broadcast that were considered
relevant.
ASR has also been widely performed on telephone speeches. Peskin et. al. [9]
performed ASR on the Switchboard corpus and achieved 50–60% accuracy by using
human transcripts for training.
Classifiers based on topic models are primarily used for text analysis applications,
with Probabilistic Latent Semantic Analysis (PLSA) and LDA being the most preva-
lent [10–12]. Introduced by Hofmann, PLSA identifies topics of documents using a
distribution of words and also it does not concern with the distribution of topics in
a document. LDA addresses this concern by using Dirichlet prior for the document
Audio Scene Classification Based on Topic Modelling and Audio Events … 97

and word topics and eventually creating a Bayesian form of PLSA. Mesaros et al.
[10] performed audio event detection with HMM model. PLSA was used for proba-
bilities prior to the audio events which were then transformed to derive the transition
probabilities. Hu et al. [11] improved the performance of LDA for audio retrieval
by modelling with a Gaussian distribution. It uses a multivariate distribution for the
topic and word distribution to alleviate the effects of vector quantization.
In this proposed work we adopted LSA and LDA to achieve ASR. As PLSA/LDA
based ASR algorithms [12–14] have been compared with this proposed algorithm, it
utilizes document event cooccurrence matrix, whereas in [12–14], document word
cooccurrence matrix has been used for analyzing the topic. This method extracts the
distribution of topics that would express the audio document in a better way, and then
also we can attain better recognition results. Common audio events suppression and
emphasizing unique topics are achieved by weighting the event distribution audio
documents.

3 LSA and LDA

3.1 Latent Semantic Analysis (LSA)

LSA is a technique which uses vector-based representations for texts to map the
text model using the terms. LSA is a statistical model which compares the similarity
between texts based on their semantics. It is a technique used in information retrieval,
analyzing relationships between documents and terms, identifying the hidden topics,
etc., The LSA technique analyzes large corpus data and forms a document-term cooc-
currence matrix to find the existence of the term in the document. It is a technique
to find the hidden data in a document [15]. Every document and the terms are repre-
sented in the form of vectors with their corresponding elements related to the topics.
The amount of match for a document or term is found using vector elements. The
hidden similarities can be found by specifying the documents and terms in a common
way.

3.2 Latent Dirichlet Allocation (LDA)

LDA is a generative probabilistic model for collections of discrete data. It is a

Bayesian model which has three levels of hierarchy and each component of a collec-
tion will be modelled using the set of latent topics as a finite combination. Every
term is derived from topics that are not observed directly. Every topic is modelled
based on the set of probabilities as an infinite combination [16].
98 J. Sangeetha et al.

4 Framework of the Proposed Work

The proposed framework is shown in Fig. 3. First audio input vocabulary set is
prepared, then the document-term cooccurrence matrix is generated to finally classify
the audio.

4.1 Input Vocabulary Creation

Considering the audio vocabulary set as input, each frame is matched with a similar
term in the vocabulary for training the model. Then the document-term cooccurrence
matrix is counted which is represented as Ztrain In the training set, the labels of the
audio frames can be known previously to calculate the number of event term cooccur-
rence matrix Xtrain In the training dataset, if there are ‘J’ documents {d1 ,d2 ….dI }and
‘j’ audio events {ae1 ,ae2 ,…aei } and if the audio vocabulary set size is ‘I’, then the
matrix I x J represents Ztrain and the matrix I x j represents Xtrain . Then Ytrain denotes
the document event cooccurrence matrix j x J of a particular document dh and for a
particular event eg . Ytrain will take the form [paeh d g ] j × J. {p d g e h} is the (g,h)th
item of Ytrain , which is the representation of the distribution of document d g on the
event e h.

4.2 Event Term Cooccurrence Matrix

As many audio events occur simultaneously, the event term cooccurrence matrix
Xtrain must be counted with care for various audio documents. We can annotate as
many audio events but not more than 3 for a particular time interval. The audio
frame containing multiple labels has been presented for all audio events with equal
proportions to count the event term cooccurrence matrix in the statistics. For ‘m’
audio events, if we count the event term cooccurrence matrix, the result will be 1/m
while.
Different annotators will produce different results for the same set of audio events
for a given time interval. So we need at least three annotators to annotate the same
set of audio events for a given time interval [17]. Finally, we retain the event labels

Fig. 1 Document event Xtrain(Terms & Words)

cooccurrence matrix
(Training set)
Ztrain(Words &
Documents)
Ytrain(Terms &
Documents)
Audio Scene Classification Based on Topic Modelling and Audio Events … 99

Fig. 2 Document event Xtrain(Terms & Words)

cooccurrence matrix (Test
set)
Ztest(Words &
Documents)
Ytest(Terms &
Documents)

annotated by more than one annotator and omit the rest. Here document-term cooc-
currence matrix Xtest of test case can be calculated by splitting the audio into terms
and matching the frames with the terms. Having the event term cooccurrence matrix
of test and training stages are the same, we derive the document event cooccurrence
matrix Ztrain of the test set which is similar to the training stage. The document event
cooccurrence for the test and training sets are obtained through Latent Semantic Anal-
ysis matrix factorization. Instead of LSA matrix factorization, we can also obtain the
Ztrain by counting the number of occurrences.
Weighing the audio event’s distribution is required for recognizing the influence
of the events. Topic distribution along with its feature set is the input for the classifier.
If the occurrence of the events reflects less topics, then they are less influential. But if
the occurrence of the events reflects few topics, then they are more influential. Using
entropy, we can find the influence of the events as mentioned in [18]. If there are t1
latent topics, then T will give the event topic distribution matrix [ paeh d g ] t1 x j where
h = 1,2, ···, t 1, g = 1,2, ···, j). The event distribution aeh on the topic dg is denoted
by p aeh d g . So the event entropy can be computed by using the vector E = E(aeh ),
where E(aeh ) denotes the value of entropy of the aeh th event can be computed using
the formula.

E(aeh) = −i = 1 paehdlog2( paehd) (1)

If the entropy value is too small the topic is very specific and if the entropy value
is larger the audio event will be common to many topics. So, we choose audio events
with smaller entropy values for classification. Using this entropy value, we calculate
the coefficient to find the influence of an audio event [19]. Vector z represented as
z(aeh ) represents the coefficient of the event aeh where the coefficient should be larger
than or equal to 1. We can design it as.

z(aeh) = (ae) − |E(aeh) − mean(E)|/2variance(E) (2)

z(aeh) ⇐ z(aeh)/min(z) (3)

The document event distribution in Ytrain and Ytest can be found using the coef-
ficient vector z. By reframing the formula for document event distribution, we
get.
paehdg ⇐ z(aeh). paehdg where h = 1,2, …,j and g = 1,2, …,J.
100 J. Sangeetha et al.

4.3 Output Generation

4.3.1 Pre-processing

• The document-term matrix is taken as the input for topic models. The documents
are considered as rows in the matrix and the terms as columns.
• The size of the corpus is equal to the number of rows and the vocabulary size is
the number of columns.
• We should tokenize the document for representing the document with the
frequency of the term like stem words removal, punctuation removal, number
removal, stop word removal, case conversion and omission of low length terms.
• In a group of vocabulary, the index will map the exact document from where the
exact term was found.

Here the distribution of the topics is taken as a feature set to achieve topic
modelling. Ytrain (Fig. 1) and Ytest (Fig. 2) are broken to find the distribution of
topics for training and test documents of audio respectively. We can break Ytrain as
Y1train and Y2train, Ytest can be broken into Y1train and Y2test can be keeping Y1train
fixed. Y2train is a L2 x J matrix if there are L2 latent topics and each of its column
represents the training audio document’s topic distribution. Y2test is a L2 x J test
matrix if there are J test audio inputs and each of its column represents the test audio
document’s topic distribution. We consider this distribution of topics as a feature set
for the audio documents to perform our classification model using SVM. We adopt
a one–one multiclass classification technique in SVM to classify the audio scene
which has been used in many applications [1, 20].

5 Experimental Results

The proposed algorithm is tested by two publicly available dataset IEEE AASP
challenge and DEMAND (Diverse Environments Multi-channel Acoustic Noise
Database) dataset [21]. There are 10 classes such as tube, busy street, office, park,
Quiet Street, restaurant, open market, supermarket and bus. Each class consists of ten
audio files which consists of 30 s long, sampled in 44.1 k Hz and stereo. Diverse Envi-
ronments Multi-channel Acoustic Noise Database (DEMAND) dataset [22] offers
various types of indoor and outdoor settings and eighteen audio scene classes are
there, which includes kitchen, living, field, park, washing, river, hallway, office, cafe-
teria, restaurant, meeting, station, cafe, traffic, car, metro and bus. Each audio class
includes 16 recordings related to 16 channels. For experiments, only the first channel
recording is used and every recording is three hundred seconds long. Then it is sliced
into 10 equal documents, with 30 s long each. As a summary, the dataset DEMAND
contains 18 categories of audio scenes, each category has 10 audio files of 30 s long.
Audio Scene Classification Based on Topic Modelling and Audio Events … 101

Fig. 3 Proposed framework Training Set Test Set

Input Audio
Vocabulary

Topic Idenﬁcaon Using

Document-Term Co-
LSA/LDA
Occurrence Matrix

Classiﬁer

Generate Audio
Output

In this present work, audio documents have been partitioned into 30 ms-long
frames spending 50% overlap of the hamming window; for every frame, 39 dimen-
sional MFCCs features were obtained as the feature set; the distribution of topic is
utilized for characterizing each audio document, which is given as the input for SVM,
after carrying out topic analysis through LSA/LDA. One-to-one strategy is followed
in SVM for multiclass type of classification and the kernel function has been taken
as RBF (Radial Basis Function). The evaluation of algorithms have been done, in
terms of classification accuracy,

The number of correctly classified audio documents

Accuracy = ∗ 100 (4)
Thetotal number of audio documents in the test set
The following Table 1 shows the accuracy obtained for the LDA and LSA methods.
In order to prove the proposed algorithm uses the prescribed format matrix for topic
analysis is found to be more efficient than the existing algorithms which uses the
conventional matrix to analyze the topic.
Based on the above result in Table 1, the algorithm trusts the prescribed matrix
to carry out topic analysis. From the given analysis, we can conclude that, for a
particular audio scene class, Many existing topic model work with ASR algorithms
have approved SVM as a classification model [17–19]. So that in this work also,
SVM is taken as a classification model to perform the classification and the results
are tabled in Table 2.
102 J. Sangeetha et al.

Table 1 Classification
Dataset Topic model Algorithm Accuracy
performance document event
(DE) cooccurrence matrix AASP LSA Document event (DE) & 45.6
and document word (DW) LSA
cooccurrence matrix Document word (DW) & 60.1
LSA
LDA Document event (DE) & 46.9
LDA
Document word (DW) & 52.8
LDA
DEMAND LSA Document event (DE) & 62.1
LSA
Document word (DW) & 81.3
LSA
LDA Document event (DE) & 62.6
LDA
Document word (DW) & 76.5
LDA

Table 2 Performance of
Data Set Topic model Accuracy (%)
SVM on AASP and
DEMAND AASP LSA 61
LDA 55
DEMAND LSA 82
LDA 77

6 Conclusion and Future Enhancement

In this proposed approach, new ASR algorithm which utilizes document event cooc-
currence matrix for topic modelling instead of most widely used document word
cooccurrence matrix. The adopted technique outperforms well than the existing
matrix based topic modelling. To acquire the document event cooccurrence matrix
in more efficient method, this proposed work uses a matrix factorization method.
Even though this work found least results on AASP dataset, at least we have verified
that using the existing matrix for analyzing the topic is much better to go with the
proposed method matrix. As a future enhancement of our work, the deep learning
models can be considered as a reference, and motivated in using the neural network
to encompass in present system, by identifying the merits of topic models and neural
networks, the recognition performance can be improved.
Audio Scene Classification Based on Topic Modelling and Audio Events … 103

References

1. Leng Y, Sun C, Xu X et al (2016) Employing unlabelled data to improve the classification

performance of SVM, and its application in audio event classification Knowl. Based Syst
98:117–129
2. Leng Y, Sun C, Cheng C et al (2015) Classification of overlapped audio events based on AT.
PLSA, and the Combination of them Radio Engineering 24(2):593–603
3. Leng Y, Qi G, Xu X et al (2013) A BIC based initial training set selection algorithm for active
learning and its application in audio detection. Radio Eng 22(2):638–649
4. Choi WH, Kim SI, Keum MS et al (2011) Acoustic and visual signal based context awareness
system for mobile application. IEEE Trans Consum Electron 57(2):738–746
5. Smith DV, Shahriar MS (2013) A context aware sound classifier applied to prawn feed
monitoring and energy disaggregation. Knowl Based Syst 52:21–31
6. Wang JC, Lee HP, Wang JF et al (2008) Robust environmental sound recognition for home
automation. IEEE Trans Autom Sci Eng 5(1):25–31
7. Leng Y, Zhou N, Sun C et al (2017) Audio scene recognition based on audio events and topic
model. Knowl-Based Syst 125:1–12. https://doi.org/10.1016/j.knosys.2017.04.001
8. Hansen JHL, Rongqing H, Bowen Z et al (2005) Speech Find advances in spoken document
retrieval for a National Gallery of the Spoken Word. IEEE Trans Speech Audio Process 13:712–
730
9. Peskin B et al (1996) Improvements in Switchboard recognition and topic identification. Proc
ICASSP-96(I):303-306
10. Mesaros A, Heittola T, Diment A, Elizalde B, Shah A, Vincent E, Raj B, Virtanen T (2017)
DCASE challenge setup: tasks, datasets and baseline system. In: DCASE 2017-workshop on
detection and classification of acoustic scenes and events
11. Hu P, Liu W, Jiang W, Yang Z (2012) Latent topic med on Gaussian-LDA for audio retrieval.
In: Chinese conference on pattern recognition, pp 556–563
12. Peng Y, Lu Z, Xiao J (2009) Semantic concept annotation based on audio PLSA model. In:
Proceedings of the 17th ACM international conference on multimedia, pp 841–844
13. Lee L, Ellis DPW (2010) Audio-based semantic concept classification for consumer video.
IEEE Trans Audio Speech Lang Process 18(6):1406–1416
14. Kim S, Sundaram S, Georgiou P et al (2009) Audio scene understanding using topic models.
In: Proceedings of the neural information processing systems (NIPS) Workshop
15. Kherwa P, Bansal P (2017) Latent semantic analysis: an approach to understand semantic of
text. In: 2017 international conference on current trends in computer, electrical, electronics and
communication (CTCEEC) IEEE. 870–874 (2017)
16. Jelodar H, Wang Y, Yuan C, Feng X, Jiang X, Li Y, Zhao L (2019) Latent Dirichlet allo-
cation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools Appl
78(11):15169–15211
17. Wintrode J, Kulp S (2009) Techniques for rapid and robust topic identification of conversational
telephone speech. In: Tenth annual conference of the international speech communication
association
18. Leng Y, Zhou N, Sun C, Xu X, Yuan Q, Cheng C, Liu Y, Li D (2017) Audio scene recognition
based on audio events and topic model. Knowl-Based Syst 125:1–12
19. Wang KC (2020) Robust audio content classification using hybrid-based SMD and entropy-
based VAD. Entropy 22(2):183
20. Kim S, Sundaram S, Georgiou P, Narayanan S (2009) Audio scene understanding using topic
models. In: Neural Information Processing System (NIPS) Workshop (Applications for Topic
Models: Text and Beyond)
21. Stowell D, Giannoulis D, Benetos E, Lagrange M, Plumbley MD (2015) Detection and
classification of acoustic scenes and events. IEEE Trans Multimedia 17(10):1733–1746
22. Thiemann J, Ito N, Vincent E (2013) The Diverse Environments Multi-channel Acoustic
Noise Database (DEMAND): a database of multichannel environmental noise recordings.
In: Proceedings of meetings on acoustics ICA2013. Acoustical Society of America, vol 19, no.
1, 035081
Diagnosis of Brain Tumor Using Light
Weight Deep Learning Model with Fine
Tuning Approach

Tejas Shelatkar and Urvashi Bansal

1 Introduction

A brain tumor is a cluster of irregular cells that form a group. Growth in this type
of area may cause issues that be cancerous. The pressure inside the skull will rise as
benign or malignant tumors get larger. This will harm the brain and may even result
in death Pereira et al. [1]. This sort of tumor affects 5–10 people per 100,000 in India,
and it’s on the rise [12]. Brain and central nervous system tumors are also the second
most common cancers in children, accounting for about 26% of childhood cancers. In
the last decade, various advancements have been made in the field of computer-aided
diagnosis of brain tumor. These approaches are always available to aid radiologists
who are unsure about the type of tumor or wish to visually analyze it in greater detail.
MRI (Magnetic Resonance Imaging) and CT-Scan (Computed Tomography) are two
methods used by doctors for detecting tumor but MRI is preferred so researchers
are concentrated on MRI. A major task of brain tumor diagnosis is segmentation.
Researchers are focusing on this task using Deep Learning techniques [3]. In medical
imaging, deep learning models have various advantages from the identification of
important parts, pattern recognition in cell parts, feature extraction, and giving better
results for the smaller dataset as well [3]. Transfer learning is a technique in deep
learning where the parameters (weights and biases) of the network are copied from
another network trained on a different dataset. It helps identify generalized features
in our targeted dataset with help of features extracted from the trained dataset. The
new network can now be trained by using the transferred parameters as initialization

T. Shelatkar (B) · U. Bansal

Dr. B.R. Ambedkar National Institute of Technology, Jalandhar, India
e-mail: [email protected]
U. Bansal
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 105
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_10
106 T. Shelatkar and U. Bansal

(this is called fine-tuning), or new layers can be added on top of the network and
only the new layers are trained on the dataset of interest.
Deep learning is a subset of machine learning. It is used to solve complex prob-
lems with large amounts of data using an artificial neural network. The artificial
neural network is a network that mimics the functioning of the brain. The ‘deep’ in
deep learning represents more than one layer network. Here each neuron represents a
function and each connection has its weight. The network is trained using the adjust-
ment of weights which is known as the backpropagation algorithm. Deep learning
has revolutionized the computer vision field with increased accuracy on the complex
data set. Image analysis employs a specific sort of network known as a convolutional
network, which accepts photos as input and convolves them into a picture map using
a kernel. This kernel contains weight that changes after training.
A frequent practice for deep learning models is to use pre-trained parameters
on dataset. The new network can now be trained by using transferred parameters
as initialization (this is called fine-tuning), or new layers can be added on top of
the network and only the new layers are trained on the dataset of interest. Some
advantages of transfer learning are it reduces the data collection process as well it
benefits generalization. It decreases the training duration of a large dataset.

2 Motivation

The motivation behind this research is to build a feasible model in terms of time
and computing power so that small healthcare systems will also benefit from the
advancements in computer-aided brain tumor analysis. The model should be versatile
enough so that it can deal with customized data and provide an acceptable result by
using adequate time.

3 Literature Review

Various deep learning models have been employed for the diagnosis of brain tumor
but very restricted research has been done by using object detection models. Some
of the reviewed papers have been mentioned below.
Pereira and co-authors have used the modern deep learning model of the 3D Unet
model which helps in grading tumor according to the severity of the tumor. It achieves
up to 92% accuracy. It has considered two regions of interest first is the whole brain
and another is the tumors region of interest [1].
Neelum et al. achieve great success in the analysis of the problem as they use pre-
trained models DesNet and Inception-v3 model as which achieves 99% accuracy.
Feature concatenation has helped a great deal in improving the model [4].
Mohammad et al. have applied various machine learning algorithms like decision
tree, support vector machine, convolutional neural network, etc. as well as deep
Diagnosis of Brain Tumor Using Light Weight Deep Learning Model … 107

learning models, i.e., VGG 16, ResNET, Inception, etc. on the limited dataset of 2D
images without using any image processing techniques. The most successful model
was VGG19 which achieved 97.8% of F1 scope on top of the CNN framework. Some
points stated by the author were that there is trade off between the time complexity
and performance of the model. The ML method has lesser complexity and DL is
better in performance. The requirement of a benchmark dataset was also stated by
Majib et al. They have employed two methods FastAi and Yolov5 for the automation
of tumor analysis. But Yolov5 gains only 85% accuracy as compared to the 95% of
FastAI. Here they haven’t employed any transfer learning technique to compensate
for the smaller dataset [18].
A comprehensive study [7] is been provided on brain tumor analysis for small
healthcare facilities. The author has done a survey that listed various challenges in the
techniques. They have also proposed some advice for the betterment of techniques.
Al-masni et al. have used the YOLO model for bone detection. The YOLO method
relieves a whooping 99% accuracy. So here we can see that the YOLO model can
give much superior results in medical imaging [13].
Yale et al. [14] detected Melanoma skin disease using the YOLO network. The
result was promising even though the test was conducted on a smaller dataset. The
Dark Net Framework provided improved performance for the extraction of the fea-
ture. A better understanding of the working of YOLO is still needed.
Kang et al. [21] proposed a hybrid model of machine learning classifiers and deep
features. The ensemble of various DL methods with a classifier like SVM, RBF,
KNN, etc. The ensemble feature has helped the model for higher performance. But
author suggested the model developed is not feasible for real-time medical diagnosis.
Muhammad et al. [18] have studied various deep learning and transfer learning
techniques from 2015–2019. The author has identified challenges for the techniques
to be deployed in the actual world. Apart from the higher accuracy, the researchers
should also focus on other parameters while implementing models. Some concerns
highlighted are the requirement of end-to-end deep learning models, enhancement
in run time, reduced computational cost, adaptability, etc. The author also suggested
integrating modern technologies like edge computing, fog and cloud computing,
federated learning, GAN technique, and the Internet of Things.
As we have discussed various techniques are used in medical imaging and specif-
ically on MRI images of brain tumor. Classification, segmentation, and detection
algorithms were used but each one had its limitation. We can refer to Table 1 for a
better understanding of the literature review.

4 Research Gap

Although classification methods take fewer resources, they are unable to pinpoint
the specific site of a tumor. The segmentation methods which can detect exact loca-
tions take large amounts of resources. The existing models do not work efficiently
on the comparatively smaller dataset for small healthcare facilities. Harder to imple-
108 T. Shelatkar and U. Bansal

Table 1 Literature review

Author and Year Dataset Objective Technique Limitations
Pereira et al. [1] BRats 2017 Automatic tumor 3D Unet Complex
grading computation
Rehman et al. [2] Brain tumor To explore AlexNet, High time
dataset by Cheng fine-tuning GoogleNet, complexity
transfer learning VGGNet
model
Ercan et al. [3] Private dataset Faster Faster-RCNN Need improved
classification performance for
using R-CNN lesser data
models
Saba et al. [23] BRats15-17 Optimize deep UNET + RNN + Complex
learning feature FULL CNN calculation
for classification
Neelum et al. [4] Brain tumor Feature extraction Inception and 2D dataset used
dataset by Cheng using Desnet
concatenation
approach
Montalbo [5] Nanfang hospital Fine-tune Yolo Yolov4 using Precise selection
dataset model using low Transfer learning of tumor
space and
computation
Si-Yuan et al. [18] ATLAS MRI To deploy MobileNet smaller dataset
dataset pre-trained
models for
classification
purposes
Hammami et al. Visceral anatomy To develop a CycleGan and Many outliers
[17] dataset hybrid YOLO detected
multi-organ
detector
Zuzana et al. [13] CT based Distinctive bone YOLO Improvement of
multiple bone creation accuracy
dataset
Jaeyong et al. Hybrid dataset Classify MRI DL method and Larger model size
[13] scans of brain ML classifier
tumor
Mohammad et al. Pathology To apply the VGG-SCNET High-end
[18] institute dataset hybrid model processor
approach required
Nadim et al. [7] Brats18 To build fast deep Yolov5 and Low accuracy
learning models FastAI
for brain tumor
classification
Futrega et al. [8] Brats21 Experiment Optimized Unet Computational
various Unet with heavy
architecture
modification
Diagnosis of Brain Tumor Using Light Weight Deep Learning Model … 109

ment models by healthcare facilities with limited resources and custom data created.
Human intervention is needed for feature extraction and preprocessing of the dataset.

5 Our Contribution

– To deploy a light weight model using the fine-tuning approach of pre-trained

models.
– To create a model which can also be used on the smaller dataset by small healthcare
facilities.

6 Characteristics improved using our Brain Tumor

Analysis Model

6.1 Light Weight

Our model needs to consume less storage and computing resources. As we are keen
on designing a model which can be used by smaller healthcare facilities. So model
size must be smaller as well as it should be occupied lesser storage.

6.2 Reliability

The radiologist must beware of the false positive as they can’t directly rely on the
analysis as it may not be completely precise and the system should be only used by
the proper radiologists as our system can’t completely replace the doctors.

6.3 Time Efficiency

The system for brain tumor diagnosis must consume lesser time to be implemented
in the real world. The time complexity must be feasible even without the availability
of higher end systems at healthcare facilities.
110 T. Shelatkar and U. Bansal

7 Dataset

Various datasets are available for brain tumor analysis from 2D to 3D data. Since we
are focusing on the MRI data set it includes high-grade glioma, low-grade glioma,
etc. The images can be of 2D or 3D nature. The types of MRI are mostly of T1-
weighted scans. Some datasets are (a) multigrade brain tumor data set, (b) brain
tumor public data set, (c) cancer imaging archive, (d) brats, and (e) internet brain
segmentation repository.
Brats 2020 is an updated version of the brats dataset. The Brats dataset has been
used in organizing events from 2012 to up till now, they encourage participants to
research their own collected dataset. The Brats 2017–2019 varies largely from all
the previous versions. The Brats 20 is an upgraded version of this series. Figure 1
displays our selected dataset.

8 Deep Learning Based Brain Tumor Diagnosis Using

Yolov5

8.1 Yolov5

Object detection technique accomplishes two objectives (a) Localization-object loca-

tion in the image (b) Classification-identifies the object in the image. Yolo is an object
detection model. Yolo means you only look once since it is the single stage detector.
Since it is a single stage, it is very fast and due to its accuracy, it is a state of art detec-
tor. It has currently 5 versions with its initial launch in 2016 of yolov1 by Redmon.
Yolo has many variants like tiny, small, multi-scale, and different backbone which has
different feature extractors and backbone. It can train on different platforms Dark-
net, Pytorch, and Tensorflow. YOLOv5 is an object detection algorithm designed by
Ultralytics which is well known for its AI study, combining lessons learned and best
practices gleaned from millions of hours of work they are can develop Yolov5.

Fig. 1 Brats 2020 dataset

Diagnosis of Brain Tumor Using Light Weight Deep Learning Model … 111

The three major architectural blocks of the YOLO family of models are the back-
bone, neck, and head. The backbone of YOLOv5 is CSPDarknet, which is used
to extract features from photos made up of cross-stage partial networks. YOLOv5
Neck generates a feature pyramids network using PANet to do feature aggregation
and passes it to Head for prediction. YOLOv5 Head has layers that provide object
detection predictions from anchor boxes. Yolov5 is built on the Pytorch platform
which is different from previous versions which are built on DarkNet. Due to this,
there are various advantages like it has fewer dependencies and it doesn’t need to be
built from the source.

9 Proposed Model

As mentioned above we are going to use the state-of-the-art model Yolov5. The
pre-trained weights are taken from COCO (Microsoft Common Objects in Context)
dataset. Fine-tuning is done using these parameters. The model is trained using the
BRat 2020 dataset. The model is fed with 3D scans of patients. Once the model is
trained we input the test image to get information about the tumor. The new network
can now be trained by using the transferred parameters as initialization (this is called
fine-tuning), or new layers can be added on top of the network and only the new
layers are trained on the dataset of interest. Some advantages of transfer learning are
it reduces the data collection process as well as benefits generalization. It decreases
the training duration of a large dataset.
Some preprocessing is needed before we train the model using the YOLO model,
the area of the tumor must be marked by the box region. This can be done using
the tool which creates a bounding box around the object of interest in an image.
For Transfer learning we can use the NVIDIA transfer learning toolkit, we can feed
the COCO dataset as it also supports the YOLO architecture. This fine-tunes our
model and makes up an insufficient or unlabeled dataset. Afterward we can train our
BRats dataset on our model. The environment used for development is Google Colab
which gives 100 Gb storage, 12 GB Ram, and GPU support. The yolov5 authors have
made available their training results on the COCO dataset to download and use their
pre-trained parameters for our own model. For applying the yolov5 algorithm on our
model we need a labeled dataset for training which is present in the brats dataset.
Since we need to train it for better results on BRats dataset we will freeze some layers
and add our own layer on top of the YOLO model for better results. Since we need a
model which takes lesser space we will use the YOLOv5n model. As mentioned in
the official repository YOLOv5 model provides us a mean average precision score of
72.4 with a speed of 3136 ms on the COCO dataset [25]. The main advantage of this
model is smaller and easier to use in production and it is 88 percent smaller than the
previous YOLO model [26]. This model is able to process images at 140 FPS. The
pre-trained weights are taken from COCO (Microsoft Common Objects in Context)
dataset. Fine-tuning is done using these parameters. The model is trained using the
BRats 2020 dataset. Here specifically we are going to use the yolov5 nano model
112 T. Shelatkar and U. Bansal

Fig. 2 Fine-tuned yolov5 model

since it has smaller architecture than the other models as our main priority is the size
of the model. The YOLO model has a much lower 1.9 M params as compared to the
other models. Our model needs a certain configuration to be able to perform on brain
scans. Since the scanned data of Brats is complex, we perform various preprocessing
on the data from resizing to masking. Since the image data is stored in nii format
with different types of scans like FLAIR, T1, T2, it is important to process the dataset
according to the familiarity of our model. The model is fed with scans of patients.
For evaluating the results of our model we use the dice score jaccard score and map
value but our main focus is on the speed of the model to increase the usability of
the model. For training and testing the dataset is already partitioned for Brats. Our
dataset contains almost 360 patient scans for training and 122 scans for patient scans
for testing. The flow of our model is mentioned in Fig. 2. The yolov5 models provide
with their yml file for our custom configuration so we can test the network according
to our own provision. Since we have only 3 classes we will configure it into three.
As well as many convolution layers must be given our parameters in the backbone or
head of our model. Once the model is trained we can input the test image dataset on
our models. The expected result of the model must be close to a dice score of 0.85
which compares to the segmentation models. This model takes up lesser storage and
better speed in processing of brats dataset as compared to the previous models.
Diagnosis of Brain Tumor Using Light Weight Deep Learning Model … 113

10 Conclusion

Various models and toolkits for brain tumor analysis have been developed in the
past which has given us promising results but the viability of the model in terms
of real-time application has been not considered. Here we present a deep learning-
based method for brain tumor identification and classification using YOLOv5 in
this research. These models are crucial in the development of a lightweight brain
tumor detection system. A model like this with lesser computational requirements
and relatively reduced storage will provide a feasible solution to be considered by
various healthcare facilities.

References

1. Pereira S et al (2018) Automatic brain tumor grading from MRI data using convolutional
neural networks and quality assessment. In: Understanding and interpreting machine learning
in medical image computing applications. Springer, Cham, pp 106–114
2. Rehman A et al (2020) A deep learning-based framework for automatic brain tumors classifi-
cation using transfer learning. Circuits Syst Signal Process 39(2): 757–775
3. Salçin Kerem (2019) Detection and classification of brain tumours from MRI images using
faster R-CNN. Tehnički glasnik 13(4):337–342
4. Noreen N et al (2020) A deep learning model based on concatenation approach for the diagnosis
of brain tumor. IEEE Access 8: 55135–55144
5. Montalbo FJP (2020) A computer-aided diagnosis of brain tumors using a fine-tuned YOLO-
based model with transfer learning. KSII Trans Internet Inf Syst 14(12)
6. Dipu NM, Shohan SA, Salam KMA (2021) Deep learning based brain tumor detection and
classification. In: 2021 international conference on intelligent technologies (CONIT). IEEE
7. Futrega M et al (2021) Optimized U-net for brain tumor segmentation. arXiv:2110.03352
8. Khan P et al (2021) Machine learning and deep learning approaches for brain disease diagnosis:
principles and recent advances. IEEE Access 9:37622–37655
9. Khan P, Machine learning and deep learning approaches for brain disease diagnosis: principles
and recent advances
10. Amin J et al (2021) Brain tumor detection and classification using machine learning: a com-
prehensive survey. Complex Intell Syst 1–23
11. https://www.ncbi.nlm.nih.gov/
12. Krawczyk Z, Starzyński j (2020) YOLO and morphing-based method for 3D individualised
bone model creation. In: 2020 international joint conference on neural networks (IJCNN).
IEEE
13. Al-masni MA et al (2017) Detection and classification of the breast abnormalities in digital
mammograms via regional convolutional neural network. In: 2017 39th annual international
conference of the IEEE engineering in medicine and biology society (EMBC). IEEE
14. Nie Y et al (2019) Automatic detection of melanoma with yolo deep convolutional neural
networks. In: 2019 E-health and bioengineering conference (EHB). IEEE
15. Krawczyk Z, Starzyński J (2018) Bones detection in the pelvic area on the basis of YOLO neural
network. In: 19th international conference computational problems of electrical engineering.
IEEE
16. https://blog.roboflow.com/yolov5-v6-0-is-here/
17. Hammami M, Friboulet D, Kechichian R (2020) Cycle GAN-based data augmentation for
multi-organ detection in CT images via Yolo. In: 2020 IEEE international conference on image
processing (ICIP). IEEE
114 T. Shelatkar and U. Bansal

18. Majib MS et al (2021) VGG-SCNet: A VGG net-based deep learning framework for brain
tumor detection on MRI images. IEEE Access 9:116942–116952
19. Muhammad K et al (2020) Deep learning for multigrade brain tumor classification in smart
healthcare systems: a prospective survey. IEEE Trans Neural Netw Learn Syst 32(2): 507–522
20. Baid U et al (2021) The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation
and radiogenomic classification. arXiv:2107.02314
21. Kang J, Ullah Z, Gwak J (2021) MRI-based brain tumor classification using ensemble of deep
features and machine learning classifiers. Sensors 21(6):2222
22. Lu S-Y, Wang S-H, Zhang Y-D (2020) A classification method for brain MRI via MobileNet
and feedforward network with random weights. Pattern Recognit Lett 140:252–260
23. Saba T et al (2020) Brain tumor detection using fusion of hand crafted and deep learning
features. Cogn Syst Res 59:221–230
24. Menze BH et al (2015) The multimodal brain tumor image segmentation benchmark (BRATS).
IEEE Trans Med Imaging 34(10):1993–2024. https://doi.org/10.1109/TMI.2014.2377694
25. https://github.com/ultralytics/yolov5/
26. https://models.roboflow.com/object-detection/yolov5
Comparative Study of Loss Functions
for Imbalanced Dataset of Online
Reviews

Parth Vyas, Manish Sharma, Akhtar Rasool , and Aditya Dubey

1 Introduction

Please Google Play serves as an authentic application database or store for authorized
devices running on the Android operating system. The application allows the users
to look at different applications developed using the Android Software Development
Kit (SDK) and download them. As the name itself indicates, the digital distribution
service has been developed, released, and maintained through Google [1]. It is the
largest app store globally, with over 82 billion app downloads and over 3.5 million
published apps. The Google Play Store is one of the most widely used digital distri-
bution services globally and has many apps and users. For this reason, there is a lot
of data about app and user ratings. In Google Play shop Console, you may get a
top-level view of various users’ rankings on an application, your app’s rankings, and
precis facts approximately your app’s rankings. An application can be ranked and
evaluated on Google Play in the form of stars and reviews by the users. Users can
rate the app only once, but these ratings and reviews can be updated at any time. The
play store can also see the top reviews of certified users and their ratings [2]. These
user ratings help many other users analyze your app’s performance before using it.
Different developers from different companies also take their suggestions for further
product development seriously and help them improve their software.
Leaving an app rating is helpful to users and developers and the Google Play
Store itself [3]. The goal of the Play Store as an app platform is to quickly display
accurate and personalized results and maintain spam when searching for the app
you need. Launch the app. This requires information about the performance of the
app displayed through user ratings [4]. A 4.5-star rating app may be safer and more

P. Vyas · M. Sharma · A. Rasool · A. Dubey (B)

Department of Computer Science and Engineering, Maulana Azad National Institute of
Technology, Bhopal, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 115
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_11
116 P. Vyas et al.

relevant than a 2-star app in the same genre. This information helps Google algorithms
classify and download apps in the Play Store and provide high-quality results for a
great experience [5]. The cheaper the app’s ratings and reviews, the more people will
download and use the Play Store services.
Natural language processing (NLP) has gained immense momentum in previous
years, and this paper covers one such sub-topic of NLP: sentiment analysis. Sentiment
analysis refers to the classification of sentiment present in a sentence, paragraph, or
manuscript based on a trained dataset [6]. Sentiment analysis has been done through
trivial machine learning algorithms such as k-nearest neighbors (KNN) or support
vector machine (SVM) [7, 8]. However, for more optimization, for the search of
this problem, the model selected for the sentiment analysis on Google Play reviews
was the Bidirectional Encoder Representations from Transformers (BERT) model,
a transfer learning model [9]. The BERT model is a pre-trained transformer-based
model which tries to learn through the words and predicts the sentiment conveyed
through the word in a sentence.
For this paper, selected models of BERT from Google BERT were implemented
on textual data of the Google Play reviews dataset, which performed better than the
deep neural networking models. This paper implemented the Google BERT model for
sentiment analysis and loss function evaluation. After selecting the training model,
the loss function to be evaluated was studied, namely the cross-entropy loss and focal
loss. After testing the model with various loss functions, the f1 score was calculated
with the Google Play reviews dataset, and the best out of the loss functions which
could be used for sentiment analysis of an imbalanced dataset will be concluded [10].

2 Literature Review

With the increasing demand for balance and equality, there is also an increasing
imbalance in the datasets on which NLP tasks are performed nowadays [6]. If a
correct or optimized loss function is not used with these imbalance datasets, the
result may appropriate the errors due to these loss functions. For this reason, many
research papers have been studied extensively. At last, the conclusion was to compare
the five loss functions and find which will be the best-optimized loss function for
sentiment analysis of imbalanced datasets. This segment provides a literature review
of the results achieved in this field.
For the comparison of loss functions first need was for an imbalanced dataset.
Therefore, from the various datasets available, it was decided to construct the dataset
on Google Play apps reviews manually and then modify the constructed dataset to
create an imbalance [11]. The dataset was chosen as in the past and has been studied
on deep learning sentiment analysis on Google Play reviews of customers in Chinese
[12]. The paper proposes the various models of long short-term memory (LSTM),
SVM, and Naïve Bayes approach for sentiment analysis [7, 13–15]. However, the
dataset has to be prepared to compare the cross-entropy loss and the focal loss
function. Focal loss is a modified loss function based on cross-entropy loss which
Comparative Study of Loss Functions for Imbalanced Dataset of Online Reviews 117

is frequently used in imbalanced datasets. Thus, both the losses will be compared
to check which loss will perform better for normal and both imbalanced datasets.
Multimodal Sentiment Analysis of #MeToo Tweets using Focal Loss proposes the
Roberta model of BERT, which is a robust BERT model which does not account
for the bias of data while classification and for further reduction of errors due to
misclassification of imbalance of dataset they have used the focal loss function [16].
After finalizing the dataset, the next topic of discussion is the model to be trained
on this dataset. The research started with trivial machine learning models based on
KNN and SVM [7, 8]. Sentiment Analysis Using SVM suggests the SVM model
for sentiment analysis of pang corpus, which is a 4000-movie review dataset, and
the Taboada corpus, which is a 400-website opinion dataset [7, 17, 18]. In sentiment
Analysis of Law Enforcement Performance Using an SVM and KNN, the KNN
model has been trained on the law enforcement of the trial of Jessica Kumala Wongso.
The result of the paper shows that the SVM model is better than the KNN model
[8]. But these machine learning algorithms like KNN and SVM are better only for
a small dataset with few outliers; however, these algorithms cease to perform better
for the dataset with such a large imbalance and large dataset. For training of larger
dataset with high imbalance, the model was changed to the LSTM model.
The LSTM model is based on the recurrent neural network model units, which
can train a large set of data and classify them efficiently [7, 15]. An LSTM model can
efficiently deal with the exploding and vanishing gradient problem [19]. However,
since the LSTM model has to be trained for classification, there are no pre-trained
LSTM models. In the LSTM model, the model is trained sequentially from left to
right and simultaneously from right to left, also in the case of the bi-directional LSTM
model. Thus, LSTM model predicts the sentiment of a token based on its predecessor
or successor and not based on the contextual meaning of the token. So, in search of
a model which can avoid these problems, transfer learning model BERT was finally
selected for data classification.
Bidirectional encoder representations from transformer abbreviated as BERT are
a combination of encoder blocks of transformer which are at last connected to the
classification layer for classification [20–22]. The BERT model is based on the trans-
formers, which are mainly used for end-to-end speech translation by creating embed-
dings of sentences in one language and then using the same embeddings by the
decoder to change them in a different language [20, 23]. These models are known as
transfer learning because these models are pre-trained on a language such as BERT is
trained on the Wikipedia English library, which is then just needed to be fine-tuned,
and then the model is good to go for training and testing [21, 22]. Comparing BERT
and the trivial machine learning algorithms in comparing BERT against traditional
machine learning text classification, the BERT performed far better than the other
algorithms in NLP tasks [24]. Similarly, with the comparison of the BERT model
with the LSTM model in A Comparison of LSTM and BERT for Small Corpus, it
was seen that the BERT model performed with better accuracy in the case of a small
training dataset. In contrast, LSTM performed better when the training dataset was
increased above 80 percent [25]. And also, Bidirectional LSTM is trained both from
left-to-right to predict the next word, and right-to-left to predict the previous work.
118 P. Vyas et al.

But, in BERT, the model is made to learn from words in all positions, meaning the
entire sentence and in this paper using Bidirectional LSTM made the model high
overfit. At last BERT, the model was finalized for the model’s training and checking
the performance. In the case of the BERT model still, there is two most famous model,
first is the Google BERT model, and the other is Facebook ai research BERT model
“Roberta” [26]. In comparison, the Roberta model outperforms the BERT model by
Google on the general language understanding evaluation benchmark because of its
enhanced training methodology [27].
After finalizing the training model, the loss functions to be compared for imbal-
anced dataset evaluation were then studied through previous research. Each of the
five loss functions has been described in further sections.

3 Loss Functions

3.1 Cross-Entropy Loss

Cross-Entropy loss is one of the most important cost functions. Cross-entropy is

based on entropy and is a measure from the field of information theory that gener-
ally calculates the difference between two probability distributions [28, 30]. This
is closely related to KL divergence, which calculates the relative entropy between
two probability distributions, but it is different, while it can be thought of as cross-
entropy, which calculates the total entropy between distributions. Cross-entropy is
also associated with logistic loss and is often confused with log loss [29]. The two
measures are taken from different sources, but both estimates calculate the same
amount when used as a loss function in a classification model. They can be used
interchangeably—used to optimize the classification model. When tackling machine
learning or deep learning problems, use the loss/cost function to optimize your model
during training. The goal is, in most cases, to minimize the loss function. The less
loss, the better the model. The formula for cross-entropy loss for binary classes can
be defined as

CE(binary) = −(ylog( p) + (1 − y)log(1 − p)) (1)

where y is the binary indicator in 0 or 1, and p is the predicted probability of class

being present in the instance or not. However, multiple classes for sentiment analysis
have been used in this model. Thus, the formula for cross-entropy for multiple classes
can be given as per Eq. (2).

M
CE = − y ∗ log( p) (2)
c=1
Comparative Study of Loss Functions for Imbalanced Dataset of Online Reviews 119

where M refers to the total number of classes and the loss is calculated by summation
of all the losses calculated for each class separately.

3.2 Focal Loss

To address the case of classification and object detection, a high imbalance focal
loss was introduced [16, 30]. Starting with the cross-entropy loss to incur the high
imbalance in any dataset, an adjusting factor is added to the cross-entropy loss, α for
class 1 and 1−α for class 0. Even since the α can differentiate between positive and
negative examples, it is still unable to differentiate between easy and hard examples.
The hard examples are related to the examples of the classification of the minority
class. Thus, in the loss function, instead of α, a modulating factor is introduced in the
cross-entropy loss to reshape the loss function to focus on hard negatives and down
weight the easy examples. The modulating factor (1 − pt)γ contains the tunable
factor γ ≥ 0, which changes to the standard cross-entropy loss when equated to zero.
The focal loss equation is

FL(pt) = −(1 − pt)γ log(pt) (3)

In the above equation, pt is equal to if y = 1 and pt = 1 − p in all other cases and

y refers to the ground-truth value and the estimated probability for a class with y =
1, in Eq. (3), it can be derived that the focal loss has two main properties. The first
property is that if the instance is misclassified, i.e., pt value is very low, the value of
the complete loss remains unchanged as 1 − pt is approximately equal to 1. However,
when the value of pt is very high or equal to one, the value of loss becomes very low,
which in turn leads to giving more focus to misclassified instances.

4 Dataset

The dataset used in this manuscript is on the Google Play reviews dataset, which has
been scrapped manually using the Google Play scrapper library based on NodeJS
[11]. The data was scraped from Google Play based on the productivity category of
Google Play. Various apps were picked up from the productivity category using the
app on any website, and the info of the app was kept in a separate excel file which was
then used for scrapping out the reviews of each app contained in the excel file. Finally,
the data was scraped out, which contained the user’s name; the user reviews the stars
the user has given to the app, the user image, and other pieces of information that
are not needed in the model training. The total number of user reviews was 15,746
reviews, out of which 50,674 reviews were of stars of 4 and above, 5042 reviews
were of stars of 3, and 5030 reviews were for stars two and below. The model training
was done using the reviews of the users and the stars which have been given to an
120 P. Vyas et al.

app. Since the range of the stars given to an app was from 1 to 5, the range was
to be normalized into three classes: negative, neutral, and positive. Therefore, the
reviews containing stars from 1 to 2 were classified in the negative class. Reviews
with three stars were classified in neutral class, and the reviews which contained stars
from 4 and above were classified in the positive class. The text blob positive review
can be seen in Fig. 1. Which shows the most relevant words related to the positive
sentiment in a review. More is the relevancy of a word in review greater will be the
size of the token. Since the comparison was done between the two-loss function on
an imbalanced dataset, the data percentage for the classification was calculated, as
shown in Table 1. As per Table 1, the dataset so formed was a balanced one. However,
the imbalance was created to compare the two-loss functions for imbalance on the
dataset. The number of neutral reviews decreased up to 20 percent for one iteration to
form one dataset. Similarly, in the balanced dataset, neutral reviews were decreased
up to 40 percent for the next iteration to create a second set of imbalanced datasets.
The classification percentage for both datasets can be seen in the Table. Finally, these
three datasets, namely, the dataset with balanced classifications, the dataset with 20
percent fewer neutral reviews, and finally the dataset with 40 percent fewer neutral
datasets were used for training, and finally, a comparison of cross-entropy and focal
loss was done on these datasets to see the difference in accuracy on using a weighted
loss function on imbalanced datasets.

Fig. 1 Text blob for positive

reviews

Table 1 Data percentage for different reviews in all the different datasets
Datasets Positive review Negative review Neutral review
percentage percentage percentage
Reviews without any 36 31 33
imbalance
Reviews with 20 percent 38 34 28
less neutral reviews
Reviews with 40 percent 42 36 22
less neutral reviews
Comparative Study of Loss Functions for Imbalanced Dataset of Online Reviews 121

5 Methodology

The basic model structure is deployed as given in Fig. 2. Firstly, the Roberta model
was trained on all the three datasets of Google Play reviews. After the model training,
the model is tested with both the loss functions for accuracy and f1 score, calculated
to compare the loss functions [16, 31]. The following steps are executed in the model
for data processing and evaluation.
• Data Pre-Processing
– Class Normalization—As per the previous section first the class normalization
that is the stars in reviews will be normalized to positive, neutral, and negative
classes.
– Data Cleaning—In this phase, all characters of non-alphabet characters are
removed. For example, Twitter hashtags like #Googleplayreview will be

Fig. 2 Model architecture

122 P. Vyas et al.

removed as every review will be containing such hashtags, thus it will lead
to errors in classification.
– Tokenization—In this step, a sentence is split into each word that composes it.
In this manuscript, BERT tokenizer is used for the tokenization of reviews.
– Stop words Removal—All the irrelevant general tokens like of, and our which
are generalized and present in each sentence are removed.
– Lemmatization—Complex words or different words having same root word
are changed to the root word for greater accuracy and easy classification.
• The tokens present in the dataset were transformed into BERT embeddings of 768
dimensions which are converted by BERT model implicitly. Also, the advantage
of BERT vectorizer over other vectorization methods like Word2Vec method is
that BERT produces word representations as per the dynamic information of the
words present around the token. After embeddings creation, the BERT training
model was selected.
• There are two models of BERT available for BERT training one is “BERT
uncased,” and the other is “BERT case.” In the case of the BERT uncased model,
the input words are lowercased before workpiece training. Thus, the model does
not remain case sensitive [21]. On the other hand, in the case of the BERT cased
model, the model does not lowercase the input; thus, both the upper case and lower
case of a particular word will be trained differently, thus making the process more
time-consuming and complex. Therefore, in this paper, BERT uncased model is
used.
• After training of the BERT model on first, the actual pre-processed Google Play
reviews dataset, the testing data accuracy has been calculated with both the loss
functions separately. Then the accuracy and f1 score has been calculated for each
loss function for each column of the dataset.
• After calculating the f1-score score for Google Play reviews, the training and
testing step is repeated for the dataset with 20 percent fewer neutral reviews and
40 percent less neutral dataset.

Lastly, Tables 2, 3, and 4 are plotted to ease the results and comparison, shown in
the next section.

Table 2 Performance metrics for both focal loss and cross-entropy loss for balanced dataset
Performance metrics (focal loss) Performance metrics (cross-entropy
loss)
Precision Recall F1-Score Support Precision Recall F1-Score Support
Negative 0.80 0.73 0.76 245 0.88 0.84 0.86 245
Neutral 0.69 0.70 0.69 254 0.79 0.80 0.80 254
Positive 0.83 0.88 0.85 289 0.89 0.91 0.90 289
Accuracy 0.77 788 0.86 788
Comparative Study of Loss Functions for Imbalanced Dataset of Online Reviews 123

Table 3 Performance metrics for both focal loss and cross-entropy loss for 20 percent fewer neutral
classes of the dataset
Performance metrics (focal loss) Performance metrics (cross-entropy
loss)
Precision Recall F1-Score Support Precision Recall F1-Score Support
Negative 0.77 0.81 0.79 245 0.87 0.84 0.86 245
Neutral 0.67 0.65 0.66 220 0.69 0.77 0.73 220
Positive 0.84 0.81 0.83 269 0.87 0.82 0.85 269
Accuracy 0.77 734 0.81 734

Table 4 Performance metrics for both focal loss and cross-entropy loss for 40 percent fewer neutral
classes of the dataset
Performance metrics (focal loss) Performance metrics (cross-entropy
loss)
Precision Recall F1-Score Support Precision Recall F1-Score Support
Negative 0.82 0.87 0.84 243 0.91 0.91 0.91 243
Neutral 0.70 0.62 0.66 152 0.77 0.79 0.78 152
Positive 0.87 0.87 0.87 289 0.91 0.91 0.91 289
Accuracy 0.76 684 0.88 684

6 Training and Classification

The datasets trained on the BERT base uncased model is used for training and clas-
sification of the model. The datasets were split into train and test with 10 percent
data for testing with a random seed. The value for gamma and alpha used in focal
loss functions has been fixed at gamma = 2 and alpha value = 0.8. The epochs used
for the training model in the case of BERT are fixed to three for all three datasets.
Lastly, the f1 score was calculated for all the classes individually. Then the accuracy
of the model was calculated using the f1 score itself, where the f1 score is defined
as the harmonic mean of precision and recall of the evaluated model [10]. Further
support and precision, and recall have been calculated for each of the classes, and
then the overall support has been calculated for the model in each of the dataset cases
[32, 33].

7 Results

In this paper, the model is trained at three epochs for three categories of data as
follows:
124 P. Vyas et al.

• Google Play reviews dataset with balanced classes

• Google Play reviews dataset with 20 percent or less neutral classes
• Google Play reviews dataset with 40 percent or less neutral classes
Tabulated data for performance metrics for focal loss and cross-entropy loss for
each class of positive, negative, and neutral and for the overall model has been
tabulated for all three categories of the dataset in Tables 2, 3, and 4, respectively.
The accuracy derived from training the dataset on BERT is between 0.75 and 0.78
for focal loss. On the other hand, for cross-entropy loss the accuracy is between 0.80
and 0.90. Low accuracy of training model is because the token embeddings created
by BERT inbuilt tokenizer do not assign weights to the tokens as per their relevancy
which causes errors in classification. The higher accuracy in case of cross-entropy
loss vis-à-vis focal loss is attributed to the summation of different errors which are
still present in the classification. As per the focal loss formula the hyperparameter γ
leads to decrease in many errors which may be required for classification, thus cross-
entropy loss performs better than focal loss. In the case of Table 2 of the dataset of
balanced classes, it can be concluded that accuracy for cross-entropy loss is more than
that in the case of focal loss. However, the focal loss has been a modified version
of cross-entropy loss; still, the cross-entropy loss performs better in the balanced
dataset. In Table 3, where the neutral classes have decreased by 20 percent to create
a little imbalance, the cross-entropy loss has performed better than focal loss. A
similar trend is visible in Table 4, where the cross-entropy loss has performed better
than the focal loss for individual classes and the overall model. This confirms that
although the focal loss has been a modified version of the cross-entropy loss, in the
case of slightly imbalanced data still the cross-entropy loss outperforms the focal
loss. Also, the focal loss is more focused on solving the imbalance problem in binary
classification problems only. Focal loss is not suitable in the case of datasets where
the classes are more than two classes.

8 Conclusion

As the reach of technology grows along with it grows the number of users on different
apps to meet the benefits are providing and solving problems for them and making
their life easier. As the users use other apps, they tend to give their reviews on the
Google Play Store about how the app helped them or if they faced any problem with
the app. Most developers take note of the reviews to fix any bugs on applications
and try to improve the application more efficiently. Many times, reviews of different
apps on the Google Play Store help other users use different apps. Good reviews on
any application tend to grow faster. Similarly, other people’s reviews can help you
navigate your app better and reassure your downloads are safe and problem-free.
And also, if the number of positive reviews is way more than the negative reviews,
then negative reviews may get overshadowed, and the developer may not take note
of the bugs. The loss function that showed better results is the cross-entropy loss
Comparative Study of Loss Functions for Imbalanced Dataset of Online Reviews 125

function over the focal loss function. Focal loss doesn’t differentiate on multiclass
as cross-entropy loss is able to classify. Although focal loss is a modification of
cross-entropy loss function, it is able to outperform only when the imbalance is
high. In slight imbalanced data, the focal loss function ignores many loss values
due to the modulating factor. In the future, more experiments will be conducted on
different datasets to make conclusion that a particular loss function performs well
with a particular model. Also, model can be further upgraded by comparing other loss
functions for imbalanced data’s most reliable loss function. Lastly, the upgradation
on focal loss has to be done mathematically so that the loss function can perform
well even in a multiclass dataset and slightly imbalanced dataset.

References

1. Malavolta I, Ruberto S, Soru T, Terragni V (2015) Hybrid mobile apps in the google play
store: an exploratory investigation. In: 2nd ACM international conference on mobile software
engineering and systems, pp. 56–59
2. Viennot N, Garcia E, Nieh J (2014) A measurement study of google play. ACM SIGMETRICS
Perform Eval Rev 42(1), 221–233
3. McIlroy S, Shang W, Ali N, Hassan AE (2017) Is it worth responding to reviews? Studying
the top free apps in Google Play. IEEE Softw 34(3):64–71
4. Shashank S, Naidu B (2020) Google play store apps—data analysis and ratings prediction. Int
Res J Eng Technol (IRJET) 7:265–274
5. Arxiv A Longitudinal study of Google Play page, https://arxiv.org/abs/1802.02996, Accessed
21 Dec 2021
6. Patil HP, Atique M (2015) Sentiment analysis for social media: a survey. In: 2nd international
conference on information science and security (ICISS), pp. 1–4
7. Zainuddin N, Selamat, A.: Sentiment analysis using support vector machine. In: International
conference on computer, communications, and control technology (I4CT) 2014, pp. 333–337
8. Dubey A, Rasool A (2021) Efficient technique of microarray missing data imputation using
clustering and weighted nearest neighbor. Sci Rep 11(1)
9. Li X, Wang X, Liu H (2021) Research on fine-tuning strategy of sentiment analysis model based
on BERT. In: International conference on communications, information system and computer
engineering (CISCE), pp. 798–802
10. Mohammadian S, Karsaz A, Roshan YM (2017) A comparative analysis of classification algo-
rithms in diabetic retinopathy screening. In: 7th international conference on computer and
knowledge engineering (ICCKE) 2017, pp. 84–89
11. Latif R, Talha Abdullah M, Aslam Shah SU, Farhan M, Ijaz F, Karim A (2019) Data scraping
from Google Play Store and visualization of its content for analytics. In: 2nd international
conference on computing, mathematics and engineering technologies (iCoMET) 2019, pp. 1–8
12. Day M, Lin Y (2017) Deep learning for sentiment analysis on Google Play consumer review.
IEEE Int Conf Inf Reuse Integr (IRI) 2017:382–388
13. Abdul Khalid KA, Leong TJ, Mohamed K (2016) Review on thermionic energy converters.
IEEE Trans Electron Devices 63(6):2231–2241
14. Regulin D, Aicher T, Vogel-Heuser B (2016) Improving transferability between different engi-
neering stages in the development of automated material flow modules. IEEE Trans Autom Sci
Eng 13(4):1422–1432
15. Li D, Qian J (2016) Text sentiment analysis based on long short-term memory. In: First IEEE
international conference on computer communication and the internet (ICCCI) 2016, pp. 471–
475 (2016)
126 P. Vyas et al.

16. Lin T, Goyal P, Girshick R, He K, Dollár P (2020) Focal loss for dense object detection. IEEE
Trans Pattern Anal Mach Intell 42(2):318–327
17. Arxiv A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based
on Minimum Cuts, https://arxiv.org/abs/cs/0409058, Accessed 21 Dec 2021
18. Sfu Webpage Methods for Creating Semantic Orientation Dictionaries, https://www.sfu.ca/
~mtaboada/docs/publications/Taboada_et_al_LREC_2006.pdf, Accessed 21 Dec 2021
19. Sudhir P, Suresh VD (2021) Comparative study of various approaches, applications and
classifiers for sentiment analysis. Glob TransitS Proc 2(2):205–211
20. Gillioz A, Casas J, Mugellini E, Khaled OA (2020) Overview of the transformer-based models
for NLP tasks. In: 15th conference on computer science and information systems (FedCSIS)
2020, pp. 179–183
21. Zhou Y, Li M (2020) Online course quality evaluation based on BERT. In: 2020 International
conference on communications, information system and computer engineering (CISCE) 2020,
pp. 255–258
22. Truong TL, Le HL, Le-Dang TP (2020) Sentiment analysis implementing BERT-based pre-
trained language model for Vietnamese. In: 7th NAFOSTED conference on information and
computer science (NICS) 2020, pp. 362–367 (2020)
23. Kano T, Sakti S, Nakamura S (2021) Transformer-based direct speech-to-speech translation
with transcoder. IEEE spoken language technology workshop (SLT) 2021, pp. 958–965
24. Arxiv Comparing BERT against traditional machine learning text classification, https://arxiv.
org/abs/2005.13012, Accessed 21 Dec 2021
25. Arxiv A Comparison of LSTM and BERT for Small Corpus, https://arxiv.org/abs/2009.05451,
Accessed 21 Dec 2021
26. Arxiv BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,
https://arxiv.org/abs/1810.04805, Accessed 21 Dec 2021
27. Naseer M, Asvial M, Sari RF (2021) An empirical comparison of BERT, RoBERTa, and Electra
for fact verification. In: International conference on artificial intelligence in information and
communication (ICAIIC) 2021, pp. 241–246
28. Ho Y, Wookey S (2020) The real-world-weight cross-entropy loss function: modeling the costs
of mislabeling. IEEE Access 8:4806–4813
29. Zhou Y, Wang X, Zhang M, Zhu J, Zheng R, Wu Q (2019) MPCE: a maximum probability based
cross entropy loss function for neural network classification. IEEE Access 7:146331–146341
30. Yessou H, Sumbul G, Demir B (2020) A Comparative study of deep learning loss functions for
multi-label remote sensing image classification. IGARSSIEEE international geoscience and
remote sensing symposium 2020, pp. 1349–1352
31. Liu L, Qi H (2017) Learning effective binary descriptors via cross entropy. In: IEEE winter
conference on applications of computer vision (WACV) 2017, pp. 1251–1258 (2017)
32. Riquelme N, Von Lücken C, Baran B (2015) Performance metrics in multi-objective
optimization. In: Latin American Computing Conference (CLEI) 2015, pp. 1–11
33. Dubey A, Rasool A (2020) Clustering-based hybrid approach for multivariate missing data
imputation. Int J Adv Comput Sci Appl (IJACSA) 11(11):710–714
A Hybrid Approach for Missing Data
Imputation in Gene Expression Dataset
Using Extra Tree Regressor
and a Genetic Algorithm

Amarjeet Yadav, Akhtar Rasool , Aditya Dubey , and Nilay Khare

1 Introduction

Missing data is a typical problem in data sets gathered from real-world applications
[1]. Missing data imputation has received considerable interest from researchers as
it widely affects the accuracy and efficiency of various machine learning models.
Missing values typically occur due to manual data entry practices, device errors,
operator failure, and inaccurate measurements [2]. A general approach to deal with
missing values is calculating statistical data (like mean) for each column and substi-
tuting all missing values with the statistic, deleting rows with missing values, or
replacing them with zeros. But a significant limitation of these methods was a
decrease in efficiency due to incomplete and biased information [3]. If missing values
are not handled appropriately, they can estimate wrong deductions about the data.
This issue becomes more prominent in Gene expression data which often contain
missing expression values. Microarray technology plays a significant role in current
biomedical research [4]. It allows observation of the relative expression of thousands
of genes under diverse practical states. Hence, it has been used widely in multiple
analyses, including cancer diagnosis, the discovery of the active gene, and drug
identification [5].
Microarray expression data often contain missing values for different reasons,
such as scrapes on the slide, blotting issues, fabrication mistakes, etc. Microarray data
may have 1–15% missing data that could impact up to 90–95% of genes. Hence, there
is a need for precise algorithms to accurately impute the missing data in the dataset
utilizing modern machine learning approaches. The imputation technique known as
k-POD uses the K-Means approach to predict missing values [6]. This approach

A. Yadav · A. Rasool · A. Dubey (B) · N. Khare

Department of Computer Science & Engineering, Maulana Azad National Institute of
Technology, Bhopal 462003, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 127
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_12
128 A. Yadav et al.

works even when external knowledge is unavailable, and there is a high percentage
of missing data. Another method based on Fuzzy C-means clustering uses Support
vector regression and genetic algorithm to optimize parameters [7]. The technique
suggested in this paper uses both of these models as a baseline. This paper presents a
hybrid method for solving the issue. The proposed technique applies a hybrid model
that works on optimizing parameters for the K-Means clustering algorithm using
an Extra tree regression and genetic algorithm. In this paper, the proposed model is
implemented on the Mice Protein Expression Data Set and then its performance is
compared with baseline models.

2 Literature Survey

Missing value, also known as missing data, is where some of the observations in
a dataset are empty. Missing data is classified into three distinctive classes. These
classes are missing completely at random (MCAR), missing at random (MAR), and
missing not at random (MNAR) [2, 8]. These classes are crucial as missing data
in the dataset generates issues, and the remedies to these concerns vary depending
on which of the three types induces the situation. MCAR estimation presumes that
missing data is irrelevant to any unobserved response, indicating any observation in
the data set does not impact the chances of missing data. MCAR produces unbiased
and reliable estimates, but there is still a loss of power due to inadequate design but
not the absence of the data [2]. MAR means an organized association between the
tendency of missing data and the experimental data, while not the missing data.
For instance, men are less likely to fill in depression surveys, but this is not asso-
ciated with their level of depression after accounting for maleness. In this case, the
missing and observed observations are no longer coming from the same distribution
[2, 9]. MNAR describes an association between the propensity of an attribute entry
to be missing and its actual value. For example, individuals with little schooling
are missing out on education, and the unhealthiest people will probably drop out
of school. MNAR is termed “non-ignorable” as it needs to be handled efficiently
with the knowledge of missing data. It requires mechanisms to address such missing
data issues using prior information about missing values [2, 9]. There must be some
model for reasoning the missing data and possible values. MCAR and MAR are both
viewed as “ignorable” because they do not require any knowledge about the missing
data when dealing with it.
The researchers have proposed many methods to solve the accurate imputa-
tion of missing data. Depending on the type of knowledge employed in the tech-
niques, the existent methodology can be classified into four distinct categories: (i)
Global approach, (ii) Local approach, (iii) Hybrid approach, and (iv) Knowledge-
assisted approach [10, 11]. Each of the approaches has distinct characteristics. Global
methods use the information about data from global correlation [11, 12]. Two widely
utilized global techniques are Singular Value Decomposition imputation (SVDim-
pute) and Bayesian Principal Component Analysis (BPCA) methods [13, 14]. These
A Hybrid Approach for Missing Data Imputation in Gene Expression … 129

approaches are differentiated by their ability to retrieve missing data by capturing

global correlation information. Nevertheless, they ignore the hidden local structure
of data. SVDimpute delivers accurate results on time-series data sets with a minor
error, but it has one disadvantage: it is not appropriate for non-time-series datasets.
The BPCA approach has a tolerable computation error, suggesting that the bias raised
by BPCA is less compared to prior methods [14]. However, the BPCA technique may
not give accurate results if a gene has local similitude structures in a dataset.
Local approaches use the local information from the data. Some of the Local
methods are K Nearest Neighbor Imputation (KNNimpute), Local Least Squares
Imputation (LLSimpute), Gaussian Mixture Clustering imputation (GMCimpute),
Collateral Missing Value Imputation (CMVE), Multiple Imputations by Chained
Equations (MICE) and Classification and Regression Tree (CART), and Locally
Auto-weighted Least Squares Method (LAW-LSimpute) [15–21]. When the number
of samples is small, KNNimpute produces better results using local similarity, but
it gives unsatisfactory outcomes on big data sets [15]. The performance of LLSim-
pute enhances when the number of closest neighbors represented by “k” becomes
near to the number of samples [16]. Still, it degrades when “k” moves near to the
number of instances. MICE-CART utilizes data imputation technology to compre-
hend complicated relationships with the tiniest accommodations [19, 22]. However,
since CART-based and conventional MICE outcomes rely on inferior glitch repre-
sentation, comparable validity is not ensured. GMCimpute is more effective because
it can better utilize global association knowledge. But it has an issue with slower
fitting [17]. The tailored nearest neighbor method executes satisfactorily with limited
sample size and delivers superior accuracy than random forest techniques. In addi-
tion, in both time series and non-time series data sets, the CMVE method produces
better results when the missing rate is higher [18]. Still, CMVE does not automati-
cally determine the most optimistic number of terminating genes (k) from the data
set. LAW-LSimpute optimizes convergence and reduces estimation errors, making
it more reliable [20]. However, this method is not recommended if the missing rate
is high.
The hybrid strategy appears to be derived by combining global and local data
matrix correlations [23]. The hybrid approach may provide better imputation results
than an individual technique. Lincmb, Hybrid Prediction Model with Missing Value
Imputation (HPM-MI), KNN + Neural Network (NN), Genetic Algorithm (GA) +
Support Hybrid approaches like SVR and Fuzzy C-means + SVR + GA come under
hybrid systems [7, 23–25]. In terms of precision, selectivity, and sensitivity, HPM-MI
outperforms other methods [23]. Case of imbalanced classification in a multi-class
dataset creates a problem for the HPM-MI method. The GA + SVR model takes less
time to compute, and the SVR clustering method produces a more realistic result [7].
This imputation technique has a problem with local minimization for some outlier
data. Because of noise mitigation measures that improve computation accuracy, the
KNN + GA strategy outperforms other NN-GA systems in terms of evaluation
accuracy [7, 24]. However, some criteria must be chosen ahead of time, for instance,
the sort of neural network to use and the proper parameters to use when training
the model to fulfill performance standards [25]. In Knowledge-assisted approaches
130 A. Yadav et al.

for imputation of missing value, domain knowledge from data is utilized. Fuzzy C-
Means clustering (FCM) and Projection Onto Convex Sets (POCS) are some of the
knowledge-assisted methods [1, 25]. FCM process missing value imputation using
gene ontology annotation as external information. On the other hand, it becomes
hard to extract and regulate prior knowledge. Furthermore, the computation time is
increased.

3 About Genetic Algorithm, K-Means, and Extra Tree

Regression

3.1 Genetic Algorithm

John Holland introduced a Genetic Algorithm in 1975. A genetic algorithm can be

used to optimize both constrained and unconstrained problems using the process
of natural selection [26]. The genetic algorithm constantly changes a population
of individual solutions. The genetic algorithm randomly picks individuals from the
current population who will be parents and utilizes them to produce descendants for
the following generation. Over subsequent generations, the population grows toward
an optimal solution. Figure 1 demonstrates the working of the Genetic algorithm.

Fig. 1 Block diagram of

genetic algorithm
A Hybrid Approach for Missing Data Imputation in Gene Expression … 131

3.2 K-Means Algorithm

K-means clustering is one of the prevalent unsupervised machine learning algorithms

[27]. Unsupervised learning algorithms make deductions from datasets without pre-
assigned training data labels. This method was proposed during the 1950s and
1960s. Researchers from diverse domains independently conceived proposals for
K-Means. In 1956 Steinhaus was the first researcher to propose the algorithm. In
1967 MacQueen coined the term K-means. In the K-means algorithm, k number of
centroids are first identified. Then each data point is allotted to the closest cluster
while maintaining the centroids as small as feasible. K-Means minimize the following
objective function:

n
k 2
j
J= xi − c j (1)
j=1 i=1

In Eq. (1) j indicates cluster, cj is the centroid for cluster j, x i represents case i, k
represents number of clusters, n represents number of cases, and |x i − cj | is distance
functions.
In addition to K-Means clustering, each case x i has a membership function repre-
senting a degree of belongingness to a particular cluster cj . The membership function
is described as
1
ui j = m−1 (2)
c xi −c j
2

k=1 xi −ck

In Eq. (2), m is the weighting factor parameter, and its domain lies from one to
infinity. c represents the number of clusters, whereas ck represents the centroid for the
kth cluster. Only the complete attributes are considered for revising the membership
functions and centroids.
Missing value for any case x i is calculated using the membership function and
value of centroid for a cluster. Function used for missing value imputation is described
as.

Missing value = m i ∗ ci (3)

In Eq. (3), mi is the estimated membership value for ith cluster, ci represents
centroid of a ith cluster and c is the number of clusters. denotes summation of
product of mi and ci .
132 A. Yadav et al.

3.3 Extra Tree Regression

Extra Trees is an ensemble technique in machine learning. This approach integrates

the predictions from multiple decision trees trained on the dataset. The average
predicted values from the decision trees are taken when estimating regression values
in the Extra tree regression, while majority voting is performed for classification
[28]. The algorithms like bagging and random forest generate individual decision
trees by taking a bootstrap sample from the training dataset. In contrast, in the case
of the Extra Tree regression, every decision tree algorithm is fitted on the complete
training dataset.

4 About Dataset

For the implementation of the model, this paper uses the Mice Protein Expression
Data Set from UCI Machine Learning Repository. The data set consists of the expres-
sion levels of 77 proteins/protein modifications. There are 38 control mice and 34
trisomic mice for 72 mice. This dataset contains eight categories of mice which are
defined based on characteristics such as genotype, type of behavior, and treatment.
The dataset contains 1080 rows and 82 attributes. These attributes are Mouse ID,
Values of expression levels of 77 proteins, Genotype, Treatment type, and Behavior.
Dataset is artificially renewed such that it has 1%, 5%, 10%, and 15% missing
value ratios. All the irrelevant attributes such as MouseID, Genotype, Behavior, and
Treatment are removed from the dataset. Next, 558 rows were selected from shuf-
fled datasets for the experiment. For dimensionality reduction, the PCA (Principal
Component Analysis) method was used to reduce the dimensions of the dataset to
20. To normalize the data values between 0 and 1, a MinMax scaler was used.

5 Proposed Model

This research proposes a method to evaluate missing values using K-means clustering
optimized with an Extra Tree regression and a genetic algorithm. The novelty of the
proposed approach is the application of an ensemble technique named Extra Tree
regression for estimating accurate missing values. These accurate predictions with
the genetic algorithm further help in the better optimization of K-Means parameters.
Figure 2 represents the implementation of the proposed model. First, to implement
the model on the dataset, missing values are created artificially. Then the dataset with
missing values is divided into a complete dataset and an incomplete dataset. In the
complete dataset, those rows are considered in which none of the attributes contains
a missing value. In contrast, an incomplete dataset contains rows with attributes with
one or more missing values.
A Hybrid Approach for Missing Data Imputation in Gene Expression … 133

Fig. 2 Proposed model (KextraGa)

In the proposed approach, an Extra Tree regression and Genetic algorithm are
used for the optimization of parameters of the K-Means algorithm. The Extra tree
regression and K-Means model are trained on a complete row dataset to predict
the output. Then, K-means is used to evaluate the missing data for the dataset with
incomplete rows. K-means outcome is compared with the output vector received from
the Extra Tree regression. The optimized value for c and m parameters is obtained by
operating the genetic algorithm to minimize the difference between the Extra Tree
regression and K-means output. The main objective is to reduce error function =
(X − Y )2 , where X is the prediction output of the Extra Tree regression method and
Y is the outcome of the prediction from the K-means model. Finally, Missing data
are estimated using K-means with optimized parameters.

5.1 Experimental Implementation

The code for the presented model is written in Python version 3.4. The K-means clus-
tering and Extra Tree regression are imported from the sklearn library. The number
of clusters = 3 and membership operator value = 1.5 is fed in the K-Means algo-
rithm. In the Extra tree regression, the number of decision trees = 100 is used as a
parameter. The genetic algorithm uses 20 as population size, 40 as generations, 0.60
crossover fraction, and a mutation fraction of 0.03 as parameters.
134 A. Yadav et al.

6 Performance Analysis

The performance of the missing data imputation technique is estimated by calculating

the mean absolute error (MAE), root mean squared error (RMSE), and relative clas-
sification accuracy (A) [29, 30]. MAE is an evaluation metric used with regression
models. MAE takes the average of the absolute value of the errors.

1 n
y j − yj

MAE = (4)
n j=1

RMSE is one of the most commonly used standards for estimating the quality of
predictions.

n
y j − y j 2

RMSE =
(5)
j=1
n

In Eqs. (4) and (5), ŷj represents predicted output and yj represents actual output.
“n” depicts total number of cases. The relative classification accuracy is given by.
ct
A= ∗ 10 (6)
c
In Eq. (6), c represents the number of all predictions, and ct represents the number
of accurate predictions within a specific tolerance. A 10% tolerance is used for
comparative prediction, which estimates data as correct for values within a range of
±10% of the exact value.

7 Experimental Results

This section discusses the performance evaluation of the proposed model. Figures 3–
4 shows box plots of the performance evaluation of the three different methods for
the Mice Protein Expression Data Set, with 1%, 5%, 10%, and 15% missing values.
In Box plots, the halfway mark on each box represents the median. The whiskers
cover most of the data points except outliers. Outliers are plotted separately. Figure 3a
compares three methods on the dataset with 1–15% missing data. Each box includes
4 results in the RMSE. The median RMSE values are 0.01466, 0.01781, and 0.68455.
Figure 3b compares the MAE on the dataset with 1–15% missing values. The median
MAE values are 0.10105, 0.10673, and 0.78131. Better performance is indicated
from lower error. Figure 4 compares the accuracy of different models used for the
experiment. This accuracy is estimated by computing the difference between the
correct and predicted value using a 10% tolerance. Accuracy is calculated for three
techniques executed on the dataset with 1–15% missing values. The median accuracy
A Hybrid Approach for Missing Data Imputation in Gene Expression … 135

Fig. 3 Box plot for RMSE and MAE in three methods for 1–15% missing ratio

Fig. 4 Box plot for relative

accuracy in three methods
for 1–15% missing ratio

values are 22.32143, 19.04762, and 0.67. Better imputations are indicated from higher
accuracy.
It is evident from the box plots that the proposed method gives the lowest RMSE
and MAE error and the highest relative accuracy on the given dataset. Figure 5–
6 represents a line graph of the performance evaluation of the three different
methods against the missing ratios. Figure 5a illustrates that the hybrid K-Means and
ExtraTree-based method has a lower RMSE error value compared to both methods
for the mice dataset. Figure 5b indicates that the proposed hybrid K-Means and
136 A. Yadav et al.

Fig. 5 RMSE and MAE comparison of different techniques for 1–15% missing ratio in the dataset

Fig. 6 Relative Accuracy

comparison of different
techniques for 1–15%
missing ratio in the dataset

ExtraTree-based hybrid method has a lower MAE error value than both methods for
the mice dataset. Figure 6 demonstrates that the accuracy of the evaluated and actual
data with 10% tolerance is higher for the proposed method than the FcmSvrGa and
k-POD method.
The graphs in Figs. 5–6 indicate that k-POD gives the highest error and lowest
accuracy at different ratios [6]. The FcmSvrGa method gives a slightly lower error at
a 1% missing ratio, but when compared to overall missing ratios KExtraGa method
provides the lowest error than other baseline models. Furthermore, compared to other
methods, the KExtraGa method gives better accuracy over each missing ratio. It is
clearly illustrated from Figs. 3–6 that the proposed model KExtraGa performs better
than the FcmSvrGa and k-POD method. The Extra Tree regression-based method
achieves better relative accuracy than the FcmSvrGa and k-POD method. In addition,
A Hybrid Approach for Missing Data Imputation in Gene Expression … 137

the proposed method also achieves overall less median RMSE and MAE error than
both methods. There are some drawbacks to the proposed method. Training of Extra
tree regression is a substantial issue. Although the training time of the proposed
model is slightly better than FcmSvrGa, it still requires an overall high computation
time.

8 Conclusion and Future Work

This paper proposes a hybrid method based on the K-Means clustering, which utilizes
an Extra tree regression and genetic algorithm to optimize parameters to the K-
Means algorithm. This model was applied to the Mice protein expression dataset and
gave better performance than the other algorithms. In the proposed model, complete
dataset rows were clustered based on similarity, and each data point is assigned
a membership function for each cluster. Hence, this method yields more practical
results as each missing value belongs to more than one cluster. The experimental
results clearly illustrate that the KExtraGa model yields better accuracy (with 10%
tolerance) and low RMSE and MAE error than the FCmSvrGa and k-POD algorithm.
The limitation of the model proposed in this research paper has indicated a need for
a fast algorithm. Hence, the main focus area for the future would be a reduction of
the computation time of the proposed algorithm. Another future goal would be to
implement the proposed model on a large dataset and enhance its accuracy [22].

References

1. Gan X, Liew AWC, Yan H (2006) Microarray missing data imputation based on a set theoretic
framework and biological knowledge. Nucleic Acids Res 34(5):1608–1619
2. Pedersen AB, Mikkelsen EM, Cronin-Fenton D, Kristensen NR, Pham TM, Pedersen L,
Petersen I (2017) Missing data and multiple imputation in clinical epidemiological research.
Clin Epidemiol 9:157
3. Dubey A, Rasool A (2020) Time series missing value prediction: algorithms and applications.
In: International Conference on Information, Communication and Computing Technology.
Springer, pp. 21–36
4. Trevino V, Falciani F, Barrera- HA (2007) DNA microarrays: a powerful genomic tool for
biomedical and clinical research. Mol Med 13(9):527–541
5. Chakravarthi BV, Nepal S, Varambally S (2016) Genomic and epigenomic alterations in cancer.
Am J Pathol 186(7):1724–1735
6. Chi JT, Chi EC, Baraniuk RG (2016) k-pod: A method for k-means clustering of missing data.
Am Stat 70(1):91–99
7. Aydilek IB, Arslan A (2013) A hybrid method for imputation of missing values using optimized
fuzzy c-means with support vector regression and a genetic algorithm. Inf Sci 233:25–35
8. Dubey A, Rasool A (2020) Clustering-based hybrid approach for multivariate missing data
imputation. Int J Adv Comput Sci Appl (IJACSA) 11(11):710–714
9. Gomer B (2019) Mcar, mar, and mnar values in the same dataset: a realistic evaluation of
methods for handling missing data. Multivar Behav Res 54(1):153–153
138 A. Yadav et al.

10. Meng F, Cai C, Yan H (2013) A bicluster-based bayesian principal component analysis method
for microarray missing value estimation. IEEE J Biomed Health Inform 18(3):863–871
11. Liew AWC, Law NF, Yan H (2011) Missing value imputation for gene expression data:
computational techniques to recover missing data from available information. Brief Bioinform
12(5):498–513
12. Li H, Zhao C, Shao F, Li GZ, Wang X (2015) A hybrid imputation approach for microarray
missing value estimation. BMC Genomics 16(S9), S1
13. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB
(2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525
14. Oba S, Sato Ma, Takemasa I, Monden M, Matsubara, Ki, Ishii S (2003) A Bayesian missing
value estimation method for gene expression profile data. Bioinformatics 19(16), 2088–2096
15. Celton M, Malpertuy A, Lelandais G, De Brevern AG (2010) Comparative analysis of missing
value imputation methods to improve clustering and interpretation of microarray experiments.
BMC Genomics 11(1):1–16
16. Kim H, Golub GH, Park H (2005) Missing value estimation for DNA microarray gene
expression data: local least squares imputation. Bioinformatics 21(2):187–198
17. Ouyang M, Welsh WJ, Georgopoulos P (2004) Gaussian mixture clustering and imputation of
microarray data. Bioinformatics 20(6):917–923
18. Sehgal MSB, Gondal I, Dooley LS (2005) Collateral missing value imputation: a new robust
missing value estimation algorithm for microarray data. Bioinformatics 21(10):2417–2423
19. Burgette LF, Reiter JP (2010) Multiple imputation for missing data via sequential regression
trees. Am J Epidemiol 172(9):1070–1076
20. Yu Z, Li T, Horng SJ, Pan Y, Wang H, Jing Y (2016) An iterative locally auto-weighted least
squares method for microarray missing value estimation. IEEE Trans Nanobiosci 16(1):21–33
21. Dubey A, Rasool A (2021) Efficient technique of microarray missing data imputation using
clustering and weighted nearest neighbour. Sci Rep 11(1):24–29
22. Dubey A, Rasool A (2020) Local similarity-based approach for multivariate missing data
imputation. Int J Adv Sci Technol 29(06):9208–9215
23. Purwar A, Singh SK (2015) Hybrid prediction model with missing value imputation for medical
data. Expert Syst Appl 42(13):5621–5631
24. Aydilek IB, Arslan A (2012) A novel hybrid approach to estimating missing values in databases
using k-nearest neighbors and neural networks. Int J Innov Comput, Inf Control 7(8):4705–4717
25. Tang J, Zhang G, Wang Y, Wang H, Liu F (2015) A hybrid approach to integrate fuzzy c-means
based imputation method with genetic algorithm for missing traffic volume data estimation.
Transp Res Part C: Emerg Technol 51:29–40
26. Marwala T, Chakraverty S (2006) Fault classification in structures with incomplete measured
data using autoassociative neural networks and genetic algorithm. Curr Sci 542–548
27. Hans-Hermann B (2008) Origins and extensions of the k-means algorithm in cluster analysis.
Electron J Hist Probab Stat 4(2)
28. Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
29. Yadav A, Dubey A, Rasool A, Khare N (2021) Data mining based imputation techniques to
handle missing values in gene expressed dataset. Int J Eng Trends Technol 69(9):242–250
30. Gond VK, Dubey A, Rasool A (2021) A survey of machine learning-based approaches for
missing value imputation. In: Proceedings of the 3rd International Conference on Inventive
Research in Computing Applications, ICIRCA 2021, pp. 841–846
A Clustering and TOPSIS-Based
Developer Ranking Model
for Decision-Making in Software Bug
Triaging

Pavan Rathoriya, Rama Ranjan Panda, and Naresh Kumar Nagwani

1 Introduction

Meeting software development deadlines is a key challenge in software development.

To meet deadlines, testing, and bug fixing activities should be managed in a prioritized
and optimized manner. Bug triaging is the process of assigning newly reported bugs
to the appropriate software developers. The person who handles the bug triage is
called a trigger. The finding of expert developers includes the understanding of the
developer’s profile and domain in which the developers are comfortable fixing bugs.
In recent years, many machine learning-based techniques have been proposed by
researchers to automate the process of bug triaging. These machine Learning-based
techniques analyzed the historical data and then discover the appropriate software
developer for the newly reported bugs. The problem with these techniques is that the
availability of the developers is not considered while assigning the newly reported
bugs. A few developers may be heavily loaded with the assigned bugs, but most of the
developers are free. So, consideration of the availability of developers should also be
considered for better management of the triaging process. An effective bug triaging
technique considers many attributes extracted from the software bug repositories.
Most of the bug repositories maintain information about the developer’s profile in
terms of the developer’s experience, bugs resolved and fixed, how many bugs are
assigned to a developer, and so on. MADM (Multi-Attribute Decision Making) [1]
techniques play a key role in solving and decision-making problems having multiple

P. Rathoriya (B) · R. R. Panda · N. K. Nagwani

Department of Information & Technology, National Institute of Technology, Raipur, India
e-mail: [email protected]
R. R. Panda
e-mail: [email protected]
N. K. Nagwani
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 139
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_13
140 P. Rathoriya et al.

attributes and generating the ranked list of solutions to such problems. TOPSIS [2]
is one of the popular techniques under the MADM paradigm to solve problems. The
main attributes for bug triaging include the consideration of the attributes, namely,
the experience of developers in years (D), the number of assigned bugs (A), the
number of newly assigned bugs (N), the number of fixed or resolved bugs (R), and
the average resolving time (T). Software bugs are managed through online software
bug repositories. For example, the Mozilla bugs are available online at 1 https://bug
zilla.mozilla.org/users_profile?.user_id=X, where X is the id of the software bug.
In this paper, Sect. 2 presents motivation, and Sect. 3 presents some related work
to bug triaging. In Sect. 4, the methodology is presented. Sect. 5 describes our model
with an illustrative example. Sect. 6 covers some threats to validity, and Sect. 7
discusses the conclusion and future work.

2 Motivation

The problem with machine learning techniques mostly depends on the historical
dataset for training and do not consider the availability of developers in bug triaging.
For example, the machine learning algorithm can identify one developer as an expert
for the newly reported bug, but at the same time, the developer might have been
assigned numerous bugs at the same time as the developer is an expert developer.
With the help of MCDM approaches, such a problem can be handled efficiently
by considering the availability of the developer as one of the non-profit criteria
(negative/lossy attribute or maximize/minimize attributes).

3 Related Work

Bug triaging is a decision-making problem in which a suitable developer who

can solve the bug is identified. There have been a series of studies conducted by
many researchers. They’ve gone through various methods. Several researchers used
machine learning, deep learning, topic modeling, and MCDM methodologies, as
discussed in the below paragraphs.
Different researchers used various machine learning algorithms in [3–10].
Malhotra et al. [6] used textual information for bug triaging using various machine
learning algorithms on six open-source datasets: Mesos, Hadoop, Spark, Map
Reduce, HDFS, and HBASE. Since textual information may contain a lot of unneces-
sary data, therefore result can be inconsistent, Shadhi et al. [9] have used Categorical
fields of bag of word model (BOW) of bug report and combined both Categorical and
textual data to show the results. Agrawal et al. [10] created the word2vec approach,
which employs natural language processing (NLP) to construct a prediction model
for determining the appropriate developer for bug triaging using various classifi-
cation algorithms such as KNN, SVM, RF, and others. The challenges faced in the
A Clustering and TOPSIS-Based Developer Ranking Model … 141

existing machine learning mechanism were that it is difficult to label bug reports with
missing or insufficient label information, and most classification algorithms used in
existing approaches are costly and inefficient with large datasets.
Deep learning techniques to automate the bug assignment process are another set
of approaches that can be used with large datasets being researched by researchers
[11–18]. By extracting features, Mani et al. [19] proposed the deep triage technique,
which is based on the Deep Bidirectional Recurrent Neural Network with Attention
(DBRN-A) model. Unsupervised learning is used to learn the semantic and syntactic
characteristics of words. Similarly, Tuzun et al. [15] improved the accuracy of the
method by using Gated Recurrent Unit (GRU) instead of Long-Short Term Memory
(LSTM), combining different datasets to create the corpus, and changing the dense
layer structure (simply doubling the number of nodes and increasing the layer) for
better results. Guo et al. [17] proposed an automatic bug triaging technique based on
CNN and developer activity, in which they first apply text preprocessing, then create
the word2vec, and finally use CNN to predict developer activity. The problem asso-
ciated with the deep learning approach is that, based on the description, a developer
can be selected accurately, but availability and expertise can’t be determined.
Several studies [19–23] have increased bug triaging accuracy by including addi-
tional information such as components, products, severity, and priority. Hadi et al.
[19] have presented the Dependency-aware Bug Triaging Method (DABT). This
considers both bug dependency and bug fixing time in bug triaging using Topic
mode Linear Discriminant Analysis (LDA) and integer programming to determine
the appropriate developer who can fix the bug in a given time slot. Iyad et al. [20]
have proposed the graph-based feature augmentation approach, which uses graph-
based neighborhood relationships among the terms of the bug report to classify the
bug based on their priority using the RFSH [22] technique.
Some bugs must be tossed due to the inexperience of the developer. However, this
may be decreased by selecting the most suitable developer. Another MCDM-based
method is discussed in [24–28] for selecting the best developer. Goyal et al. [27]
used the MCDM method, namely the Analytic Hierarchy Process (AHP) method,
for bug triaging, in which the first newly reported bug term tokens are generated and
developers are ranked based on various criteria with different weightages. Gupta et al.
[28] used a fuzzy technique for order of preference by similarity to ideal solution
(F-TOPSIS) with the Bacterial Foraging Optimization Algorithm (BFOA) and Bar
Systems (BAR) to improve bug triaging results.
From the above-discussed method, it can be concluded that the existing methods
do not consider the ranking of bugs or developers using metadata and multi-
criteria decision-making for selecting the developer. And in reality, all the parame-
ters/features are not equal, so the weight should be assigned explicitly for features
and their prioritization. Other than these, in the existing method, developer avail-
ability is also not considered. Hence, these papers identify this gap and suggest a
hybrid bug triaging mechanism.
142 P. Rathoriya et al.

4 Methodology

The proposed method explained in this section consists of the following steps
(Fig. 1).–

1. Extract bugs from bug repositories

2. Preprocessing
3. Developer vocabulary and metadata generation
4. Matching of newly reported bugs with developer vocabulary
5. For filtered developers, extract metadata from 3
6. Apply AHP
7. Apply TOPSIS for ranking of developers.

In the first step, the bug data is collected from open sources like Kaggle or bugzilla.
In the present paper, the dataset has 10,000 raw and 28 columns of attributes which
contain information related to bugs, like the developer who fixed it, when the bug
was triggered, when the bug was fixed, bud id, bug summary, etc. The dataset is taken
from Kaggle.
In step two, preprocessing tasks are applied to the bug summary, for example, text
lowercasing, stop word removal, tokenization, stemming, and lemmatization.
In the third step, developer metadata is extracted from the dataset. It consists of
the following: developer name, total number of bugs assigned to each developer,
number of bugs resolved by each developer, new bugs assigned to each developer,
total experience of developer, average fixing time of developer to resolve all bugs.
And developer vocabulary is also created by using bug developer names and bug
summary.

Fig. 1 Framework of overall proposed approach

A Clustering and TOPSIS-Based Developer Ranking Model … 143

In the fourth step, the newly reported preprocessed bug summary is matched
with developer vocabulary using the cosine similarity [17] threshold filter. Based on
similarity, developers are filtered from the developer vocabulary for further steps.
In the fifth step, developer metadata is extracted from step 3 only for the filtered
developer from step 4.
In step six AHP method is applied to find the criteria weight. It has the following
steps for bug triaging:
1. Problem definition: The problem is to identify the appropriate developer to fix
the newly reported bug.
2. Parameter selection: Appropriate parameter (Criteria) are selected for finding
their weight. It has the following criteria: name of developer (D), developer
experience in year (E), total number of bugs assigned (A), newly assigned bugs
(N), total bug fixed (R), and average fixing time (F).
3. Create a judgement matrix (A): An squared matrix named A order of m × m is
created for pairwise comparison of all the criteria, and the element of the matrix
the relative importance of criteria to the other criteria-

Am∗m = ai j (1)

where m is the number of criteria and a is the relative importance of criteria.

Their entry of an element into the matrix follows the following rules:

ai j = 1 a ji (2)

aii = 1 for all i (3)

Here i, j = 1,2.3……..m.
For relative importance, the following data will be used:
4. Then normalized the matrix A.
5. Then find the eigenvalue and eigenvector Wt .
6. Then a consistency check of weight will be performed. It has the following steps:
i. Calculate λmaκ by given Eq. (4):

1 i th in AW t
n
λmaκ = (4)
n i=1 i th inW t

ii. Calculate the consistency index (CI) using Eq. (5):

(λmaκ − n)
CI = (5)
(n − 1)
144 P. Rathoriya et al.

Here n is the number of criteria.

iii. Calculate the consistency ration (CR) using Eq. (6):

CI
CR = (6)
RI
Here RI is random index [24]. If the consistency ratio is less than 0.10, the
weight is consistent and we can use the weight (W) for further measurement
in next step. If not, repeat from step 3 of AHP and use the same step.
In step 7, the TOPSIS [27] model is applied for ranking the developer. The TOPSIS
model has the following steps for developer ranking:
1. Make a performance matrix (D) for each of the selected developers with the
order m × n, where n is the number of criteria and m is the number of developers
(alternatives),
And the element of the matrix will be the respective value for the developer
according to the criteria.
2. Normalize the Matrix Using the Following Equation

m

Ri j = ai j / ai2j (7)
k=1

3. Multiply the Normalized Matrix (Rij ) with the Calculated Weight from the Ahp
Method.

V = (Vi j )m×n = (Wi Ri j )m×n (8)

4. Determine the positive idea solution (best alternative) (A* ) and the anti-idea
solution (A− )(worst alternative) using the following equations.

A∗ = maxi Vi j | j ∈ J , mini Vi j | j ∈ J ‘ = {V1∗ , V2∗ I., Vn∗ } (9)

A− = maxi Vi j | j ∈ J , mini Vi j | j ∈ J ‘ = {V1− , V2− ...., Vn− } (10)

5. Find the Euclidean distance between the alternative and the best alternative
called d* , and similarly, from the worst alternative called d − , using the following
formula

n
di =
∗
(vi j − v ∗j )2 (11)
j=1
A Clustering and TOPSIS-Based Developer Ranking Model … 145

n
di =
− 2
(vi j − v ij ) (12)
j=1

6. Find the similarity to the worst condition (CC). It is also called the close-
ness ration. Using the following formula, the higher the closeness ration of an
alternative, the higher the ranking

di−
CCi = (13)
di∗ + di−

For bug triaging, the developer who has the highest closeness will be ranked
first, and the lowest closeness developer will have the last rank for bug triaging.

5 Illustrative Example: A Case Study

In order to better explain the proposed model, an illustrative example is presented

in this section. The data is taken from Kaggle repository, which consists of 10,000
bugs and 4,000 developers. Then one bug at a time is taken and preprocessing task
is performed on bug summary and developer vocabulary is created. Then the newly
reported bug summary similarity is checked with the developer vocabulary, and based
on similarity top 5 developers are selected and its metadata (developer name (D),
developer experience in year (E), total number of bugs assigned (A), newly assigned
bugs (N), total bug fixed (R), and average fixing time (F)) is filtered on which TOPSIS
method will be applied to rank them is shown in Table 1. For privacy reasons, the
actual names of developers are not mentioned here. Now the AHP method will be
applied for calculating the criteria weight. In the AHP method, first the goal is defined.
Here the goal is to find the best appropriate developer for bug fixing.
In step 2, the criteria are selected. Here the criteria are the developer experience
in a year (E), the total number of bugs assigned (A), newly assigned bugs (N), the
total number of bugs fixed (R), and the average fixing time (F). In the next step, a
pairwise judgement matrix is created. It is a 2D square matrix. By referring to Table
2 and Eqs. (2) and (3), you can see how element values are assigned based on the

Table 1 Relative importance

Importance value Description
of criteria
1 Identical importance
3 Reasonable importance
5 Strong importance
7 Very Strong importance
9 Extremely strong importance
2, 4, 6, and 8 Middle values
146 P. Rathoriya et al.

Table 2 Sample dataset for demonstrating TOPSIS-based model

Developer D Experience Total bug Newly assigned Total fixed Average fixing
(Years) E assigned A bugs N bugs R time (Days) F
D1 10 40 3 35 7
D2 5 15 0 12 9
D3 3 25 7 16 8
D4 5 2 0 1 3
D5 7 20 4 4 4

Table 3 Pairwise comparison of criteria

E A N R F
E 1.00 0.20 1.00 0.14 0.14
A 5.00 1.00 1.00 0.20 0.14
N 1.00 1.00 1.00 0.20 0.20
R 7.00 5.00 5.00 1.00 0.33
T 7.00 7.00 5.00 3.00 1.00

relative importance of criteria to other criteria. For example, in Table 4 criteria “total
bug assigned” has strong importance, our “experience” hence assigns 5 and in vice
versa case assigns 1/5, and criteria “total bug resolved” has very strong importance,
our experience assigns 7 and in visa versa case 1/7 will be assigned. Similarly, other
values can be filled by referring to Table 2, and diagonal value is always fixed to one.
The resultant judgement matrix is given in table format in Table 3.
In next step, Table 4 will normalize and eigenvalue and eigenvector will be calcu-
lated, the transpose of eigenvector is the weight of criteria that is given in Table
4.
Then the calculated criteria weight consistency will be checked by following
Eqs. (4), (5) and (6) and get the result shown below.

λmaκ = 5.451

Consistency Index (CI) = 0.113

Consistency Ration (CR) = 0.10069

Table 4 Criteria weight

E A N R T
Weight (W) 0.0497 0.102 0.069 0.295 0.482
A Clustering and TOPSIS-Based Developer Ranking Model … 147

Table 5 Weighted normalized matrix

Developer E A N R T
D1 0.034 0.076 0.024 0.255 0.228
D2 0.017 0.029 0.000 0.087 0.293
D3 0.010 0.048 0.056 0.116 0.261
D4 0.017 0.004 0.000 0.007 0.098
D5 0.024 0.038 0.032 0.029 0.130

Since the CR 0. 1, hence the weights are consistent and it can be used for further
calculation.
Now in next Step TOPSIS method is applied, The TOPSIS method generates
the first evolutionary matrix (D) of size m*n, where m is the number of alternatives
(developers) and n is the number of criteria, and in our example, there are 5 developers
and 5 criteria, so the evolution matrix will be 5* 5.In the next step, matrix D will be
normalized by using Eq. (7), and then get the resultant matrix R, and, matrix R will be
multiplied with weight W by using Eq. (8) and get the weighted normalized matrix
shown in Table 5.
In the next step, the best alternative and worst alternative are calculated by using
Eqs. 9 and 10 shown in Table 6.
Next, find the distance between the best substitute and the target substitute and
also from the worst alternative using Eqs. (11) and (12). Then, using Eq. (13), get
the closeness ratio that is shown in Table 7.
Generally, CC = 1 if the alternative has the best solution. Similarly, if CC = 0,
then the alternative has the worst solution. Based on the closeness ration, D1 as the
first rank, then D4 has the second rank, and respectively, D5, D3, and D2 have the
third, fourth, and fifth ranks. Ranking Bar graph based on closeness ration is also
shown in Fig. 2.

Table 6 Ideal solution (P*) and anti-ideal solution (P− )

P* 0.034 0.076 0.000 0.255 0.098
P− 0.010 0.004 0.056 0.007 0.293

Table 7 Developer closeness ratio and rank

Developer d* d− CC Rank
D1 0.1011 0.3077 0.7526 1
D2 0.0996 0.0996 0.4975 5
D3 0.1445 0.1445 0.4983 4
D4 0.1325 0.2690 0.6700 2
D5 0.2950 0.2950 0.4984 3
148 P. Rathoriya et al.

Fig. 2 Developer ranking bar graph

6 Threats to Validity

The suggested model poses a threat due to the use of the ahp approach to calculate
the criterion weight. Because the judgement matrix is formed by humans, there may
be conflict in the emotions of humans when assigning weight to criteria, and there
may be a chance of obtaining distinct criteria weight vectors, which may affect the
overall rank of a developer in bug triaging.

7 Conclusion and Future Scope

A new algorithm is proposed for bug triaging using hybridization of two MCDM
algorithms respectively AHP for criteria weight calculation and TOPSIS for ranking
of the developers with considering the availability of the developers. The future work
can be applying other MCDM algorithms for the effective ranking of developers in
bug triaging.

References

1. Yalcin AS, Kilic HS, Delen D (2022) The use of multi-criteria decision-making methods
in business analytics: A comprehensive literature review. Technol Forecast Soc Chang 174,
121193
2. Mojaver M, et al. (2022) Comparative study on air gasification of plastic waste and conventional
biomass based on coupling of AHP/TOPSIS multi-criteria decision analysis. Chemosphere 286,
131867
3. Sawarkar R, Nagwani NK, Kumar S (2019) Predicting available expert developer for newly
reported bugs using machine learning algorithms. In: 2019 IEEE 5th International Conference
A Clustering and TOPSIS-Based Developer Ranking Model … 149

for Convergence in Technology (I2CT), pp. 1–4, doi: https://doi.org/10.1109/I2CT45611.2019.

9033915
4. Chaitra BH, Swarnalatha KS (2022) Bug triaging: right developer recommendation for bug
resolution using data mining technique. In: Emerging Research in Computing, Information,
Communication and Applications. Springer, Singapore, pp. 609–618
5. Sun, Xiaobing, et al. “Experience report: investigating bug fixes in machine learning
frameworks/libraries.“ Frontiers of Computer Science 15.6 (2021): 1-16.
6. Malhotra R, et al. (2021) A study on machine learning applied to software bug priority predic-
tion. In: 2021 11th International Conference on Cloud Computing, Data Science & Engineering
(Confluence). IEEE
7. Goyal A, Sardana (2019) Empirical analysis of ensemble machine learning techniques for bug
triaging. In: 2019 Twelfth International Conference on Contemporary Computing (IC3). IEEE
8. Roy NKS, Rossi B (2017) Cost-sensitive strategies for data imbalance in bug severity classi-
fication: Experimental results. In: 2017 43rd Euromicro Conference on Software Engineering
and Advanced Applications (SEAA). IEEE
9. Chowdhary MS, et al. (2020) Comparing machine-learning algorithms for anticipating the
severity and non-severity of a surveyed bug. In: 2020 International Conference on Smart
Technologies in Computing, Electrical and Electronics (ICSTCEE). IEEE
10. Agrawal R, Goyal R (2021) Developing bug severity prediction models using word2vec. Int J
Cogn Comput Eng 2:104–115
11. Mani S, Sankaran A, Aralikatte R (2019) Deeptriage: Exploring the effectiveness of deep
learning for bug triaging. In: Proceedings of the ACM India Joint International Conference on
Data Science and Management of Data
12. Zhou C, Li B, Sun X (2020) Improving software bug-specific named entity recognition with
deep neural network. J Syst Softw 165:110572
13. Liu Q, Washizaki H, Fukazawa Y (2021) Adversarial multi-task learning-based bug fixing
time and severity prediction. In: 2021 IEEE 10th Global Conference on Consumer Electronics
(GCCE), IEEE 2021
14. Zaidi SFA, Lee C-G (2021) One-class classification based bug triage system to assign a newly
added developer. In: 2021 International Conference on Information Networking (ICOIN).
IEEE, 2021
15. Tüzün E, Doğan E, Çetin A (2021) An automated bug triaging approach using deep learning:
a replication study. Avrupa Bilim ve Teknoloji Dergisi 21:268–274
16. Mian TS (2021) Automation of bug-report allocation to developer using a deep learning algo-
rithm. In: 2021 International Congress of Advanced Technology and Engineering (ICOTEN).
IEEE, 2021
17. Guo S, et al. (2020) Developer activity motivated bug triaging: via convolutional neural network.
Neural Process Lett 51(3), 2589–2606
18. Aung TWW, et al. (2022) Multi-triage: A multi-task learning framework for bug triage. J Syst
Softw 184, 111133
19. Jahanshahi H, et al. (2021) DABT: A dependency-aware bug triaging method. J Syst Softw
2021, 221–230
20. Terdchanakul P, et al. (2017) Bug or not? Bug report classification using n-gram idf. In: 2017
IEEE international conference on software maintenance and evolution (ICSME). IEEE
21. Xi S.-Q., et al. (2019) Bug triaging based on tossing sequence modeling. J Comput Sci Technol
34(5), 942–956
22. Alazzam I, et al. (2020) Automatic bug triage in software systems using graph neighborhood
relations for feature augmentation. In: IEEE Transactions on Computational Social Systems
7(5), 1288–1303
23. Nguyen U, et al. (2021) Analyzing bug reports by topic mining in software evolution. In: 2021
IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE
24. Yadav V, et al. (2019) PyTOPS: A Python based tool for TOPSIS. SoftwareX 9, 217–222
25. James AT, et al. (2021) Selection of bus chassis for large fleet operators in India: An AHP-
TOPSIS approach. Expert Syst Appl 186, 115760
150 P. Rathoriya et al.

26. Goyal A, Sardana N (2017) Optimizing bug report assignment using multi criteria decision
making technique. Intell Decis Technol 11(3):307–320
27. Gupta C, Inácio PRM, Freire MM (2021) Improving software maintenance with improved
bug triaging in open source cloud and non-cloud based bug tracking systems. J King Saud
Univ-Comput Inf Sci
28. Goyal A, Sardana N (2021) Feature ranking and aggregation for bug triaging in open-source
issue tracking systems. In: 2021 11th International Conference on Cloud Computing, Data
Science & Engineering (Confluence). IEEE
GujAGra: An Acyclic Graph to Unify
Semantic Knowledge, Antonyms,
and Gujarati–English Translation
of Input Text

Margi Patel and Brijendra Kumar Joshi

1 Introduction

One of the most challenging issues in NLP is recognizing the correct sense of each
word that appears in input expressions. Words in natural languages can have many
meanings, and several separate words frequently signify the same notion. WordNet
can assist in overcoming such challenges. WordNet is an electronic lexical database
that was created for English and has now been made available in various other
languages [1]. Words in WordNet are grouped together based on their semantic simi-
larity. It segregates words into synonym sets or synsets, which are sets of cognitively
synonymous terms. A synset is a collection of words that share the same part of
speech and may be used interchangeably in a particular situation. WordNet is widely
regarded as a vital resource for scholars working in computational linguistics, text
analysis, and a variety of other fields. A number of WordNet compilation initiatives
have been undertaken and carried out in recent years under a common framework
for lexical representation, and they are becoming more essential resources for a wide
range of NLP applications such as a Machine Translation System (MTS).
The rest of the paper is organized as follows:
Next section gives a brief of Gujarati Language. Section 3 gives an overview of
previous Relevant Work in this topic. Section 4 covers the description about each
component of System Architecture for the software used to build WordNet graph
with respect to Gujarati–English–Gujarati Language. Section 5 demonstrates the
Proposed Algorithm for the same. Section 6 is about the Experiment and Results.
Section 7 brings the work covered in this article to a Conclusion.

M. Patel (B)
Indore Institute of Science and Technology, Indore, India
e-mail: [email protected]
B. K. Joshi
Military College of Telecommunication Engineering, Mhow, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 151
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_14
152 M. Patel and B. K. Joshi

2 Gujarati Language

Gujarati is an Indo-Aryan language that is indigenous to the Indian state of Gujarat.

Gujarati is now India’s seventh most frequently spoken language in terms of native
speakers. It is spoken by approximately 4.48% of the Indian population, totaling
46.09 million people [2]. It is the world’s 26th most frequently spoken language in
terms of native speakers, with about 55 million people speaking it [3, 4]. Initially,
Gujarati writing was mostly used for commercial purposes, with literature Devana-
gari script used for literary. The poetry form of language is considerably older, and it
has been enriched by the poetry of poets such as Narsinh Mehta [5]. Gujarati prose
literature and journalism began in the nineteenth century. It is utilized in schools, the
government, industry, and the media. The language is commonly spoken in expa-
triate Gujarati communities in the United Kingdom and the United States. Gujarati
publications, journals, radio, and television shows are viewable in these communities.

3 Literature Review

Word Sense Disambiguation (WSD) is the task of identifying the correct sense of
a word in a given context. WSD is an important intermediate step in many NLP
tasks especially in Information extraction, Machine translation [N3]. Word sense
ambiguity arises when a word has more than one sense. Words which have multiple
meanings are called homonyms or polysemous words. The word mouse clearly has
different senses. In the first sense it falls in the electronic category, the computer
mouse that is used to move the cursor in computers and in the second sense it falls
in animal category. The distinction might be clear to the humans but for a computer
to recognize the difference it needs a knowledge base or needs to be trained. Various
approaches have been proposed to achieve WSD: Knowledge-based methods rely on
dictionaries, lexical databases, thesauri, or knowledge graphs as primary resources,
and use algorithms such as lexical similarity measures or graph-based measures.
Supervised methods, on the other hand make use of sense annotated corpora as
training instances. These use machine learning techniques to learn a classifier from
labeled training sets. Some of the common techniques used are decision lists, decision
trees, Naive Bayes, neural networks, support vector machines (SVM).
Finally, unsupervised methods make use of only raw unannotated corpora and do
not exploit any sense-tagged corpus to provide a sense choice for a word in context.
These methods are context clustering, word clustering, and cooccurence graphs.
Supervised methods are by far the most predominant as they generally offer the best
results [N1]. Many works try to leverage this problem by creating new sense annotated
corpora, either automatically, semi-automatically, or through crowdsourcing.
In this work, the idea is to solve this issue by taking advantage of the semantic
relationships between senses included in WordNet, such as the hypernymy, the
hyponymy, the meronymy, and the antonymy. The English WordNet was the first of
GujAGra: An Acyclic Graph to Unify Semantic Knowledge, Antonyms … 153

its kind in this field to be developed. It was devised in 1985 and is still being worked
on today at Princeton University’s Cognitive Science Laboratory [6]. The success
of English WordNet has inspired additional projects to create WordNets for other
languages or to create multilingual WordNets. EuroWordNet is a semantic network
system for European languages. The Dutch, Italian, Spanish, German, French, Czech,
and Estonian languages are covered by the Euro WordNet project [7]. The BalkaNet
WordNet project [8] was launched in 2004 with the goal of creating WordNets for
Bulgarian, Greek, Romanian, Serbian, and Turkish languages. IIT, Bombay, created
the Hindi WordNet in India. Hindi WordNet was later expanded to include Marathi
WordNet. Assamese, Bengali, Bodo, Gujarati, Hindi, Kannada, Kashmiri, Konkani,
Malayalam, Manipuri, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Tamil, Telugu,
and Urdu are among the main Indian languages represented in the Indo WordNet
project [9]. These WordNets were generated using the expansion method, with Hindi
WordNet serving as a kingpin and being partially connected to English WordNet.

4 Software Description

In this section, we describe the salient features of the architecture of the system. The
Gujarati WordNet is implemented on Google Colaboratory platform. To automati-
cally generate semantic networks from text, we need to provide some preliminary
information to the algorithm so that additional unknown relation instances may be
retrieved. We used Indo WordNet, which was developed utilizing the expansion
strategy with Hindi WordNet as a pivot, for this purpose As a result, we manually
created Gujarati antonyms for over 700 words as a tiny knowledge base.

4.1 Software Architecture

Initially, sentence in Gujarati Language is taken as input. A feature of text to speech

is provided for those who are not aware about the pronunciation of the sentence that
is given as input (Fig. 1).
Text Analysis Phase
The text analysis procedure then begins with the elimination of non-letter elements
and punctuation marks from the sentence. This is followed by Tokenization of words.
Each token is saved in a list. Like if input is (jaldi
thik thai jaav tevi shubhkamna), then output of tokenization phase will be
.
Concept Extraction Phase
Then comes concept extraction phase. Here semantically related concepts for each
token term are extracted from the IIT Synset, Gujarati Lexicon, or Bhagwad Go
154 M. Patel and B. K. Joshi

Fig. 1 Software architecture

Mandal which is used to create a collection of synonyms. Antonyms of each tokens

are extracted from the Gujarati antonym knowledge base that was created manually
of more than 700 words. English Translation of each token is searched either from
google translator or Bilingual Dictionary Dataset. Google Translator API is used to
fetch pronunciation of each token. Then, an acyclic graph is formed of token and its
respective concept extracted in Concept Extraction phase.

5 Proposed Algorithm

This section describes a method for producing an acyclic graph, which is essentially a
visualization tool for the WordNet lexical database. Through the proposed algorithm
we wish to view the WordNet structure from the perspective of a specific word in
the database using the suggested technique. Here we have focused on WordNet’s
main relation, the synonymy or SYNSET relation, antonym and the word’s English
translation.
This algorithm is based on what we will call a sense graph, which we formulate as
follows. Nodes in the sense graph comprise the words wi in a vocabulary W together
with the senses sij for those words. Labeled, undirected edges include word-sense
GujAGra: An Acyclic Graph to Unify Semantic Knowledge, Antonyms … 155

edges wi, si,j, which connect each word to all of its possible senses, and sense-sense
edges sij, sij labeled with a meaning relationship r that holds between the two senses.
WordNet is used to define their sense graph. Synsets in the WordNet ontology define
the sense nodes, a word-sense edge exists between any word and every synset to which
it belongs, and WordNet’s synset-to-synset relations of synonymy, hypernymy, and
hyponymy define the sense-sense edges. Figures 4 and 5 illustrate a fragment of a
WordNet- based sense graph.
Key point to observe is that this graph can be based on any inventory of word-sense
and sense-sense relationships. In particular, given a parallel corpus, we can follow
the tradition of translation-as-sense-annotation: the senses of an Gujarati word type
can be defined by different possible translations of that word in any other language.
Operationalizing this observation is straightforward, given a word-aligned parallel
corpus. If English word form ei is aligned with Gujarati word form gj, then ei(gj) is a
sense of ei in the sense graph, and there is a word-sense edge ei, ei(gj). Edges signi-
fying a meaning relation are drawn between sense nodes if those senses are defined
by the same translation word. For instance, English senses Defeat and Necklace both
arise via alignment (Haar), so a sense-sense edge will be drawn between these
sense nodes.

READ Input String

REPAT
remove punctuation marks, stop words
UNTIL end_of_string
convert text to speech
STORE audio_file
split string sentence to words
FOR each word on the board
SEARCH word’s antonym from antonym dataset_list
SEARCH word’s synonyms from online_synset
SEARCH word’s translation & Pronunciation on google
OBTAIN search results
IF result is not null THEN
COMPUTE results
add result nodes with different color
END IF
generate word_net network graph
STORE graph_image
PRINT graph_image
END FOR
156 M. Patel and B. K. Joshi

6 Experiment and Result

For the experimental purpose more than 200 random sentences have been found from
different Guajarati language e-books, e-newspapers, etc. A separate excel document
(file contains 700+ words) named as ‘Gujarati Opposite words.xlsx’ keeping one
word and its corresponding antonym in each row was created. Now for the generation
of the word net graph, google colab is used as it’s an online cloud service provided
by google (standalone system having Jupiter Notebook can also be used).
Firstly, all the APIs are being installed using pip install command. Then importing
required packages for processing of tokens like pywin, tensorflow tokenizer google
translator, and netwrokx. Figure 2 displays the content of ‘sheet 1’ of excel file
named ‘Gujarati Opposite words.xlsx’ using panda (pd) library. Then, the instance
of ‘Tokenzier’ from ‘keras’ api is called for splitting the sentence into number of
tokens as shown in Fig. 3.
Different color coding is used to represent different things. Like Light Blue is used
to represent token in our input string, Yellow color is used to represent synonyms,
Green color is used to represent opposite, Red color is used for English translation
of the token, and Pink color is used to represent pronunciation of the token. Hence if
no work has been done on a particular synset of the Gujarati WordNet, then acyclic
graph will not contain yellow node. As in our example, synonym of (hoy) is

Fig. 2 Reading the excel file

GujAGra: An Acyclic Graph to Unify Semantic Knowledge, Antonyms … 157

Fig. 3 Reading string and generating tokens

not found so acyclic graph is not plotted for the same. Same way if antonym is not
available in knowledge base then green node will be omitted and so on.
Thereafter, a custom function is created which uniquely read the token and calls
different functions for obtaining respective value of synonyms, anonym, English
translation, pronunciation, and then create an acyclic WordNet graph.
Finally, the result is being saved in different dot png files showing the acyclic
WordNet graph for each token as shown in Fig. 4.
We have made acyclic graph to more than 200 sentences through our proposed
system. In some of the cases, we faced challenges. One of which is:
when (Ram e prativilok tyag kariyu) was given as input, then
acyclic graph for the word (prathvilok) is shown in Fig. 5. Here, the linguistic
resource that is used to extract synonyms of (Prathvilok) is Synset provided
by IIT, ID 1427. The concept means the
place meant for all of us to live. But in Synset (Mratyulok) is given as
co-synonym of .

7 Conclusion

The application of a differential theory of lexical semantics was one of WordNet’s

core design concepts. WordNet representations are at the level of word meanings
rather than individual words or word formations. In turn, a term’s meaning is defined
158 M. Patel and B. K. Joshi

Fig. 4 Acyclic WordNet Graph for Word ‘Poonam’

by simply listing the different word forms that might be used to describe it in a
synonym set (synset). Through the proposed architecture, we extracted tokens from
the inputted sentence. Synonyms, antonyms, pronunciation, and translation of these
tokens are identified. Synonyms, antonyms, pronunciation, and translation of the
tokens identified previously are then plotted to form an acyclic graph to give picto-
rial view. Different color coding is used to represent the tokens, its synonyms, its
antonyms, its pronunciation, and its translation (Gujarati or English). We demon-
strated the visualization of WordNet structure from the perspective of a specific
word in this work. That is, we want to focus on a specific term and then survey the
greater structure of WordNet from there. While we did not design our method with
the intention of creating WordNets for languages other than Gujarati, we realize the
possibility of using it in this fashion with other language combinations as well. Some
changes must be made to the system’s architecture, for example, in Concept Extrac-
tion phase, linguistic resources of other languages for providing needed synonyms
have to be made available. But the overall design of displaying the information of
the Gujarati WordNet can be easily applied in developing a WordNet for another
language. We have presented an alternative means of deriving information about
GujAGra: An Acyclic Graph to Unify Semantic Knowledge, Antonyms … 159

Fig. 5 Acyclic WordNet Graph for Word ‘Prathvilok’

senses and sense relations to build sense-specific graphical space representations

of words, making use of parallel text rather than a manually constructed ontology.
Based on the graphs, it would be interesting to evaluate further refinements of the
sense graph: alignment-based senses could be clustered.
160 M. Patel and B. K. Joshi

References

1. Miller GA, Fellbaum C (2007) WordNet then and now. Lang Resour Eval 41(2), 209–214.
http://www.jstor.org/stable/30200582
2. Scheduled Languages in descending order of speaker’s strength - 2001”. Census of India.
https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers_in_India
3. Mikael Parkvall, “Världens 100 största språk 2007” (The World’s 100 Largest Languages
in 2007), in National encyclopedia. https://en.wikipedia.org/wiki/List_of_languages_by_num
ber_of_native_speakers
4. “Gujarati: The language spoken by more than 55 million people”. The Straits Times.
2017–01–19. https://www.straitstimes.com/singapore/gujarati-the-language-spoken-by-more-
than-55-million-people
5. Introduction to Gujarati wordnet (GCW12) IIT Bombay, Powai, Mumbai-400076 Maharashtra,
India. http://www.cse.iitb.ac.in/~pb/papers/gwc 12-gujarati-in.pdf
6. Miller GA (1990) WordNet: An on-line lexical database. Int J Lexicogr 3(4):235–312. Special
Issue.
7. Vossen P (1998) EuroWordNet: a multilingual database with lexical semantic networks. J
Comput Linguist 25(4):628–630
8. Tufis D, Cristea D, Stamou S (2004) Balkanet: aims, methods, results and perspectives: a
general overview. Romanian J Sci Technol Inf 7(1):9–43
9. Bhattacharya P (2010) IndoWordNet. In: lexical resources engineering conference, Malta
10. Narang A, Sharma RK, Kumar P (2013) Development of punjabi WordNet. CSIT 1:349–354.
https://doi.org/10.1007/s40012-013-0034-0
11. Kanojia D, Patel K, Bhattacharyya P (2018) Indian language Wordnets and their linkages
with princeton WordNet. In: Proceedings of the eleventh international conference on language
resources and evaluation (LREC 2018), Miyazaki, Japan
12. Patel M, Joshi BK (2021) Issues in machine translation of indian languages for information
retrieval. Int J Comput Sci Inf Secur (IJCSIS) 19(8), 59–62
13. Patel M, Joshi BK (2021) GEDset: automatic dataset builder for machine translation system
with specific reference to Gujarati–English. In: Presented in 11th International Advanced
Computing Conference held on 18th & 19th December, 2021
Attribute-Based Encryption Techniques:
A Review Study on Secure Access
to Cloud System

Ashutosh Kumar and Garima Verma

1 Introduction

Cloud computing is turning into the principal computing model in the future
because of its benefits, for example, high asset use rate and saving the signif-
icant expense of execution. The existing algorithms for security issues in cloud
computing are advanced versions of cryptography. Mainly cloud computing algo-
rithms are concerned about data security and privacy-preservation of the user. Most
solutions for privacy are based on encryption and data to be downloaded is encrypted
and stored in the cloud. To implement the privacy protection of data owners and data
users, the cryptographic data are shared and warehoused in cloud storage by applying
Cyphertext privacy—Attribute-based encryption (CP-ABE). Algorithms like AES,
DES, and so on are utilized for encoding the information before downloading it to
the cloud.
The main three features of clouds define the easy allocation of resources, a
platform for service management, and massive scalability to designate key design
components of processing and clouds storage. A customer of cloud administrations
might see an alternate arrangement of characteristics relying upon their remarkable
requirements and point of view [1]:
• Location free asset pools—process and storage assets might be located anyplace
that is the network available; asset pools empower reduction of the dangers of
weak links and redundancy,

A. Kumar (B) · G. Verma

School of Computing, DIT University, Dehradun, India
e-mail: [email protected]
G. Verma
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 161
P. Singh et al. (eds.), Machine Learning and Computational Intelligence Techniques
for Data Engineering, Lecture Notes in Electrical Engineering 998,
https://doi.org/10.1007/978-981-99-0047-3_15
162 A. Kumar and G. Verma

• On-demand self-service—the capability to use, manage storage and allocation

of storage, computing, and further business benefits voluntarily without relying
upon support staff,
• Flexible costing—generally any cloud providers work on the “pay as you go”
costing model,
• Network ubiquitous access—the capacity to work with cloud assets from any
point with Internet access,
• Adaptable scalability—As the resource utilized by cloud users changes as on-
demand or time to time so resource allocation in the cloud is done by the cloud
itself as low-demand to peak demand.
Customary models of information security have regularly centered around
network-driven and perimeter security, often with tools such as intrusion detection
systems and firewalls. However, this methodology doesn’t give adequate protection
against APTs, special clients, or other guileful kinds of safety attacks [2]. The encryp-
tion execution should join a vigorous key administration solution for giving insistence
that the keys are sufficiently secured. It’s basic to review the whole encryption and
key administration arrangement. Encryption works by cooperating with other focus
data security advancements, gleaning increased security intelligence, to deliver an
inclusive hybrid approach to the transaction with ensuring sensitive data transmission
of the cloud [3].
In this way, any data-driven system should join encryption, key administration,
minimal access controls, and security understanding to guarantee data in the cloud
and give the basic level of safety [4]. By utilizing a hybrid approach that joins
these fundamental parts, affiliations can additionally foster their security act more
reasonably and successfully than by just worrying exclusively on ordinary association
is driven security procedures [5].
A cloud computing architecture involves a front end and a back end. They partner
with each other over an organization, generally the Internet. Any computer user is
an example of the front end and the “cloud” section stands for the back end of the
system (Fig. 1).
The front end of the cloud computing structure incorporates the client’s gadgets
(or it very well may be an organization association) and a couple of uses are needed
for getting to the distributed computing system. All distributed computing systems
don’t give a comparable interface to customers. Web organizations like electronic
mail programs use some current web programs like Firefox, Apple’s Safari [3].
Various systems have some outstanding applications that give network admit-
tance to their clients. The back end suggests some actual peripherals. In distributed
computing, the back end is the cloud itself which may fuse distinctive processing
machines, servers, and information stockpiling systems. Gatherings of these mists
make an entire distributed computing framework. Theoretically, a distributed
computing system can consolingly any kind of web application program, for
instance, PC games to applications for data dealing with, diversion, and program-
ming improvement. Typically, every application would have its steadfast server for
administration.
Attribute-Based Encryption Techniques: A Review Study on Secure … 163

Fig. 1 Example of cloud computing systems [1]

The paper is divided into mainly five sections. Section one deals with an intro-
duction to research work with an explanation of basic concepts in brief. The second
section is about the background study of cloud computing security issues. The
third section deals with the survey study of the existing studies which are useful
as exploratory data for the research work, and to evaluate the review for new frame-
work designing. This section also presents the tabular form of the survey studies.
The fourth section describes the summary of the literature study done in Sect. 4 and
also presents the research gap analysis. Last Sect. 5 contains concludes the paper.

2 Background of the Review Study

In the current conventional framework, there exist security issues for storing the
information in the cloud. Cloud computing security incorporates different issues like
data loss, authorization of cloud, multi-occupancy, inner threats, spillage, and so
forth. It isn’t difficult to carry out the safety efforts that fulfill the security needs of
all the clients. It is because clients might have dissimilar security concerns relying
on their motivation of utilizing the cloud services [5].
164 A. Kumar and G. Verma

• Data security management: To certify any particular cloud service providers to

hold for data storage, it must have verified security policy and life cycle. Analysis
done to implement this concept recommends that CSPs use encryption strategies
using keys to protect and securely transmit their data.
• Data protection in the cloud: Cloud service provider (CSP) has given a brilliant
security layer for the owner and user. The client needs to guarantee that there is no
deficiency of information or misuse of information for different clients who are
utilizing a similar cloud. The CSPs should be equipped for receiving against digital
assaults. Not all cloud suppliers have the capacity for data protection. Different
techniques are being carried out to annihilate the security issues in cloud storage
of data
• Key management in cryptography: Cryptography is a technique for covering
data to conceal it from unapproved clients [6]. Communicated information is
clouded and delivered in a ciphertext design that is inexplicable and unreadable
to an unauthorized user. A key is utilized to change figure text to plain text. This
key is kept hidden, and approved customers can move toward it [7] (Fig. 2).
Encryption is probably the most secure way of staying away from MitM assaults
because regardless of whether the communicated information gets captured, the
assailant would not be able to translate it. There exist data hypothetically secure plans
that most likely can’t be earned back the original investment with limitless figuring
power—a model is a one-time cushion—yet these plans are harder to execute than
the best hypothetically delicate however computationally secure components.

Fig. 2 Encryption process

in cloud system [6]
Attribute-Based Encryption Techniques: A Review Study on Secure … 165

• Access controls: The security concept in cloud system require the CSP to provide
an access control policy so that the data owner can restrict end-user to access it
from authenticated network connections and devices.
• Long-term resiliency of the encryption system: With most current cryptog-
raphy, the capacity to maintain encoded data secret is put together not concerning
the cryptographic calculation, which is generally known, yet on a number consid-
ered a key that should be utilized with the cryptographic algorithm to deliver an
encoded result or to decode the encoded data. Decryption with the right key is
basic. Decoding without the right key is undeniably challenging, and sometimes
for all practical purposes.

3 Review Study

In 2018, Li Chunhua et al. [7] presented a privacy-preserving access control scheme

named CP-ABE utilizing a multiple-cloud design. By working on the customary
CP-ABE technology and presenting a proxy to steal the private key from the user, it
needs to be certified that the user attribute set can be attained by any cloud, which
successfully secures the protection of the client ascribes. Security analysis presents
the effectiveness of the proposed scheme against man-in-the-middle attacks, user
collusion, and replay attacks.
In 2018, Bramm et al. [8] developed a combined protocol named attribute
management protocol for CP-ABE schemes grounded on the system called a
BDABE-Blockchain Distributed Attribute-Based Encryption scheme. This devel-
opment acknowledges storage, reversal of private attribute keys, and distributed
issues-based adding a contract-driven structure, a blockchain. This upgraded both the
security and effectiveness of key administration in distributed CP-ABE frameworks
for the use of cloud data sharing.
In 2019, Wang et al. [9] projected a new hybrid secure cloud storage model with
access control. This model is a grouping of CP-ABE and Ethereum blockchain.
In 2019, Sarmah [10] reviews the use of blockchain in distributed computing
frameworks. First and foremost, the idea of blockchain is momentarily talked about
with its benefits and disadvantages. Secondly, the idea of cloud computing is momen-
tarily exhibited with blockchain technology. At last, earlier studies are explored and
introduced in tabular form. It directs that the research gaps actually relate to the field
of blockchain-dependent on cloud computing frameworks.
In 2020, Qin et al. [11] proposed a Blockchain-based Multi-authority Access
Control conspire called BMAC for sharing information safely. Shamir’s secret
sharing plan and permission blockchain are acquainted with performance that each
attribute is jointly controlled by multiple authorities to avoid the weak link of failure.
Moreover, it took advantage of blockchain innovation to set up trust among numerous
data owners which cause a reduction of computation and communication overhead
on the user side.
166 A. Kumar and G. Verma

Table 1 Tabular analysis of different Encryption-based cloud computing studies

Authorname/ year Technique name Methodology used Results
Chunhua et al. [7] CPABE Improving the Effectiveness against
customary CP-ABE man-in-the-middle
algorithm attacks, replay, and user
collusion attack
Bramm et al. [8] BDABE For Ciphertext-policy, Enhanced both security
it developed a and productivity of key
combined attribute generation in distributed
management protocol CP-ABE structures
Wang et al. [9] Ethereum blockchain Combination of Proposed a secure cloud
technology Ethereum blockchain storage framework
and CP-ABE
Sarmah [10] A review study of the The research gaps
methods related to the showed the
uses of blockchain in developments in the
cloud computing blockchain method
Qin et al. [11] BMAC This is a mixture of Reduction of
secret information computation and
sharing plans and communication overhead
permission blockchain on the user side
Guo et al. [13] O-R-CP-ABE Assisting of In this proposition, the
blockchains in the qualities of fine-grained
IoMT ecosystem and access control are
cloud servers accomplishes

In Ref. [12], an analytical procedure is presented to review and compare the

existing ABE schemes proposed. For KP-ABE and each sort of CP-ABE, the
comparing access control circumstances are introduced and clarified by substantial
examples.
In Ref. [13], the authors present proficiency in the online/offline revocable CP-
ABE scheme with the guide of cloud servers and blockchains in the IoMT environ-
ment. This proposition accomplishes the qualities of user revocation, fine-grained
access control, ciphertext verification fast encryption, and outsourced decryption.
From the existing studies examined above, we have acquired the inspiration to deal
with digital data sharing with the help of attribute-based encryption and blockchain
[14]. The tabular analysis of different attribute-based encryption and blockchain-
based studies is presented in Table 1.

4 Review Summary

There are some points summarized after surveying the distinctive encryption-based
cloud security strategies for late exploration improvements that are as per the
following:
Attribute-Based Encryption Techniques: A Review Study on Secure … 167

Computing is sorted by its utilization design. Cluster computing, distributed

computing, and parallel computing are notable standards of these classifications.
A cluster computing acts as a group of connected
systems that are firmly combined with rapid networks and work intently together.
Whereas, distributed computing is an assortment of software or hardware frame-
works that contain more than one task or stored data component yet show up as
a single random process running under a tightly or loosely controlled system. In
the distributed system, computers don’t share a memory rather they pass messages
nonconcurrent or simultaneously between them [15]. In addition, parallel processing
is a type of calculation where a major process is divided into different minor or smaller
processes so that these smaller processes can be simultaneously computed.
Cloud computing is a specific type of network, distributed, and utility computing
and it takes a style of network registering where stable and virtualized assets are
accessible as a service over the internet. Moreover, cloud computing technology
gives numerous development elements, for example, on-demand, portal services,
assets versatility, firewall applications, and so on. Nonetheless, these elements are
affected by numerous security issues (security attacks and threats, key distribu-
tion, and cryptographic perspectives) due to an o