What are the benefits and drawbacks of using dropout in CNNs as observed in the document?

Dropout helps in reducing overfitting by randomly ignoring a subset of neurons during training, which forces the network to learn more robust features that are not reliant on specific pathways through the model. However, this comes at the cost of increased training time and potentially slower convergence since the network has to adapt to varying structures throughout the training process .

In what way does the proposed CNN model utilize 10-fold cross-validation, and what advantages does this provide?

The proposed CNN model employs 10-fold cross-validation by dividing the dataset into 10 subsets, training the model on 9 and testing on 1 iteratively, to compute an average accuracy. This method provides a better estimation of the model's performance on unseen data and reduces variance compared to a single split, particularly improving accuracy due to effective generalization across small datasets .

What role does the SoftMax function play in the CNN model used for epileptic seizure detection, and why is it an appropriate choice?

The SoftMax function is used in the output layer to convert the final outputs into probabilities, which are then used for class classification. For epileptic seizure detection, SoftMax is appropriate as it provides a probabilistic framework to classify if input EEG data indicates a seizure or not, accommodating the multi-class nature of the problem .

How does parameter sharing in CNNs contribute to their computational efficiency, as mentioned in the document?

Parameter sharing in CNNs involves using the same filter (weights) across different parts of the input to detect common features like edges, allowing the efficient use of a smaller number of parameters. This leads to reduced memory and computational demands compared to fully-connected layers where each neuron has a unique set of weights .

Discuss the significance of converting the epilepsy dataset into a binary classification problem as observed in the document.

Converting the dataset into a binary classification simplifies the model's task to identifying only two classes: epilepsy and non-epilepsy. This differentiation makes computation more efficient and aligns with the clinical need to identify whether a seizure is present. It reduces complexity and focuses on clinically relevant outcomes .

How does max pooling contribute to computational efficiency in convolutional neural networks (CNNs)?

Max pooling enhances computational efficiency by reducing the spatial dimensions of the input, effectively decreasing the number of parameters and computations in subsequent layers. It selects the maximum value from a region of the input feature map, retaining essential patterns while discarding less important details. This helps in reducing overfitting and computational costs by down-sampling the feature space .

What challenges does small dataset size pose for CNN models and how does the proposed model mitigate them?

Small datasets limit the CNN's ability to generalize, potentially leading to overfitting. The proposed model mitigates this through the use of max pooling, dropout, and advanced validation techniques like 10-fold cross-validation, which ensure that accuracy reflects true model performance rather than fitting noise in the data .

How does the CNN model's architecture address the problem of epileptic seizure detection?

The CNN model processes EEG recordings through its layers, using the ReLU activation function in convolutional layers and SoftMax in the output layer. Max pooling reduces the dimensionality of the feature maps, and dropout helps prevent overfitting. The output layer classifies data into seizure or non-seizure categories, effectively handling the binary classification task .

Why is the Adam optimizer a suitable choice for the CNN model, and what advantages does it offer over other optimizers?

Adam optimizer is suitable for the CNN model due to its efficient handling of large datasets and large gradients. It combines the advantages of both AdaGrad and RMSProp, adapting learning rates for individual parameters, which results in robust convergence performance. This makes Adam highly effective for models with sparse gradients and noisy data, common in EEG analysis .

What role does the sampling rate and data segmentation play in preparing the EEG dataset for CNN modeling?

The sampling rate determines the temporal resolution of the EEG recording, affecting how precisely neural activity is captured. Segmenting the data into uniform parts for each individual ensures consistency in inputs for the CNN, facilitating effective learning as the model is trained on evenly structured data. High temporal resolution (173.61 Hz) allows precise detection of seizure patterns within the consistent segments .

Open navigation menu

Upload

0% found this document useful (0 votes)

3K views528 pages

Advanced Computing

Advanced Computing book

Uploaded by

Rahul Saraf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3K views528 pages

Advanced Computing

Advanced Computing book

Uploaded by

Rahul Saraf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Deepak

Garg
Kit Wong
Jagannathan Sarangapani
Suneet Kumar Gupta (Eds.)

Communications in Computer and Information Science 1367

Advanced Computing
10th International Conference, IACC 2020
Panaji, Goa, India, December 5–6, 2020
Revised Selected Papers, Part I
Communications
in Computer and Information Science 1367

Editorial Board Members

Joaquim Filipe
Polytechnic Institute of Setúbal, Setúbal, Portugal
Ashish Ghosh
Indian Statistical Institute, Kolkata, India
Raquel Oliveira Prates
Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
Lizhu Zhou
Tsinghua University, Beijing, China
More information about this series at [Link]
Deepak Garg Kit Wong
• •

Jagannathan Sarangapani•

Suneet Kumar Gupta (Eds.)

Advanced Computing
10th International Conference, IACC 2020
Panaji, Goa, India, December 5–6, 2020
Revised Selected Papers, Part I

123
Editors
Deepak Garg Kit Wong
Bennett University University College London
Greater Noida, Uttar Pradesh, India London, UK
Jagannathan Sarangapani Suneet Kumar Gupta
Missouri University of Science Bennett University
and Technology Greater Noida, Uttar Pradesh, India
Rolla, MO, USA

ISSN 1865-0929 ISSN 1865-0937 (electronic)

Communications in Computer and Information Science
ISBN 978-981-16-0400-3 ISBN 978-981-16-0401-0 (eBook)
[Link]
© Springer Nature Singapore Pte Ltd. 2021
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface

The 10th International Advanced Computing Conference (IACC 2020) was organized
with the objective of bringing together researchers, developers, and practitioners from
academia and industry working in the area of advanced computing. The conference
consisted of keynote lectures, tutorials, workshops, and oral presentations on all aspects
of advanced computing. It was organized speciﬁcally to help the computer industry to
derive beneﬁts from the advances of next-generation computer and communication
technology. Researchers invited to speak presented the latest developments and tech-
nical solutions in the areas of High Performance Computing, Advances in Commu-
nication and Networks, Advanced Algorithms, Image and Multimedia Processing,
Databases, Machine Learning, Deep Learning, Data Science, and Computing in
Education.
IACC promotes fundamental and applied research which can help in enhancing the
quality of life. The conference was held on 05th-06th December, 2020 to make it an
ideal platform for people to share views and experiences in Futuristic Research
Techniques in various related areas.
The conference has a track record of acceptance rates from 20% to 25% in the last
10 years. More than 10 IEEE/ACM Fellows hold key positions on the conference
committee, giving it a quality edge. In the last 10 years the conference’s citation score
has been consistently increasing, moving it into the top 10% cited conferences globally.
This has been possible due to adherence to quality parameters of review and acceptance
rate without any exception, which allows us to make some of the best research
available through this platform.

December 2020 Deepak Garg

Kit Wong
Jagannathan Sarangapani
Suneet Kumar Gupta
Organization

Honorary Co-chair
Sundaraja Sitharama Florida International University, USA
Iyengar
Sartaj Sahni University of Florida, USA
Jagannathan Sarangapani Missouri University of Science and Technology, USA

General Co-chair
Deepak Garg Bennett University, India
Ajay Gupta Western Michigan University, USA
M. A. Maluk Mohamed M.A.M. College of Engineering and Technology, India

Program Co-chairs
Kit Wong University College London, UK
George Ghinea Brunel University London, UK
Carol Smidts Ohio State University, USA
Ram D. Sriram National Institute of Standards and Technology, USA
Kamisetty R. Rao University of Texas at Arlington, USA
Sanjay Madria Missouri University of Science and Technology, USA
Marques Oge Florida Atlantic University, USA
Vijay Kumar University of Missouri-Kansas City, USA

Publication Co-chair
Suneet K. Gupta Bennett University, India

Technical Program Committee

Hari Mohan Pandey Edge Hill University, UK
Sumeet Dua Louisiana Tech University, USA
Jagannathan Sarangapani Missouri University, USA
Roger Zimmermann National University of Singapore, Singapore
Shivani Goel Bennett University, India
Seeram Ramakrishna National University of Singapore, Singapore
B. V. R. Chowdari Nanyang Technological University, Singapore
Mun Han Bae MIU/Mongolia International University, Mongolia
Selwyn Piramuthu University of Florida, USA
Bharat Bhargava Purdue University, USA
Omer F. Rana Cardiff University, UK
viii Organization

Javed I. Khan Kent State University, USA

Harpreet Singh Wayne State University, USA
Rajeev Agrawal North Carolina A&T State University, USA
P. Prabhakaran St. Joseph University in Tanzania, Tanzania
Yuliya Averyanova National Aviation University, Ukraine
Mohammed M. Banet Jordan University of Science and Technology
Dawid Zydek Idaho State University, USA
Wensheng Zhang Iowa State University
Bal Virdee London Metropolitan University, UK
S. Rao Chintalapudi Pragati Engineering College, India
Qun Wu Harbin Institute of Technology, China
Anh V. Dinh University of Saskatchewan, Canada
Lakshman Tamil University of Texas at Dallas, USA
P. D. D. Dominic Universiti Teknologi Petronas, Malaysia
Muhammad Sabbir Rahman North South University, Bangladesh
Zablon Akoko Mbero University of Botswana, Botswana
V. L. Narasimhan University of Botswana
Kin-Lu Wong National Sun Yat-sen University, Taiwan
P. G. S. Velmurugan Thiagarajar College of Engineering, India
N. B. Balamurugan Thiagarajar College of Engineering, India
Pawan Lingras Saint Mary’s University, Canada
Mahesh Bundele Poornima College of Engineering, India
N. Venkateswaran Sri Sivasubramaniya Nadar College of Engineering,
India
R. Venkatesan National Institute of Ocean Technology, India
Manoranjan Sahoo National Institute of Technology Trichy, India
Venkatakirthiga Murali National Institute of Technology Trichy, India
Karthik Thirumala National Institute of Technology Trichy, India
S. Sundaresh IEEE Madras Section, India
S. Mageshwari National Institute of Technology Trichy, India
Premanand V. Chandramani SSN College of Engineering, India
Mini Vasudevan Ericsson India Pvt. Ltd., India
P. Swarnalatha School of CSE, VIT, India
P. Venkatesh Thiagarajar College of Engineering, India
S. Mercy Shalinie Thiagarajar College of Engineering, India
Dhanalakshmi K. National Institute of Technology Tiruchirappalli, India
M. Marsalin Beno St Xavier’s Catholic College of Engineering, India
K. Porkumaran Dr. N.G.P. Institute of Technology, India
D. Ezhilarasi National Institute of Technology Tiruchirappalli, India
Ramya Vijay SASTRA University, India
S. Rajaram Thiagarajar College of Engineering, India
B. Yogameena Thiagarajar College of Engineering, India
H. R. Mohan IEEE India Council Newsletter, India
S. Joseph Gladwin SSN College of Engineering, India
D. Nirmal Karunya University, India
N. Mohankumar SKP Institute of Technology, India
Organization ix

A. Jawahar SSN College of Engineering, India

K. Dhayalini K. Ramakrishnan College of Engineering, India
C. Jeyalakshmi K. Ramakrishnan College of Engineering, India
B. Viswanathan K. Ramakrishnan College of Engineering, India
V. Jayaprakasan IEEE Madras Section, India
D. Venkata Vara Prasad SSN College of Engineering, India
Jayakumari J. Mar Baselios College of Engineering and Technology,
India
P. A. Manoharan IEEE Madras Section, India
S. Salivahanan IEEE Madras Section, India
P. Santhi Thilagam National Institute of Technology Karnataka, India
Umapada Pal Indian Statistical Institute, Indian
A. Revathi SASTRA Deemed University, India
K. Prabhu National Institute of Technology Karnataka, India
B. Venkatalakshmi Velammal Engineering College, India
S. Suresh NIT Trichy, India
V. Mariappan NIT Trichy, India
T. Senthilkumar Anna University, India
S. Arul Daniel NIT Trichy, India
N. Sivakumaran NIT Trichy, India
N. Kumaresan NIT Trichy, India
R. Gnanadass Pondicherry Engineering College, India
S. Chandramohan College of Engineering, Guindy, India
D. Devaraj Kalasalingam Academy of Research and Education,
India
Avani Vasant Babaria Institute of Technology, India
S. Raghavan National Institute of Technology Trichy, India
J. Williams Agnel Institute of Technology & Design, India
R. Boopathi Rani National Institute of Technology Puducherry, India
Anandan Vellore Institute of Technology, India
R. Kalidoss SSN College of Engineering, India
R. K. Mugelan Vellore Institute of Technology, India
V. Vinod Kumar Government College of Engineering Kannur, India
R. Saravanan VIT, India
S. Sheik Aalam iSENSE Intelligence Solutions, India
E. Srinivasan Pondicherry Engineering College, India
M. Hariharan NIT, Uttarakhand, India
L. Ganesan A.C. Government College of Engineering and
Technology, India
Varun P. Gopi NIT Trichy, India
S. Mary Joans Velammal Engineering College, India
V. Vijaya Chamundeeswari Velammal Engineering College, India
T. Prabhakar GMRIT, India
V. Kamakoti IITM, India
D. Vaithiyanathan NIT Delhi, India
N. Janakiraman KLN College of Engineering, India
x Organization

S. Suresh Banaras Hindu University, India

R. Gobi PRIST University, India
B. Janet National Institute of Technology Tiruchirappalli, India
R. Sivashankar Madanapalle Institute of Technology & Science, India
S. Moses Santhakumar National Institute of Technology Trichy, India
G. Beulah Gnana Ananthi Anna University, India
Bud Mishra New York University, USA
S. Suresh Babu Adhiyamaan College of Engineering, India
T. Ramesh NIT Trichy, India
V. Anatha Krishanan NIT Trichy, India
R. B. Patel MMU Mullana, India
Adesh Kumar Sharma NDRI, India
Gunamani Jena JNTU, Andhra Pradesh, India
Maninder Singh Thapar Institute of Engineering and Technology, India
Gurbinder Singh GNDU, India
Manoj Manuja Chitkara University, India
Ajay Sharma NIT, New Delhi, India
Manjit Patterh UIET, Punjabi University, India
Mayank Dave NIT, Kurukshetra, India
A. L. Sangal NIT Jalandhar, India
C. Suresh Thakur Naval Research Board, India
L. M. Bhardwaj Amity University Noida, India
Sh. Manu Goel Infosys, India
Parvinder Singh DCRUST, India
Gurpal Singh PTU, Fatehgarh, India
M. Syamala Punjab University Chandigarh, India
Lalit Awasthi NIT Jalandhar, India
Ajay Bansal NIT Jalandhur, India
Jyotsna Sengupta Punjabi Univ, India
Ravi Aggarwal Adobe, USA
V. R. Singh New Delhi, India
Sigurd Meldal San José State University, USA
M. Balakrishnan IIT New Delhi, India
Renu Vig University Institute of Engineering & Technology,
Panjab University, India
Malay Pakhira KGEC, India
Savita Gupta University Institute of Engineering & Technology,
Panjab University, India
Lakhwinder Kaur University Institute of Engineering & Technology,
Punjabi University, India
B. Ramadoss NIT Trichy, India
Ashwani Kush Kurukshetra University, India
Manas Ranjan Patra Berhampur University, India
Sukhwinder Singh University Institute of Engineering & Technology,
Panjab University, India
Dharmendra Kumar GJU, India
Organization xi

Chandan Singh Punjabi University, India

Rajinder Nath Kurukshetra University, India
Manjaiah D. H. Mangalore University, India
Himanshu Aggarwal Punjabi University, India
R. S. Kaler Thapar Institute of Engineering and Technology, India
Pabitra Pal Chaudhary Indian Statistical Institute, India
S. K. Pal DRDO, India
G. S. Lehal Punjabi University, India
Rajkumar Kannan Bishop Heber College, India
Yogesh Chaba GJU, India
Amardeep Singh Punjabi Unversity, India
Hardeep Singh GNDU, India
Ajay Rana Amity University, India
Kanwaljeet Singh Punjabi University, India
P. K. Bansal MIMIT, India
C. K. Bhensdadia DD University, India
Savina Bansal GZSCET, India
Mohammad Asger BGSB, India
Rajesh Bhatia Punjab Engineering College, India
Stephen John Turner NTU, Singapore
Om Vikas Ministry of IT, India
Chiranjeev Kumar IIT (ISM) Dhanbad, India
Bhim Singh IEEE Delhi Section, India
Anandha Gopalan Imperial College London, UK
Ram Gopal Gupta IEEE India Council, India
A. K. Sharma YMCA University of Science and Technology, India
Rob Reilly MIT, USA
B. K. Murthy C-DAC, Noida, India
Karmeshu Jawaharlal Nehru University, India
K. K. Biswas IIT Delhi, India
Sandeep Sen IIT Delhi, India
Suneeta Aggarwal MNNIT Allahabad, India
Satish Chand NSIT, India
Savita Goel IIT Delhi, India
Raghuraj Singh HBTI Kanpur, India
Ajeet Kumar Cognizant, India
Varun Dutt IIT Mandi, India
D. K. Lobiyal JNU, India
Ajay Rana Amity University, India
R. S. Yadav MNNIT Allahabad, India
N. Singh NSUT, India
Bulusu Anand IIT Roorkee, India
R. K. Singh BTKIT Dwarahat, India
Sateesh Kumar Peddoju IIT Roorkee, India
Divakar Yadav JIIT, Noida, India
Naveen Kumar Singh IGNOU, India
xii Organization

R. S. Raw NSUT, India

Prabha Sharma IIT Kanpur, India
Ela Kumar GBU, India
Vidushi Sharma GBU, India
Sumit Srivastava Manipal University Jaipur, India
Manish K. Gupta DA-IICT, India
Annappa B. NIT Karnataka, India
Nikhil Pal ISI, India
P. I. Basarkod REVA ITM Bangalore, India
Anil Dahiya Manipal University, India
Gautam Barua IIT Guwahati, India
Anjana Gosain GGSIP University, India
Saroja Devi NMIT, India
P. K. Saxena DRDO, India
B. K. Das ITM University, India
Raghu Reddy IIIT Hyderabad, India
B. Chandra IIT Delhi, India
R. K. Agarwal JNU, India
Basim Alhadidi Al-Balqa’ Applied University, Jordan
B. G. Krishna Space Application Center, ISRO, India
Naveen Garg IIT Delhi, India
K. S. Subramanian IGNOU, India
Vijay Nadadur Tationem, India
Biplab Sikdar National University of Singapore, Singapore
Sreeram Ramakrishna National University of Singapore, Singapore
Vikas Mathur RightCloudz, India
B. V. R. Chaoudhari NUS, Singapore
Hari Krishna Garg Engineering National University of Singapore,
Singapore
Raja Dutta IIT Kharagpur, India
Y. V. S. Lakshmi Center for Development of Telematics, India
Vishakha Vaidya Adobe, USA
Sudipto Shankar Dasgupta Infosys, India
R. Dattakumar VTU, India
Atal Chaudhari Jadavpur University, India
K. Rajinikanth RR School of Architecture, India
Srikanta Murthy PESCE, India
Ganga Boraiah KIMS, India
Ananda Kumar K. R. SJBIT, India
Champa H. N. UVCE, India
S. N. Omkar IISC, India
Bala Ji C-DAC, India
Annapoorna Patil Ramaiah Institute of Technology, India
Chandrashekhar S. N. SJCIT, India
M. Misbahuddin C-DAC, India
Roshini Charles C-DAC, India
Organization xiii

Saroj Meher ISI, India

Jharna Majumdar NMIT, India
Cauvery N. K. RVCE, India
G. K. Patra CSIR, India
Anandi J. Oxford College of Engineering, India
K. V. Dinesha IIIT Bangalore, India
Sunita K. R. BIT, India
Shailaja Ambedkar Institute of Technology, India
Andrzej Rucinski University of New Hampshire, USA
K. R. Murali Mohan DST, GOI, India
Ramesh Paturi Microsoft, India
Chandra Sekharan K. National Institute of Technology Karnataka, India
S. Viswanadha Raju Jawaharlal Nehru Technological University, India
C. Krishna Mohan Indian Institute of Technology Hyderabad, India
S. R. N. Reddy Delhi University, India
R. T. Goswamy Birla Institute of Technology, India
B. Surekha KS Institute of Technology, India
P. Trinatha Rao GITAM University, India
G. Varaprasad BMS College of Engineering, India
M. Usha Rani SPM University, India
Tanmay De NIT Durgapur, India
P. V. Lakshmi SPM University, India
K. A. Selvaradjou Pondicherry University, India
Ch. Satyananda Reddy Andhra University, India
Jeegar Trivedi Sardar Vallabhai Patel University, India
S. V. Raoa Indian Institute of Technology Guwahati, India
Suresh Varma Adikavi Nannaya University, India
Y. Padma Sai VNR Vignana Jyothi Institute of Engineering
& Technology, India
T. Ranga Babu RVR & JC College of Engineering, India
D. Venkat Rao Narasaraopet Inst. of Technology, India
N. Sudhakar Reddy SV College of Engineering, India
Dhiraj Sunehra Jawaharlal Nehru Technological University, India
Madhavi Gudavalli Jawaharlal Nehru Technological University, India
B. Hemantha Kumar RVR & JC College of Engineering, India
N. Srinagesh RVR & JC College of Engineering, India
Bipin Bihari Jayasingh CVR College of Engineering, India
M. Ramesh Jawaharlal Nehru Technological University, India
P. Rajeshwari GITAM University, India
R. Kiran Kumar Krishna University, India
M. Dhana Lakshmi Jawaharlal Nehru Technological University, India
P. Raja Rajeswari Andhra University, India
O. Srinivasa Rao Jawaharlal Nehru Technological University, India
D. Ramesh Jawaharlal Nehru Technological University, India
B. Kranthi Kiran Jawaharlal Nehru Technological University, India
R. V. G. Anjaneyulu National Remote Sensing Centre, NRSC, India
xiv Organization

A. Nagesh MGIT, India

P. Sammulal Jawaharlal Nehru Technological University, India
G. Narasimha Jawaharlal Nehru Technological University, India
B. V. Ram Naresh Yadav Jawaharlal Nehru Technological University, India
B. N. Bhandari JNTUH, India
O. B. V. Ramanaiah VNRVJIET, India
M. Malini Osmania University, India
Anil Kumar Vuppala IIIT Hyderabad, India
Golla Vara Prasad College of Engineering, India
M. Surya Kalavathi JNTUH, India
Duggirala Srinivasa Rao JNTUH, India
Makkena Madhavi Latha JNTUH, India
L. Anjaneyulu NIT Warangal, India
K. Anitha Sheela JNTUH, India
B. Padmaja Rani JNTUH College of Engineering Hyderabad, India
S. Mangai Velalar College of Engg & Tech., India
P. Chandra Sekhar Osmania University, India
Mrityunjoy Chakraborty IIT Kharagpur, India
Manish Shrivastava IIIT Hyderabad, India
Uttam Kumar Roy Jadavpur University, India
Kalpana Naidu IIT Kota, India
A. Swarnalatha St. Joseph’s College of Engineering, India
Aaditya Maheshwari Techno India NJR Institute of Technology, India
Ajit Panda National Institute of Science and Technology (NIST),
India
Amit Kumar Infosys Technologies, India
R. Anuradha Sri Ramakrishna Engineering College, India
Anurag Goswami Bennett University, India
B. G. Prasad BMS College of Engineering, India
R. Balaji Ganesh NIT Trichy, India
Chung Trinity College Dublin, Ireland
D. Murali VEMU Institute of Technology, India
Deepak Padmanabhan Queen’s University Belfast, UK
Dinesh G. Harkut Ram Meghe Institute of Engineering and Management,
Badnera, India
Dinesh Manocha University of Maryland, USA
Firoz Alam RMIT University, Australia
Frederic Andres National Institute of Informatics, Japan
G. Kishor Kumar RGMCET, India
G. L. Prajapati Devi Ahilya University, India
Gaurav Varshney IIT Jammu, India
Geeta Sikka Dr B R Ambedkar National Institute of Technology,
India
K. Giri Babu VVIT, India
Gopal Sakarkar GHRCE, India
Gudivada Venkat Naidu East Carolina University, USA
Organization xv

Gurdeep Hura University of Maryland, USA

G. V. Padma Raju S.R.K.R., India
Yashodhara V. Haribhakta Government College of Engineering Pune, India
Haritha Dasari SRK Institute of Technology, India
Harsh Dev PSIT Kanpur, India
Yashwantsinh Jadeja Marwadi University, India
Jagdish Chand Bansal South Asian University, India
Vinit Jakhetiya Indian Institute of Technology (IIT) Jammu, India
Singaraju Jyothi Sri Padmavati Mahila Visvavidyalayam, India
K. Subramanian IIT Kanpur, India
Kalaiarasi Sonai Muthu Multimedia University, Malaysia
Anbananthen
Kalyana Saravanan Kongu Engineering College, India
Annathurai
K. Kotecha Symbiosis Institute of Technology, India
K. K. Patel Charotar University of Science & Technology
(CHARUSAT) India
Kokou Yetongnon University of Burgundy, France
Kuldeep Sharma Jain University, India
Sumalatha Lingamgunta JNTU Kakinada, India
Latika Duhan Sushant University, India
Luca Saba University of Cagliari, Italy
M. Arun VIT University, India
M. G. Sumithra KPR Institute of Engineering and Technology, India
M. Mary Shanthi Rani Deemed University Gandhigram, India
Suneetha Manne VR Siddhartha Engineering College, India
Milind Shah Fr. C. Rodrigues Institute of Technology, India
Mohammed Asghar Baba Ghulam Shah Badshah University, India
Nagesh Vadaparthi MVGR College of Engineering, India
Navanath Saharia IIIT Manipur, India
Neeraj Kumar Thapar Institute of Engineering and Technology, India
Neeraj Mittal University of Texas at Dallas, USA
Nikunj Tahilramani Adani Institute of Infrastructure Engineering (AIIE),
India
Nobel Xavier Glassdoor, USA
Om Vikash ABVIITM Gwalior, India
S. N. Omkar Subbaram Indian Institute of Science, India
Laxmi Lydia VIIT, India
P. Arul Sivanatham Muscat College, Oman
P. K. Banssal Quest Group of Institutions, India
Venkata Padmavati Metta Bhilai Institute of Technology, India
Parteek Bhatia Thapar Institute of Engineering & Technology, India
Pradeep Kumar Indian Institute of Management Lucknow, India
Prashant Singh Rana Thapar Institute of Engg. & Tech., India
Pushpender Sarao Hyderabad Institute of Technology and Management,
India
xvi Organization

Krishnan Rangarajan Dayananda Sagar College of Engineering, India

M. Naresh Babu National Institute of Technology Silchar, India
R. Priya Vaijayanthi GMR Institute of Technology, India
Saravanan R. VIT University, India
Radhika K. R. BMSCE, India
Rajkumar Buyya University of Melbourne, Australia
Rajanikanth Aluvalu Vardhaman College of Engineering, India
Ramakanth Kumar P. R V College of Engineering, India
Roshani Raut Vishwakarma Institute of Information Technology,
India
Suresh Babu Adhiyamaan College of Engineering, India
Sabu M. Thampi Indian Institute of Information Tech & Mgt-Kerala,
India
Sajal K. Das Missouri Univ. of Science and Technology, USA
Samayveer Singh Ambedkar National Institute of Technology, India
Sanjeevikumar Padmanaban Aalborg University, Denmark
Sanjeev Pippal GL Bajaj Institute of Management & Technology, India
Santosh Saraf KLS Gogte Institute of Technolgy, India
Satyadhyan Chickerur KLE Technological University, India
Saurabh Kumar Garg University of Tasmania, Australia
Shachi Natu TSE College, India
Shailendra Aswale SRIEIT, India
Shirin Bhanu Koduri Vasavi College of Engineering, India
Shom Das Biju Patnaik University of Technology, India
Shreenivas Londhe Vishwakarma Institute of Information Technology,
India
Shweta Agrawal SIRT, SAGE University, India
Shylaja S. S. PES University, India
Srabanti Maji DIT University, India
Sudipta Roy Assam University, India
Suneet Gupta Bennett University, India
Sunil Kumar Khatri Amity University Tashkent, Uzbekistan
Tummala Ranga Babu R.V.R. & J.C. College of Engineering, India
Tessy Mathew MBCET, India
Anandakrishnan V. National Institute of Technology Tiruchirappalli, India
V. Gomathi National Engineering College, Kovilpatti, India
Vaibhav Gandhi B H Gardi College of Engineering & Technology, India
Vaibhav Anu Montclair State University, USA
Koppula Vijaya Kumar CMR College of Engg. & Tech., India
Vikram Bambedkar Cognizant Technology Solutions, Australia
Virendrakumar Bhavsar Univ. of New Brunswick, Canada
Vishnu Pendyala San José State University, USA
Vishnu Vardhan B. JNTUH College of Engineering Manthani, India
M. Wilscy Saintgits College of Engineering, Kottayam, India
Yamuna Prasad IIT Jammu, India
Organization xvii

Nishu Gupta Vaagdevi College of Engineering (Autonomous), India

B. Surendiran National Institute of Technology (NIT) Puducherry,
India
Bhadri Raju M. S. V. S. S.R.K.R. Engineering College, India
Deepak Poola IBM India Private Limited, India
Edara Sreenivasa Reddy Acharya Nagarjuna University, India
Seung Hwa Chung Samsung, South Korea
Anila Rao MLRIT, India
B. Raveendra Babu RVR & JC College of Engineering, India
V. K. Jain Mody University, India
Abhishek Shukla R.D. Engineering College, India
Ajay Shiv Sharma Melbourne Institute of Technology, Australia
G. Singaravel K.S.R. College of Engineering, India
M. Nageswara Rao KL University, India
K. Suvarna Vani V R Siddhartha Engineering College, India
Amit Sinha ABES Engg. College, India
Md. Dilshad Anasari CMRCET, India
Mainak Biswas JIS University, India
Abhinav Tomar IIT(ISM) Dhanbad, India
A. Obulesu Anurag University, India
Dattatraya V. Kodavade D.K.T.E Society’s Textile & Engineering Institute,
India
Arpit Bhardwaj Bennett University, India
Gaurav Singal Bennett University, India
Madhushi Verma Bennett University, India
Tanveer Ahmed Bennett University, India
Vipul Kumar Mishra Bennett University, India
Tapas Badal Bennett University, India
Mayank Swankar Bennett University, India
Hiren Thakkar Bennett University, India
Shakti Sharma Bennett University, India
Sanjeet Kumar Nayak Bennett University, India
Shashidhar R. Bennett University, India
Indrajeet Gupta Bennett University, India
Kuldeep Chaurasia Bennett University, India
Tanmay Bhowmik Bennett University, India
Sridhar Swaminathan Bennett University, India
Rohan Sharma Bennett University, India
Suchi Kumari Bennett University, India
Vijay Kumar Bohat Bennett University, India
Simranjit Singh Bennett University, India
Deepak Singh Bennett University, India
Nidhi Chahal Bennett University, India
Vijaypal Singh Rathor Bennett University, India
Mohit Sajwan Bennett University, India
Gunjan Rehani Bennett University, India
xviii Organization

Apeksha Agrawal Bennett University, India

Samya Muhuri Bennett University, India
Rahul Kumar Verma Bennett University, India
Suman Bhattacharjee Bennett University, India
Ankur Gupta Bennett University, India
Contents – Part I

Application of Artificial Intelligence and Machine Learning

in Healthcare

Epileptic Seizure Detection Using CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Divya Acharya, Anushna Gowreddygari, Richa Bhatia, Varsha Shaju,
S. Aparna, and Arpit Bhardwaj

Residual Dense U-Net for Segmentation of Lung CT Images Infected

with Covid-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Abhishek Srivastava, Nikhil Sharma, Shivansh Gupta,
and Satish Chandra

Leveraging Deep Learning and IoT for Monitoring COVID19 Safety

Guidelines Within College Campus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Sahai Vedant, D’Costa Jason, Srivastava Mayank, Mehra Mahendra,
and Kalbande Dhananjay

A 2D ResU-Net Powered Segmentation of Thoracic Organs at Risk Using

Computed Tomography Images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Mohit Asudani, Alarsh Tiwari, Harsh Kataria, Vipul Kumar Mishra,
and Anurag Goswami

A Compact Shape Descriptor Using Empirical Mode Decomposition

to Detect Malignancy in Breast Tumour. . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Spandana Paramkusham, Manjula Sri Rayudu, and Puja S. Prasad

An Intelligent Sign Communication Machine for People Impaired

with Hearing and Speaking Abilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Ashish Sharma, Tapas Badal, Akshat Gupta, Arpit Gupta,
and Aman Anand

Features Explaining Malnutrition in India: A Machine Learning Approach

to Demographic and Health Survey Data . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Sunny Rajendrasingh Vasu, Sangita Khare, Deepa Gupta,
and Amalendu Jyotishi

Surveillance System for Monitoring Social Distance . . . . . . . . . . . . . . . . . . 100

Sahil Jethani, Ekansh Jain, Irene Serah Thomas, Harshitha Pechetti,
Bhavya Pareek, Priyanka Gupta, Venkataramana Veeramsetty,
and Gaurav Singal
xx Contents – Part I

Consumer Emotional State Evaluation Using EEG Based Emotion

Recognition Using Deep Learning Approach . . . . . . . . . . . . . . . . . . . . . . . 113
Rupali Gill and Jaiteg Singh

Covid Prediction from Chest X-Rays Using Transfer Learning . . . . . . . . . . . 128

D. Haritha and M. Krishna Pranathi

Machine Learning Based Prediction of H1N1 and Seasonal

Flu Vaccination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Srividya Inampudi, Greshma Johnson, Jay Jhaveri, S. Niranjan,
Kuldeep Chaurasia, and Mayank Dixit

A Model for Heart Disease Prediction Using Feature Selection

with Deep Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Vaishali Baviskar, Madhushi Verma, and Pradeep Chatterjee

CovidNet: A Light-Weight CNN for the Detection of COVID-19 Using

Chest X-Ray Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Tejalal Choudhary, Aditi Godbole, Vaibhav Gattyani, Aditya Gurnani,
Aditi Verma, and Aditya Babar

Using Natural Language Processing for Solving Text and Language

related Applications

Analysis of Contextual and Non-contextual Word Embedding Models

for Hindi NER with Web Application for Data Collection . . . . . . . . . . . . . . 183
Aindriya Barua, S. Thara, B. Premjith, and K. P. Soman

NEWS Article Summarization with Pretrained Transformer . . . . . . . . . . . . . 203

Apar Garg, Saiteja Adusumilli, Shanmukha Yenneti, Tapas Badal,
Deepak Garg, Vivek Pandey, Abhishek Nigam, Yashu Kant Gupta,
Gyan Mittal, and Rahul Agarwal

QA System: Business Intelligence in Healthcare . . . . . . . . . . . . . . . . . . . . . 212

Apar Garg, Tapas Badal, and Debajyoti Mukhopadhyay

Multidomain Sentiment Lexicon Learning Using Genre-Seed Embeddings . . . 224

Swati Sanagar and Deepa Gupta

Deep Learning Based Question Generation Using T5 Transformer . . . . . . . . 243

Khushnuma Grover, Katinder Kaur, Kartikey Tiwari, Rupali,
and Parteek Kumar

Improving Word Recognition in Speech Transcriptions by Decision-Level

Fusion of Stemming and Two-Way Phoneme Pruning . . . . . . . . . . . . . . . . . 256
Sunakshi Mehra and Seba Susan
Contents – Part I xxi

MultiDeepFake: Improving Fake News Detection with a Deep

Convolutional Neural Network Using a Multimodal Dataset . . . . . . . . . . . . . 267
Rohit Kumar Kaliyar, Arjun Mohnot, R. Raghhul, V. K. Prathyushaa,
Anurag Goswami, Navya Singh, and Palavi Dash

English-Marathi Neural Machine Translation Using Local Attention . . . . . . . 280

K. Adi Narayana Reddy, G. Shyam Chandra Prasad,
A. Rajashekar Reddy, L. Naveen Kumar, and Kannaiah

NLP2SQL Using Semi-supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 288

H. Vathsala and Shashidhar G. Koolagudi

RumEval2020-An Effective Approach for Rumour Detection with a Deep

Hybrid C-LSTM Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
Rohit Kumar Kaliyar, Rajbeer Singh, Sai N. Laya, M. Sujay Sudharshan,
Anurag Goswami, and Deepak Garg

Speech2Image: Generating Images from Speech Using Pix2Pix Model . . . . . 313

Ankit Raj Ojha, Abhilash Gunasegaran, Aruna Maurya,
and Spriha Mandal

SQL Query from Portuguese Language Using Natural

Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
Carlos Fernando Mulessiua da Silva and Rajni Jindal

Misogynous Text Classification Using SVM and LSTM . . . . . . . . . . . . . . . 336

Maibam Debina Devi and Navanath Saharia

Active Learning Enhanced Sequence Labeling for Aspect Term Extraction

in Review Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
K. Shyam Sundar and Deepa Gupta

Using Different Neural Network Architectures

for Interesting Applications

Intuitive Feature Engineering and Machine Learning Performance

Improvement in the Banking Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
S. Teja, B. Chandrashekhar, Eswar Reddy, Hrishikesh Jha,
K. Nageswara, Mathew Joseph, Jaideep Matto, and Richard K. Bururu

A Weighted Ensemble Approach to Real-Time Prediction of Suspended

Particulate Matter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
Tushar Saini, Gagandeep Tomar, Duni Chand Rana, Suresh Attri,
and Varun Dutt

DualPrune: A Dual Purpose Pruning of Convolutional Neural Networks

for Resource-Constrained Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
Tejalal Choudhary, Vipul Mishra, and Anurag Goswami
xxii Contents – Part I

Incremental Ensemble of One Class Classifier for Data Streams

with Concept Drift Adaption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Shubhangi Suryawanshi, Anurag Goswami, and Pramod Patil

Detection of Ransomware on Windows System Using Machine Learning

Technique: Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
Laxmi B. Bhagwat and Balaji M. Patil

Leading Athlete Following UAV Using Transfer Learning Approach. . . . . . . 424

Shanmukha Sai Sumanth Yenneti, Riti Kushwaha, Smita Naval,
and Gaurav Singal

Image Forgery Detection & Localization Using Regularized U-Net . . . . . . . . 434

Mohammed Murtuza Qureshi and Mohammed Ghalib Qureshi

Incorporating Domain Knowledge in Machine Learning for Satellite

Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Ambily Pankajakshan, Malay Kumar Nema, and Rituraj Kumar

Enabling Oil Production Forecasting Using Machine Learning . . . . . . . . . . . 452

Bikash Kumar Parhi and Samarth D. Patwardhan

TABot – A Distributed Deep Learning Framework for Classifying Price

Chart Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
Matthew Siper, Kyle Makinen, and Raman Kanan

Multi-class Emotion Classification Using EEG Signals . . . . . . . . . . . . . . . . 474

Divya Acharya, Riddhi Jain, Siba Smarak Panigrahi, Rahul Sahni,
Siddhi Jain, Sanika Prashant Deshmukh, and Arpit Bhardwaj

MaskNet: Detecting Different Kinds of Face Mask for Indian Ethnicity . . . . . 492
Abhinav Gola, Sonia Panesar, Aradhna Sharma,
Gayathri Ananthakrishnan, Gaurav Singal,
and Debajyoti Mukhopadhyay

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505

Contents – Part II

Using AI for Plant and Animal Related Applications

Tomato Leaf Disease Prediction Using Transfer Learning. . . . . . . . . . . . . . . 3

R. Sangeetha and M. Mary Shanthi Rani

Amur Tiger Detection for Wildlife Monitoring and Security . . . . . . . . . . . . . 19

Shankho Boron Ghosh, Ketan Muddalkar, Balmukund Mishra,
and Deepak Garg

Classification of Plant Species with Compound and Simple Leaves Using

CNN Fusion Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
P. G. Mary Sobha and Princy Ann Thomas

A Deep Learning-Based Transfer Learning Approach for the Bird

Species Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Tejalal Choudhary, Shubham Gujar, Kruti Panchal, Sarvjeet,
Vipul Mishra, and Anurag Goswami

Applications of Blockchain and IoT

Integration of Explainable AI and Blockchain for Secure Storage of Human

Readable Justifications for Credit Risk Assessment . . . . . . . . . . . . . . . . . . . 55
Rahee Walambe, Ashwin Kolhatkar, Manas Ojha, Akash Kademani,
Mihir Pandya, Sakshi Kathote, and Ketan Kotecha

Blockchain Based Approach for Managing Medical Practitioner Record:

A Secured Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Neetu Sharma and Rajesh Rohilla

iSHAB: IoT-Enabled Smart Homes and Buildings. . . . . . . . . . . . . . . . . . . . 83

V. Lakshmi Narasimhan

Implementation of Blockchain Based Distributed Architecture

for Enhancing Security and Privacy in Peer-To-Peer Networks . . . . . . . . . . . 94
Kriti Patidar and Swapnil Jain

Application of Neural Networks to Simulate a Monopole Microstrip

Four-Tooth-Shaped Antenna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Zufar Kayumov, Dmitrii Tumakov, and Angelina Markina
xxiv Contents – Part II

Use of Data Science for Building Intelligence Applications

RECA: Deriving the Rail Enterprise Confluence Architecture . . . . . . . . . . . . 123

V. Lakshmi Narasimhan

Energy-Aware Edge Intelligence for Dynamic Intelligent

Transportation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Shajulin Benedict

A Single Criteria Ranking Technique for Schools Based on Results

of Common Examination Using Clustering and Congenital Weights . . . . . . . 152
Dillip Rout

Innovations in Advanced Network Systems

5G Software-Defined Heterogeneous Networks in Intra Tier with

Sleeping Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Rohit S. Waghmare, Hemlata Patil, and Sujata Kadam

Performance Investigation of MIMO-OFDM Based Next Generation

Wireless Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Balram Damodhar Timande and Manoj Kumar Nigam

DLC Re-Builder: Sketch Based Recognition and 2-D Conversion of Digital

Logic Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Maitreyi Sharma, Sonal Nipane, Rachita, Krupa N. Jariwala,
and Rasika Khade

Design of I/O Interface for DDR2 SDRAM Transmitter Using gpdk

180 nm Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Jayashree C. Nidagundi

Advanced Algorithms for Miscellaneous Domains

Improved SMO Based on Perturbation Rate in Local Leader Phase . . . . . . . . 231

Naveen Tanwar, Vishnu Prakash Sharma, and Sandeep Kumar Punia

Generation of Pseudo Random Sequence Using Modified Newton

Raphson Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
Aakash Paul and Shyamalendu Kandar

Smoothening Junctions of Engineering Drawings Using C2 Continuity . . . . . 263

Paramita De

An Efficient Privacy Preserving Algorithm for Cloud Users . . . . . . . . . . . . . 273

Manoj Kumar Shukla, Ashwani Kumar Dubey, Divya Upadhyay,
and Boris Novikov
Contents – Part II xxv

An Upper Bound for Sorting Rn with LRE. . . . . . . . . . . . . . . . . . . . . . . . . 283

Sai Satwik Kuppili, Bhadrachalam Chitturi, Venkata Vyshnavi Ravella,
and C. K. Phani Datta

Programmable Joint Computing Filter for Low-Power

and High-Performance Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Abhineet Bawa, Rama Kanta Choudhury, Chandra Kanta Samal,
and Navneet Yadav

Novel Design Approach for Optimal Execution Plan and Strategy for Query
Execution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
Rajendra D. Gawali and Subhash K. Shinde

New Approaches in Software Engineering

An Iterative and Incremental Approach to Address Regulatory Compliance

Concerns in Requirements Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
Deepti Balaji Raykar, L. T. JayPrakash, and K. V. Dinesha

State Space Modeling of Earned Value Method for Iterative Enhancement

Based Traditional Software Projects Tracking . . . . . . . . . . . . . . . . . . . . . . . 336
Manoj Kumar Tyagi, Ajay Sikandar, Dheerendra Kumar Tyagi,
Durgesh Kumar, Prashant Singh, Srinivasan Munisamy,
and L. S. S. Reddy

Agile Planning and Tracking of Software Projects Using the State-Space

Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Manoj Kumar Tyagi, Dheerendra Kumar Tyagi, Lalit Kumar Tyagi,
Neha Tyagi, Durgesh Kumar, and Ajay Sikandar

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373

About the Editors

Dr. Deepak Garg is Director NVIDIA Bennett Research

Center on AI, Director [Link] and Chair, CSE,
Bennett University with 30 SCI publications and 700
million INR funding in Grants He has *1000 citation
count and h-index of 16 in Google Scholar. He has 60
Scopus papers and in total 130+ publications including
IEEE Transactions, Springer and Elsevier Journals.
Worked in national and international committees on
ABET, NBA and NAAC accreditation. Known as algo-
rithm and Deep Learning gurus in India and consulting
[Link]. He is an active Blogger with Times of
India with nickname as “Breaking Shackles”. Homepage
[Link]

Kit Wong received the BEng, the MPhil, and the PhD
degrees, all in Electrical and Electronic Engineering, from
the Hong Kong University of Science and Technology,
Hong Kong, in 1996, 1998, and 2001, respectively. Since
August 2006, he has been with University College London.
Prof. Wong is Fellow of IEEE and IET. He is Area
Editor for IEEE Transactions on Wireless Communica-
tions, and Senior Editor for the IEEE Communications
Letters and IEEE Wireless Communications Letters.

Dr. Sarangapani Jagannathan is at the Missouri

University of Science and Technology (former University
of Missouri-Rolla) where he is a Rutledge-Emerson
Distinguished Professor of Electrical and Computer Engi-
neering and served as a Site Director for the NSF
Industry/University Cooperative Research Center on
Intelligent Maintenance Systems. His research interests
include neural networks and learning, secure
human-cyber-physical systems, prognostics, and autono-
mous systems/robotics. He has co-authored with his stu-
dents 176 peer reviewed journal articles, 289 refereed
IEEE conference articles, authored/co-edited 6 books,
received 21 US patents and graduated 30 doctoral and 31
xxviii About the Editors

M.S thesis students. He received many awards including

the 2020 best AE award from IEEE SMC Transactions,
2018 IEEE CSS Transition to Practice Award, 2007
Boeing Pride Achievement Award, 2000 NSF Career
Award, 2001 Caterpillar Research Excellence Award, and
many others. He is a Fellow of the IEEE, National
Academy of Inventors, and Institute of Measurement and
Control, UK and Institution of Engineering and Technol-
ogy (IET), UK.

Suneet Kumar Gupta is a Asst. Professor in the

Department of Computer Science Engineering at Bennett
University, Gr. Noida. His current research interests are
Wireless Sensor Network, Internet of Things, Natural
Language Processing and Brain-Computer Interaction.
Presently Dr. Gupta has completed a Wireless Sensor
Network based project funded by Department of Science
and Technology, Uttar Pradesh.
Dr. Gupta is also part of a project funded by the Royal
Academy of Science London entitled with [Link].
He has more than 45 research articles, authored 2 books
and 2 book chapters in his account.
Application of Artificial Intelligence
and Machine Learning in Healthcare
Epileptic Seizure Detection Using CNN

Divya Acharya1(B) , Anushna Gowreddygari2 , Richa Bhatia3 , Varsha Shaju4 ,

S. Aparna5 , and Arpit Bhardwaj1
1 Bennett University, Greater Noida, India
{da9642,[Link]}@[Link]
2 VNR Vignana Jyothi Institue of Engineering and Technology, Hyderabad, India
3 VESIT College, Mumbai, Maharashtra, India
[Link]@[Link]
4 Vidya Academy of Science and Technology, Thrissur, Kerala, India
5 Saintgits College of Engineering, Kottayam, Kerala, India

aparnasanthosh1911@[Link]

Abstract. Epilepsy is one of the most devastating diseases in the history of

mankind. A neurological disorder in which irregular transmission of brain waves
result in seizures and physical and emotional imbalance. This paper presents the
study of effective use of Convolutional Neural Network (CNN) a deep learning
algorithm for epileptic seizure detection. This innovative technology will help the
real world to make diagnosis of the disease faster with greater accuracy. A binary
class epilepsy dataset is considered with 179 feature extracted as the attributes.
Binary and multiclass classification is performed by considering the class with
epileptic activity against all non-epileptic class. For classification CNN model is
proposed which achieved an accuracy of 98.32%. Then by applying a different
multiclass data set having 4 classes, the degree of generalization of our model is
also checked as the accuracy of end results was high. For 10-fold cross validation
and 70–30 data splitting method our models for performance is evaluated using
various performance metrics.

Keywords: Epilepsy · EEG · Deep learning · CNN · Machine learning

1 Introduction
Brain waves are electrical brain impulses. The behavior, emotions and thoughts of an
individual within our brains are communicated between neurons. Brainwaves are pro-
duced by synchronized electric pulses from neuron masses that communicate with each
other. Brainwaves happen at different frequencies [1]. Some are fast, and others are
slow. Such EEG (Electroencephalogram) Bands are generally called delta, theta, alpha,
and beta and gamma and are measured in cycles per second or hertz(Hz). Irregularity
in these waves results in several problems ranging from irregular sleeping patterns to
several neural diseases like epilepsy. An EEG (Electroencephalography) can be used to
identify possible issues relevant to the irregularity in brainwaves [2].
Electroencephalography (EEG) is a method of electrophysiological monitoring to
record activities of the brain [3]. It is a non-invasive method in which electrodes are

© Springer Nature Singapore Pte Ltd. 2021

D. Garg et al. (Eds.): IACC 2020, CCIS 1367, pp. 3–16, 2021.
[Link]
4 D. Acharya et al.

placed along the scalp. During EEG electrodes with wires are attached to one’s head.
This electrode detects the brain waves and the EEG machine amplifies it and later the
wave pattern is recorded on screen or paper [4]. Most commonly used for evaluating the
form and origin of seizure.
Epilepsy is a neurological disorder in which brain activity becomes abnormal result-
ing in the sensations, and loss of consciousness, exhibition of seizures, unusual behavior
[5]. The two main types of seizures include focal and generalized seizures. The focal
seizure is those which start at a particular part of the brain and are named after the origin.
Generalized seizures are those in which the brain misfires and result in muscle spasms
and blackouts.
Using new and emerging technologies like deep learning, this research paper is able
to make a directional change in the field of medical science. This can be used as a prime
tool in diagnosis, the most important phase in medical science. Epilepsy being one of the
most complicated diseases, it needs accurate detection facilities. EEG signals recorded
are analyzed by neuro physician and related specialists [6]. This detection and diagnosis
method depends solely on the decision of humans are susceptible to human prone errors
and is really time-consuming. Using deep learning algorithms, an automated alternative
solution is found that is faster and less error-prone thereby increasing the patient’s quality
of life [7].
A detailed study of CNN (Convolutional Neural Network) for epileptic seizure detec-
tion is presented in this research paper. The network performance is tested using four
approaches: a combination of a 10-fold cross-validation method and two databases
(binary and multiclass). The results are presented using the confusion matrix and as
well as by plotting accuracy-loss graphs. The overall performance of our model and
the results obtained from this study prove that the superiority of our CNN based deep
learning technique to effectively detect epileptic seizure.

2 Related Work
As deep learning is one of the most emerging and advanced technology now, there are
numerous effective studies done in order to effectively incorporate it in different means
of life. Like any other, epilepsy detection using deep learning have already undergone
different assessment by scholars. Here some of the state-of-the-art work is described
which were done earlier.
Sirwan Tofiq and Mokhtar Mohammad [8] mentioned how deep neural networks
allow learning directly on the data can be useful. In almost all machine learning applica-
tions, this approach has been hugely successful. They created a new framework which
also learns directly from the data, without extracting a set of functions. The EEG signal
is segmented into 4 segments and used to train the memory network in the long and
short term. The trained model is used to discriminate against the background of the EEG
seizure. The Freiburg EEG data set is used. Approximately 97.75% accuracy is achieved.
Vikrant Doma and Martin pirouz [9] conducted an in-depth analysis of the EEG data
set epoch and conducted a comparative study of multiple machine learning techniques
like SVM, K-nearest neighbor, LDA, Decision trees. The accuracy was between 55–75%.
Epileptic Seizure Detection Using CNN 5

The study carried out by a group of researchers Mi Li, Hongpei Xu, Xingwang
Liu, Shengfu Liu [10] used various EEG channel combinations and classified the emo-
tional states into two dimensions mainly valence and arousal. Entropy and energy than
measured as neighboring K-nearest characteristics. The accuracy was ranging from
89–95%.
Jong-Seob Yun and Jin Heon Kim [11] used the DEAP data set to classify emotions
by modeling the artificial neural network, k-NN, and SVM models by selecting EEG
training data based on the Valence as well as the Arousal values calculated using the
SAM (Self-Assessment Manikin) process methods. Accuracy of 60–70% was shown.
Ramy Hussein, Hamid Palangi, Rabab Ward, Z. Jane Wang [12] used the LSTM
network for the classification of their model. SoftMax functions were also found a
handful in their research. But the approach was found noisy and robust in real-life
situations.

3 Methodology

In this section proposed system architecture of detecting epilepsy using CNN and the
architecture of proposed CNN is described next.

3.1 Proposed System Architecture

Data is the basis for any machine learning based classification problem. Data collection
is a crucial task as data gathered will affect the model used for classification problem.
Initially input data of epilepsy is taken. After the data is collected it is preprocessed
as the real world data is noisy, incomplete, and inconsistent. To resolve such issues
preprocessing is done on the data set. After collecting the data it is divide into training
and testing data, where training data is large and testing data is smaller than the training
data. Then an appropriate model which suits our data set and our requirements is selected.
In our project, CNN model is taken for classifying the data into two groups. The model
makes the prediction in two classes. Class 0 is for patients suffering from epileptic
seizures and class 1 is patients not suffering.

3.2 Proposed Convolutional Neural Network Model (CNN)

The Convolutional Neural Network (CNN) is class of Deep Neural Network, most widely
used for working with 2d image data, although it can be used to work with 1d and 3d
data also. CNN was inspired from the biological process and its architecture is similar
to the connections between the neurons in the human brain. The name CNN refers to
the network using a mathematical operation called convolution.
A CNN usually consists of an input layer, an output layer and multiple hidden layers
sometimes only one hidden layer is present. Typically, the hidden layers of a CNN
consist of a series of convolutional layers which converge with a multiplication or other
dot product (Fig. 1).
6 D. Acharya et al.

Fig. 1. System architecture of prediction algorithm

CNN model as the ability to learn the filters and these filters are usually smaller
in size than the input and dot product is applied between the filter sized patch of the
input and the filter which is added to get a value. Sometimes the size of the output data
may be different from the input data so to retain the size and make it equal padding
is done. Specifically, it is possible to inspect and visualize the two-dimensional filters
learned by the model to discover the types of features that the model can detect, and it is
possible to inspect the activation maps produced by convolutional layers to understand
precisely what features were detected for a given input. Compared to other classification
algorithms the pre-processing required for CNN is much lower.
CNN is a method of information processing which is influenced by the way infor-
mation is processed by the biological neural network which is the human brain. The
main goal is to build a system that performs specific computational tasks faster than the
conventional systems. These tasks include the identification and classification of pat-
terns, approximation, optimization and clustering of data. This includes a huge number
of highly interconnected processing units that integrate to solve a particular problem.
Messages passing through the network can influence the configuration of ANN when a
neural network changes or learns depending on the I/O.
Figure 2 shows the architecture of our CNN model. The CNN model consists of an
input layer and output layer and 4 hidden layers which has a dense layer. The data set is
fed to the input layer and filters are applied which produce an output which are fed as
input to the next layer.
Max pooling and dropout are applied on the 1d convolutional layers to avoid over-
fitting of the data and to reduce the computational costs. Max pooling will take the
maximum value from the previous layer as the neuron in the next layer, and dropout
reduces the number of neurons performing in the convolution to reduce computation
cost. Relu activation function is used for the convolutional layers and SoftMax func-
tion is used on the output layer. The output layer classifies the data into seizure or not
seizure.
Epileptic Seizure Detection Using CNN 7

Fig. 2. Architecture of CNN Model

4 Experimental Results

This section contains the description of dataset used, implementation of proposed CNN
model, and discussion and analysis on results obtained as follows:

4.1 Dataset

Figure 3 shows the dataset and is held by the UCI machine learning repository [13].
It is a preprocessed dataset. The sampling rate of the data was 173.61 Hz. It includes
11500 rows and 179 attributes with the closing attribute representing the class. The data
set includes the recording of 500 people’s brain activity. At a given time point, each
data point represents the EEG recording value. So for 2.3.6 s, there are 500 individuals
in total with 4097 data points. These 4097 statistical points were divided into 23 parts,
each part containing 178 data factors, and each data point represents the EEG recording
value at an exceptional time factor. Now the dataset have 23 * 500 = 11500 excerpts
of facts (rows) and each record incorporates 178 data points. The last column shows
the values of y that are 1, 2, 3, 4, 5. The dataset is converted into binary class problem
having epilepsy and non-epilepsy for classification.
The values of y are given in 179 dimensional input vector column and the explana-
tory variables are referred to as X1, X2,…., X178. The label in y represents the EEG
recordings of people taken from different states.
Class 2 to 5 are the records with people not having epileptic seizures and class 1 is
for people suffering from epileptic seizure. In this research paper we have implemented
a binary classification and multi class classification by means of thinking about class 1
as people who are epileptic and the rest of the classes are blended and made right into
a class 0 that is taken into consideration to be the folks who are not epileptic for binary
class.
8 D. Acharya et al.

Fig. 3. Data set

4.2 Implementation of CNN Model

Our proposed CNN model is a 1 dimensional fully connected sequential model with
an input layer an output layer and 4 hidden layers which has one dense layer. Imple-
mented the model using two approaches one is by splitting the data set in 70–30 ratio
and the other by 10 fold cross validation method. For the implementation of the CNN
algorithm, Kera’s API was used to develop the CNN model with input size as 178 × 1.
The data set was divided into training and testing in the ratio 70:30. The input is fed into
the CNN architecture with a sequence of convolutional and pooling layers. Max pooling
and dropout is applied on the convolutional layers to avoid overfitting and to reduce the
computational cost. Padding was applied for each layer and a stride of 2 was applied.
SoftMax and ReLu were used as the activation functions and Adam as the optimizer.
Compilation of the CNN model was done by specifying the loss function as “categorical
cross-entropy” and evaluation metric as “accuracy”. Training of the CNN model was
done with batch-size equal to 16 for 200 epochs. When validated with the test set, an
accuracy of 97.19% was obtained.
In the 10-fold cross validation model K-Fold was imported
from sklearn.model_selection package and the number of folds were taken as 10.
The same model was implemented within 10 folds each with batch size 20 and
200 epochs and accuracy was calculated for each fold and the mean accuracy of
all the folds was taken, and the model achieved an accuracy of 98.32%. Using 10
fold cross validation, better accuracy is provide for both testing and training and
it is also beneficial when the data set size is small.
Epileptic Seizure Detection Using CNN 9

Table 1. Hyperparameters for the proposed CNN Model

Parameters Chosen values

Epoch 200
Batch size 16
Loss function Categorical cross entropy
Optimizer Adam
Metrics Accuracy

To ensure that the results were valid and generalizable to make predictions from new
data the detection was further tested on a different multiclass data set. For the new data
set validation a splitting of the training and testing data in 70 and 30 ratio and 10 fold
cross validation is done. When the dataset is divided into 70–30 ratio is it has obtained an
accuracy of 77%. When 10 fold cross validation was implemented the obtained accuracy
is 90.2%.
Convolutional Neural Networks are computationally efficient in terms of memory
and time because of parameter sharing. They tend to perform better than regular neural
networks. However, CNN has high computational cost and training is slow if you don’t
have a good GPU. In addition, they demand large training data in order to make accurate
classifications.
Table 1 shows the hyper-parameters used for CNN model. A lot of hyper-parameter
tuning was carried out while finalizing the Network parameters. In the CNN architecture,
Conv1D layers are used because it is most suitable for time series data. Both Max
pooling and average pooling was tried but, max pool gave better results as expected from
the literature. Other parameters like the number of epochs, batch size, optimizer, loss
function, activation functions and learning rates were finalized using the Grid Search.
The epoch size finalized for the CNN architecture is 200 with batch size of 16. The
models are trained on various train test splits like 80–20 and 75–25 and K-fold cross
validation with 10 folds is also used for finding the most appropriate metrics-accuracy.
The loss function used by them for updating the weights during back-propagation is
categorical cross entropy and the optimizer used is Adam. Activation function for the
last layers is SoftMax and ReLu.

4.3 Results and Analysis

Collected the multiclass data set to predict epileptic seizures and have preprocessed the
data to fill in the missing values and performed binary classification on the data to
predict if the patient has epileptic seizure or doesn’t have epileptic seizure. Proposed
CNN model is used as the classifier. The data was split into training and testing in the
ratio 70–30 split and in 10-fold cross validation to check the performance of our model
classifying epileptic or not epileptic data. The performance was evaluated using different
performance metrics such as accuracy, recall, precision and f1 score.
10 D. Acharya et al.

Fig. 4. (a) loss vs epoch performance graph of the proposed CNN model for 70–30 validation
technique (b) accuracy vs epoch graph of the proposed CNN model for 70–30 validation technique

4.3.1 Results Obtained for Binary Classification of Proposed CNN Model

Results obtained by splitting the data set into training and testing in the ratio of 70–30
and 10-fold cross validation for the binary data set are described next:
Figure 4 shows the accuracy vs epoch and loss vs epoch graphs for the CNN model
where the validation technique used is splitting the data in 70 and 30 ratios. Categorical
Loss entropy function is used to calculate loss and accuracy is used as our metric. From
the figure it’s clear that our model is minimizing loss up to 0.032. The results show that
the model has achieved higher accuracy and loss is low compared to the accuracy which
means our model has got lower FP and FN values. Therefore, the model has achieved
average accuracy up to 97.72% for 200 epochs.

Fig. 5. (a) loss vs epoch performance graph of the proposed CNN model for 10-fold cross val-
idation technique (b) accuracy vs epoch graph of the proposed CNN model for 10-fold cross
validation

Figure 5 shows the accuracy vs epoch and loss vs epoch graphs for the CNN model
where the validation technique used is 10-fold cross validation. From the Figure it’s clear
that our model is minimizing loss up to 0.02. The results show that CNN has achieved
Epileptic Seizure Detection Using CNN 11

higher accuracy and loss is low compared to the accuracy. Therefore, achieved accu-
racy up to 98.32% for 200 epochs of the testing phase for the 10-fold cross valida-
tion. Also, here validation loss has deviated from training loss but the deviation is not
too much which indicates our model is not over fitted and they are no overlapping, which
indicates our model is neither under-fitted.
A confusion matrix is a table that compares the actual values to the predicted values,
therefore evaluating the performance of the classifier on test data to which the true values
are known. Fig. 6 and Fig. 7 show that our model is able to classify the classes correctly.
The matrix shows high TP and TN values compared to the low FP and FN values therefore
this can be stated that our model is able to predict correct samples correctly with higher
accuracy.

Fig. 6. Confusion matrix of the proposed CNN model for 70–30 validation technique

Fig. 7. Confusion matrix of the proposed CNN model for 10-fold cross validation technique

A binary classification was done to predict epileptic seizures. The data set was
divided into training and testing with 70 and 30 ratio and the accuracy obtained was
97.72%. When the 10 fold cross validation was applied the model obtained a slightly
12 D. Acharya et al.

better accuracy of 98.32% than the previous one as shown in Table 2. Therefore, it is
stated that cross validation of the data set has given better accuracy. In the Table 2 various
performance metrics were also evaluated such as precision, recall, f1 score.
The proposed CNN model has achieved recall of 97.65% and 99.71% for 70–30
and 10-fold cross validation data partition method. In terms of precision also our model
has achieved 96.64% and 91.21% values for 70–30 and 10-fold cross validation data
partition method.
Since the dataset was highly unbalanced as more samples of non-epileptic data was
there so F1-score is also calculated for the proposed CNN model. For 70–30 and 10-fold
cross validation data partition method F1-score obtained is 97.14% and 95.27% which
proves that our proposed CNN based epilepsy classification model is able to handle and
accurately classify unbalanced dataset too as shown in Table 2.

Table 2. Performance Metrics for CNN Model

Validation technique Accuracy (%) Precision (%) Recal l(%) F1-score (%)
70 and 30 ratio 97.72 96.64 97.65 97.14
Cross validation 98.32 91.21 99.71 95.27

4.3.2 Results Obtained for Multiclass Classification of Proposed CNN Model

Results obtained by splitting the data set into training and testing in the ratio of 70–30
and 10-fold cross validation for the multiclass data set are described next:

Fig. 8. (a) accuracy vs epoch graph of the proposed CNN model for 70–30 validation technique
for multiclass dataset (b) loss vs epoch performance graph of the proposed CNN model for
70–30 validation technique for multiclass dataset

Figure 8 shows the accuracy vs epoch and loss vs epoch graphs for the CNN model
where the validation technique used is splitting the data in 70–30 ratios. Categorical
Epileptic Seizure Detection Using CNN 13

Loss entropy function is used to calculate loss and accuracy is used as our metric.
From the figure it’s clear that our model is minimizing loss up to 0.5. The results show
that our model has achieved higher accuracy and loss is low compared to the accu-
racy which means our model has got lower FP and FN values. It has achieved average
accuracy of 78% for 200 epochs.
Figure 9 shows the accuracy vs epoch and loss vs epoch graphs for the CNN model
where the validation technique used is 10-fold cross validation. From the figure it’s clear
that our model is minimizing loss up to 0.2. The results show that our CNN model has
achieved higher accuracy and loss is low compared to the accuracy which means our
model has got lower FP and FN values. Therefore, the proposed model has achieved
accuracy of 89.40% for 200 epochs of the testing phase for the 10-fold cross validation.
Also, here validation loss has deviated from training loss but the deviation is not too
much which indicates our model is not overfitted and they are no overlapping, which
indicates our model is neither underfitted.

Fig. 9. (a) accuracy vs epoch graph of the proposed CNN model for 10-fold cross validation
for multiclass data set (b) loss vs epoch performance graph of the proposed CNN model for 10-
fold cross validation for multiclass data set

A confusion matrix is a table that compares the actual values to the predicted values,
therefore evaluating the performance of the classifier on test data to which the true
values are known. Fig. 10 and Fig. 11 show that our model is able to classify the classes
correctly. The matrix shows high TP and TN values compared to the low FP and FN
values therefore we can say that our model is able to predict correct samples correctly
with a good accuracy.
In the Table 3, values for different performance metrics such as precision, recall,
f1 score for 4 different classes is represented. In the data set class 0 is Sad, class 1 is
Amusement, class 2 is Disgust and class 3 is Fear. The overall accuracy for the multiclass
data set using the 70 and 30 ratios of splitting of data is 78.9% and a loss of 0.5. For
the 10-fold cross validation the overall accuracy is 89.40% and a loss of 0.3. Therefore,
10-fold cross validation has given better results in terms of accuracy and loss and other
performance metrics also.
14 D. Acharya et al.

Fig. 10. Confusion matrix of the proposed CNN model for 70–30 validation technique having
multiclass data set

Fig. 11. Confusion matrix of the proposed CNN model for the 10-fold cross validation technique
having multiclass data set

The proposed CNN model has achieved highest value of recall for multiclass classi-
fication as 0.94 in class 0 for 70–30 and 0.85 in class 3 for 10-fold cross validation data
partition method. In terms of precision for multiclass classification also our model has
achieved highest value as 0.93 and 0.94 by class 2 in both the dataset partition method
i.e., 70–30 and 10-fold cross validation.
Since the dataset was highly unbalanced as more samples of non-epileptic data was
there so F1-score is also calculated for the proposed CNN model. For 70–30 and 10-fold
cross validation data partition method the highest F1-score obtained is 0.70 in class 0
and 0.80 in class 0 which proves that our proposed CNN based epilepsy classification
model is able to handle and accurately classify unbalanced dataset too as shown in Table
3.
Epileptic Seizure Detection Using CNN 15

Table 3. Comparison of precision, Recall, F1-score of CNN model

Validation Classifier CNN

method Class Precision Recall F1Score
70–30 0 0.56 0.94 0.70
1 0.88 0.06 0.11
2 0.93 0.32 0.48
3 0.49 0.51 0.50
10-fold cross Class Precision Recall F1Score
validation 0 0.87 0.73 0.80
1 0.78 0.75 0.77
2 0.94 0.64 0.75
3 0.63 0.85 0.73

5 Conclusion and Future Scope

The first step in diagnosing a neurological disorder is always to assess if an EEG record
indicates irregular or regular brain activity. Since manual EEG perception is a costly
and time-consuming process, any classifier automating this first distinction will have the
capability to minimize delays in treatment and alleviate the caregivers. In this research
paper a new CNN model architecture is designed to be open and flexible and therefore
suitable for analyzing EEG time series data and accurately diagnosing whether or not a
person has Epilepsy. This novel CNN architecture outperforms the best accuracy previ-
ously reported and thus establishes a new benchmark. The performance of the proposed
model was assessed using 70–30 and 10 fold cross-validation train-test split method. In
both architectures, average of 95.02% accuracy is achieved.
In future we would implement other deep learning models like Recurrent Neural
Network, cascaded-hybrid models to increase the accuracy of the automated epilepsy
detection system.

Acknowledgment. This research work is performed under the nation wise initiative leadingin-
[Link] and Bennett University, India. They have supported us with lab and equipment during the
experiments.

References
1. Bhardwaj, A., et al.: An analysis of integration of hill climbing in crossover and mutation
operation for EEG signal classification. In: Proceedings of the 2015 Annual Conference on
Genetic and Evolutionary Computation (2015)
2. Acharya, D., Goel, S., Bhardwaj, H., Sakalle, A., Bhardwaj, A.: A long short term memory
deep learning network for the classification of negative emotions using EEG signals. In: 2020
International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, pp. 1–8 (2020).
[Link]
16 D. Acharya et al.

3. Bhardwaj, H., et al.: Classification of electroencephalogram signal for the detection of epilepsy
using innovative genetic programming. Expert Syst. 36(1), e12338 (2019)
4. Acharya, D., et al.: An enhanced fitness function to recognize unbalanced human emotions
data. Expert Syst. Appl. 166, 114011 (2020)
5. Acharya, U.R., et al.: Application of entropies for automated diagnosis of epilepsy using EEG
signals: a review. Knowl.-Based Syst. 88, 85–96 (2015)
6. Acharya, D., et al.: Emotion recognition using fourier transform and genetic programming.
Appl. Acoust. 164, 107260 (2020)
7. Acharya, D., et al.: A novel fitness function in genetic programming to handle unbalanced
emotion recognition data. Pattern Recogn. Lett. 133, 272–279 (2020)
8. Jaafar, S.T., Mohammadi, M.: Epileptic Seizure Detection using Deep Learning Approach.
UHD J. Sci. Technol. 3(41), 41–50 (2019). [Link]
9. Doma, V., Pirouz, M.: A comparative analysis of machine learning methods for emotion
recognition using EEG and peripheral physiological signals. J. Big Data 7(1), 1–21 (2020).
[Link]
10. Li, M., Xu, H., Liu, X., Liu, S.: Emotion recognition from multichannel EEG signals using
k-nearest neighbour classification. Technol. Health Care 26(S1), 509–519 (2018). [Link]
org/10.3233/THC-174836
11. Yun, J.-S., Kim, J.H.: A Study on “training data selection method for EEG emotion analysis
using machine learning algorithm.” Int. J. Adv. Sci. Technol. 119, 79–88 (2018). [Link]
org/10.14257/ijast.2018.119.07
12. Hussein, R., Palangi, H., Ward, R., Wang, Z.J.: Epileptic seizure detection: a deep learning
approach, March 2018. arXiv:1803.09848 [[Link]]
13. Andrzejak, R.G., Lehnertz, K., Rieke, C., Mormann, F., David, P., Elger, C.E.: Indications
of nonlinear deterministic and finite dimensional structures in time series of brain electrical
activity: dependence on recording region and brain state. Phys. Rev. E 64, 061907 (2001)
Residual Dense U-Net for Segmentation
of Lung CT Images Infected
with Covid-19

Abhishek Srivastava1 , Nikhil Sharma2 , Shivansh Gupta1 ,

and Satish Chandra1(B)
1
Department of CSE and IT, Jaypee Institute of Information
Technology Noida, Noida, India
[Link]@[Link]
2
Department of Electronics and Communication Engineering,
Jaypee Institute of Information Technology Noida, Noida, India

Abstract. The novel coronavirus disease 2019 (Covid-19) has been

declared as a pandemic by the World Health Organization which in the
current global scenario has brought everything from economy to edu-
cation to a halt. Due to its rapid spread around the globe, even the
most developed countries are facing difficulties in diagnosing Covid-19.
For efficient treatment and quarantining of the exposed population it is
important to analyse Lung CT Scans of the suspected Covid-19 patients.
Computer aided segmentation of the suspicious Region of Interest can
be used for better characterization of infected regions in Lung. In this
work a deep learning-based U-Net architecture is proposed as a frame-
work for automated segmentation of multiple suspicious regions in a CT
scan of Covid-19 patient. Advantage of Dense Residual Connections has
been taken to learn the global hierarchical features from all convolution’s
layers. So, a better trade-off in between efficiency and effectiveness in a
U-Net can be maintained. To train the proposed U-Net system, publicly
available data of Covid-19 CT scans and masks consisting of 838 CT
slices has been used. The proposed method achieved an accurate and
rapid segmentation with 97.2%, 99.1% and 99.3% as dice score, sensitiv-
ity and specificity respectively.

Keywords: Covid-19 · Deep learning · Dense residual networks ·

U-net · Segmentation

1 Introduction

SARS-COV-2 generally known as COVID-19 is a novel contagious virus whose

ﬁrst case was reported from Wuhan, China in late December 2019 [1] where a
bunch of people were found suﬀering from pneumonia due to an unknown cause
and since then the virus has spread almost all part of the globe. In micro genome
study it was found that the strains of SARS-Cov-2 are highly similar to that of
c Springer Nature Singapore Pte Ltd. 2021
D. Garg et al. (Eds.): IACC 2020, CCIS 1367, pp. 17–30, 2021.
[Link]
18 A. Srivastava et al.

bat [1] whose primary sources were considered to be wet markets. High transmis-
sion rate of the novel COVID-19 is so threatening that it has forced humankind
to take shelters for long lock down periods. It created a threatening situation of
increasing clinical treatment forcing medical workers to work round the clock to
help the infected beings risking their own life. It is observed that a COVID-19
positive will infect roughly three new susceptible (the reproductive number [2]
is averaged to be 3.28) and the number increases even more if precautions are
not taken. Symptoms in patients infected with Covid-19 vary from person to
person based on immune response, with some patients remaining asymptomatic
[3], but the common ones are fever, cough, fatigue and breathing problems. It
was reported that [4] that 44% of the patients from China suffered from fever
in the beginning whereas that 89% of them developed a fever while in hospi-
tal [5]. It was also revealed later that the patients had varying symptoms like
cough (68%), fatigue (38%), sputum production (34%), and shortness of breath
(19%) and some of them who already were suffering from other illness where
more vulnerable to the impact of COVID-19. Not every community has suffi-
cient infrastructure for dealing with outbreaks like this, so there is a need to do
whatever we can to control.
A standard procedure is recommended by the World Health Organization
(W.H.O.) to test the presence of pathogens in the suspected host known as Real-
Time Fluorescence (RT-PCR) [6] for the in this procedure an oropharyngeal or
a nasopharyngeal swab is used to collect the specimen of a suspected being to
determine the nuclei acid in the sputum [7]. Still due to its high false positive rate,
resampling of the suspected person is suggested by W.H.O. Computer Tomog-
raphy (CT scan) imaging technique is one of the good options for the diagnosis
of SARS- CoV2 virus [8]. With demand of finding a Vaccine for the COVID-19
(SARs-COV2) many laboratory and pharmaceutical industries are working to
design vaccine based on immune response, targeting specific epitopes for bind-
ing sites. But part from these classic and important procedures and researches
it was discovered that subjects infected COVID-19 form abnormalities such as
bilateral, and unilateral pneumonia involves the lower lobes, pleural thickening,
pleural effusion, and lymphadenopathy, which is then analyzed by experts for
such characteristics features for diagnosis. Computer Aided Diagnosis (CAD)
tools help in better diagnosis from the CT scans [9] are based on some applica-
tion of machine learning algorithms. Moreover, CT scans improved false negative
rate compared to RT-PCR. Several studies have exploited deep learning archi-
tectures for various applications in medical imaging viz. lesion segmentation,
object/cell detection, tissue segmentation, image registration, anatomy localiza-
tion etc. Dice similarity coefficient is widely used to validate the segmentation
of white matter lesions in MRIs and CT scans [10]. In a recent work, Chen et
al. [11] proposed a residual attention U-net for automatic quantification of lung
infection in Covid-19 cases. They used aggregated residual transforms ResNet
blocks on the encoder side followed by soft attention. It is focused on relative
position of features on the decoder side in a U-Net like architecture evaluated
for multi class segmentation on Covid-19 data from Italian Society of Medical
Residual Dense U-Net 19

and Interventional Radiology (SIRM). It contained 110 axial CT images of 60

patients. A DICE score of 0.83 was obtained without data augmentation com-
pared to a DICE score of 0.75 with baseline U-net. Shan et al. [12] developed a
deep learning based segmentation using VB-Net that is, a modified 3-D convolu-
tional network V-net with bottleneck to make it faster to obtain volume, shape,
point of interest (POI) alongside contours on data from 249 Covid-19 patients
for training with a human in the loop strategy (HITL) and achieved a median
DICE score of 0.922.
In another work, Wu et al. [13] proposed a joint classification and segmenta-
tion system using a Res2Net mode. A Res2Net model was used for classification
with an attempt to explain the latent features by visualizing the spatial response
regions from the final Convolution layer of the model and for segmentation. A
VGG-16 model with deep supervisions was implemented to obtain the segmented
image of the CT scans. This joint model was trained on 2794 CT images from
150 patients and achieved a DICE score of 0.783. Zhou et al. [14] proposed a seg-
mentation scheme using res dil blocks i.e. residue blocks with dilated convolution
layers which caused the model to extract features at different granularities. The
model also used attention blocks and deep supervision on the U-net architecture.
CT scans can be used for the early detection of the Covid-19 and quarantining
the active host in order to control the inexorable virus.
The work presented in this work is inspired by simple ladder U-net [15] which
was designed especially for the segmentation of Bio-Medical Images bridging
with Dense Residual Blocks [16] to enable the use of the hierarchical features
from all the convolution layers along with the residual connections. Therefore,
managing the features gradient explosion, without any explicit parameter opti-
mization. Section 2 describes the materials and methods and the proposed deep
learning architecture. In Sect. 3 experimental results are discussed followed by
conclusions.

2 Materials and Methods

2.1 Dataset

Medical scans and data are usually private as they contain the information
of patients making it hard to access publicly. But due to the rapid spread of
Covid-19 many researches and organizations have released datasets which can
be accessed publicly for CAD development. This research is based on two pub-
licly available datasets described below.

COVID-CT. This CT- Scans based Covid-19 dataset [17]1 consists of 349 CT
images containing clinical ﬁndings of Covid-19 and numerous Normal patients’
slices.

1
[Link]
20 A. Srivastava et al.

Fig. 1. Two diﬀerent masks consolidation and pleural eﬀusion for a Covid-19 patient
which was the prime task as multi-class segmentation from CTSeg dataset [18].

CT-Seg. This CT-Scan COVID-19 segmentation-based dataset [18] was made

publicly accessible on 13th April 2020, this dataset includes whole volumes of 9
patients, providing both positive i.e. 373 out of the total of 829 slices were label
as COVID-positive with ﬁndings of consolidation (=1) and pleural eﬀusion (=2)
and rest were label as COVID-negative. Dataset included niftii CT volumes and
niftii COVID-19 masks accessible free of costs through their online site (Fig. 1)2 .

2.2 Deep Learning Architecture

In this section the components of the proposed model viz. Dense Residual Blocks,
U-Net and Residual Connections are described in length.

Residual Blocks. Residual blocks [19] are a special case of highway networks
without any gates in their skip connections. Essentially, residual blocks allow
the ﬂow of memory from initial layers to last layers and avoiding training of
some parameters for our output segmentation. Despite the absence of gates in
their skip connections, residual networks perform as good as any other highway
network in practice.
Residual Block ease the training of few layers due to its skip connection by
producing an identity function which makes the model to learn the F(x) part
which is easier to learn than the H(x) part as mentioned in Fig. 2. We deployed
several residual blocks on the encoder decoder parts to avoid gradient vanishing
during training.

U-Net. The U-net architecture is designed mainly for segmentation of Bio Med-
ical images. The encoder part comprises of several Fully Connected Networks
(FCN) [20] to extract the spatial features from the subject, similarly decoder
is equipped with series of convolution, up-sample layers and skip connections
between the two, to retain the features from each encoder levels. But range of
interest of U-Net is very small, and do not have enough capability to distinguish
those trivial diﬀerence.

2
[Link]
Residual Dense U-Net 21

Fig. 2. Canonical form of ResNet Block. A skip connection allows reusing of activations
from previous layer till current layer learns its weights hence avoiding vanishing gradient
in the initial back propagation.

Dense Residual Connection. Dense Residual blocks [21] consists of densely

connected structure where every Convolutional layer feed forward its learned rep-
resentations to every succeeding layer in the block. Dense Residual Connection
are designed to extract and adaptively fuse features from all hierarchical layers
by exploiting these features efficiently, which are obtained from each level of con-
volutional layers. RDB as shown in the Fig. 4 uses the adaptive route mechanism
for learning more effective features from current and preceding local feature in
order to stabilize the training of encoder. RDB, can’t only read a state from
the preceding block via a contiguous memory (CM) mechanism, which avoid
stacking up of numerous features all together to make it efficient in memory
consumption. The accumulated features are then adaptively preserved by local
feature fusion (LFF). These layers have relatively fewer channels (a.k.a growth
rate) as higher number of channels affects the stability and due to such dense
connections, it does not need to learn dispensable feature maps. Essential low-
level feature maps are also sustained and the construction of the segmentation
maps employ both high and low-level feature representations which improves the
quality of the segmented image.
The continuous feed-forward from Local Residual Learning ensures restora-
tion of the lost local features along with a deep and thorough extraction of
relevant information in the foreground of the CT scans from shallow or initial
convolution layers [23] with contiguous memory mechanism analogous to LSTM
[24]. This memory, aids a vital role in maintaining a long-term relationship. The
final concatenation in between RDB block and the Cth convolution layer pro-
duces the final output and makes a local residual block this deals with the issue
of gradient vanishing by improving gradient flow due to the easy propagation of
gradients through the network, eventually leading to a shorter pathway for an
inadequate latent feature space. The RrDB block as shown in Fig. 5 is a collec-
tive unit of 3 RDB Blocks connected in sequence for the cohesive extraction of
22 A. Srivastava et al.

Fig. 3. U-network for segmentation of bio-medical imagesas proposed by Ronneberger

et al.

Fig. 4. Residual dense block consisting of dense connected layers, local residual learning
through the Rd feature maps produced due to concatenation of feature map obtained
through densely connected [Bd,1, Bd,2, Bd,3, Bd,4, Bd,5] and Rd-1, leading to a con-
tiguous memory (CM) mechanism and improve the information ﬂow.

all signiﬁcant information from input low-resolution subject to high-resolution

quality images as done with image restoration [25] through RrDB blocks with-
out Batch Normalization [26] for restoring ﬁne level of information for diﬀerent
textures within a CT Scan (Fig. 3).

3 Proposed Model

The proposed architecture shown in Fig. 6 is based on the U-Net architec-

ture integrated with Dense Residual Network. Spatial features extracted from
Encoder do not make full utilization of the hierarchical features which can help
in focusing on the region and denoising any unwanted region that incorporates
during CT scan acquisitions resulting in poor performance. In the Decoder part
we included residual networks which are used for mapping the segmentation
Residual Dense U-Net 23

map from extracted spatial and hierarchical features from all convolution layers
in encoder. Full description of the model layers s provided in the Table 1 along
with the hyper parameters used during training process Table 2.

Fig. 5. 3-RrDB Network consisting of RDB block which is used in later stage for
Encoder stem of U-Net through which information ﬂow from input is processed through
Global Residual Learning by concatenating the feature maps produced through Local
Residual Learning of 3-RrDB blocks and feature maps produced by the Encoder of U-
net. The extracted feature maps from the encoder branch are passed through 3-RrDB
Network blocks and concatenated with feature maps of Encoder to give rise to Global
Residual Pooling.

Rg = R0 + [RdI ] (1)
These feature maps obtained through 3-RrDB Network is fed into Decoder
part of RrDB-U-Net. A skip connection is added from each filter level from
encoder straight with decoder at every interval in order to get better precise
locations. The traditional CNN used in the decoder often have limited receptive
field which creates a shallow feature map of the encoder output. The dense
blocks are a continuous memory mechanism preserves both the low dimensional
features as well as high dimensional features of encoder output which is shown
in Eq. (2 to 8).
X → C1 → X1 (2)
(X, X1 ) → C2 → X2 (3)
(X, X1 , X2 ) → C3 → X3 (4)
(X, X1 , X2 , X3 ) → C4 → X4 (5)
(X, X1 , X2 , X3 , X4 , X5 ) → C5 → X5 (6)
X5 = X5 ∗ α (7)
X = X + X5 (8)
Where X denotes the input to the decoder layer, C1 is the first Convolution
layer, C2 is the second Convolution layer, C3 is the third Convolution layer,
C4 is the fourth Convolution layer, C5 is the fifth Convolution layer and α is a
constant. The lower output channels of (X1 , X2 , X3 , X4 , X5 ) ensures that the
continuous mechanism of the dense blocks stay intact. At each level of dense
24 A. Srivastava et al.

Fig. 6. ResNet block in encoder and decoder of U-net.

Fig. 7. Proposed residual dense U-net with residual connection and 3-RrDB network.

blocks only necessary higher as well as lower dimensional features are extracted
and propagated for the decoder layers to allow better generation of mask.
Extraction of quality information is one of the tough tasks that need to be
addressed before designing any model due to the presence of some proportion of
SNR (Signal to Noise Ratio) in the CT scan during acquisition. This may result
in poor performance of deep convolutional networks. To address this issue RrDB
blocks were included in the U-Net. U-Net improves the ﬂow of information,
which leads to a dense fusion of features along with deep supervision, acting as a
catalyst, to learn ﬁne line features from and around the region of interest as the
deep model has a strong representation capacity to capture semantic information.
Residual Dense U-Net 25

Table 1. Dimension description of each layer incorporated within the proposed con-
volution model
Number Type of Output Output Kernel Number Type of Output Output Kernel
of Layers Layer Features Size Size of Layers Layer Features Size Size
1 Input Layer 1 512*512 NA 41 Convolution a8 32 32*32 3*3
2 ResNet Layer R1 32 512*512 (3*3), (3*3), (1*1) 42 Leaky Relu l 9 32 32*32 Alpha = 0.25
3 Convolution C1 32 512*512 3*3 43 Concatenate c8 640 32*32 NA
4 Maxpool M1 32 256*256 2*2 44 Convolution a9 512 32*32 3*3
5 ResNet Layer R2 64 256*256 (3*3), (3*3), (1*1) 45 Leaky Relu l 10 512 32*32 Alpha = 0.25
6 Convolution C2 64 256*256 3*3 46 Lambda 2 512 32*32 x * 0.4
7 Maxpool M2 64 128*128 2*2 47 Add 2 512 32*32 NA
8 ResNet Layer R3 128 128*128 (3*3), (3*3), (1*1) 48 Convolution a10 32 32*32 3*3
9 Convolution C3 128 128*128 3*3 49 Leaky Relu l 11 32 32*32 Alpha = 0.25
10 Maxpool M3 128 64*64 2*2 50 Concatenate c9 544 32*32 NA
11 ResNet Layer R4 256 64*64 (3*3), (3*3), (1*1) 51 Convolution a11 32 32*32 3*3
12 Convolution C4 256 64*64 3*3 52 Leaky Relu l 12 32 32*32 Alpha = 0.25
13 Maxpool M4 256 32*32 2*2 53 Concatenate c10 576 32*32 NA
14 Convolution C5 512 32*32 3*3 54 Convolution a12 32 32*32 3*3
15 Convolution C6 512 32*32 3*3 55 Leaky Relu l 13 32 32*32 Alpha = 0.25
16 Convolution a1 32 32*32 3*3 56 Concatenate c11 604 32*32 NA
17 Leaky Relu l 1 32 32*32 Alpha = 0.25 57 Convolution a13 32 32*32 3*3
18 Concatenate c1 544 32*32 NA 58 Leaky Relu l 14 32 32*32 Alpha = 0.25
19 Convolution a2 32 32*32 3*3 59 Concatenate c14 640 32*32 NA
20 Leaky Relu l 2 32 32*32 Alpha = 0.25 60 Convolution a14 512 32*32 3*3
21 Concatenate c2 576 32*32 NA 61 Leaky Relu 15 512 32*32 Alpha = 0.25
22 Convolution a3 32 32*32 3*3 62 Lambda 3 512 32*32 x * 0.4
23 Leaky Relu l 3 32 32*32 Alpha = 0.25 63 Add 3 512 32*32 NA
24 Concatenate c3 608 32*32 NA 64 Lambda 4 512 32*32 x * 0.2
25 Convolution a4 32 32*32 3*3 65 Add 4 512 32*32 NA
26 Leaky Relu l 4 32 32*32 Alpha = 0.25 66 DropOut 1 512 32*32 NA
27 Concatenate c4 640 32*32 NA 67 Up Sampling 1 512 64*64 2*2
28 Convolution a5 512 32*32 3*3 68 Convolution C7 256 64*64 3*3
29 Leaky Relu l 5 512 32*32 Alpha = 0.25 69 Convolution C8 256 64*64 3*3
30 Lambda 1 512 32*32 x * 0.4 70 Up Sampling 2 256 128*128 2*2
31 Add 1 512 32*32 NA 71 Convolution C9 128 128*128 3*3
32 Convolution a6 32 32*32 3*3 72 Convolution C10 128 128*128 3*3
33 Leaky Relu l 6 32 32*32 Alpha = 0.25 73 Up Sampling 3 128 256*256 2*2
34 Concatenate c5 544 32*32 NA 74 Convolution C11 64 256*256 3*3
35 Convolution a6 32 32*32 3*3 75 Convolution C12 64 256*256 3*3
36 Leaky Relu l 7 32 32*32 Alpha = 0.25 76 Up Sampling 4 64 512*512 2*2
37 Concatenate c6 576 32*32 NA 77 Convolution C13 32 512*512 3*3
38 Convolution a7 32 32*32 3*3 78 Convolution C14 32 512*512 3*3
39 Leaky Relu l 8 32 32*32 Alpha = 0.25 79 Convolution C15 32 512*512 3*3
40 Concatenate c7 604 32*32 NA 80 Output Segmented Mask 1 512*512 NA

In contrast, to the fact that deeper model is hard to train, performance was
facilitated with easy training and better performance.

4 Experiments and Results

4.1 Data Pre-processing
CT scans are collection of huge raw images that need meaningful pre-processing
for unambiguous analysis and for a useful computer aided diagnosis. The scans
were labeled by radiologists as: ground glass, consolidation and pleural diﬀusion.
SegCT data includes 9 volumes, total 829 slices, where 373 slices have been
evaluated and segmented by radiologists as Covid-19. Each suspicious slice was
loaded and centre cropped in order to remove outer noise or unwanted regions.
The resulting images obtained were of sizes 128 * 128 pixels each slice, as shown
in the Fig. 9. The extracted image’s intensity I is then scaled to Hounsﬁeld [22]
unit and linearly normalized before used as input for training the model in order
26 A. Srivastava et al.

Fig. 8. Data Augmentation methods opted for our model.

Table 2. List of hyperparameters used for training the proposed network for COVID-
19 CT scan segmentation

Hyperparameter values
Epochs 150
Batch sizes 20
Activation function Softmax, leaky relu, sigmoid [30, 31]
Optimizers Adam [29]
Loss Categorical crossentropy
Learning rate 0.001
Performance matrices Dice coeﬃcient, accuracy

to prevent noises and black frame issues in raw data. The total of 838 images was
split into training set (60%), validation set (20%), and test set (20%). Experiment
was performed with 150 number of epochs on intel i5 8th Gen Intel CoreTM i5
9300H (2.4 GHz, up to 4.1 GHz, 8 MB cache, 4 cores) + NVIDIA GeForce
GTX 1050 (3 GB) GPU.

4.2 Data Augmentation

It is known that Deep Learning performance depends heavily on the amount

of data available to explore and exploit for the architecture. As the number of
CT slices available for the experiment were less, we opted to increase the data
through artificial method. This is done through domain specific techniques. Few
data augmentation techniques can result into information loss around a suspi-
cious region. To avoid such data from new training data is created with rotation,
inversion, rescaling, blurring and contrast enhanced images from training data,
with created data of the same class as that of the original data. CT Seg [18] pro-
vided with only 300+ CT Covid-19 slices which can result into overfitting [27].
Overfitting refers to the situation when network starts to memorize a function
Residual Dense U-Net 27

Fig. 9. Plot between training and validation data confirms that no over-fitting and
under-fitting takes place and model converges nearly around 30–40 epochs.

with very high variance to perfectly model the training data hence, resulting
poor performance on test/validation set.

4.3 Experiments and Performance Metrics

In multi-class image segmentation, the target area of interest may take a trivial
part of the whole image. The performance measure metrics considered here for
the segmentation task are DICE Score [28], accuracy, and precision, F1 score,
sensitivity and specificity. The DICE score can be considered as validation metric
to evaluate the performance of model’s reproducibility of manual segmentations
and the spatial overlap accuracy of automated generated mask.
From the above Table 1 it was found that the proposed architecture per-
formed well from many of the other variations with U-net. It can be clearly
observed that without increasing the much depth of our architecture model
managed to learn complex relevant features while maintaining the efficiency and
effectiveness. As deeper model requires large bunch of data and time to train,
we noticed that by using RrDB blocks in U-Net leads towards better results and
less training time.
As from above results we can conclude that our architecture performed well
by focusing on the finest and smallest details around the suspicious region of
interest. Our Model stabilize nearly around 30–40 epochs validating our use for
including the RrDB blocks i.e. ease the training process without increasing the
depth of model alongside with a quick convergence rate justifying our beliefs
that U-Net performance can be enhanced significantly by focusing on tuning the
model to focus on finer details rather than just increasing the model’s depth as
in our case RrDB blocks suitably deals with gradient explosion and over-fitting
as can be validated from Fig. 10.
28 A. Srivastava et al.

Fig. 10. Results of the proposed architecture (A) For the (i) lungs effected due to
the COVID-19 labelled as from (ii) consolidation along with the generated segmented
mask (in green) in (iii). (B) Similar to the above cases where (i) CT scan of human
lungs labelled as (ii) pleural and its (iii) generated mask (in blue) (C) Atlast cases
where both the consolidation and pleural cases were identified ((i), (ii), (iii)) and its
(iv) segmented masks in green and blue for both the labels reprectively (Color figure
online)

5 Conclusion

CT imaging is used for screening Covid-19 patients and for analyzing the sever-
ity of the disease. For Computer Aided Diagnosis, deep learning has played an
important role. In this work, we explored the use of Residual Dense U-Net for seg-
mentation of lung CT Images infected with Covid-19. The proposed approach can
accurately and eﬃciently identify regions of interest within CT images of patients
infected withCovid-19. As current clinical tests take relatively longer time, this
approach of incorporating RrDB blocks in the standard encoder decoder struc-
ture of U-Net improves the quality of segmentations and proves as a useful
component in COVIDs-19 analysis and testing through CT images. A superior
performance was observed with dice coeﬃcient of 97.6%. It was observed that
Residual Dense U-Net 29

by maintaining the hierarchal features a signiﬁcant improvement in the Dice

Coeﬃcient was seen by just changing the baseline of U-Net. Availability of more
data will lead to more ﬁndings in this area.

References
1. Zhou, P., et al.: A pneumonia outbreak associated with a new coronavirus of prob-
able bat origin. Nature 579, 270–273 (2020). [Link]
2012-7
2. Liu, Y., Gayle, A., Annelies, W. S., Rocklöv, J.: The reproductive number of
COVID-19 is higher compared to SARS coronavirus. J. Travel Med. 27 (2020).
[Link]
3. Gao, Z., et al.: A Systematic Review of Asymptomatic Infections with COVID-19.
J. Microbiol. Immunol. Infect. (2020). [Link]
4. Huang, C., et al.: Clinical features of patients infected with 2019 novel coronavirus
in Wuhan, China. Lancet. 395, 497–506 (2020). [Link]
6736(20)30183-5
5. Guan, W.J., et al.: Clinical Characteristics of Coronavirus Disease 2019 in China
(2020). [Link]
6. Ai, T., et al.: Correlation of chest CT and RT-PCR testing for coronavirus disease
2019 (COVID-19) in China: a report of 1014 cases. Radiology. 296 (2020). https://
[Link]/10.1148/radiol.2020200642
7. Di Gennaro, F., et al.: Coronavirus diseases (COVID-19) current status and future
perspectives: a narrative review. Int. J. Environ. Res. Public Health 17, 2690
(2020). [Link]
8. Yang, W., Yan, F.: Patients with RT-PCR-confirmed COVID-19 and normal chest
CT. Radiology. 295 (2020). [Link]
9. Lee, E., Ng, M.Y., Khong, P.: COVID-19 pneumonia: what has CT taught
us? Lancet Infect. Dis. 20, 384–385 (2020). [Link]
3099(20)30134-1
10. Zijdenbos, A., Dawant, B., Margolin, R., Palmer, A.: Morphometric analysis of
white matter lesions in MR images. IEEE Trans. Med. Imaging 13, 716–24 (1994).
[Link]
11. Chen, X., Yao, L., Zhang, Y.: Residual attention U-Net for automated multi-class
segmentation of COVID-19 chest CT images (2020). arXiv:2004.05645
12. Shan, F., et al.: Lung infection quantification of Covid-19 in CT images with deep
learning (2020). arXiv:2003.04655
13. Wu, Y.H., et al.: JCS: An explainable Covid-19 diagnosis system by classification
and segmentation (2020). arXiv:2004.07054
14. Zhou, T., Canu, S., Ruan, S.: An automatic Covid-19 CT segmentation network
using spatial and channel attention mechanism (2020). arXiv:2004.06673
15. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomed-
ical image segmentation (2015). arXiv:1505.04597
16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition
(2015). arXiv:1512.03385
17. Zhao, J., Zhang, Y., He, X., Xie, P., Covid-CT (dataset): a CT scan dataset about
Covid-19 (2020). arXiv:2003.13865
18. Jenssen, H.B., Covid-19 CT-segmentation (dataset). [Link]
com/covid19/. Accessed 13 April 2020
30 A. Srivastava et al.

19. Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network for image
super-resolution. In: Conference on Computer Vision and Pattern Recognition, pp.
2472–[Link]/CVF (2018). [Link]
20. Basha, S.H.S., Dubey, S.R., Pulabaigari, V., Mukherjee, S.: Impact of fully con-
nected layers on performance of convolutional neural networks for image classifi-
cation. Neurocomputing (2019). [Link]
21. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-
level performance on ImageNet classification (2015). arXiv:1502.01852
22. Freeman, T.G.: The Mathematics of Medical Imaging: A Beginner’s Guide.
Springer Undergraduate Texts in Mathematics and Technology. Springer, Heidel-
berg (2010)
23. Keiron, O.S., Nash, R.: An introduction to convolutional neural networks (2015).
arXiv:1511.08458
24. Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory, vol. 9 of Neural Com-
putation. 8th edn. Cambridge, London (1997)
25. Wang, X., et al.: ESRGAN: enhanced super resolution generative adversarial net-
works (2018). arXiv:1809.00219
26. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by
reducing internal covariate shift (2015). arXiv:1502.03167
27. Salman, S., Xiuwen, L.: Overfitting mechanism and avoidance in deep neural net-
works (2019). arXiv:1901.06566
28. Shamir, R.R., Duchin, Y., Kim, J., Sapiro, G., Harel, N.: Continuous dice coeffi-
cient: a method for evaluating probabilistic segmentations. medRxiv and bioRxiv
(2018). [Link]
29. Diederik, K., Jimmy, B.: Adam: A method for stochastic optimization (2014).
arXiv:1412.6980
30. Maas, A. L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural net-
work acoustic models. In: International Conference on Machine Learning (2013)
31. Nwankpa, C., Ijomah, W., Gachagan, A., Marshall, S.: Activation functions:
comparison of trends in practice and research for deep learning (2018).
arXiv:1811.03378
Leveraging Deep Learning and IoT
for Monitoring COVID19 Safety Guidelines
Within College Campus

Sahai Vedant1(B) , D’Costa Jason1 , Srivastava Mayank1 , Mehra Mahendra1 ,

and Kalbande Dhananjay2
1 Fr. Conceicao Rodrigues College of Engineering, University of Mumbai, Mumbai,
Maharashtra, India
vedantsahai18@[Link], jasondcosta99@[Link],
srivastavamayank679@[Link], [Link]@[Link]
2 Sardar Patel Institute of Technology, University of Mumbai, Mumbai, Maharashtra, India

drkalbande@[Link]

Abstract. The widespread coronavirus pandemic 2019 (COVID-19) has brought

global emergency with its deadly spread to roundabout 215 countries, and about
4,448,082 Active cases along with 535,098 deaths globally as on July 5, 2020
[1]. The non-availability of any vaccine and low immunity against COVID19
upsurges the exposure of human beings to this virus. In the absence of any vaccine,
WHO guidelines like social distancing, wearing masks, washing hands and using
sanitizers is the only solution against this pandemic. However, there is no idea
when the pandemic situation that the world is going through will come to an end,
we can take a breath of relief that someday we will surely go back to our colleges.
Although having students wait in line to be screened for COVID19 symptoms
may prove logistically challenging. Enthused by this belief, this paper proposes
an IoT and deep learning-based framework for automating the task of verifying
mask protection and measuring the body temperature of all the students entering
the campus. This paper provides a human-less screening solution using a deep
learning model to flag no facemasks on students entering the campus and non-
contact temperature sensor MLX90614 to detect elevated body temperatures to
reduce the risk of exposure to COVID19.

Keywords: WebApp · IoT · COVID19 · Deep learning

1 Introduction

Coronavirus 2019, since the day it originated in Wuhan city of Hubei Province of China
in December 2019, was declared a pandemic on March 11, 2020. Globally, 14.6 Million
confirmed cases had been reported with 610110 death cases by July 21, 2020. India
registered its first COVID19 case of a student returned from Wuhan, China, in the state
of Kerala, on January 30, 2020. Following this, numerous incidents were reported from

© Springer Nature Singapore Pte Ltd. 2021

D. Garg et al. (Eds.): IACC 2020, CCIS 1367, pp. 31–53, 2021.
[Link]
32 S. Vedant et al.

different states of the country, mainly from travelers returning from abroad, and then
local transmission led to widespread COVID19.
The graph depicts the severity of this pandemic and the rate at which it is spreading.
The trajectory for all the affected countries started when 100 confirmed cases were
reported within that country. This helps us in realizing how quickly the number of
confirmed cases has grown worldwide. India recorded its 1 million cases on July 17,
2020 (Fig. 1).

Fig. 1. Cumulative confirmed COVID 19 cases [2]

COVID-19 displays clinical symptoms varying from a state where symptoms are not
seen to multiple organ dysfunction syndromes and acute respiratory distress syndrome.
Conferring to the release of a recent study led by the World Health Organization based
on confirmed laboratory cases, a majority showed clinical characteristics like fever being
the most common symptom with 87.9%, dry cough with 67.7%, fatigue with 38.1% and
sputum production was seen in 33.4%. Few cases had symptoms like a sore throat with
13.9%, headache with 13.6%, myalgia with 14.8%, and breathlessness in 18.6%, while
symptoms such as nausea were seen in 5.04%, nasal congestion in 4.8%, hemoptysis in
0.9%, diarrhea in 3.7%, and conjunctival congestion in 0.8% were seen rarely [3].
At its inception, Coronavirus research was linked with the exposure of humans to
suspected animals’ species; the sudden outburst and quick spread have changed the
direction of research to transmission due to human contacts. The study of COVID-19
cases has confirmed that the Coronavirus is principally transmitted amongst humans
due to the spread of respiratory droplets via coughing and sneezing [4]. Respiratory
droplets can cover a distance of up to 6 feet (1.8 m). Thus, any human being coming
in close contact with another infected person is at high risk of getting exposed to these
virus traces and can contract the Coronavirus. Touch in any form directly or indirectly
Leveraging Deep Learning and IoT 33

with surfaces that are infected has been acknowledged as one of the likely reasons for
Coronavirus spread. There is proof which reveals that coronavirus can live on metal and
plastic surfaces for three days, on cardboard, it remains up to 24 h and on copper for
nearly 4 h [5].
As the world struggles due to the COVID-19 pandemic, it is very much required to
follow useful preventive guidelines to reduce the probability of being another fatality.
Every Individual and group must adhere to the practices given below, and if these prac-
tices are strictly followed, the world may soon see a flattened Coronavirus curve. Curve
Flattening indicates lowering the transmission of the Coronavirus to the level where
available healthcare arrangements can adequately manage the effect of the disease.

1. Hands must be washed more often using an alcohol-based sanitizer or use soap and
water to wash them thoroughly at regular intervals if you are away from home.
2. Practice social distancing – maintain a distance of 1 m from others
3. Make sure you don’t touch your eyes, nose, and mouth with bare hands.
4. Spraying disinfectant on regularly touched surfaces is essential.
5. Try staying at home unless it’s an emergency. Pregnant women’s and old age people
with any health conditions should avoid social interactions.
6. One should sneeze or cough in the open. Try covering your face with a cloth or use
elbow pit.
7. One must wear a mask always if people surround them. However, care should be
taken while disposing of the used masks [6].

The rate at which COVID19 is spreading across the world, the globe is facing issues of
falling economies and increasing casualties. Regrettably, the human race is still under
a persistent threat of contracting infection, with the condition getting worse every day.
However, researchers worldwide are coming up with technological approaches to deal
with Coronavirus pandemic’s impacts. These technologies include AI, IoT, Blockchain,
and the upcoming 5G Telecommunication networks, which have been at the forefront.
[7]. As per the CDC and the WHO cutting edge technologies will play an important role
in helping fight against Coronavirus Pandemic [8].
In this paper, we are focusing on the post lockdown scenarios where schools and
colleges will reopen, pending examinations will be held. This reopening will lead to a
lot of human movement and gathering at campuses. We are proposing a model where
precautionary measures can be automated with the help of technology and alert the
administration in the lapse of adequate precautionary measures or in the event of finding
symptoms like high body temperature in the person entering the facility. Highlights of
our research are the following:
Today, at the time of a severe crisis, screening of potential risk bearers is very crucial,
wherein this must be done without human interaction, hence automation of this process
must be done, such that a person can be identified uniquely and preventive measures can
be taken after that if considered as a risk.
Machine Learning and Deep Learning models have been used to detect various kinds
of objects and even faces. Wide range of applications have been using object detection
techniques, yet no model uniquely identifies a person and if a mask is present or not
at the same time. In the current scenario, there is a need for one such model, so that
34 S. Vedant et al.

we can identify every person by their unique features and thus automate the facemask
detection process along with identity verification. Just detecting whether a person is
wearing a facemask is not enough. According to the World Health Organization, one of
the primary symptoms of COVID-19 is the rise in body temperature. If fever patterns
of a person can be monitored it will be easy to take preventive measures and break the
chain of spread.
Due to advancement in the field of IoT, we are surrounded by various types of
sensors. Infrared thermal sensors are the best way to scan and detect body temperatures.
The speed of scanning is fast, measuring body temperature with an accuracy of ±0.5 °C.
The speed of processing is fast making these sensors detect body temperatures even
in larger groups of people. Another reliable method of scanning high temperature is
using Thermal Imaging Cameras. They work by rendering the infrared radiations as
visible light. Each College/University has a well-defined database of students studying
in their facility. Using any programming language, we can access the database. If the
model is running on the same server, where the database is present the computations
and processing time will be very less. Migrating from Relational database to a NoSQL
database will make the application scalable and easy to store data according to dates for
the pattern checking. Accessing databases for the admin will also be easy to find out
who are the potential risk bearers.

2 Literature Review

2.1 Object Detection

Various techniques exist for face detection with varying levels of accuracy and com-
putation speed. The major deciding factor in determining the technique was a balance
between accuracy and performance as the operation is run on a Raspberry Pi 3B+. Results
from the paper “A comparison of CNN-based face and head detectors for real-time video
surveillance applications” suggest that, although CNNs can accomplish a high level of
precision in comparison to old-style detectors, they require high computational resources
which are a constraint for several practical real-time applications [9]. The method of face
recognition developed by P. Viola and M. Jones has appropriate accuracy for the purpose
and can be run on a Raspberry Pi 3B+.

2.2 Facial Recognition

In facial landmark detection, relatively few of the techniques give the source code and
rather simply give executable equals making the multiplication of codes on various train-
ing sets or using various landmark annotation plans problematic. Binaries simply permit
certain pre-defined handiness and are routinely not cross-stage, making a continuous
blend of the systems relying upon landmark detection insurmountable. Although there
exist a couple of unique cases that give both getting training and testing code [10, 11],
those philosophies don’t consider landmark tracking on a real-time basis which is of
utmost importance for an interactive system. The head pose estimation and AU affirma-
tion point of view have not gotten a comparable proportion of energy as facial landmark
Leveraging Deep Learning and IoT 35

detection. Watson structure is a committed head pose estimation, which is an execu-

tion of the Generalized Adaptive View-based Appearance Model [12]. Similarities exist
between a couple of frameworks that contemplate head pose estimation using signifi-
cance data [13], regardless, they can’t go after webcams. While some facial landmark
detection consolidates limits of estimating head pose [14, 15], most dismissal this issue.
Even the Action Unit Recognition systems (FACET2, Affdex3, and OKAO4) have draw-
backs related to prohibitive cost, dark estimations, and normally dark getting ready data.
Looking at the eye-gaze estimation, contraptions and business systems require specialist
hardware, for instance, infrared cameras or head-mounted cameras [16–18]. Even though
there exist a few structures obtainable for eye-gaze estimation through webcam [19–21],
they fight in authentic circumstances and some require blundering manual arrangement
steps.
On the other hand, OpenFace gives both training and testing code considering the
basic reproducibility of examinations. The model also shows top tier results on in-the-
wild data and doesn’t require any remarkable gear or individual express arrangement.
Finally, the model runs dynamically with the total of the facial behaviour examination
modules collaborating.

2.3 Mask Detection

CNN expects a critical activity in PC vision-related model affirmation endeavour’s,

considering its supervisor spatial component extraction capacity and less computation
cost [22]. CNN uses convolution kernels to convolve with the principal pictures or
feature advisers for independent progressively critical level features. In any case, how to
arrange better convolutional neural framework structures remains an underlying request.
The inception framework proposed in [23] grants the framework to get comfortable with
the best mix of kernels. To set up a great deal of further neural frameworks, K. He et al.
proposes the Residual Network (ResNet) [24], which can take in an identity mapping
from the last layer. As object detectors are regularly passed on mobile or embedded
devices, where the computational resources are incredibly confined MobileNet [25] is
proposed. It uses significant adroit convolution to remove features and channel-wise
convolutions to alter channel numbers, so the computational cost of MobileNet is a
great deal lower than frameworks using standard convolutions.

2.4 IoT

We have based the embedded system design on systems already in use since it was
not the primary objective of the paper. A fusion of the methods of temperature sensor
interfacing [26] and the Pi camera library [27] was used to capture an image of the user’s
face and simultaneously record the user’s temperature.
36 S. Vedant et al.

3 System Architecture and Implementation

(See Fig. 2).

Fig. 2. Architecture diagram

4 Algorithm
The significant thought process behind the advancement of the Fig. 2 framework, was to
make a strong framework which needn’t bother with overwhelming registering neces-
sities and hefty costing. But simultaneously doesn’t settle on the precision part too. So,
we propose such an architecture is very much cost-effective as well as makes sure that
all the safety protocols are ensured by tracking every individual entering the college.
When the students and the staff enter the college, they are required to go through the
following process:

1. Get their image captured by the camera using the face detection model and
temperature by the MLX90614 Infrared Temperature sensor once a face is detected.
2. This image and data are read and sent to the central server where the details of each
student and staff members are present.
3. The machine learning models are applied to the captured image to identify the
student and check if a mask is present or not.
4. For face recognition, we have used OpenFace.
Leveraging Deep Learning and IoT 37

5. For mask detection, we have prepared a model using Keras/Tensorflow.

6. On the classification of a student, we can extract information like Roll Number,
Class to uniquely identify each student.
7. Similarly, on the classification of staff, we can extract information like Teacher Id,
Department to uniquely identify each staff member.
8. The information along with the image and temperature is sent for checking and
validation of data is done.
9. If the information is valid i.e. there are no null values and within the safety limits,
then this data can be uploaded to the database. Once the database is updated, we
inform the student that he can proceed.
10. Else if the Application server checks for null values, if exists it informs the Rasp-
berry Pi to once again take the picture of the student or staff member present at the
gate.
11. Else it checks the temperature value is higher than the prescribed normal body
temperature or if a student or a staff member is wearing a mask or not. If safety
protocols are broken by the predicted values which are not wearing a mask or
higher body temperature or both, in that case, the applications server then notifies
the administrator and the security personnel. Finally, admin and security personnel
can take precautionary measures if required.
12. In any case, the data is always being uploaded to the database even if the data is
within the safety protocols limits or not.
13. Once the data validation is completed the whole process is carried out again for the
next person.
14. As all the value is being stored in database day and date wise, it makes it simpler
for the administrator to apply various data analytics methods and generate charts
in our UI to track those students who have flouted the safety protocols most of the
time or students who have gradually shown any symptoms for COVID19. Hence
making it possible for the administrator to take advance actions, thereby preventing
the spread of such insidious virus to other students and staff present in the college.
Also, at the same time help the individual by informing

Figure 3 shows the working of the system for a single individual when he/she approaches
the entry point of the college. The process is repeated continuously in a loop for all the
individuals entering the college.
38 S. Vedant et al.

Fig. 3. Flow diagram

5 Software Design

Initially, the input is provided in the form of a captured image. The image is sent to
the application server as soon as the face is detected. At the application server, its
features are separated. Features are then contrasted with the authentic features for the
face recognition part. Though the same features are likewise sent to the face mask
classifier model to identify whether the student is wearing a mask or not. If the student’s
matches and different boundaries which include body temperature and mask detection
are inside the permissible limits then the student is permitted to enter the school. On
the off chance that the essence of the student is unrecognized or any of the boundaries
like a face mask and the body temperature is off the breaking point, alert assistance will
alarm the security about the sitting at a savvy social good way from the section/leave
point. Beginning from coding language python to human-computer interaction through
Face Detection, Mask Detection & Face Recognition model, and Firebase for the user
interface. A detailed analysis has been done of the software stack used.
Leveraging Deep Learning and IoT 39

5.1 Face Detection

Face detection is performed to find the trigger to capture an image from the camera and
simultaneously record the temperature at that instant. The entire procedure takes place
on a Raspberry Pi 3B+. This necessitates an object detection algorithm which is robust,
can run in real-time while not using too much processing power, since processing power
is a limited resource on this platform. The limitations and demands of the algorithm are
satisfied with the Viola-Jones Object Detection framework. When implemented on 384
× 288-pixel images, faces are detected at 15 frames per second on a 700 MHz Intel
Pentium III, which is an x86 processor from 1999 [28]. The performance of the system
and its accuracy suit the application perfectly.
The algorithm has four stages:

• Haar Feature Selection

– Haar features are used to match human faces since all faces share some common
characteristics like the upper cheeks are lighter than the eyes and the eyes are darker
than the nose bridge.

• Integral Image creation

– Integral Image Rectangle features are quick to compute using an intermediate rep-
resentation for the image, which is known as an integral image. The integral image
lets any rectangular sum be computed in four array references [28]. Thus, the inte-
gral image method reduces the number of calculations and thus can save a lot of
time.

• Adaboost Training

– The Viola-Jones detection framework employs an adaption of the algorithm

‘AdaBoost’ to both select the best features for face detection and to train classifiers
that use them [28].

• Cascading Classifiers

– Classifiers work in a sequence, with simpler classifiers first in line, which reject the
majority of sub-windows before more complex classifiers are even necessary. This
results in low false-positive rates. This detection process resembles a degenerate
decision tree and is referred to as ‘Cascading Classifiers’ [28].

5.2 Face Recognition

For the motivation behind facial recognition, we have utilized OpenFace [29], which is an
open face profound learning facial acknowledgement model. It is based upon the paper
[30] developed by Google developers. OpenFace is actualized utilizing Python and Torch
permitting the system to be executed smoothly on CPU as well as on a GPU acceleration
40 S. Vedant et al.

which is CUDA. As we actualized the application in Keras (using TensorFlow [31]

backend), and to do that we utilized a pre-prepared model known as Keras-OpenFace by
Victor Sy Wangwhich [32] which is an open-source Keras execution of the OpenFace.
Applying CNN classifier to confront acknowledgement is certifiably not an extraor-
dinary thought because, as a gathering of individuals (like representatives of an organi-
zation) increments or diminishes, one needs to change the SoftMax classifier work. here
is a wide range of ways by which we can make a face recognition framework, and in this
application, we utilized facial recognition along with one-shot learning by a profound
neural system.

One-Shot Learning [33–35]

In one shot learning, just one picture for each individual is put away in the database,
which is gone through the neural system to create an implanting vector. This installing
vector is contrasted and the vector produced for the individual who must be perceived.
If there exist similarities between the two vectors, at that point the framework perceives
that individual, else that individual isn’t there in the database. This can be comprehended
in Fig. 4.

Fig. 4. OneShot [34] learning working

Triplet Loss Function [30]

Since we are utilizing the OpenFace pre-trained model for facial recognition. Preparing
the neural system with the Triplet Loss Function limits the separation between similar
pictures which have a similar personality, and amplifies the separation between the
pictures of an alternate character. The function is appreciated beneath in Eq. 1.
N

a
Loss = f − f p 2 − f a − f n 2 + α (1)
i i 2 i i 2
i=1
Leveraging Deep Learning and IoT 41

Fig. 5. OpenFace [29] model architecture

OpenFace Model [29]

The accompanying Fig. 5 shows the work process for a solitary info picture of Sylvester
Stallone from the freely accessible LFW dataset [36].

1. Pre-trained models from libraries like OpenCV [37] or dlib are used to detect
distinguished faces.
2. The faces are then fed into the neural network.
3. Utilize a deep neural system to implant the face on a 128-dimensional unit hyper-
sphere. The embedding is a conventional portrayal of anyone’s face. In contrast
to other portrayals, inserting has a pleasant property: a bigger separation between
two face embeddings implies that the appearances are likely not of a similar indi-
vidual. Thereby making grouping, likeness discovery, & order assignments simpler
than other face acknowledgement strategies where Euclidean separation betwixt
highlights isn’t significant.
4. Apply your preferred grouping or classifying methods to the highlights to finish your
acknowledgement task.

Working
As we are utilizing the pre-trained model to compare the embedding vectors of the
pictures put away in the file system with the embedding vector of the picture captured
by the webcam. This can be clarified by underneath Fig. 6.

Fig. 6. Face recognition workflow

All the images stored in the file system are converted to a dictionary with names
as key and embedding vectors as value. When handling an image, face recognition is
done to discover bounding boxes around faces. We have used the same face detection
code that is being executed at the Raspberry Pi end for extricating the face Region of
Interest of the captured image. Before passing the picture to the neural system, it is
resized to 96 × 96 pixels as the profound neural system expects the fixed (96 × 96)
input picture size. When the picture is taken care of into the model, we produce the
128-measurement inserting vector for the obscure picture with the assistance of a pre-
prepared model. Simultaneously, we likewise load the put away implanting vectors for
the known datasets. To think about two pictures for likeness, we figure the separation
between their embeddings. This should be possible by either computing Euclidean (L2)
42 S. Vedant et al.

distance or Cosine separation between the 128-dimensional vectors. On the off chance
that the separation is not exactly an edge (which is a hyperparameter), at that point the
countenances in the two pictures are of a similar individual, if not, they are two distinct
people.

5.3 Mask Detection

In the current situation due to Covid-19, there are no effective face mask detection appli-
cations that are presently popular for transportation implies thickly populated territories,
private regions, huge scope producers, and different endeavours to guarantee security.
Likewise, Also, the absence of large datasets of ‘with_mask’ images has made this task
more cumbersome and challenging. The dataset used for the training of the mask detector
comprises 10563 pictures having labels with two classes which are “with_mask: 7130
pictures” and “without_mask: 3433 images”. The pictures utilized were genuine pic-
tures of faces wearing covers. The pictures were gathered from Kaggle [38] and RMFD
datasets [39].
To prepare a custom face mask detector, for accommodation, we have broken
the entire procedure into two particular stages, each with its separate sub-ventures as
appeared by Fig. 7.

Fig. 7. Face detection architecture

Leveraging Deep Learning and IoT 43

Phase 1: Train Mask Detector

In the preparation stage, we concentrated on stacking our face mask recognition dataset
from the hard drive, preparing a model (utilizing Keras/TensorFlow [31]) on the
dataset, and afterwards stacking the face mask recognition in the file system. Utilizing
Keras/TensorFlow [31] permits us to get to the CUDA computations of Nvidia GPUs
which quickens our preparation procedure and permits a lot of functionalities while
building the model. The procedure embraced while building the model for the face mask
recognition involves:

1. Load the picture Datasets

2. Preprocessing steps incorporate scaling the dataset images to 224 × 224 pixels,
transformation to exhibit configuration, and escalating the pixel forces in the image
to the range [−1, 1].
3. Data augmentation steps incorporate applying on-the-fly transformations to the
dataset to ameliorate speculation during the preparation procedure. Utilizing the
scikit-learn technique, we section our images into 4:1 ratio for training and testing
purposes respectively.
4. Loading MobilNetV2 [25] classifier which we fine-tune with the assistance of pre-
prepared ImageNet [22] weights.
5. Building the completely associated (FC) head happens in three-advance:
6. Loading MobileNet [25] with pre-trained ImageNet [22] weights, excluding the head
layers of the network.
7. We then construct another Fully Connected head layer and add this layer to the base
layers instead of the original head layers.
8. Freezing the base layers of the neural network. As a result, the weights of these base
layers are not updated during the process of backpropagation. On the other hand, the
head layer weights will be tuned.
9. Compiling the model with the Adam optimizer, binary cross-entropy and a learning
rate decay schedule. We have defined the hyperparameter constants which includes
the underlying learning rate, batch size and the number of training epochs.

Phase 2: Apply Mask Detector

When the face mask classifier is prepared, we would then be able to proceed onward
to loading the mask classifier model, performing face mask detection and afterwards
characterizing each face as “without_mask” or “with_mask”. When the face detection
happens at the microcontroller end, the image captured is then sent to the application
server which sends it further to the face classifier model. Before feeding the image
directly into the model, we do pre-process on the captured image. Pre-processing is
taken care of by OpenCV’s [37] blobFromImage function, we resize the captured image
to 300 × 300 pixels and perform mean deduction. After the preprocessing step, we at that
point foresee the face guaranteeing that the threshold is met before finding the Region
of Interest. Looping over the detections and extracting the confidence scores to measure
against the confidence defined threshold. At that point, we compute the bounding boxes
value for detected faces and guarantee that the bounding box falls inside the limits of
the captured picture. In the wake of removing the facilitation of the bounding box of the
44 S. Vedant et al.

face ROI, we feed it into our face classifier model and get the ideal forecasts for that
face’s ROI. At long last, we decide the classmark dependent on the probabilities score
returned by the mask classifier model and thereby allocate the related class name which
is “with_mask” and “without_mask” for that captured image of the understudy.

5.4 Firebase
Firebase Firestore is a horizontally scaling NoSQL cloud-based database service pro-
vided by Google Developers. Firestore is a serverless database hence it can be easily
integrated with any platform very easily. The services of Firebase, being on the cloud
is available for usage from anywhere. The cloud messaging service of Firebase gives a
way to send notifications to the admin about a potential carrier of the virus. The Firebase
Firestore being a horizontally scaling database is highly scalable. At any point in time,
if we require new functionality, it can be integrated for the next versions of our database,
hence increasing the scope of the project is possible.
The usage of firebase is happening as follows:

1. Firstly, the image is being captured and transferred to the central server along with
the temperature.
2. Then, face recognition algorithms predict if a user is wearing a mask or not also
assign the captured image identity of the person.
3. The complete data as a packet is checked for any vulnerabilities or Null values
4. If the checks are completed the data is stored on the Firebase firestore according to
the current date. If the temperature readings are above normal or the student is not
wearing a mask, in that scenario the admin/security personnel will be notified.

6 Hardware Design
We use the Raspberry Pi 3B+ as a platform to capture user images and temperature
readings. The Raspberry Pi 3B+ has an ARMv8 64-bit SoC with Wi-Fi and Bluetooth
support. Gigabit Ethernet is also supported over the USB 2.0 connection [40]. This allows
the Raspberry Pi to perform basic face detection and communicate with the central server
effectively. The camera used is the Raspberry Pi Camera v2, which interfaces over the
Camera Serial Interface (CSI) port of the Raspberry Pi 3B+ [40]. It supports many video
resolutions and has libraries to access the camera feed [41]. The MLX 90614 [42] (3.3 V)
Infrared temperature sensor is used to measure user temperature. The sensor interfaces
over i2c hardware bus through i2c_bcm2708 kernel module and the libi2c library [26].
The camera and temperature sensor have to be adjusted so that the Field of View of the
sensor is aligned over the centre of the frame of the view of the camera (Fig. 8).
Leveraging Deep Learning and IoT 45

Fig. 8. Circuit diagram of raspberry Pi & MLX 90614 [26]

7 Results

The significant thought process behind the advancement of such a framework was to
make a strong framework which needn’t bother with overwhelming registering necessi-
ties and simultaneously doesn’t settle on the precision part too. The models that we used
should be computationally efficient and deployable to embedded systems (Raspberry Pi,
Google Coral, etc.). This was the very explanation we have utilized the OpenFace [29]
model for facial recognition, transfer learning on the MobileNet V2 [25] model for the
face mask classifier and Viola Jones [24] for face detection.

Table 1. Time & memory consumption [9]

Detector Time Memory consumption (GB)

GFLOPS FPS
Viola-Jones [43] 0.6 60 0.1
Head Hunter DPM [44] 5 1 2
SSD [45] 45.8 13.3 0.7
Faster R-CNN [46] 223.9 5.8 2.1
R-FCN 50 [47] 132.1 6 2.4
R-FCN 101 [47] 186.6 4.7 3.1
PVANEY [48] 40.1 9 2.6
Local RCNN [49] 1209.8 0.5 2.1
Yolo 9000 [50] 34.9 19.2 2.1
46 S. Vedant et al.

The Viola-Jones [43] framework has favorable performance in comparison to other

object detection frameworks, especially when processing power and framerate are con-
cerned. Table 1 and Fig. 9 show us a comparative study between the different detec-
tion frameworks present in the market along with their time taken and the memory
consumption.

Fig. 9. Precision vs recall scores [9]

We can see that the training is done on LFW dataset [51] for the OpenFace [32]
Keras model which gave us an exactness of around (93.80 ± 1.3) % alongside different
measurements as according to the Table 2.

Table 2. OpenFace Keras metrics table [32]

Accuracy 0.938 ± 0.013

Alidation rate 0.47469 ± 0.04227@ FAR = 0.00134
Area Under Curve (AUC) 0.979
Equal Error Rate (EER) 0.062

Then again, the training is done on the custom dataset which incorporates around
10563 pictures downloaded from Kaggle [38] and RFID [39] for the face classifier model
dependent on transfer learning based upon the MobileNetV2[25] gave us a precision of
again 93% on normal conditions. Taking a gander at Fig. 11 we can see there are little
Leveraging Deep Learning and IoT 47

indications of overfitting and the Fig. 10 shows the assessment measurements on the
testing dataset per epoch which includes 20% of the all-out pictures present in the
custom dataset.

Fig. 10. Mask classifier metrics

Fig. 11. Loss/accuracy per epoch graph

At the point when the image captured from the microcontroller is fed into the model
by the application server after pre-preparing the image, the models return the probabil-
ities of the expectations made and the name of the understudy perceived. For portrayal
purposes, we have hued the bounding boxes showing up as red for an understudy without
mask and green for an understudy with a cover. We at that point additionally print the
48 S. Vedant et al.

class name {i.e. “with_mask” or “without_mask”}, likelihood, and the name perceived
by the models on the head of the bounding enclosure as indicated in the underneath
Fig. 12 and Fig. 13.

Fig. 12. Result without mask

Fig. 13. Result with mask

For any great framework, UI is one of the most significant perspectives. It is through
the UI that the individual interfaces with the framework get advantageous. Saving the
Leveraging Deep Learning and IoT 49

accommodation for the administrator and the for the security staff, we have made such
an interface, that would unravel the two fundamental purposes that are keeping up the
record of the understudy with the name, timestamp, mask, and internal heat level just as
keep any track in the abnormalities in the estimation of the mask-wearing and internal
heat level of every single understudy entering the school. In the Firebase database,
complete information is stored as a bundle of a packet of each understudy is embedded
by the current date and day. This makes the framework progressively adaptable and the
information from sorted out for playing out the data analysis by the administrator. To the
extent the alert notification generation is considered, the alert notification is produced by
the firebase itself as a message pop-up/email, which makes it considerably increasingly
best for the framework. The accompanying Fig. 14 and Fig. 15 are of the UI and the
firebase database respectively that we have utilized in our framework.

Fig. 14. User interface

Fig. 15. Firebase database

50 S. Vedant et al.

8 Limitations
Our present strategy for recognizing whether an individual is wearing a mask or not is
a two-advance procedure that performs face detection and afterwards applies a classi-
fication on faces to detect the mask. The issue with this methodology is that a mask
darkens some portion of the face. If enough of the face is darkened, the face can’t be
distinguished, and hence, the face mask detector won’t be applied.
Another issue is the reliability of the web relationship of the framework in which
the system is being set up. The web relationship with the system must have low inaction
and high transmission ability to send the alarm to the security as well as the image to
the application server for further processing. The force flexibly of the framework must
be steady as all the segments of the security framework run on power.

9 Future Work
We have entirely fair outcomes by simply contrasting the Euclidean separation with
perceiving a face. Notwithstanding, if one needs proportional the framework to a creation
framework, at that point, one ought to consider applying Affine changes additionally
before taking care of the picture to the neural system.
Further to improve our face mask detection model, we need to assemble all the
more genuine pictures of individuals wearing masks. Additionally, we have to assemble
pictures of appearances that may “befuddle” our classifier into speculation the individual
is wearing a mask when in truth they are not—potential models incorporate shirts folded
over faces, a handkerchief over the mouth, and so forth. At long last, we ought to
consider preparing a committed two-class object finder instead of a straightforward
picture classifier.

10 Conclusion

Since the origin of Covid19, technological solutions have been worked out by researchers
to combat the spread of Coronavirus pandemic. Few hot technologies like, IoT and Arti-
ficial Intelligence have been the front runners. Our paper discussed using IoT-based
sensors and Deep learning-based algorithms to detect the breach of suggested precau-
tionary measures like the use of masks in public places and to ensure no entry within
the campus to individuals showing COVID19 symptoms in our case high body temper-
ature. Our model also records every student’s body temperature in a central database on
a day-to-day basis and raises the alarm if the Pattern generated shows a gradual rise in
body temperature also helps the administration in monitoring safety standards within
the campus. This automated approach helps prevent the security personnel from coming
in contact with every student or visitors and reduces the chances of human errors in
identifying the person entering the facility with COVID19 symptoms.
Leveraging Deep Learning and IoT 51

References
1. WHO Homepage. [Link] Accessed 16
July 2020
2. Ourworldindata Homepage. [Link] Accessed 14 July 2020
3. Report WHO-China Joint Mission Coronavirus Disease 2019 (COVID-19), February
2020. [Link]
[Link]. Accessed 14 July 2020
4. Modes of Transmission of Virus Causing COVID-19: Implications for IPC Precaution Rec-
ommendations, April 2020. [Link]
of-transmission%-of-virus-causing-covid-19-implications-for-ipc-precaution-recommend
ations. Accessed 14 July 2020
5. Study Suggests New Coronavirus May Remain on Surfaces for Days, March
2020. [Link]
virus-may-remain-surfaces-days. Accessed 15 July 2020
6. Coronavirus Disease (COVID-19) Advice for the Public: When and How to Use Masks, April
2020. [Link]
lic/when-and-how-to-use-masks. Accessed 15 July 2020
7. Ting, D.S.W., Carin, L., Dzau, V., Wong, T.Y.: Digital technology and COVID-19. Nat. Med.
26(4), 459–461 (2020)
8. Digital Technology For Covid-19 Response, April 2020. [Link]
ail/03-04-2020-digital-technology-for-%covid-19-response. Accessed 16 July 2020
9. Nguyen-Meidine, L.T., Granger, E., Kiran, M., Blais-Morin, L.: A comparison of CNN-based
face and head detectors for real-time video surveillance applications. In: 2017 Seventh Inter-
national Conference on Image Processing Theory, Tools and Applications (IPTA), Montreal,
QC, pp. 1–7 (2017). [Link]
10. Alabort-i-medina, J., Antonakos, E., Booth, J., Snape, P.: Menpo: a comprehensive plat-
form for parametric image alignment and visual deformable models categories and subject
descriptors, pp. 3–6 (2014)
11. Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild.
In: CVPR (2012)
12. Morency, L.-P., Whitehill, J., Movellan, J.R.: Generalized adaptive view-based appearance
model: integrated frame-work for monocular head pose estimation. In: FG (2008)
13. Fanelli, G., Gall, J., Gool, L.V.: Real time head pose estimation with random regression
forests. In: CVPR, pp. 617–624 (2011)
14. Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Robust discriminative response map fitting
with constrained local models. In: CVPR (2013)
15. Asthana, A., Zafeiriou, S., Cheng, S. Pantic, M.: Incremental face alignment in the wild. In:
CVPR (2014)
16. Hansen, D.W., Ji, Q.: In the eye of the beholder: a survey of models for eyes and gaze. IEEE
Trans. Pattern Anal. Mach. Intell. 32, 478–500 (2010)
17. Lidegaard, M., Hansen, D.W., Krüger, N.: Head mounted device for point-of-gaze estima-
tion in three dimensions. In: Proceedings of the Symposium on Eye Tracking Research and
Applications - ETRA 2014 (2014)
18. Świrski, L., Bulling, A., Dodgson, N.A.: Robust real-time pupil tracking in highly off-axis
images. In: Proceedings of ETRA (2012)
19. Ferhat, O., Vilarino, F.: A cheap portable eye–tracker solution for common setups. In: 3rd
International Workshop on Pervasive Eye Tracking and Mobile Eye-Based Interaction (2013)
20. Wood, E., Bulling, A.: EyeTab: model-based gaze estimation on unmodified tablet computers.
In: Proceedings of ETRA, March 2014
52 S. Vedant et al.

21. Zielinski, P.: Opengazer: open-source gaze tracker for ordinary webcams (2007)
22. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical
image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition,
Miami, FL, pp. 248–255 (2009). [Link]
23. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
24. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Pro-
ceedings of the IEEE Conference on computer Vision and Pattern Recognition, pp. 770–778
(2016)
25. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.: MobileNetV2: inverted residuals
and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern
Recognition, Salt Lake City, UT, pp. 4510–4520 (2018). [Link]
00474
26. Sensor. [Link] Accessed 20 Apr 2020
27. GitHub Repository. [Link] Accessed 05 June 2020
28. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In:
Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition. CVPR 2001, Kauai, HI, USA, p. I-I (2001) [Link]
990517
29. Amos, B., Ludwiczuk, B., Satyanarayanan, M.: OpenFace: a general-purpose face recogni-
tion library with mobile applications. CMU-CS-16-118, CMU School of Computer Science,
Technical report (2016)
30. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition
and clustering. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), Boston, MA, pp. 815–823 (2015). [Link]
31. TensorFlow Homepage. [Link] Accessed 19 June 2020
32. GitHub Repository. [Link]
Accessed 16 Apr 2020
33. Lungu, I.A., Hu, Y., Liu, S.: Multi-resolution siamese networks for one-shot learning. In: 2020
2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS),
Genova, Italy, pp. 183–187 (2020). [Link]
34. Bromley, J., et al.: Signature verification using a siamese time delay neural network. Int. J.
Pattern Recogn. Artif. Intell. 7(04), 669–688 (1993)
35. Koch, G.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning
Workshop (2015)
36. LFW Dataset. [Link] Accessed
02 May 2020
37. OpenCV Homepage. [Link] Accessed 18 June 2020
38. Kaggle Datasets. [Link] Accessed 28 June 2020
39. GitHub Repository. [Link]
Accessed 29 Apr 2020
40. Raspberry Pi Products. [Link]
Accessed 19 Apr 2020
41. Raspberry Pi Products. [Link] Accessed
19 Apr 2020
42. Sparkfun Sensors Datasheets. [Link]
MLX90614_rev001.pdf. Accessed 20 Apr 2020
43. Viola, P., Jones, M.J.: Robust real-time face detection. J. Comput. Vis. 57(2), 137–154 (2004)
44. Yan, J., Zhang, X., Lei, Z., Li, S.Z.: Real-time high-performance deformable model for face
detection in the wild
Leveraging Deep Learning and IoT 53

45. Liu, W., et al.: SSD: single shot multibox detector. CoRR, abs/1512.02325 (2015)
46. Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal
networks. CoRR, abs/1506.01497 (2015)
47. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional
networks. CoRR, abs/1605.06409 (2016)
48. Kim, K., Cheon, Y., Hong, S., Roh, B., Park, M.: PVANET: deep but lightweight neural
networks for real-time object detection. CoRR, abs/1608.08021 (2016)
49. Vu, T., Osokin, A., Laptev, I.: Context-aware CNNs for person head detection. In: ICCV
(2015)
50. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. CoRR,abs/1612.08242 (2016)
51. Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a
database for studying face recognition in unconstrained environments. Technical Report
07-49, University of Massachusetts, Amherst, October 2007
A 2D ResU-Net Powered Segmentation
of Thoracic Organs at Risk Using
Computed Tomography Images

Mohit Asudani1 , Alarsh Tiwari2 , Harsh Kataria2 , Vipul Kumar Mishra2(B) ,

and Anurag Goswami2
1
Indian Institute of Information Technology Senapati,
Imphal 795002, Manipur, India
mohitasudani20@[Link]
2
Bennett University, Greater Noida 201310, Uttar Pradesh, India
alarsh1309@[Link], hkataria99@[Link],
{[Link],[Link]}@[Link]

Abstract. The recent advances in the ﬁeld of computer vision have led
to the wide use of Convolutional Neural Networks (CNNs) in organ seg-
mentation of computed tomography (CT) images. Image-guided radia-
tion therapy requires the accurate segmentation of organs at risk (OARs).
In this paper, the proposed model is a 2D ResU-Net network to auto-
matically segment thoracic organs at risk in computed tomography (CT)
images. The architecture consists of a downsampling path for capturing
features and a symmetric upsampling path for obtaining precise local-
ization. The proposed approach achieves a 0.93 dice metric (DSC) and
0.26 hausdorﬀ distance (HD) after using ImageNet stats for normalizing
and using pre-trained weights.

Keywords: Convolutional Neural Networks · ResU-Net · Computed

Tomography (CT) images · Organ segmentation

1 Introduction
Lung cancer is one of the leading cause of death in both males and females with
a contribution of 26.8% of all cancer deaths [1]. There were approximately 3.05
million cancer survivors treated with radiation, accounting for around 29% of all
the cancer survivors in 2016. The radiation-treated cancer survivors are projected
to reach 4.17 million by 2030 [1]. The introduction of procedures like stereotactic
body radiation therapy and intensity-modulated radiation therapy has led to
the improvement of Radiation therapy techniques, therefore, protecting normal
organs become a primary concern [2].
During the radiation treatment, it is necessary to segment organs at risk
correctly to avoid a very high radiation dose from the computed tomography
(CT). The segmentation of images has brought a signiﬁcant impact on diagno-
sis and treatment. This segmentation helps the doctors in viewing the internal
c Springer Nature Singapore Pte Ltd. 2021
D. Garg et al. (Eds.): IACC 2020, CCIS 1367, pp. 54–65, 2021.
[Link]
A 2D ResU-Net Powered Segmentation 55

anatomy. Many existing techniques include X-Ray like Computed Tomography

(CT) and cross-section images, or Magnetic Resonance Imaging (MRI), or oth-
ers like Positron Emission Tomography and Single Photon Emission Computed
Tomography.
The CT images are complex, so the identification, including the localization
of organs by using manual techniques, is time-consuming and challenging. In
general, experts do segmentation manually by intensity levels and anatomical
knowledge (e.g., the Esophagus is located behind the Heart, Trachea is above
the Spinal cord, etc). In current medical practice, the fundamental method of
OARs segmentation is contouring manually, due to large number of axial slices
of images and scans it may take one to two hours for major thoracic organs [2].
Moreover, manual contouring also suffers not only from inter and intra-observer
variability but institutional variability as well, where different varying sites adopt
varying labeling criteria and contouring atlases [2]. For the development of fully
or semi automated solutions for segmentation, a lot of effort has been invested.
Atlas-based methods incorporate for consensus-based labeling [2]. The automa-
tion of the segmentation algorithms enabled the shorting of the segmentation
duration as compared with the manual contouring, and adaptive therapy [2].
Due to segmentation availability and quality, the usage of these techniques are
still limited. Our objective is to instinctively segment out the thoracic organs:
heart, aorta, trachea, and esophagus in CT images. This is difficult because the
medical images given are three dimensional; the separation of organs are hard
to differentiate since they may be low contrast. There are significant variations
in the shape, size, and location of these organs at risk between patients [3]. For
some organs (e.g.-esophagus), the segmentation is the most challenging: position
and shape vary significantly between varying patients; also, the contours in CT
images have low contrast and may be absent [3].
In the paper, we proposed a 2D ResU-Net based deep network on segment-
ing the OARs and to identify tumors. The proposed network achieves a good
accuracy, and speed as compared to previous approaches. The paper is organized
in such a way to identify the problem, followed by understanding the dataset,
which is then followed by pre-processing, training the model with different archi-
tectures, and post-processing.

2 Related Work

A few interesting work have been done in recent years using a deep neural net-
work to segment the CT images. In [4] Olaf Ronneberger et al. introduced a
model that was based on simple UNet architecture for biomedical image segmen-
tation. Other modifications are also proposed like localization and organ-specific
U-Net model, Pixel shuffle method on fully convolutional U-Net architecture like
in the Two-stage encoder-decoder model with coarse and fine segmentation in
[5]. The author in [6] Used multi-task learning on U-Net architecture. Another U-
Net model with each layer containing a context pathway, a localization pathway,
and 2D residual U-Net with dilation rate was proposed in [7]. Moreover, dilated
56 M. Asudani et al.

U-Net architecture is also used with convolution, dilation, ReLU, batch normal-
ization, and average pooling in [7]. These architectures use 2D convolutions, but
with more computational capabilities. In another research, 3D convolutions are
also being used like Using two resolutions and applying the VB-Net for each
with Single-Class Dice loss in [8]. These are modiﬁed by researchers with a 3D
enhanced multi-scale network with residual V-Net and 3D dilated convolution
in [9]. A Simple dens V-Net with post-processing is presented in [10]. In [11], the
author used both 3D and 2D convolutions in a full convolution 3D network.

3 Proposed Methodology
3.1 Data Collection and Pre-processing
The experimental data was collected from the SegTHOR19 training and testing
datasets. The training data set include 40 patients (7390 slices), and testing
data contains 20 patients (3694 slices). By analyzing the provided training data,
The data is in the Neuroimaging Informatics Technology Initiative (NifTI) .nii
format. It was then converted into NumPy .npy format [13] and later to png
format using the matplotlib and PIL. A sample training image is shown in Fig.
1a, and a masked image is shown in Fig. 1b.
Pre-processing, often overlooked, is a major concern in terms of performance.
Generally, There are bright regions in the images as compared with the external
objects which will have a key eﬀect on the organ voxels when normalizing with
the original intensity range. Due to the said reason, the key step was assumed to
be normalization. The reduction in the variability in the size occurred due to the
re-sampling of the images to the same voxel spacing. It also helped in bringing the
testing case distribution near to the training case distribution [2]. The Computed
Tomography scans have 512 × 512 pixels with its spatial variations varying from
0.90 to 1.37 mm. The most frequent spatial resolution is 0.98 × 0.98 × 2.5 mm3 .
The 3D CT scan was converted into 2.5D or 2D images formed by stacking the
previous array and next array. They were also normalized to 256 range values.
The 3D CT scan was cut into slices along the axial, sagittal, and coronal planes
for visualization of the test data. The 3D visualization of the testing data is
depicted in Fig. 2.

3.2 Data Augmentation

At the period of training, re-scaling was done. Images have arbitrarily rotated
the images by ﬂipping them horizontally [13]. The data augmentation was imple-
mented on the png framework. The U-Net structure will not be aﬀected by
augmentation method used in the training period [4,7].

3.3 Evaluation Metrics

The Evaluation metrics used are Dice metric and Hausdorﬀ distance in order to
follow the metric evaluation of the competition Segthor’19. Moreover, they deﬁne
A 2D ResU-Net Powered Segmentation 57

(a) CT-scan image in 2D for training (b) Segmented mask

Fig. 1. A sample training and mask data

Fig. 2. 3D visualization of training data

similarity metrics and distinctions in pixels in accordance with the distance,

which evaluates the segmentation models (especially medical ones) critically to
gain better understanding about the practical working of the model.

The Overlap Dice Metric (DSC) has been used to ﬁnd overlap between
segmented area as result of proposed algorithm [3].

(2|X ∩ Y |)
DSC(X, Y ) = (1)
(|X| + |Y |)

The Hausdorﬀ Distance (HD)

HD(X, Y ) = max(h(X, Y ), h(X, Y )) (2)

58 M. Asudani et al.

HD is deﬁned as max(d1; d2), where d1 and d2 are the maximum distance, to

the closest manual contour point from automatic contour points, and manual
contour points [3].
The model obtained the DSC and HD between two surfaces, G and S. The
proposed model is trained on single class dice metric and modified accuracy
metric. The modified accuracy metric defined only four classes that are to be
segmented for accuracy metric calculation and didn’t consider the void class.
In total, five classes (heart, esophagus, trachea, aorta, and void). Metric took
into consideration the error of four main classes and did not include the void
class. To get a better idea of performance of proposed model HD and DSC are
used as their complementary nature.

3.4 Loss Function

The accuracy metric was utilized in our study. It was demonstrated by the
research that for highly unbalanced segmentation dice loss yielded better results
[15]. In this paper, the Dice loss has been used to rained the model [2]. The
accuracy metric shows a high instability therefore, the localization neuralnet the
more time to converge. We also used ﬂattened loss of Cross-entropy loss function
which gave nearly same results as compared with dice loss (Fig. 3).

(2|X ∩ Y |)
DSC(X, Y ) = (3)
(|X| + |Y |)

Diceloss = 1 − DSC(sumof allclasses)/numberof classes (4)

3.5 Proposed 2D ResU-Net Network Architecture

We started our experiment from uniform U-net encoder-decoder segmentation

architecture. The U-Net encoder-decoder architecture with long range connec-
tions has shown good performance in much of the segmentation tasks by effec-
tively combining the spatial-information features of CT images with high level
features with more global information to optimize classification. This model has
symmetrical encoder and decoder parts. Here, the encoder has comparatively
more non-linear mapping capability and learns by initializing parameters with
the well-liked networks that have been trained on the classification of medical
images. U-Net comprises of three sections, the contraction part, which is made
of many contraction blocks. This is used for the extraction of high-level con-
text information by convolutions and down-samplings. Each contraction block
accepts an input that applies two 3 × 3 convolution layers. The convolution lay-
ers are followed by 2 × 2 max pooling. After each block, the number of feature
maps or kernels doubles. Therefore, the architecture can effectively learn more
complex structures. The bottleneck layer is the bottom-most layer that mediates
A 2D ResU-Net Powered Segmentation 59

Fig. 3. The loss surfaces of ResNet-56 with/without skip connections. The proposed
filter normalization scheme is used to enable comparisons of sharpness/flatness between
the two figures.

between the contraction and the expansion layer of the U-net. This layer makes
use of two 3 × 3 convolutional neural network (CNN) layers preceded by 2 × 2 up
convolution layers. Same as the contraction layer on the left, the right expand-
ing section is also formed by many expansion blocks. Each of these blocks gives
input to two 3 × 3 convolution layers. To maintain the symmetry, Only half of
the feature map will carry forward after each block. The number of expansion
and contraction blocks on both sides is equal. The resulting mapping is fetched
to another 3 × 3 CNN. In this CNN layer, the number of feature maps is the
same as the number of segments desired.
The ResU-net model, as shown in Fig. 4, was implemented using the PyTorch
framework. ResU-Net brings out appreciable segmentation accuracy compared
with many other classical convolution networks. The residual connections pro-
vided the beneﬁts in reducing the training diﬃculty [2]. Along with that, train-
ing a deep network required more memory and training time. A mix of residual
connection with deeper network, as shown in Fig. 4 yields better or equal per-
formances but takes a lot longer to train.
Utilization of dilated convolutions was another attempt as shown in Fig. 5
with more tunable parameters that includes dilation rates; the performances
were alike and hence, no further investigate was carried out.

3.6 Training

For the training, the proposed model was trained with weight decay of 1e−2 and
a learning rate of 1e−4 as shown in Fig. 6 learning rate and loss. Then slices
were made for varying learning rates at diﬀerent epochs. The model was trained
for ten epochs. In the model, pixel shuﬄing and average pooling is used.
60 M. Asudani et al.

Fig. 4. The architecture of ResUnet

Fig. 5. The transposed convolution [21].

Total trainable parameters for our model are 19,946,396 and total non-
trainable parameters are 11,166,912. ImageNet stats were used for normalizing
the data.
For the task of image super-resolution, Shi et al. at [17] proposed to use pixel
shuﬄe as an upsampling operator.
This operator rearranges input channels to produce a feature map with higher
resolution, as shown in Fig. 7. Worth to mention, this technique solves the prob-
lem of checkerboard artifacts in the output image. Later, the same concept was
employed for semantic segmentation tasks [18,19]. The loss curve has been shown
in Fig. 8 with respect to epoch.
A 2D ResU-Net Powered Segmentation 61

Fig. 6. Finding learning rate with respect to the loss graphically

Fig. 7. The eﬃcient sub-pixel CNN [20].

3.7 Data Post-processing

Due to conversion to 2.5D images, the number of total images formed is less by
two images (i.e., ﬁrst one and last one) as compared with the given training data.
So, after the conversion of the results of 2.5D to 3D image again, void images
are added to the 3D image by stacking all the 2.5D images depth-wise. It was
noticed that the ﬁrst and last images missing are void images in all the cases.
62 M. Asudani et al.

Fig. 8. Loss curve.

4 Experimental Results

The proposed algorithm has been implemented in Python 3.6, 64-bit Ubuntu
Linux platform in docker of Nvidia DGX-1 GPU. The proposed method was
validated on the 20 Computed Tomography scans of the given test data. No
external data was used, and our model was trained from scratch. The proposed
method uses the evaluation metrics, overlap Dice metric (DSC) Dice Similarity
Coeﬃcient and the Hausdorﬀ distance given in Eq. 2 and 1. The best result
obtained by the proposed algorithm shown in Fig. 9. Moreover, a comparative
result with a recent previous approach has been given in Table 1. It is evident
from Table 1 that the proposed approach is able to achieve better performance
in terms of DSC and HD both. Moreover, a sample predicted output and ground
truth is also shown in Fig. 10.

Fig. 9. Dice and Hausdroﬀ distance for all four classes

A 2D ResU-Net Powered Segmentation 63

Table 1. Comparison with the previous approach

Proposed [13] [10]

DSC HD DSC HD DSC HD
Esophagus 0.80 0.5 0.81 0.68 0.77 1.68
Heart 0.934 0.23 0.93 0.26 0.93 0.20
Trachea 0.91 0.32 0.86 1.08 0.89 0.27
Aorta 0.93 0.31 0.92 0.52 0.92 0.30

Fig. 10. Comparison between ground truth and predictions of masks and CT scans of
the validation set

5 Discussion
The networks trained included U-net with ResNet34 and ResNet50, but the
results and metrics were similar and approximately equal. This network used a
2D CNN for training, and then also it has similar or better results than using a 3D
CNN network like V-nets or VB-net [8]. That’s why the parameters to be trained
are less, and the model is trained faster, cheaper, and with excellent eﬃciency
in results. A few lessons on convolutional neural network implementation were
learned, which are discussed below.

6 Conclusion
The images were converted to 3D CT scans from 2D to train our model. So,
there is a loss in slicing. State of the art architecture was used, and that helped
a lot with high accuracy. Without ResNet18, a single class dice metric was 0.39.
Pre-trained weights were used for resnet18 downloaded from torchvision models.
After using ImageNet stats for normalizing and using pretrained weights, the
accuracy graph got a high bump. This methodology gives accurate and more
robust segmentation as compared to manual segmentation. The proposed model
was applied to the test dataset and the results are depicted in Table 1.
64 M. Asudani et al.

Instead of training from scratch a better way that used in hospitals, is to

ﬁne-tune the general model with the same patient study before deploying to
newer studies, which helps to incorporate the patient-speciﬁc features into the
general model to provide much better results.

References
1. Cancer - World Health Organization. [Link]
2. Feng, X., Qing, K., Tustison, N.J., Meyer, C.H., Chen, Q.: Deep convolutional
neural network for segmentation of thoracicorgans-at-risk using cropped 3D images.
Med. Phys. (2019)
3. Trullo, R., Petitjean, C., Ruan, S., Dubray, B., Nie, D., Shen, D.: Segmentation
of organs at risk in thoracic CT images using a sharpmask architecture and con-
ditional random fields. In: IEEE International Symposium on Biomedical Imaging
(ISBI), pp. 1003–1006 (2017)
4. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomed-
ical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F.
(eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).
[Link] 28
5. Zhang, L., Wang, L., Huang, Y., Chen, H.: Segmentation of thoracic organs at risk
in CT images combining coarse and fine network. In: SegTHOR ISBI (2019)
6. He, T., Guo, J., Wang, J., Xu, X., Yi, Z.: Multi-task learning for the segmentation
of thoracic organs at risk in CT images. In: SegTHOR ISBI (2019)
7. Vesal, S., Ravikumar, N., Maier, A.: A 2D dilated residual U-Net for multi-organ
segmentation in thoracic CT. arXiv preprint arXiv:1905.07710 (2019)
8. Han, M., et al.: Segmentation of CT thoracic organs by multi-resolution VB-nets.
In: SegTHOR ISBI (2019)
9. Wang, Q., et al.: 3D enhanced multi-scale network for thoracic organs segmenta-
tion. In: SegTHOR ISBI (2019)
10. Feng, M., Huang, W., Wang, Y., Xie, Y.: Multi-organ segmentation using simplified
dense V-net with post-processing. In: SegTHOR ISBI (2019)
11. van Harten, L.D., Noothout, J.M., Verhoeff, J.J., Wolterink, J.M., Isgum, I.: Auto-
matic segmentation of organs at risk in thoracic CT scans by combining 2D and
3D convolutional neural networks. In: SegTHOR ISBI (2019)
12. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional
encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal.
Mach. Intell. 39(12), 2481–2495 (2017)
13. Gibson, E., et al.: Niftynet: a deep-learning platform for medical imaging. Comput.
Methods Programs Biomed. 158, 113–122 (2018)
14. Kim, S., Jang, Y., Han, K., Shim, H., Chang, H.J.: A cascaded two-step approach
for segmentation of thoracic organs. In: CEUR Workshop Proceedings, vol. 2349.
CEUR-WS (2019)
15. Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Jorge Cardoso, M.: Generalised
dice overlap as a deep learning loss function for highly unbalanced segmentations.
In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp.
240–248. Springer, Cham (2017). [Link] 28
16. Lambert, Z., Petitjean, C., Dubray, B., Ruan, S.: SegTHOR: Segmentation of Tho-
racic Organs at Risk in CT images. arXiv preprint arXiv:1912.05950 (2019)
A 2D ResU-Net Powered Segmentation 65

17. Shi, W., et al.: Real-time single image and video super-resolution using an eﬃcient
sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)
18. Chen, K., Kun, F., Yan, M., Gao, X., Sun, X., Wei, X.: Semantic segmentation of
aerial images with shuﬄing convolutional neural networks. IEEE Geosci. Remote
Sens. Lett. 15(2), 173–177 (2018)
19. Gao, H., Yuan, H., Wang, Z., Ji, S.: Pixel deconvolutional networks. arXiv preprint
arXiv:1705.06820 (2017)
20. Wang, Z., Liu, D., Yang, J., Han, W., Huang, T.: Deeply Improved Sparse Coding
for Image Super-Resolution, ArXiv 2015, abs/1507.08905
21. Boureau, Y., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in
vision algorithms. In: Proceedings of International Conference on Machine learning
(ICML 2010), vol. 28 (2010)
A Compact Shape Descriptor Using Empirical
Mode Decomposition to Detect Malignancy
in Breast Tumour

Spandana Paramkusham1(B) , Manjula Sri Rayudu2 , and Puja S. Prasad1

1 Geethanjali College of Engineering and Technology, Cheeryal Village, Kesara Mandal,
Medchal, Hyderabad 501301, Telangana, India
spandanaparamkusham1@[Link]
2 Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering and Technology, Vignana

Jyothi Nagar, Pragathi Nagar, Nizampet (S.O), Hyderabad 500090, Telangana, India

Abstract. Breast cancer is the most common cancer in India and the world. Mam-
mogram helps the radiologists to detect abnormalities in breast. Analysis of the
lesions on breast helps doctors in the detection of cancer in early stages. Lesion
contours of breast are characterized by their shape. Malignant lesion contours
have speculated and ill-defined shapes and benign have circular and lobulated
shape. In the present work, we proposed a method to classify breast contours
into benign/malignant using empirical mode decomposition (EMD) technique.
Initially, the two-dimension contours of breast lesions are compacted into 1D
signature. Further, 1D signatures of lesions are decomposed into intrinsic mode
functions (IMFs) by the EMD algorithm and statistical based features are cal-
culated from these IMFs. This parameters form a input feature vector which are
further fed to classifier.

Keywords: Breast cancer · Mammogram · Feature extraction · Empirical mode

decomposition · Classification

1 Introduction
Breast cancer is the most common cancer in India and the world. According the WHO
reports, 2.1 million women got affected with breast cancer each year, and resulted in
highest mortality rate among women [1]. In 2018, nearly 627,000 women died due to
breast cancer. Approximately 15% of death in women is due to breast cancer. Mammog-
raphy plays prominent role in the detection of breast cancer in early stages. Computer
aided diagnosis and detection of masses from mammograms helps radiologists in early
indication of breast cancer. Mass is one of the abnormality in breast in which the radi-
ologists look for diagnosis. Masses are characterized by their shape. Benign mass is
circular or round with well defined boundary but where as malignant mass is spicu-
lated with fuzzy boundary. Shape descriptors are very important tools to classify masses
in breast. The goal of shape based descriptors is to measure spiculation in malignant
masses based on their boundary. Complexity of 1D signature of mass contour is studied

© Springer Nature Singapore Pte Ltd. 2021

D. Garg et al. (Eds.): IACC 2020, CCIS 1367, pp. 66–74, 2021.
[Link]
A Compact Shape Descriptor Using Empirical Mode Decomposition 67

using fractal analysis and achieved accuracy of 89% using ruler method [2]. Several
studies have been carried out to classify masses as benign and malignant. Shape fea-
tures such as compactness (C), fractional concavity (Fcc), spiculation index (SI), and
a Fourier-descriptor-based factor (FF) are calculated to discriminate benign and malig-
nant contours [3, 4]. Pohlman et al. [5] applied fractal analysis to benign and malignant
contours of breast masses and achieved accuracy of 80%. Rangayan et al. [6] employed
fractal analysis based on power spectral analysis to classify breast contour 1D signatures.
Texture features can also be extracted from mammograms to classify masses as benign
masses are homogeneous in nature and malignant masses have heterogeneous textures.
Many researchers have contributed papers on classification of masses using texture fea-
tures. Yang et al. [7], applied wave atom transform to extract features and classified the
masses using random forest classifiers. Prathibha et al. [8] employed a method of bandlet
and orthogonal ripplet type II transforms to extract features and applied KNN classifier
to distinguish normal-benign, normal-malignant and malignant-benign images. Dhahbi
et al. [9] used curvelet moment to classify masses. However, the use of texture features
results in high dimensional feature vector and increases computational cost of the classi-
fication model [7]. Regardless, many researches have shown that shape based descriptors
are more useful compared to any other descriptors such as texture, color, etc., [10]. In
the work proposed we have implemented EMD algorithm to extract features from 1D
signature of 2D mass contours to classify masses. Empirical mode decomposition algo-
rithm is developed by Huang et al. [11] to analyse nonstationary or nonlinear signals.
Djemili et al. [12] applied EMD algorithm and artificial neural networks to classify 1D
EEG signals. Orosco et al. [13] employed EMD for epileptic seizure detection.
In this work we focus on extraction of compact shape feature vector from 2D mass
contours. This work is proposed in three steps. In the first step, the 2D contour is mapped
into a compact 1D signature using Eucleidian distance. In the second step the 1D sig-
nature is further compressed using empirical mode decomposition algorithm to extract
statistical based features from IMFs of 1D signature and in the third step the extracted fea-
tures are given to classifier to discriminate benign and malignant masses. The proposed
model to classify breast masees is shown in Fig. 1.

2 Materials and Methods

2.1 Data Set
The mammography images used in this work are taken from publically available database
“Digital Database for Screening Mammography” (DDSM) [14]. Each mammogram
image containing massThe information related to particular patient is provided in overlay
file. The overlay file consist information about type of abnormalities and its location
present in mammogram. The description of abnormalities such as lesion type, subtlety,
assessment and its outline is given in overlay file. The outline of mass contour is given
as chain code in overlay file. The information related to contours in overlay file is given
by expert radiologists. The database contains normal, benign and cancer volumes. For
our research, we require benign and malignant mass contours. So, we have chosen 147
mammography images set from DDSM database which has only mass contours. These
set include 73 malignant and 74 benign contours.
68 S. Paramkusham et al.

Mammogram Mass contours

Mapping of 2D contour into 1D

Signature

Feature Extraction using EMD

algorithm

Classification

Fig. 1. Flow chart of the proposed model

2.2 Mapping of 2D Contour into 1D Signature

Benign masses are almost circular and well defined which gives smooth signature and
malignant masses have speculated and rugged boundary. 1D signature curve of mass
contours is an important component for diagnosis of benign and malignant tumors or
masses due to its invariant properties in Euclidean space and the signature curve does
not changes with the orientation of mass contours [14] in mammogram. Mapping od 2D
contour into 1D signature is performed by centralized distance function method and it
is discussed below.

Centralized Distance Function

1-D signatures, defined as the plot of Euclidean distance from centroid of mass to the
each contour point vs a function of the index of the contour point. The contours and
signatures of both malignant and benign masses are shown in Fig. 2a, b, c and d.
A Compact Shape Descriptor Using Empirical Mode Decomposition 69

2.3 Compression of 1D Signature Using Empirical Mode Decomposition

EMD method is used to decompose the nonlinear or non-stationary 1D signal proposed

by Huang et al. [11]. It decomposes the 1D signal into intrinsic mode function (IMFs)
on two conditions. First condition is the difference between number of extremas and
zeros for IMFs must be 0 or 1 and the second condition is the mean of envelopes of local
maxima and local minima must be zero.
Let us consider the 1D signature of mass contour is x(t), where t indicates contour
index. The 1D signature x(t) can be decomposed into IMFs as
M
x(t) = xm (t) + rm (t) (1)
m=1

The procedure to obtain IMFs from 1D signature is summarized in steps given below
[12]
Step1: Intialize m = 0, and r(t) = x(t)
Step2: local minima and the local maxima of x(t) are to be computed
Step 3: Get the local minima and maxima envelopes using cubic spline interpolation and
they are represented as El (t)(lowerenvelope) and E(t)(t)(upperenvelope)
Step 4: Calculate mean of the envelopes and it is given as
El (t) + Eu (t)
M (t) =
2
Step 5: Compute mode 1 IMF represented as h(t)

h(t) = x(t) − M (t)

Step 6: h(t) is IMF1 (imfm (t) = h(t))

a) b) c) d)

Fig. 2. a) Benign contour b) 1D signature of 2a c) Malignant contour d) 1D signature of 2c

If it satisfies the two condition specified above. Then increment m = m + 1 and go to

step 7
else
x(t) = h(t) and repeat step 1 to 5
Step 7: compute residual r(t) = r(t) − imfm (t) and go to step1

IMFs and residual 1D signature of Fig. 2c is shown in Fig. 3

70 S. Paramkusham et al.

Feature Extraction
Features are extracted from IMFs of 1D signature obtained by the EMD algorithm.
The features extracted from IMFs are given as follows

• Root mean square value of IMF1

• First order diff of IMF1
• Ratio of mean to standard deviation of IMF1
• Entropies of IMF1, IMF2, IMF3

Along with above features, we also calculated length of the 1D signature, area,
solidity and eccentricity of 2D contour. We computed ten features for each contour
considered in the dataset. These features are further given to different classifiers for
further validation.

2.4 Classification

Classification is an important step to validate the efficacy of the proposed method. The
features extracted from the procedure discussed above are given to different classifiers
such as K-Nearest-neighbor (KNN), support vector machine (SVM), Adaboost decision
tree classifier and artificial neural network (ANN) are used to discriminate benign and
malignant mass contours.
Performance analysis of different classification model is achieved by computing
different parameters such as accuracy, sensitivity, specificity and Area under the curve
(AUC).

Fig. 3. IMF’s and residual of 1D signature of 2d

A Compact Shape Descriptor Using Empirical Mode Decomposition 71

3 Results and Discussion

The proposed method has been carried out on 97 contours out of which 67 are benign
contours and 30 are malignant contours. Among 97 mass contours we have considered
19 images for testing 78 images for training. The proposed method for feature extraction
from EMD algorithm is validated using different old-out technology. The simulations
were carried using MATLAB 2018a.
To analyze the performance of classifiers we have evaluated accuracy, sensitivity,
specificity and positive prediction value. Most of the features in the proposed method
are extracted from IMF1 because it contains high frequency components.

Table 1. Accuracy (%) with different classifiers

Classifiers IMF1 features Entropies of IMFs 2D features All ten features

SVM 81.8 78 80 94.7
KNN 68.2 77 78 86.1
Decision Tree 50 72.7 70 77.3

In the proposed work, ten features have been extracted and fed to SVM (Support
Vector Machine), KNN (K-Nearest Neighborhood) and Decision tree classifier. Table 1
shows the accuracies computed with different classifiers. Among them SVM classifier
achieved accuracy of 94.7%. Intially, the classifiers are fed with different feature set such
as only IMF1 features, entropies of IMF1, IMF2 and IMF3 and 2D contour features and
computed accuracies as shown in Table 1.

Table 2. Accuracies with different training to testing ration with classifiers

Training: testing SVM KNN Decision tree

ratio
80:20 94.7 86.1 77.3
75:25 83.3 66.7 79.2
50:50 75 72.9 75

Different sets of training to testing ratio of mass contours have been considered for
classification. First, we used 20% of mass contours for testing and 80% for training
and achieved accuracy of 94.7%, 86.1% and 77.3% with all three classifiers. The Area
under curve (AUC) is 0.85 with SVM classifier and 80:20 testing to training ratio as
shown in Fig. 4. In the same way the Fig. 5 shows the confusion matrix for testing
images with SVM kernel. Secondly, we used 25% for testing and 75% for training and
obtained accuracy of 83.3%, 66.7% and 79.2% with SVM, KNN and Decision Tree.
Finally, we used 50% for testing and 50% for training and obtained accuracy of 75%,
72 S. Paramkusham et al.

Fig. 4. AUC with SVM (linear kernel)

72.9% and 75%. Therefore, from Table 2 we can conclude that the accuracies obtained
with different number of testing images is above 75%.

Fig. 5. Confusion matrix

Table 3 gives comparison of our proposed method with the existing methods. Our
proposed model has given all assessment parameters such as accuracy, sensitivity, speci-
ficity and AUC which is not specified for other methods. Our method also achieved
highest accuracy of 94.7%. The drawback of our model is we have tested with less
number of mass contours when compared to other methods.
A Compact Shape Descriptor Using Empirical Mode Decomposition 73

Table 3. Comparison of accuracies, specificity, sensitivity and AUC with our proposed method

Feature extraction method Images Acc (%) Sens (%) Spec (%) AUC
GaborPCA [15] 114 80 – – –
Fractional concavity and spiculation index 111 82 0.79
[3]
Fractal dimension using ruler method and 111 – – – 0.82
fractional concavity [2]
Proposed method 97 94.7 100 83 0.85

4 Conclusion

In this paper, we proposed a compact shape descriptor with empirical mode decompo-
sition algorithm from 1D signature of 2D mass contour for the classification of benign
and malignant masses. This proposed method can help radiologists in classification of
breast masses. The proposed methos is validated using different classifiers and achieved
maximum accuracy of 94.7%. The experimental results show that our proposed method
achieved accuracy of 94.7%, sensitivity of 100% specificity of 83% to classify benign
and malignant masses.

References
1. [Link]
2. Rangayyan, R.M., Nguyen, T.M.: Fractal analysis of contours of breast masses in mammo-
grams. J. Digit. Imaging (2006). [Link]
3. Rangayyan, R.M., El-Faramawy, N.M., Desautels, J.E.L., Alim, O.A.: Measures of acutance
and shape for classification of breast tumors. IEEE Trans. Med. Imag. 16(6), 799–810 (1997)
4. Rangayyan, R.M., Mudigonda, N.R., Desautels, J.E.L.: Boundary modelling and shape anal-
ysis methods for classification of mammographic masses. Med BiolEngComput 38, 487–496
(2000)
5. Pohlman, S., Powell, K.A., Obuchowski, N.A., Chilcote, W.A., Grundfest-Broniatowski, S.:
Quantitative classification of breast tumors in digitized mammograms. Med. Phys. 23(8),
1337–1345 (1996)
6. Rangayyan, R.M., Oloumi, F.: Fractal analysis and classification of breast masses using the
power spectra of signatures of contours. J. Electron. Imaging 21(2), 023018 (2012)
7. Yang, W., Tianhui, L.: A robust feature vector based on waveatom transform for mammo-
graphic mass detection. In: Proceedings of the 4th International Conference on Virtual Reality
(2018)
8. Prathibha, G., Mohan, B.C.: Classification of benign and malignant masses using bandelet
and orthogonal ripplet type II transforms. Comput. Methods Biomech. Biomed. Eng. Imaging
Vis. 6(6), 704–717 (2018)
9. Dhahbi, S., Barhoumi, W., Zagrouba, E.: Breast cancer diagnosis in digitized mammograms
using curvelet moments. Comput. Biol. Med. 64, 79–90 (2015)
10. Rojas-Domínguez, A., Nandi, A.K.: Development of tolerant features for characterization of
masses in mammograms. Comput. Biol. Med. 39(8), 678–688 (2009)
74 S. Paramkusham et al.

11. Huang, N.E., Shen, Z., Long, S.R., et al.: The empirical mode decomposition and the Hilbert
spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. London 454,
903–995 (1998)
12. Orosco, L., Laciar, E., Correa, A.G., Torres, A., Graffigna, J.P.: An epileptic seizures detection
algorithm based on the empirical mode decomposition of EEG. In: Conference on Proceedings
of IEEE Engineering in Medicine and Biology Society (2009)
13. Djemili, R., Bourouba, H., Korba, M.C.A.: Application of empirical mode decomposition and
artificial neural network for the classification of normal and epileptic EEG signals. Biocybern.
Biomed. Eng. 36(1), 285–291 (2016)
14. Arica, N., Yarman-Vural, F.T.: A compact shape descriptor based on the beam angle statistics.
In: Bakker, E.M., Lew, Michael S., Huang, T.S., Sebe, N., Zhou, X.S. (eds.) CIVR 2003.
LNCS, vol. 2728, pp. 152–162. Springer, Heidelberg (2003). [Link]
45113-7_16
15. Görgel, P., Sertbas, A., Ucan, O.N.: Mammographical mass detection and classification
using local seed region growing–spherical wavelet transform (LSRG–SWT) hybrid scheme.
Comput. Biol. Med. 43(6), 765–774 (2013)
An Intelligent Sign Communication
Machine for People Impaired
with Hearing and Speaking Abilities

Ashish Sharma1(B) , Tapas Badal2 , Akshat Gupta1 , Arpit Gupta1 ,

and Aman Anand3
1
Computer Science Engineering Department,
Indian Institute of Information Technology, Kota, Kota, India
[Link]@[Link]
2
Department of CSE, Bennett University, Noida, India
3
Electronics and Communication Department,
Indian Institute of Information Technology, Kota, Kota, India

Abstract. People who are impaired with speaking and hearing abilities
use sign language for communication between them, but it is a tough task
for them to communicate with the outside world. Through this paper, we
are proposing a system to convert Indian Sigh Language (ISL), Ameri-
can Sign Language (ISL) and British Sign Language (BSL) hand ges-
tures to a textual format of the respective language as well as convert
text in to their preferable Sign language. In this paper, we are captur-
ing ISL, ASL, BSL gestures through a web camera. The streaming video
of hand gestures is then sliced to distinct images to match the finger
orientation to the corresponding alphabets. Finger orientations as fea-
tures of the hand gestures in terms of angles made by fingers, numbers of
fingers completely open, semi-open, fully closed, finger axis verticals or
horizontal and recognition of each finger are prepossessed and required
for gesture recognition. Implementation is done for alphabets uses single
hand and results are explained. After prepossessing the hand part of the
sliced frame in the form of masked image is projected to the extraction
of features from the image frame. To classify different gestures we used
SVM (Support Vector Machine), CNN (Convolutional Neural Network)
for further testing the probable gesture and recording the accuracies of
each algorithm. Implementation is done over our own regular ISL, BSL,
ASL data-set made by us only, using the web camera of our laptops. Our
Experimental results depict that our proposed work and methodology
can work on different backgrounds like a background consist of differ-
ent objects or may have some sort of color background etc. For text to
sign conversion we create a video which tells respective text into sign
language.

Keywords: Indian Sign Language Recognition (ISL) · Text to sign

conversion · Hand gesture recognition · Hand segmentation · Support
Vector Machine (SVM) · Convolutional Neural Network (CNN)

c Springer Nature Singapore Pte Ltd. 2021

D. Garg et al. (Eds.): IACC 2020, CCIS 1367, pp. 75–86, 2021.
[Link]
76 A. Sharma et al.

1 Introduction
All non-vocal communication requires a particular action for a particular context
like the movement of the face, flipping of hands or folding fingers or actions by
any other body part is a form of gesture. Gesture recognition is a method to
make a machine or a computer get to recognize these actions. Algorithms used
by these methods act as a mediator between human and machine. This enables a
computer to interact with humans naturally by their own without any physical
contact, actually just by using cameras as their eyes. Deaf and dumb people
use hand gestures in their community for communication under the name sign
language. This leads to a kind of isolation between their community and ours
due to language differentiation as a normal person do not want to learn such
language. So if we can program our computers in such a way that they take
input in sign language and process them to convert in their respective language
or maybe other languages also either in speech or in the textual format then
they can act as a noble inter mediator and can remove the language barrier,
the difference between communities can be minimized and the most important,
knowing a language will meet to its worthy result in this high-tech world as sign
language can interact to English and vice versa. All these discussions lead to a
need for a system which can act as a translator and converts sign language to
the desired language in the desired format, so people with a different language
background can have a possible conversation with the people who know only
sign language due to some disabilities but literate.
Sign Language shares grammar syntax like the use of pauses, full stop, and
simultaneity, hand postures, hand placement, orientation, motion of the head,
face gestures with different sign languages. As a country like India is completely
diverse in terms of culture, religion, beliefs, and majorly in languages, so there
is not a standard sign language is adopted in India. Various social groups of
Indian Sign Language with their native and historical variation are there in
India in various parts of the country. But still, language skeleton is similar for the
maximum gestures. Work relating to the system of contrast relationships among
the speech sounds that constitute the fundamental components of ISL started
in the 1970s. With the help from Woodward, National Science Foundation USA
Vasishta and Wilson visit of India and collection of signs from different points
in the country for language analytic.
The organization of the paper is as follows: ‘Sect. 2’ the methods related to
different technologies available in the language. ‘Section 3’ explains the given Sign
language recognition system the method which uses algorithms for skin cropping
and SVM(Support Vector Machine). ‘Section 4’ concerns on the implementation
results and ‘Sect. 5’ is description and conclusion.

2 Literature Survey
This paper [14] proposes HSI color model for segmentation of images instead of
RGB model. HSI model works better for skin color recognition. The optimal H
An Intelligent Sign Communication Machine 77

and S values for hand as specified in [14] is H < 25 or H > 230 and S < 25 or
S > 230. After this they use euclidean distance formula to evaluate the distance
between centroid of palm and fingers. Distance transform method is used to
identify the centroid of the hand. The pixel with the maximum intensity becomes
the centroid. To extract each finger tip they select farthest point from centroid.
Every finger is identified by predefined sign gestures. To recognize semi opened
finger they divide every finger into 3 parts. and angle between the centroid and
the major axis of finger is calculated (Figs. 1, 2 and 3).

Fig. 1. Indian sign language alphabets [6]

Fig. 2. American sign language alphabets [13]

In this paper [4] they used YCbCr color space, where Y channel represents
brightness and (Cb, Cr) channels refer to chrominance. They use Cb, Cr channels
to represent color and avoid Y since it is related to brightness only. There are
some small regions near skin but not in skin so they use morphological operation.
After that they select skin region and extract features to recognize hand gesture.
They use three features velocity, orientation and location. They use orientation
78 A. Sharma et al.

Fig. 3. British sign language alphabets [9]

feature as a main feature for their system. Then they classify features using
Baum-Welch algorithm (BW). The gesture of hand motion is recognized using
Left-Right Banded model with 9 stage.
In this paper [10] they used YCbCr color space. This color model is imple-
mented by defining skin range in RGB model then convert these values into
YCbCr model using conversion formula. They used support vector machine
(SVM) algorithm. This algorithm use hyper plane to differentiate between two
classes. Hyper plane is defined by the Support vectors which are nothing but the
subset of training data. This algorithm also used to solved multi-class problem
by demising it into two-class problem.
They [7] create data-set using an external camera having some specifications
like 29 fps, 18 MP ans Canon EOS with 18–55mm lens. They eliminate back-
ground and extract hand region from left-out upper body part. They used RGB
configuration of frame having dimensions of 640 * 480 then they extract key
frames from video. They use orientation histogram to extract key frames. They
used different distance metrics (Chess Board Distance, Euclidean distance etc)
to recognise a gesture. After successful recognition of gesture they classified them
for text formation.
They [8] use Fully convolution network algorithm. In particular they used 8
layers FCN model which achieves good performance and used for solving dense
prediction problems. The output segmentation of this network is robust under
various face conditions because it consider a large range of context information.
After that they use CRF algorithm for image matting.
They [9] used Convolution neural network to generate their trained model.
In this network they used 4 layers, in first stage they used five rectified linear
units (ReLu), in second stage two stochastic pooling layers then one dense and
one SoftMax output layer. They took frames of 640 * 480 dimensions then resize
these frames into 128 * 128 * 3. They took 200 frames by 5 different people and
at 5 different viewing angles. Their data-set size is of 5000 frames.
An Intelligent Sign Communication Machine 79

In this paper [13] they used CNN to recognize static sign gestures. They use
American Sign Language (ASL) data-set to train their model which is provided
by Pugeault and Bowden in 2011. There are around 60,000 RGB images they
used for training and testing. They perform some operations on this data-set
because not every is image has same depth according to their dimensions. They
used V3 model to perform color features then for better accuracy they combined
it with depth features. They use 50 epoch and 100 batch size to train their model
using CNN.
Suharjito et al. [1] reviewed the diﬀerent methods and techniques that
researchers are using to develop better Sign Language.
Kakoty et al. [6] address the sign language number and alphabets recognition
using hand kinematics with hand glove. They achieved the 97 % recognition rate
of these alphabets and numbers.
In this article [11] the proposed system is translating the English text into
Indian Sign Language (ISL). Authors have used human-computer interaction to
implement it. The implemented system consists of the ISL parser, the Hamburg
Notation System, the Signing Gesture Mark-up Language and generates the
animation for ISL grammar.
Paras et al. [12] used the wordnet concept to extend and expansion of the
dictionary and further construct the system to develop the Indian sign language
system for dump and deaf peoples.
Matt et al. [5] address the video-based feedback information to students to
learn the American Sign Language (ASL).
In this artical [3] authors address the deep learning based Gesture Images
implementation for sign language. The validation accuracy obtained for this
implementation using the diﬀerent layers of deep learning is more than 90%.

3 Proposed Work
Flow Chart. The given flow chart explains the work flow of our project includes
segmentation of video and then masking of image followed by canny edge detec-
tion which is used surf library and then features of images projected to clustering
and comparisons between clusters of training and testing data is further done
by svm library as described below flowchart.

Segmentation. As to recognize and classify each and every character of the

input video it is required to apply image processing on it. For that purpose, the
input video is converted into frames so that diﬀerent image processing algorithm
can be applied to them.
So for that in this step Input video is converted into frames, this step converts
video into 30 frames per second (fps) which is default for the webcam used for
making video but as we required less frames per second to recognize the each
and every character in the video, so by giving delay to the function which is
converting video to frame we are able to achieve the required output which is to
get 5 frame per second of a input video.
80 A. Sharma et al.

Skin Masking. The reasoning behind a process such that to remove the extra
noise in the segmented frame, after the masking there should be only the Region
of Interest (ROI), which contains only useful information in the image. This
is achieved via Skin Masking deﬁning the threshold on RGB schema and then
converting RGB colour space to grey scale image (Fig. 4).

Fig. 4. Design ﬂow for the project

So to achieve skin masking various image processing functions has been used.
Firstly, the frame is convert into a gray schema. This output gray image will
help us to convert it to HSV schema which will help us to detect the skin colour
which is the main objective of ours so that we can identify the hand region. After
identifying the hand region we have removed the noise from the image using blur
function.
An Intelligent Sign Communication Machine 81

Classification. Sign classiﬁcation is perform by using diﬀerent techniques such

as: a) Support Vector Machine (SVM) b) Convolution Neural Network (CNN).
Mainly we have use SVM for testing the accuracy of our system by dividing
the 70% data as testing data and remaining as testing data where as we have
used CNN mainly for live recognition of the sign of the alphabets.
Classification using SVM involves various steps as shown in figure. We convert
saved masked image into 100 * 100 pixels then calculate its mean and variance
oven 10 pixels.
n
xi
i=1
Mean μ =
n

n
(xi − μ)2
i=1
Variance : σ 2 =
n
We use these parameters as classification features for SVM. We use linear regres-
sion as base kernel of our SVM algorithm. After creating model of these specifi-
cations (gamma = default and c = 1) we train our model with our data-set. To
check accuracy of our model we use 30% data-set for testing purpose.
We use Convolution Neural Network as our second algorithm. To create CNN
model we use Sequential Model in which three layers have been used:
1) Rectified linear unit (ReLU) Layer
2) pooling Layer
3) Fully connected layer for this we used Sigmoid activation function
After successful creation of this model we use our 70% data-set to train this
model and remaining to test and validate our model.
We used CNN model for real time gesture recognition instead of SVM model
because of its better performance (in terms of time).

Output Labels and Sentence Formation. Proposed System predict output

labels using CNN model and show that label in real time on each frame. To form
sentence system need to store that character into their words and provide space
in between sentence. In this stage proposed system generate character from user
input and arranged it into a word from dictionary, user can choose to provide
space between words by pressing a key manually.

Text to Video. To convert text into sign video generation function is applied.
We use sign of alphabets to convert text into sign language.

4 Experiment Setup
Data-Set. As we have searched on the internet and we found no resources
from where we can get Indian Sign Language dataset. So after a long eﬀort
82 A. Sharma et al.

in searching and ﬁnding dataset from diﬀerent resources, then we only made
our own ISL dataset as in our lighting conditions and in other factors like own
environmental setup. There we have 26 × 15O = 3900 static training images
and 26 × 30 = 60 images which will use for testing. The actual resolution of
the images is 640 × 480, which will be cropped and normalized into 120 ×
120. The samples from the video are 320 × 260 in size and they are taken in a
various lighting environment. Same process we used on two other sign languages
American sign language and British sign language.
We have made one interface where we have given choice to the user in which
language he/she wants to do operation i.e whether in ISL, BSL or ASL. After
that another two other choices will come in which user have to tell whether
he/she wants to do sig-text conversion or text-sign conversion. It makes our
system user friendly and a normal people can easily use it for communication.

Algorithms
– Support Vector Machine Algorithm The support vector machine (SVM) is an
algorithm which is used for two-class problems (Binary classification prob-
lems) in which the concerned data can be separated by a different plane like
linear plane, parabolic plane etc depending upon the number of features of
the sets. Hyper plane basically refers to a virtual plane that can be drawn
in the 3D properties plot of the given data in order to separate them on the
basis of some features. Different classes are separated using it which uses the
training data to do the supervised learning of the system. Every feature in
the training data set is send with the target value to do the learning of the
system according to it. Support vector machine is mainly used to predict the
targeted value of the given testing data set features according to the plane
which is drawn by the algorithm for the distinguish of the different features
in the training data set [2].
Both Classification or regression function can be used for the mapping of
function. When there are non-linear functions for the distinction non-linear
plane is used according the features of it to convert it into n-d space distinc-
tion. Fig represents the plane which is drawn to separate the n-features in
n-d plane. Then the creation of Maximum-margin hyper planes can be done.
Proposed model works over only a subset of the training data set as per the
class boundaries. Similarly, This model can also be produced by SVR (sup-
port vector regression).
SVM uses different values of gamma and c to draw the hyper plane between
the two clusters for distinct of them. Larger the value of gamma more it con-
sidered the points far from the hyper plane which will give the better result
and c will tell how smoothly will the plane gonna be larger the value of c
greater distinguish it will take in consideration.
– Convolution Neural Network The combination of neurons with biases and
weights is known as Convolution Neural Network. The neurons which are
there in the layer gets the input from the its parents layers. Computation of
product between the weights and input is done, and posses an option to follow
An Intelligent Sign Communication Machine 83

the processes output with a non-linearity. Implementation of the properties in

the CNN is done with the assumption that of taking all the inputs as images.
The CNN architecture has been classiﬁed in to diﬀerent layers, it contains
many convolution layer along with the activation function layer called ReLU
layer and Pooling step. Standard architecture of CNN is in the last layer. is
plenary connected is a standard architecture of CNN.

The CNN architecture has been classified in to different layers: (1) Con-
volution Layer: We extract features from our frame in this convolution layer,
Some parts of image is link to the upcoming layer convolution layer. Computa-
tion of the dot product is19 done in the receptive area and a kernel [3 * 3 filter]
on all the image as shown in the image. The output of the dot product gives as
the integer value which is known as features as shown in fig. After that feature
extraction is done using filter or kernel of small matrix. (2) Padding Process:
Padding means to do the summation of all the features which we got in the fea-
ture map and finally putting the summation in the middle of the 3 × 3 matrix.
This is done to get the equal dimension of output which we have used in the
input volume.
(3) Rectifier Activation Function (ReLU):
After the implementation of convolution layer on the image matrix, we will use
ReLU layer to get the non-linearity to the system by applying ReLU (non-linear
activation function) to the feature matrix. There are many activation function
are present but here we are using ReLU as it does not which makes the network
hard to train.
(4) Pooling Layer:
Controlling of over fitting and decreasing the dimension of the image is done in
Pooling layer. It can be done in three ways first one is max, second one is average
and third one is mean pooling, here we are using the max pooling, it is used to
take maximum value from the input which we are convoling with features.
(5) Fully Connected Layer:
This one of the important layer of convolution layer as it gives the classified
images according to the training data set. We have used the different sign images
for the training set as discussed above.
(6) Epochs:
During the whole data set is going backward and forward propagation through
networks is called epochs.
(7) Training Accuracy:
Training accuracy given by the model, when we are applying training on training
data sets.
(8) Validation Accuracy:
After the successful training of the model then it is evaluated with help of test
data sets then accuracy of model is predicted.
84 A. Sharma et al.

5 Experimental Result
We have performed the training on three diﬀerent sign languages each having
45,500 training images and performed the testing on 20,800 images.

Accuracy. We have compare the performance of diﬀerent languages using both

the classiﬁcation method below is the comparison table of that in terms of accu-
racy of each sign language (Tables 1 and 2).

Table 1. Performance of diﬀerent languages using both the classiﬁcation methods.

Language CNN-accuracy SVM-accuracy

Indian Sign Language (ISL) 0.9988209 0.9876898
American Sign Language (ASL) 0.98949781 0.9796472
British Sign Language (BSL) 0.98645851 0.97289481

Table 2. Accuracy table for diﬀerent algorithms on ISL.

Algorithm Accuracy
K-nearest neighbour 0.6628820960698
Logistic regression 0.7554585152838
Naive bayes 0.6283842794759

Fig. 5. Output of ‘L’ sign recognized by the system

Fig. 6. Output showing sentence formation using the system

An Intelligent Sign Communication Machine 85

Fig. 7. Output showing sentence formation in sign language

6 Conclusion

We have successfully perform diﬀerent sign languages conversion into their

respective alphabets as well as form sentences using these alphabets. We has
used our webcam to capture gesture instead of some speciﬁed powerful camera
like RGB-D. Our systems also work on text to sign conversion. So it is works
like communication medium between who can communicate only in sign lan-
guage and those who don’t understand it (Fig. 5, 6 and 7).

– We have worked on the stationary hand gesture but sign language can have
moving hands also. So, in future it can be done for both moving hands also.
– The major problem with the project is it is mainly depend on the lighting
condition so in future the eﬀect of lighting can be overcome.

References
1. Abraham, A., Rohini, V.: Real time conversion of sign language to speech
and prediction of gestures using artiﬁcial neural network. Proc. Comput. Sci.
143, 587–594 (2018). [Link] [Link]
[Link]/science/article/pii/S1877050918321331. 8th International Con-
ference on Advances in Computing & Communications (ICACC-2018)
2. Dai, H.: Research on svm improved algorithm for large data classiﬁcation. In: 2018
IEEE 3rd International Conference on Big Data Analysis (ICBDA), pp. 181–185,
March 2018. [Link]
3. Das, A., Gawde, S., Suratwala, K., Kalbande, D.: Sign language recognition using
deep learning on custom processed static gesture images. In: 2018 International
Conference on Smart City and Emerging Technology (ICSCET), pp. 1–6 (2018)
4. Elmezain, M., Al-Hamadi, A., Michaelis, B.: Real-time capable system for hand
gesture recognition using hidden Markov models in stereo color image sequence. J.
WSCG 16 (2008)
5. Huenerfauth, M., Gale, E., Penly, B., Pillutla, S., Willard, M., Hariharan, D.: Eval-
uation of language feedback methods for student videos of American sign language.
ACM Trans. Access. Comput. (TACCESS) 10(1), 1–30 (2017). [Link]
1145/3046788
86 A. Sharma et al.

6. Kakoty, N.M., Sharma, M.D.: Recognition of sign language alphabets and num-
bers based on hand kinematics using a data glove. Proc. Comput. Sci. 133, 55–
62 (2018). [Link] [Link]
com/science/article/pii/S1877050918309529. International Conference on Robotics
and Smart Manufacturing (RoSMa2018)
7. Liu, L.: Research on logistic regression algorithm of breast cancer diagnose data by
machine learning. In: 2018 International Conference on Robots Intelligent System
(ICRIS), pp. 157–160, May 2018. [Link]
8. Qin, S., Kim, S., Manduchi, R.: Automatic skin and hair masking using fully
convolutional networks. In: 2017 IEEE International Conference on Multimedia
and Expo (ICME), pp. 103–108, July 2017. [Link]
8019339
9. Rao, G.A., Syamala, K., Kishore, P.V.V., Sastry, A.S.C.S.: Deep convolutional neu-
ral networks for sign language recognition. In: 2018 Conference on Signal Processing
And Communication Engineering Systems (SPACES), pp. 194–197, January 2018.
[Link]
10. Reshna, S., Jayaraju, M.: Spotting and recognition of hand gesture for Indian sign
language recognition system with skin segmentation and SVM. In: 2017 Interna-
tional Conference on Wireless Communications, Signal Processing and Networking
(WiSPNET), pp. 386–390, March 2017. [Link]
8299784
11. Sugandhi, Kumar, P., Kaur, S.: Sign language generation system based on Indian
sign language grammar. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 19(4),
1-26 (2020). [Link]
12. Vij, P., Kumar, P.: Mapping Hindi text to Indian sign language with exten-
sion using WordNet. In: Association for Computing Machinery, New York, NY,
USA (2016). [Link] [Link]
2979779.2979817
13. Xie, B., He, X., Li, Y.: RGB-D static gesture recognition based on convolutional
neural network. J. Eng. 2018(16), 1515–1520 (2018). [Link]
2018.8327
14. Zhou, Q., Zhao, Z.: Substation equipment image recognition based on sift feature
matching. In: 2012 5th International Congress on Image and Signal Processing, pp.
1344–1347, October 2012. [Link]
Features Explaining Malnutrition in India:
A Machine Learning Approach to Demographic
and Health Survey Data

Sunny Rajendrasingh Vasu1 , Sangita Khare1(B) , Deepa Gupta1 ,

and Amalendu Jyotishi2
1 Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru,
Amrita Vishwa Vidyapeetham, Bengaluru, India
sunnyvasu8@[Link], {k_sangita,g_deepa}@[Link]
2 School of Development, Azim Premji University, Bengaluru, Karnataka, India

[Link]@[Link]

Abstract. India is one of the severely malnourished countries in the world. Under-
nutrition is the reason for death among two-third of the 1.04 million deaths among
the children under the age of five in the year 2019. Several strategies have been
adopted by the Government of India and state governments to minimize the inci-
dents of malnutrition. However, to make the policies effective, it is important to
understand the key features explaining malnutrition. Analyzing the Indian Demo-
graphic Health Survey Data (IDHS) of the year 2015–2016, this paper attempts
to identify causes of four dimensions of malnutrition namely, Height Age Z-
score (HAZ), Weight Age Z-score (WAZ), Weight Height Z-score (WHZ) and
Body Mass Index (BMI). Using machine learning approach of feature reduction,
the paper identifies ten most important features out of available 1341 features
in the database for each of the four anthropometric parameters of malnutrition.
The features are reduced and ranked using WEKA tool. Results and finding of
this research would provide key policy inputs to address malnutrition and related
mortality among the children under the age five.

Keywords: Malnutrition · HAZ · WAZ · WHZ · BMI · Machine learning

1 Introduction

Under-nutrition refers to deficiencies, excess or imbalance of energy or nutrients in

a person’s daily diet. According to World Health Organization (WHO), malnutrition
results from inability to absorb nutrients from food. Global Nutrition Report of 2018 has
showed that India accommodates more than 46.6 million stunted children under 5 years
of age which is approximately 33% of world’s total, half of which are underweight. On
contrary, a third of wealthiest children are over-nourished. Thus, India has the highest
malnutrition rates followed by Nigeria (13.9 million) and Pakistan (10.7 million) [1].
Low socio-economic status of people leads to lack of diet in terms of quality and quantity.

© Springer Nature Singapore Pte Ltd. 2021

D. Garg et al. (Eds.): IACC 2020, CCIS 1367, pp. 87–99, 2021.
[Link]
88 S. R. Vasu et al.

Under-nourished women are most likely to have unhealthy babies. In addition, under-
nourished individuals can do less productive work leading to low payments and poverty.
The Indian Government has started many programs such as midday meal scheme on
15th August 1995 in order to eradicate malnutrition. Under this scheme, fresh cooked
meals are provided to millions of children in almost all government and government
aided schools. Apart from this, the Government of India has also started Integrated
Child Development Services in 1975 [2], which targets on improving health of mothers
and children under age of 6 by providing health and nutrition education, health services,
supplementary food, and pre-school education. But these programmes and many other
such national as well as state level policies have not been designed considering the
variation of factors responsible for malnutrition in children below five. This is the root
cause of slower rate of decrease in number of deaths of children under age five caused
due to undernutrition.
This paper is categorised into six sections. The literature review is the next section,
IDHS dataset is explained in detail in the third section. Technique used in this analysis
is described in the fourth section, results and findings are discussed in the fifth section
of this paper followed by conclusion in the sixth section.

2 Literature Survey
Several studies on malnutrition have been carried out in past decades using different
types of datasets and methodology amongst which most commonly used dataset is Demo-
graphic Health Survey Data. Demographic Health Survey is conducted in every 10 years.
Although, many studies have been done using this dataset in the past, but very few of
them have used machine learning techniques for their analysis. Others have used either
analytical or statistical approach. Following are some of the works carried out in the
field of analysing increasing rate of malnutrition.
Nair et al. [3], has characterised malnutrition causes for states of India using IDHS
2005–2006 dataset. With the help of K-means clustering analysis states were divided
according to different features. Synthetic Minority Oversampling Technique was used for
pre-processing the dataset. For attribute selection Adaboost and Ranker algorithm were
used. The analysis resulted in generating seven clusters of HAZ, four clusters of WAZ, six
clusters of WHZ and five clusters of BMI which were the four anthropometric measures
used. Later in the research, using Ranker algorithm, the features were ranked in which,
the top rank features, those having highest variance amongst all four anthropometric
parameters, were found to be mainly responsible for malnutrition. These features were
considered important for policy makers as these would be helpful for improving and
creating new policies for different regions of India to eradicate malnutrition from its
root [4].
Many studies have used data mining techniques like decision tree and clustering.
In this work [5], few patterns were found - like a child can be malnourished even if
safe water source is used and there are 87% chances of malnutrition in the child if she
acquires a major disease and does not use good toilet facility. Another research developed
a model which can help policy designers and health care facilitators to identify children
under risk. Factors which were found to be the major contributors in malnutrition were
mother’s education, child age, region, wealth index and residence [6].
Features Explaining Malnutrition in India 89

Other studies were done using statistical analysis methods such as ANOVA, Case-
based Reasoning (CBR), Euclidean distance, ID3 algorithm, Probabilistic Bayes the-
ory and logistic regression [7–11]. To prove that malting technique produce phytase
enzyme, least significant difference techniques on zinc, iron and phytic acid was used.
Zinc is an essential metalloenzyme and it widely helps in reducing stunting, wasting and
improves brain development in infants [12]. Using multivariate logistic regression on
Bangladesh DHS dataset and environmental indicator, Normalized Difference Vegeta-
tion Index (NDVI), trends of nutrition security in foods of Ganges Brahmaputra Meghna
Delta have been found for year 2007 and 2011. Results showed, with the increase of
NDVI wasting probability decreases as the food consumption of medium income group
varies with the variation in vegetation due to change in climate [13]. Results of statis-
tical analysis on Pakistan DHS show secondary or higher education of parents, health
facilities and rich children have less tendency of becoming stunted whereas, children of
rural residence having no toilet facilities, smaller size during birth and older mother are
more likely to be stunted [14].
Poverty have strong implications on malnutrition, this work [15] used Indian Health
Development Survey (IHDS) of year 2012 to find the factors responsible for absconding
and suffering from poverty. For this purpose, machine learning techniques have been
applied such as info-gain and random forest classifier. The work found that livestock
such as goat plays a vital role in explaining poverty. Also, caste, education and rural to
urban migration are major factors in falling to poverty whereas, toilet and financial sector
are features of escaping poverty. Another research was conducted on infant mortality
rate by finding the influencing factors such as national income and fertility rate, etc.
using data from [Link] [16]. Similarly, several machine learning techniques are
deployed to identify probable causes of malnutrition [17–22].
From literature survey it is observed that, strategies deployed were based on country.
There are many different techniques that were used to identify root cause of malnutrition
and how it can be dealt effectively. Features themselves are divided into four classes of
anthropometric parameters which are also recognized by WHO, they are HAZ, WAZ,
WHZ and BMI. Identifying features for these anthropometric parameters is very impor-
tant. Selecting most important features of all four anthropometric parameters HAZ,
WAZ, WHZ and BMI from IDHS data, finding major impacting features using Principal
Component Evaluator and ranking them with Ranker Algorithm are the main objec-
tives of this paper. The features thus identified will help policy makers in improving the
existing policies and address the important causes of malnutrition.

3 Data Source

Dataset used in this paper is IDHS data of year 2015–2016. The DHS program collects
information on health and population in 90 developing countries, one of which is India.
The data is categorized in fields like birth record, children’s record, couples record,
individual’s record and men’s record etc. Amongst all, birth record data set is employed
for this purpose. Information of child such as age, sex, HAZ, WAZ, WHZ and BMI, etc.
are recorded in this dataset [4]. The mother of the child is also interviewed to collect
information about both mother and child health status such as type of place of residence,
90 S. R. Vasu et al.

number of children under five in household, births in last five years, gave child pumpkin,
carrots, squash, received polio vaccine, number of tetanus injections before pregnancy,
during pregnancy, given or bought iron tablets, etc. Birth record of year 2015–2016
contains 1315617 instances of 1341 features of all states and union territories of India.

4 Methodology
Methodology used in this analysis is shown in the schematic diagram Fig. 1, which begins
with data collection and cleaning of irrelevant information from the dataset, followed by
selection of useful features of all four anthropometric parameters, determining the most
important malnutrition impacting variables and ranking them using WEKA tool.

Fig. 1. Schematic diagram of methodology used in the analysis

4.1 Data Preprocessing

Before using the dataset for the analysis, it needs to be cleaned. Out of total 1341
features, some are not useful for the study. Few of them are repeated, few others have
no value and instances of the remaining features have constant value. All these features
Features Explaining Malnutrition in India 91

are eliminated before analysis which reduced the variables to 745. On removing the
duplicate instances using distinct method of dplyr package, total observations decrease
to 639916. The remaining useful data, has both numeric as well as categorical data. For
selection of features using Boruta Algorithm, the data need to be converted into numeric
type. For this purpose, all the categorical variable instances are encoded based on factor
levels of the feature whereas for numeric variables having NA values, the NA values are
replaced by mean of the column.

4.2 Feature Selection

After cleaning the data, on the numeric birth record dataset, Boruta Algorithm is applied
on all four dependent variables i.e. HAZ, WAZ, WHZ and BMI for selecting features
having highest mean importance as this is most important processing step of Data mining.
In Comparison to most of the traditional methods of feature selection, Boruta captures
all the features relevant to the outcome variable, which are in same circumstances. The
features captured are either strongly or weakly related to the decision variable unlike
other random forest methods.
The Boruta algorithm is a wrapper around Random Forest Classification Algorithm.
It generates shadow features by duplicating the dataset and shuffling values of each
column thereafter training the random forest classifier with the duplicate dataset. After
training, it compares the z-score of each original feature with that of its shadow feature. If
the z-score (importance) of original feature is found to be much higher than its shadow, it
implies that the feature is confirmed to be important else either left as decision pending
or rejected. In the plot as shown in Fig. 2, red colour indicates those features which
are rejected, green depicts confirmed features and features for which decision is left
pending are depicted with yellow colour. Those attributes whose mismatch with shadow
attributes is much higher than 50% and are close to 100% for HAZ, are confirmed as
important such as M6. On the other hand, those attributes whose mismatch is much
lower than 50% with its shadow attribute and are close to 0%, are rejected as they are
unimportant such as V204. Remaining attributes with mismatch percentage nearby 50
are left as tentative such as M1. Similarly for WAZ, M19 is confirmed as important
and M47 is left as decision pending as shown in Fig. 3. For WHZ, M6 is confirmed as
important and M3A is kept tentative as shown in Fig. 4. For BMI, M19 is confirmed as
important and M7 is left pending for decision as shown in Fig. 5. For all 3 remaining
attributes V499E is rejected.

4.3 Feature Reduction

Boruta iterations reduced the dataset down to 153 as for HAZ 18 attributes were con-
firmed to be important, 80 pending for decision and all others rejected. Similarly, 47
attributes for WAZ, 43 attributes for WHZ and 45 attributes for BMI were selected.
Applying the cut-off method on attribute stats of each anthropometric parameters i.e.
HAZ, WAZ, WHZ and BMI generated by Boruta Algorithm, 10 attributes for each
parameter are selected as the most important factors based on highest mean importance
as described in Table 1, Table 2, Table 3, and Table 4 respectively. Features are not listed
in the tables according to their ranking.
92 S. R. Vasu et al.

Fig. 2. Plot of Boruta algorithm result for HAZ (Color figure online)

Fig. 3. Plot of Boruta algorithm result for WAZ

Features Explaining Malnutrition in India 93

Fig. 4. Plot of Boruta algorithm result for WHZ

Fig. 5. Plot of Boruta algorithm result for BMI

The attributes which are found common in all the four anthropometric parame-
ters are ‘Had diarrhoea recently’, ‘Taking iron pills, sprinkles or syrup’, ‘Assistance:
DAI/Traditional Birth Attendant’, ‘Place received most vaccinations’, and ‘Women’s
age in year’. Whereas, those which are unique are ‘Daughters elsewhere’, ‘Delivery by
94 S. R. Vasu et al.

Table 1. Most important features of HAZ

Feature code Features of HAZ

H11 Had diarrhoea recently
H42 Taking iron pills, sprinkles or syrup
M39A Did eat any solid, semi-solid or soft foods yesterday
M3A Assistance doctor
M3B Assistance ANM/nurse/mid-wife/LHV
M3G Assistance Dai/Traditional birth attendant
S515 Place received most vaccination
V137 Number of children five and under in household
V205 Daughters elsewhere
V477A Women’s age in years

Table 2. Most important features of WAZ

Feature code Features of WAZ

H11 Had diarrhoea recently
H42 Taking iron pills, sprinkles or syrup
M39A Did eat any solid, semi-solid or soft foods yesterday
M3A Assistance doctor
M3G Assistance Dai/Traditional birth attendant
S515 Place received most vaccination
V477A Women’s age in years
M10 Type of mosquito bed nets child slept under IPC
M38 Drank from bottle with nipple yesterday
M3H Assistance friend/relative

caesarean section’, and ‘Haemoglobin level (g/dl - 1 decimal)’. The common attributes
have higher probability of being the main cause of malnutrition as compared to the
unique ones.

4.4 Determining Impacting Variables of Malnutrition

After finding the 10 most important features of all four anthropometric parameter, HAZ,
WAZ, WHZ and BMI the next step is to find the ranking of the factors that are mainly
responsible for malnutrition. For this purpose, WEKA tool is used in which, for attribute
selection, Principal Component evaluator is used with Ranker algorithm to get ranking
Features Explaining Malnutrition in India 95

Table 3. Most important features of WHZ

Feature code Features of WHZ

H11 Had diarrhoea recently
H42 Taking iron pills, sprinkles or syrup
M3B Assistance ANM/nurse/mid-wife/LHV
M3G Assistance Dai/Traditional birth attendant
S515 Place received most vaccination
V137 Number of children five and under in household
V477A Women’s age in years
M10 Type of mosquito bed nets child slept under IPC
M38 Drank from bottle with nipple yesterday
M3H Assistance friend/relative

Table 4. Most important features of BMI

Feature code Features of BMI

H11 Had diarrhoea recently
H42 Taking iron pills, sprinkles or syrup
M3B Assistance ANM/nurse/mid-wife/LHV
M3G Assistance Dai/Traditional birth attendant
M38 Drank from bottle with nipple yesterday
S515 Place received most vaccination
V137 Number of children five and under in household
V477A Women’s age in years
M17 Delivery by caesarean section
V453 Haemoglobin level

of features. Former performs Principal Component Analysis (PCA) on data for dimen-
sionality reduction by choosing enough eigen vectors to account for some percentage of
variance in the original data whereas later rank the principal component features.
PCA reduces the dimensionality of the dataset having many interrelated variables,
retaining the variation of data as much as possible. The data set then contains variables
arranged according to decreasing variation amongst all. The first few of them which are
ordered and uncorrelated are called principal component and all others as components.
PCA finds the correlation pattern among the original variables thereafter substituting a
new component in place of group of attributes which were correlated (Table 5).
96 S. R. Vasu et al.

Table 5. Ranking of features of all anthropometric parameters determined using WEKA tool

Rank HAZ WAZ WHZ BMI

1 H11 M10 M10 H11
2 H42 M38 H11 M38
3 M3A H11 M38 H42
4 M3G H42 H42 M3B
5 M3B M3A M3B M3G
6 S515 M3G M3G M17
7 V447A M3H M3H S515
8 V137 S515 S515 V137
9 M39A V447A V137 V447A
10 V205 M39A V447A V453

5 Discussion
Using Principal Component Analysis along with Ranker algorithm, features were
selected and ranked based on their variance across all the four anthropometric param-
eters. The features having highest variation are identified as the most impactful fea-
tures explaining malnutrition. Three highest ranking features of HAZ are had diarrhoea
recently, taking iron pills, sprinkles or syrup and did eat any solid, semi-solid or soft
food yesterday. Similarly, for WAZ type of mosquito bed nets child slept under IPC,
drank from bottle with nipple yesterday and had diarrhoea recently are the most varying
features of respective anthropometric parameter.
From the analysis on all four anthropometric parameters namely, HAZ, WAZ, WHZ,
and BMI it was identified that 6 features are common across all the parameters. These are,
“Had diarrhoea recently”, “Taking Iron pills, sprinkles or syrup”, “Assistance of Dai”,
“Received most vaccination”, “Women’s age” and “Type of mosquito bed nets child
slept under IPC”. These variables can be used for improving or making new policies.
Three features are identified across three parameters, these are “Assistance from ANM”,
“Drank from bottle with nipple” and “Number of Children under five in the household”.
Besides, there are four features found across two parameters and there were only three
features unique to any of the parameters. BMI did not have any unique feature.
Considering only the features which are present in all the four or at least three parame-
ters different characteristics explaining malnutrition can be identified. These characteris-
tics can be classified into broadly three categories. First category is related to ‘availability
and awareness’ of safe drinking water and iron pills. It is irony of the country that even
after seventy plus years of independence a large section of the society is deprived from
availing safe drinking water. These problems are becoming even more acute in urban
areas especially in the slums apart from remote terrains. It is not surprising that iron
deficiency among the pregnant and lactating mother is one of the most important cause
of malnutrition among the mothers and children. An effective reach out in rural as well
Features Explaining Malnutrition in India 97

as in urban areas to these mothers would be helpful in addressing such deficiencies. Easy
availability and accessibility of iron rich food like fish, drumstick etc., would go a long
way in addressing iron deficiency among mothers and children. It is equally important
to invest and develop food products that can be easily stored, easily available at a very
low price would go a long way in addressing iron deficiencies. A second category is “ac-
cess to the services of ANM and trained Dais”. Investment in public health and public
health services especially creating a large pool of trained paramedical services would
be effective in addressing not only malnutrition for children but also for mothers as well
as general well-being of the mass in the need of healthcare services. Similarly access to
free vaccinations in the vicinity is an important feature to address malnutrition. A third
category is related to ‘awareness and behavioural and social change’. Early marriage
among the women and not having sufficient gap between the births are identified as two
important features of malnutrition. Investing in education, creating awareness through
the local governance structure as well as increasing income level of the households have
been identified as important factors in the literature that can have positive impact on
the behavioural as well as social change. These would require persistent investment and
action at the ground.

6 Conclusion

Malnutrition is a serious problem everywhere in the world. Severity of this problem

increases in developing countries like India due to lack of public health service sup-
port, infrastructure and investment. Despite implementation of several policies designed
by Government of India, policy makes have failed to tackle malnutrition as these poli-
cies have not been appropriately and systematically targeted. In this work causes of
malnutrition have been identified, using 2015–2016 IDHS dataset.
Further studies can go deeper into clustering the features similar across various states
to target malnutrition effectively. Besides, comparison of different IDHS data over the
years would reveal if there is any continuity or changes in the root causes of malnutrition
in India over the years.

References
1. The Economic Times. [Link]
india-has-one-third-of-worlds-stunted-children-global-nutrition-report/articleshow/668
[Link]?from=mdr. Accessed 02 June 2020
2. Malnutrition in India. [Link]
3. Anilkumar, N.A., Gupta, D., Khare, S., Gopalkrishna, D. M., Jyotishi, A.: Characteristics and
causes of malnutrition across Indian states: a cluster analysis based on Indian demographic
and health survey data. In: 2017 International Conference on Advances in Computing, Com-
munications and Informatics (ICACCI), Udupi, pp. 2115–2123 (2017). [Link]
1109/ICACCI.2017.8126158.
4. The DHS Program: Demographic and Health Surveys. [Link] Accessed 23
June 2020
98 S. R. Vasu et al.

5. Ariyadasa, S.N., Munasinghe, L.K., Senanayake, S.H.D., Fernando, N.A.S.: Data mining
approach to minimize child malnutrition in developing countries. In: International Conference
on Advances in ICT for Emerging Regions (ICTer2012), Colombo, p. 225 (2012). [Link]
org/10.1109/ICTer.2012.6423030.
6. Markos, Z., Agide, F.: Predicting under nutrition status of under-five children using data
mining techniques: the case of 2011 ethiopian demographic and health survey. J. Health Med.
Inf. 5, 152 (2014). [Link]
7. Arun, C., Khare, S., Gupta, D., Jyotishi, A.: Influence of health service infrastructure on
the infant mortality rate: an econometric analysis of indian states. In: Nagabhushan, T.N.,
Aradhya, V.N.M., Jagadeesh, P., Shukla, S., Chayadevi, M.L. (eds.) CCIP 2017. CCIS, vol.
801, pp. 81–92. Springer, Singapore (2018). [Link]
8. Jeyaseelan, L., Lakshman, M.: Risk factors for malnutrition in South Indian children. J.
Biosoc. Sci. 29(1), 93–100 (1997). [Link]
9. Fenske, N., Kneib, T., Hothorn, T.: Identifying risk factors for severe childhood malnutrition
by boosting additive quantile regression. J. Am. Stat. Assoc. 106, 494–510 (2011). https://
[Link]/10.1198/jasa.2011.ap09272
10. Mosley, W.H., Chen, L.C.: An analytical framework for the study of child survival in develop-
ing countries. Populat. Dev. Rev. 10, 25–45 (1984). [Link]/stable/2807954. Accessed
14 Aug 2020
11. Hanmer, L., Lensink, R., White, H.: Infant and child mortality in developing countries:
analysing the data for robust determinants. J. Dev. Stud. 40(1), 101–118 (2003). https://
[Link]/10.1080/00220380412331293687
12. Ana, I.M., Udota, H.I.J., Udoakah, Y.N.: Malting technology in the development of safe
and sustainable complementary composite food from cereals and legumes. In: IEEE Global
Humanitarian Technology Conference (GHTC 2014), San Jose, CA, pp. 140–144 (2014).
[Link]
13. Van Soesbergen, A., Nilsen, K., Burgess, N., Szabo, S., Matthews, Z.: Food and Nutrition
Security Trends and Challenges in the Ganges Brahmaputra Meghna (GBM) Delta. Elem Sci
Anth. 5, 56 (2017). [Link]
14. Abbasi, S., Mahmood, H., Zaman, A., Farooq, B., Malik, A., et al.: Indicators of malnutrition
in under 5 Pakistani children: a DHS data secondary analysis. J. Med. Res. Health Educ. 2(3),
12 (2018)
15. S. Narendranath, S. Khare, Gupta, D., Jyotishi, A.: Characteristics of ‘escaping’ and ‘falling
into’ poverty in India: an analysis of IHDS panel data using machine learning approach. In:
2018 International Conference on Advances in Computing, Communications and Informat-
ics (ICACCI), Bangalore, pp. 1391–1397 (2018). [Link]
4571.
16. Suriyakala, V., Deepika, M.G., Amalendu, J., Deepa, G.: Factors affecting infant mortality
rate in india: an analysis of Indian states. In: Corchado Rodriguez, J., Mitra, S., Thampi,
S., El-Alfy, E.S. (eds.) Intelligent Systems Technologies and Applications 2016, ISTA 2016.
Advances in Intelligent Systems and Computing, vol. 530, pp. 707–719. Springer, Cham
(2016). [Link]
17. Shyam Sundar, K., Khare, S., Gupta, D., Jyotishi, A.: Analysis of fuel consumption character-
istics: insights from the Indian human development survey using machine learning techniques.
In: Raju, K.S., Govardhan, A., Rani, B.P., Sridevi, R., Murty, M.R. (eds.) Proceedings of the
Third International Conference on Computational Intelligence and Informatics. AISC, vol.
1090, pp. 349–359. Springer, Singapore (2020). [Link]
7_30
18. Khare, S., Kavyashree, S., Gupta, D., Jyotishi, A.: Investigation of nutritional status of children
based on machine learning techniques using Indian demographic and health survey data. Proc.
Comput. Sci. 115, 338–349 (2017). [Link]
Features Explaining Malnutrition in India 99

19. Khare, S., Gupta, D., Prabhavathi, K., Deepika, M.G., Jyotishi, A.: Health and nutritional
status of children: survey, challenges and directions. In: Nagabhushan, T.N., Aradhya, V.N.M.,
Jagadeesh, P., Shukla, S., M. L., C. (eds.) CCIP 2017. CCIS, vol. 801, pp. 93–104. Springer,
Singapore (2018). [Link]
20. Sharma, V., Sharma, V., Khan, A., et al.: Malnutrition, health and the role of machine learning
in clinical setting. Front Nutr. 7, 44 (2020). [Link]
21. Giabbanelli, P., Adams, J.: Identifying small groups of foods that can predict achievement
of key dietary recommendations. Data mining of the UK national diet and nutrition survey.
Public Health Nutr. 1, 1–9 (2016). [Link]
22. Hearty, A., Gibney, M.: Analysis of meal patterns with the use of supervised data mining
techniques - Artificial neural networks and decision trees. Am. J. Clin. Nutr. 88, 1632–1642
(2009). [Link]
Surveillance System for Monitoring Social
Distance

Sahil Jethani1 , Ekansh Jain2 , Irene Serah Thomas3 , Harshitha Pechetti4 ,

Bhavya Pareek5 , Priyanka Gupta6(B) , Venkataramana Veeramsetty7 ,
and Gaurav Singal8
1
USICT, GGSIPU, New Delhi, India
[Link]@[Link]
2
IIT, Kharagpur, Kharagpur, India
jainekansh00@[Link]
3
SCE, Thiruvananthapuram, Kerala, India
thomasjessy2001@[Link]
4
SVECW, Bhimavaram, Andhra Pradesh, India
harshithashyam99@[Link]
5
LNMIIT, Jaipur, India
[Link]@[Link]
6
JECRC Foundation, Jaipur, India
priyanka123gg@[Link]
7
Center for Artiﬁcial Intelligence and Deep Learning, S R Engineering College,
Warangal, Telangana State, India
[Link]@[Link]
8
Bennett University, Greater Noida, India
gauravsingal789@[Link]

Abstract. In the light of recent events, an epidemic - COVID-19 which

took the world by surprise and continues to grow day by day. This paper
describes an idea to control the spread of disease by monitoring Social
Distancing. As of now from where we stand, the only way to avoid further
spreading is to maintain proper social distance. Combining the advanced
detection algorithms such as SSD, YOLO v4, and Faster-RCNN along
with pedestrian datasets we reached the desired conclusion of calculating
the distance between two detected persons in a video and identifying
whether the social distancing norm is followed or not. This method can be
implemented in CCTV’s, UAV’s, and on any other surveillance system.
The rapid advancements in technologies led to more precise and accurate
values.

Keywords: Pedestrian detection · Distance calculation · UAV · Deep

learning · COVID19 · Surveillance

1 Introduction
Surveillance devices like drones are one of the most wonderful and precious
advancements of technology [16]. Science and technology are developing day by
c Springer Nature Singapore Pte Ltd. 2021
D. Garg et al. (Eds.): IACC 2020, CCIS 1367, pp. 100–112, 2021.
[Link]
Social Distance Monitoring System 101

day. A drone or UAV is an example of this. It can be remotely controlled or ﬂy

autonomously. In the unmanned apparatuses, lengthening of flight is a crucial
factor [3]. It has a large number of applications like disaster management, wild-
fire tracking, cloud monitoring, and is also used for surveillance and monitoring.
This paper is written to analyze and explore existing studies about drone moni-
toring so that a better and more reliable model can be obtained [7]. From where
we stand UAVs and CCTVs are playing a significant role in not only offices,
industries, homes, etc. but also in streets too. Their usage will also continue to
grow in the nearby future.
Our world at present is alarmingly threatened by a deadly disease known
as COVID-19. Even though science and technology have developed up to this
extent, unfortunately, none can find a cure for this epidemic. For now, the most
effective method to somehow prevent further spreading of this worldwide epi-
demic is by maintaining a certain distance between two individuals(say from 1
to 2 m) and this is termed as Social Distancing. So one of our main objectives
should be to use UAVs, CCTVs, or any other surveillance mechanism to detect
the real-time distance between two people.
In these incredulous times, it is not surprising that the COVID-19 pandemic
and the rise of detection technologies are now intertwined, with the latter being
harnessed as part of the fightback for the former. With the help of more efficient
and advanced algorithms and freedom of customization of these systems, we
can achieve the desired result. The effectiveness of this project lies where there
is a chance of a large group of people gathering. In such cases, sometimes, it
becomes difficult for an individual to maintain social distance between people.
This situation is also true in bus stands, trains, streets, pilgrims, etc. These are
the situations in which these surveillance systems come into action [10].
Person detection algorithms have developed rapidly for the past few years.
These detection algorithms have been classified as two: One-stage approaches
and Two-stage approaches. The two-stage approach shows good accuracy as
compared to a one-stage approach, but more attraction is towards a one-stage
approach because of its former computing efficiency. It consists of a single net-
work of border boxes and class probabilities with an evaluation similar to RPN
and SSD [7].
As we mentioned above, this kind of approach might help us in controlling
the spread of COVID-19 to some extent. This is a challenging task because
this disease can spread at any moment and in all sorts of places but with the
advancement of technology and research, we can deal with this complex situation
to a considerable extent [14]. There is a high risk where more humans might
indulge in the case, so using surveillance devices can be a preferable choice in
monitoring social distance [14].
In our research, we calculated the distance between two detected persons by
converting the current perspective vision to bird’s eye vision, if they are not
maintaining the required threshold distance, a red bounding box along with a
red line is displayed over them. Hence a graph can be made which shows the
risk at a particular time in a given area. By doing so, vulnerable places can
102 S. Jethani et al.

be identified and thereby giving out awareness to the public. To reduce human
efforts and to make sure everyone follows the social distancing concept, this
work may seem to be quite promising. Further paper is arranged as, in Sect. 2,
the literature of related work is presented, in Sect. 3, the methodology of person
detection technique and monitoring of distance is shown. In Sect. 4, performance
evaluation is done and the final section gives the conclusion and future work of
our work.

2 Literature Review

Surveillance devices such as UAV’s have many real-world applications, including

vision methods such as object detection. Surveillance Devices require memory
capacity, computationally low cost, and fast solutions for Object detection [8].
Pedestrian identification has become an important task these days, especially
in vision-based and aerial surveillance systems. Over the past few months, the
role of computer vision in maintaining social distancing has been underlined
to prevent COVID-19. Various methods are available for achieving the task of
pedestrian detection in images and videos. Pedestrian detection is the process of
localizing all objects considered to be individuals in the image or video stream.
The initial process of the pedestrian detection was to capture the area of
interest from the image, describe the areas with descriptors, and then classify
the areas as human or non-human [15]. Based on CNN, to solve this problem
many methods pro-posed. These methods are classified into two groups, first
is a two-stage approach [5,13,20], and second is one stage approaches [17,18].
Two parts are there in this two-stage approach. First, the candidate produces
proposals for a set of objects, and the next part uses CNN to calculate the object
region on stage. Good accuracy has been demonstrated in two-stage detection
methods in MS COCO datasets [12]. Other than two-stage approaches, one stage
approaches attract more attention, largely by the computational efficiency of
the former. In an evaluation, the one-step approach consists of a single network
that evaluates boundary boxes directly from the full image. To cover objects at
different scales, this approach finds objects similar to RPNs and SSDs and with
distinct resolutions use feature maps. YOLO calculates the feature map and to
make predictions on a set of fixed areas uses an integrated layer. On the other
hand, YOLO v2 includes batch normalization rather than integrated layers to
assess border boxes to enhance accuracy. YOLO v3 for capturing characteristics
and making predictions based on different feature maps uses deep networks.
The main advantage of YOLO v3 is its high ability to detect objects and do
calculations so that the detection speed can receive real-time estimates.
In the paper [21], the authors combined the DPPN architecture to improve
the underlying class imbalance problem with the focal loss function embedded
in the object detection model and also the ability to detect images with varying
degrees. They considered both networks to be the backbone of their detection
model, RESNET and MobileNet. Although their model combined with that pro-
vides better results with detection accuracy, the mobile-net combination comes
Social Distance Monitoring System 103

with call time speed without recognition accuracy. Their model is compared with
RetinNet RestNet50 and HAL-RetinNet.
In the paper [4], the authors demonstrate three collaborative-based DL appli-
cations for tracking and detecting objects and assessment of distance. The object
edition is a developed method, it’s high in accuracy and also the real-time imag-
inary limitations of identifying the object. They used SSD and YOLO V3 algo-
rithms on object detection to know which algorithm is more suitable. YOLO
V3 is higher when compared to SSD. The MonoDepth algorithm provides an
asymmetric map as output. They verified policy with different datasets such as
Citiescope and Kitty, also in the RSIG LBC vehicle on Row City Center Traffic
Road in real-time. They confirmed under the railway dataset of the Tramway
Rouen. The new method presented is based on SSD to analyze the behavior of
objects such as pedestrians or vehicles. With the SSD modified algorithm, after
identifying an object they assessed future status by including its direction of
motion, for pedestrians willing to cross the road, for not willing to cross the road,
etc. SSD and YOLO V3 algorithms are used for detecting and tracking objects.
A large and appropriate dataset is very important to optimize their performance.
Changing the detection classes does not yield a significant improvement.
In paper [1] provides a comparison based on time, accuracy, and parameter
values of different algorithms for identifying and localizing objects with different
dimensions of the input image. In this, they have identified a new method to
improve speed for single stage models and for not losing accuracy. Final results
declare that Tiny Yolo V3 improves detection speed, confirming the accurate
result.
Speed and accuracy are important parameters for evaluating pedestrian
detection performance. Performance is being squandered in different situations
because the experiment does not always take place in the same condition [11]. Of
course, many parameters can vary from one experience to another. By analyzing
the characteristics for object detection three popular models are there, Single
Shot Detection [13], YOLO [19] and F-RCNN [9]. F-RCNN is highly accurate
compared to SSD and YOLO v3, but, it is slow. If high-quality accuracy needs
to be achieved, RCNN is the fastest solution. But, it is not the fastest approach.
If speed is important, then YOLO v3 is the best approach. If we want good
accuracy and good speed at the same time, SSD is a good solution. At the same
time, YOLO V4 is a good solution, as it is a fast approach, and accuracy is
similar to faster-RCNN [22].

3 Methodology

The two major steps involved in monitoring social distancing are pedestrian
detection and distance calculation. We get the video input from the surveillance
system and convert the video input into image sequences. The model runs the
detection on these images and then distance calculation is done. After we know
the people breaking the social distancing threshold, we mark them with a red
bounding box as shown in Fig. 1. This section is divided into two sub-sections. In
104 S. Jethani et al.

the ﬁrst subsection, we will discuss the models we used for pedestrian detection
and in the other sub-section, we talk about the approaches we used to calculate
the distance between each pedestrian.

Fig. 1. The ﬂow chart for the work ﬂow of monitoring social distancing

3.1 Pedestrian Detection

There are various kinds of object detection models available. Since the models
need to be deployed in surveillance systems like Unmanned Aerial Vehicle (UAV),
CCTV Camera, etc. for pedestrian detection we had to select a model that does
not require high processing power. Also, we need our model to be fast enough to
do all the detection in real-time. Hence, we had to narrow down our selection of
models based on the speed of detection and computational power required. The
model which we selected are:
1. SSD (Single Shot Detector) + MobileNet (version 2)
2. SSD (Single Shot Detector) + Inception (version 2)
3. YOLO version 4 (You Only Look Once)
4. Tiny YOLO version 3
5. Faster RCNN
6. RFCN
All the above models were pre-trained on COCO Dataset and as per the
requirements of our detection process, we ﬁltered out the detection for only
Social Distance Monitoring System 105

the class “person” from the COCO Dataset with 66808 samples. Further, we
calculated various parameters such as confusion matrix, mAP, and the time
required to do the detection for each model. These parameters give an under-
standing and help in diﬀerentiating and selection among the various pre-selected
models. The hyper parameters used in training of SSD+Mobilenet(SSD+M),
SSD+Inception(SSD+I), Faster RCNN(FRCNN), RFCN, YOLOv4 and Tiny
YOLOv3 are listed in Table 1.

Table 1. Hyper parameters used in training of diﬀerent models

Parameters SSD+M SSD+I FRCNN RFCN YOLOv4 Tiny YOLOv3

Batch size 64 64 256 64 64 64
Momentum 0.9 0.9 0.9 0.89 0.949 0.9
Weight decay 0.0003 0.0004 0.0001 0.0003 0.0005 0.0005
Learning rate 0.004 0.004 0.001 0.003 0.0013 0.00261

3.2 Distance Calculation

Once we had the detection the next part was to calculate the distance between
each person. To calculate the distance we used two approaches:

Using the Euclidean Distance Formula. (This approach is considered to

calculate the relative error for the second approach) In this approach, we started
by calculating the bottom centre coordinates of each of the bounding boxes.
These coordinates are taken so that the height constraint is eliminated from
detection as well as all the points can be considered to lie on the same plane in
real life. We then had to calculate the distance between each of those coordinates.
For that we used the Euclidean Distance formula:

D(p, q) = (px − qx )2 + (py − qy )2 (1)

where p(px , py ) and q(qx , qy ) are the bottom centre point of two bounding boxes
respectively and the unit of the distance will be “pixel”. For conversion of units
from pixel to centimetres (cm), we need to know how many pixels in the hori-
zontal and vertical direction equates to certain ground truth distance. For that,
we selected four points as shown in Fig. 2. Points 1 and 2 constitute a horizontal
distance of 490 cm and Points 3 and 4 constitute vertical distance of 330 cm (the
ground truth distance was calculated with the help of Google Maps [6]). We
then calculated the distance(in pixels) between Point 1 and Point 2 and simi-
larly for Point 3 and Point 4 using Euclidean Distance Formula in the given input
frame. Let’s name these distances as “distance w” and “distance h” respectively.
Now we consider two coordinates on the image, say P (Px , Py ) and Q(Qx , Qy ) to
calculate the distance between them in centimetres following process was done:
106 S. Jethani et al.

Fig. 2. Point 1 to Point 4 used for conversion of units from pixel to cm.

(Py − Qy )
Height = × 490 (2)
Distance h
(Px − Qx )
W idth = × 330 (3)
Distance w

Distance = (Height)2 + (W idth)2 (4)
The Distance Calculated here will have the units in centimetres.
The next step was to mark the people who were not following the social
distancing protocols. As the social distancing guidelines suggest a minimum of
6 ft (182 cm) distance between two people, we set a threshold distance of 182 cm
and whosoever falls below this distance threshold was marked by drawing a red
bounding box around them. Also, we drew red lines between those people to
show with whom they were at proximity.

Conversion from Perspective View to Bird’s Eye View. The video input
from CCTV, Drone or any other surveillance system can be in any random
perspective view, we needed a method where we could calculate distance as
accurately as possible in any view. In the method that we came up with, we
converted the perspective view into a bird’s eye view. The surveillance system
has a monocular vision and it is not possible to calculate the distance between the
detected persons from that view. By selecting four points from the image(Region
of Interest) we can then map the entire image to a bird’s eye view perspective
using a perspective transformation matrix.
For the conversion and mapping from Perspective View to Bird’s Eye View,
we need to calculate transformation matrix (Msd ). Let’s assume we have the
point P (x, y) in the perspective view image and want to locate the same point in
the bird’s eye view, say Q(u, v) as shown in Fig. 3. If we have the transformation
Social Distance Monitoring System 107

Fig. 3. The selected points from the perspective image and the four corners of the
rectangle where we map the bird’s eye view.

matrix (Msd ) we can ﬁnd it easily using the following equation:

⎡ ⎤ ⎡ ⎤ ⎡ ⎤
u ab c x
Q = Msd P ⇒ ⎣v ⎦ = ⎣d e f ⎦ ⎣y ⎦ (5)
q gh i w

Where Q (u , v , q) and P (x , y , w) are the homogeneous coordinates of the

point P and Q, hence Q(u, v) = ( uq , vq ) and P (x, y) = ( xw , yw ). Now to ﬁnd
the transformation matrix we selected four points in the perspective image and
also selected four corners of the rectangle where we wish to map the bird’s eye
view. Let points P (xk , yk ) and Q(uk , vk ) where k = 0, 1, 2, 3 be the selected
points in the perspective and Bird’s eye view respectively. We assume in the
transformation matrix i to be equal to 1, then we need to ﬁnd out elements from
‘a’ to ‘h’. By solving the Eq. 5, we get the following:
(axk + byk + c)
uk = ⇒ uk = axk + byk + c − gk uk − hyk uk
(gk + hyk + 1)

(dxk + eyk + f )
vk = ⇒ vk = dxk + eyk + f − gk uk − hyk uk
(gk + hyk + 1)
For k = 0, 1, 2, 3 this can written as 8 × 8 system:
⎡ ⎤⎡ ⎤ ⎡ ⎤
x0 y0 1 0 0 0 −x0 u0 −y0 u0 a u0
⎢x1 y1 1 0 0 0 −x1 u1 −y1 u1 ⎥ ⎢ b ⎥ ⎢u1 ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢x2 y2 1 0 0 0 −x2 u2 −y2 u2 ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ c ⎥ ⎢u2 ⎥
⎢x3 y0 3 1 0 0 0 −x3 u3 −y3 u3 ⎥ ⎢
⎥ ⎥ ⎢ ⎥
⎢ ⎢ d ⎥ = ⎢u3 ⎥
⎢0 0 0 x0 y0 1 −x0 v0 −y0 v0 ⎥ ⎢ e ⎥
⎥ ⎢ ⎢ ⎥
⎢ ⎥ ⎢ v0 ⎥
⎢0 0 0 x1 y1 1 −x1 v1 −y1 v1 ⎥ ⎢f ⎥ ⎢
⎥ ⎢ ⎥ ⎥
⎢ ⎢ v1 ⎥
⎣0 0 0 x2 y2 1 −x2 v2 −y2 v2 ⎦ ⎣ g ⎦ ⎣ v2 ⎦
0 0 0 x3 y3 1 −x3 v3 −y3 v3 h v3
108 S. Jethani et al.

Computing this we can calculate all the elements from “a” to “h” and get
the transformation matrix (Msd ). Once we have the transformation matrix we
can apply it to the perspective image to map the entire image into the bird’s
eye view image. After this we follow the same steps as in the previous approach
i.e, calculate the bottom point of each bounding box, convert those points into
bird’s eye view, Point 1 to Point 4 as shown in the Fig. 2 are also converted to
bird’s eye view and then the distance between them was calculated(in pixels).
We then converted the distance from “pixels” to “centimetres” similarly as the
previous method. Using the distance between the bounding box we marked the
people who were in the proximity of less than 182 cm (6 ft).

4 Results

Evaluation of both the subtasks of this proposed work along with their inferences
is discussed in this section. The models were trained on google colab which has
the following conﬁguration

GPU: 12GB NVIDIA Tesla K80

Python version: Python 3.6.9.

4.1 Evaluation of the Pedestrian Detection Models

For evaluating our selected models we have used the Oxford Town Center Data
set [2]. It contains video from a CCTV camera located in the Cornmarket and
Market St., Oxford, England. We calculated the Mean Average Precision (map)
and the prediction time taken per image (in seconds). Following graphs were
obtained after the evaluation.

Fig. 4. Prediction time taken per image of all the selected models.
Social Distance Monitoring System 109

Fig. 5. Mean Average Precision of all the selected models

4.2 Evaluation of the Distance Calculations

We proposed two diﬀerent distance calculation approaches, we will evaluate both

of them with ground truth distance. We used the Oxford Town Centre Data
set [2] for evaluation. We calculated the ground truth distance using Google
Map [6] and compared it with the distance that we have calculated from both of
our approaches. As seen in the Fig. 6 The mean error for the Euclidean Distance
Approach is 2.84 m and for the Bird’s eye view Approach is 1.05 m.

Fig. 6. Error in calculating the distance vs the ground truth distance for both proposed
approaches
110 S. Jethani et al.

Fig. 7. Output Image from both the proposed approaches

From Fig. 4 and Fig. 5, we observed that YOLOv4 and RFCN had the highest
mAP but took a long time for the detection while Tiny Yolo and SSD+Mobilenet
took the least time but had low mAP. For the distance calculation, it is clear
from Fig. 6 and from the mean scores of both the approaches, that the Bird’s
Eye View Approach is better than the Euclidean Distance approach. Also, from
Fig. 6, it can be observed that as the distance increases the error also increases
for the Euclidean Distance Approach but, the same does not happen for the
other approach. Figure 7 shows the output for both of the proposed approaches
of this work.

5 Conclusion and Future Work

Depending on the processing power of the surveillance device, we wish to deploy

this social monitoring system, we can select a suitable model after seeing the
results. It is possible that when deploying this system on a UAV that flies at a
great height, mAP could be lower than mentioned since the model is not trained
with aerial images taken at such height. Another limitation of our model is that
for a given area we need to feed in the four points of Region of Interest for
accurate distance calculation.
In the future, we could work upon the limitations to make this system more
accurate in detecting pedestrians, from UAVs flying at great heights. Also, we
could make this system in a way so that we won’t have to enter the four points
every time for the distance calculation. So far no ideal tool for monitoring social
distancing has been implemented and this work is a small step that can be used
to reach this goal. Furthermore, this model can be used in a way where the faces
of people who are not maintaining proper social distancing will be sent to higher
authorities so that effective action can be taken to avoid further violation.

Acknowledgment. A sincere thanks to [Link] who gave us this platform

and opportunity.
Social Distance Monitoring System 111

References
1. Adarsh, P., Rathi, P., Kumar, M.: Yolo V3-Tiny: object detection and recog-
nition using one stage improved model. In: 2020 6th International Conference
on Advanced Computing and Communication Systems (ICACCS), pp. 687–694
(2020). [Link]
2. Benfold, B., Reid, I.: Stable multi-target tracking in real-time surveillance video.
In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern
Recognition, CVPR 2011, pp. 3457–3464. IEEE Computer Society (2011). https://
[Link]/10.1109/CVPR.2011.5995667
3. Cabreira, T., Brisolara, L., Ferreira Jr., P.: Survey on coverage path planning
with unmanned aerial vehicles. Drones 3, 4 (2019). [Link]
drones3010004
4. Chen, Z., Khemmar, R., Decoux, B., Atahouet, A., Ertaud, J.: Real time object
detection, tracking, and distance and motion estimation based on deep learning:
application to smart mobility. In: 2019 Eighth International Conference on Emerg-
ing Security Technologies (EST), pp. 1–6 (2019). [Link]
2019.8806222
5. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference
on Computer Vision (ICCV), December 2015. [Link]
46493-0 22
6. Google: Google maps. [Link]
15z
7. Guo, Q., Li, Y., Wang, D.: Pedestrian detection in unmanned aerial vehicle scene.
In: Lu, H. (ed.) ISAIR 2018. SCI, vol. 810, pp. 273–278. Springer, Cham (2020).
[Link] 26
8. Gupta, S., Sangeeta, R., Mishra, R., Singal, G., Badal, T., Garg, D.: Corridor
segmentation for automatic robot navigation in indoor environment using edge
devices. Comput. Netw. 178, 107374 (2020). [Link]
2020.107374
9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), June 2016
10. Ministry of Health & Family Welfare, Government of India: Social distancing
measure in view of spread of Covid-19 disease. [Link]
[Link]
11. Kushwaha, R., Singal, G., Nain, N.: A texture feature based approach for person
veriﬁcation using footprint bio-metric. Artif. Intell. Rev. 1–31 (2020). [Link]
org/10.1007/s10462-020-09887-6
12. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D.,
Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp.
740–755. Springer, Cham (2014). [Link] 48
13. Liu, W., et al.: SSD: single shot multibox detector. arXiv abs/1512.02325 (2016)
14. Lygouras, E., Santavas, N., Taitzoglou, A., Tarchanidis, K., Mitropoulos, A.,
Gasteratos, A.: Unsupervised human detection with an embedded vision system
on a fully autonomous UAV for search and rescue operations. Sensors 19(16), 3542
(2019). [Link]
15. Nguyen, D.T., Li, W., Ogunbona, P.: Human detection from images and videos: a
survey. Pattern Recogn. 51 (2015). [Link]
112 S. Jethani et al.

16. Pareek, B., Gupta, P., Singal, G., Kushwaha, R.: Person identification using
autonomous drone through resource constraint devices. In: 2019 Sixth International
Conference on Internet of Things: Systems, Management and Security (IOTSMS),
pp. 124–129 (2019). [Link]
17. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified,
real-time object detection. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), June 2016
18. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
19. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement (2018)
20. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time
object detection with region proposal networks. In: Cortes, C., Lawrence,
N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neu-
ral Information Processing Systems 28, pp. 91–99. Curran Associates, Inc.
(2015). [Link]
[Link]
21. Vaddi, S., Kumar, C., Jannesari, A.: Efficient object detection model for real-time
UAV applications. CoRR abs/1906.00786 (2019). [Link]
22. Veeramsetty, V., Singal, G., Badal, T.: Coinnet: platform independent application
to recognize Indian currency notes using deep learning techniques. Multimed. Tools
Appl. 79(31), 22569–22594 (2020). [Link]
Consumer Emotional State Evaluation Using
EEG Based Emotion Recognition Using Deep
Learning Approach

Rupali Gill(B) and Jaiteg Singh

Chitkara University Institute of Engineering and Technology, Rajpura, India

{[Link],[Link]}@[Link]

Abstract. The standard methodologies for marketing (e.g., newspaper ads and
tv commercials) are not effective in selling products as they do not excite the
customers to buy any specific item. These methods of advertising try to ascertain
their consumers’ attitude towards any product, which might not represent the
actual behavior. So, the customer behavior is misunderstood by the advertisers
and start-ups because the mindsets do not represent the buying behaviors of the
consumers. Previous studies reflect that there is lack of experimental work done
on classification and the prediction of their consumer emotional states. In this
research, a strategy has been adopted to discover the customer emotional states by
simply thinking about attributes and the power spectral density using EEG-based
signals. The results revealed that, though the deep neural network (DNN) higher
recall, greater precision, and accuracy compared with support vector machine
(SVM) and k-nearest neighbor (k-NN), but random forest(RF) reaches values that
were like deep learning on precisely the similar dataset.

Keywords: Deep Neural Network · EEG · Support Vector Machine · K-nearest

neighbor · Random forest

1 Introduction
As an emerging field, neuromarketing relates the full of feeling and psychological sides
of consumer conduct by utilizing neuroscience. The field of neuromarketing is a rising
field that individuals don’t perceive what occurs in their minds that were oblivious. Fur-
thermore, it has been exhibited that individuals are not satisfactory in their emotional s
or objectives (Hammou 2013). The utilization of promoting and publicizing media, sim-
ilar to reviews and meeting’s needs, and purchasing purposes can cause making of ends
(Telpaz et al. 2015; Barros et al. 2016). Similarly, oral communication about emotions
can prompt biased decisions. It is hard to extricate the emotions of consumer straight-
away through decisions, because of ethical issues associated with product purchase and
delivery (Telpaz et al. 2015). These components accentuate a logical inconsistency in
the shoppers’ suppositions during the ease of use appraisals and their genuine assess-
ments, sentiments, and observations with respect to an item’s utilization (Barros et al.
2016). Hence, neuromarketing needs methodological choices that can check consumer

D. Garg et al. (Eds.): IACC 2020, CCIS 1367, pp. 113–127, 2021.
[Link]
114 R. Gill and J. Singh

behavior successfully. A powerful way is given by novel neuroimaging methods. Tech-

niques like these at last assistance advertisers to inspect the consumers cerebrums and
secure understandings of the conscious and sub conscious strategies that contains fizzled
or fruitful showcasing advertising messages. The data along these lines obtained in the
wake of taking out the key issue in traditional advertising research, that is, confiding in
people; explicitly, the consumer or employees must be believed, who report on how the
shoppers get impacted by the particular area of any advertisement (Morin 2011).
The promising neuroimaging devices in neuromarketing is Brain-computer inter-
faces (BCIs). It permits frameworks and consumers to convey proficiently. To run and
execute commands, BCI don’t requires the utilization of any sort of device or muscle
obstacle (Abdulkader 2015). Besides, to control a framework a BCI utilizes energet-
ically created consumers’ cerebrum action through signs, which offers the ability to
associate or communicate with the nearby marketplace. To examine activity of the brain,
Electroencephalography (EEG) is one of the main tools.
The rest of the paper is organized into sections. The next section provides the details of
related work done by the various authors. The third section provides the details of method-
ology and proposed work done for the current experiment, followed by classification of
DNN framework and discussion of results and last section of conclusion.

2 Background and Literature Review

BCIs help to communicate effectively between user brain and computer system. It does
not involve in physiological interference and record signals through system generated
commands (Ramadan et al. 2015). BCI have its application area in advertising, medical
science, smart cities and neuroscience (Abdulkader 2015; Hwang 2013). BCI systems
are working to aid the user. BCI systems are very challenging in the field of advertising
and marketing.
The promising neuroimaging devices in neuromarketing is Brain-computer inter-
faces (BCIs). It permits frameworks and consumers to convey proficiently. To run and
execute commands, BCI don’t requires the utilization of any sort of device or muscle
obstacle (Abdulkader 2015). Besides, to control a framework a BCI utilizes energetically
created consumers’ cerebrum action through signs, which offers the ability to associate
or communicate with the nearby marketplace.
For the same various neuromarketing techniques which record the brain activity are
used. The various techniques EEG, fNIRS, fMRI, MEG, SST, PET, TMS (Krampe 2018)
are used for recording brain activity (Ohme 2009; Hakim 2019; Harris 2018). But from
all the techniques EEG has best temporal resolution as shown in Table 1.
The study based on BCI based neuroimaging techniques indicate that there are three
neuroimaging techniques – MEG, SST, EEG which have good scope for marketing
research but due to limitations of MEG and SST these are not used for the current
research. Because of the extensive advantages and varied features of EEG over SST and
MEG (Cherubino et al. 2019), EEG is being used for the current research.
The EEG is the BCI to perform dreary, ongoing assessment of brains’ associations
in low temporal resolution (Ramadan 2017; Ramadan et al. 2015). Thus, in the experi-
mental study, EEG was held onto as the info sign to get a BCI framework. BCIs might be
Consumer Emotional State Evaluation Using EEG 115

Table 1. BCI neuroimaging techniques

BCI technique Characteristic

Temporal Spatial Cost Portability Training Scope for
resolution resolution marketing
research
fMRI (function Magnetic Low High Very No Extensive Limited
Resonance high
EEG High Low Average Yes Moderate High
(Electroencephalogram)
fNIRS (Near Infrared Low High Very No Extensive Limited
Spectroscopy) high
MEG High Low High No Extensive High
(Magnetoencephalography)
SST (Steady State High Low Average Yes Moderate High
Topography)
PET (Positron Emission Low High High No Extensive No
Topography)
TMS (Transcranial Low High High No Extensive Limited
Magnetic Stimulation)

utilized to make an interpretation of them advertisement to check the emotional strate-

gies. The current research carried out an experiment using deep neural framework (DNN)
framework and used a benchmark DEAP data set for performing the experimentation.
EEG is a tool which assesses brain action. By assessing the variants this electrical
activity is logged on the scalp. These activities are logged utilizing electrodes placed over
the cortex straight on the scalp. The electrodes are attached in a device (Agarwal 2015;
Murugappan 2014). On the other hand, resolution and the ratio are limited in comparison
with those of different practices. EEG is regarded as the BCI input to comprehend a real
time brain evaluation (Lin 2018). EEG was chosen in this research as the input signal to
the BCI. The space between the electrodes that are local is 10% or 20% of their entire
scalp diameter as shown in Fig. 1.

Fig. 1. EEG emotive headset placement

The previous studies on EEG based recognition systems for emotion state recog-
nition are presented in this section. Emotional states can be defined as presentation of
116 R. Gill and J. Singh

human behavioral state for recognition of pleasantness states which could help in making
decisions (Ramsøy 2012).
The research by (Hwang 2013; Lotte and Bougrain 2018) stated that there is need of
more than one classifier and classifier combinations to detect and define feature sets and
improve the performance. The authors (Chew et al. 2016) stated that there is a great effect
on buying decision due to aesthetics presentation. They used 3D EEG signals to record
frequency bands and achieved good accuracy over liking scale. The extensive study and
review by provided by authors (Lotte and Bougrain 2018; Teo 2018a, b) to study various
deep learning and machine learning algorithms users to study consumer preferences.
(Hakim 2019), provided in depth study of classifiers and prediction algorithms user for
understanding consumer preference states and state that SVM with approximate accuracy
of 60% is best classifier so far for preference prediction. As per the study, LDA, SVM
are most studied algorithms for classifiers. The authors studied the various preferences
using EEG based systems (Hakim 2019). The previous (Lin 2018; Alvino 2018; Yadava
2017; Teo 2018a, b; Boksem 2015) has done much work on EEG based emotional state
detection.
With the emergence of neural networks and deep learning, EEG based studies have
become popular for emotional state prediction. Deep neural network (DNN) is type of
artificial neural network with various layers along with input and output layers. The most
basic type is multi-layer perceptron (MLP). The author (Loke 2017) suggested use DNN
for object identification. The authors (Teo 2018a, b; Roy 2019) have explores various
deep learning frameworks and (Teo 2017; 2018a, b) proposed the methods for EEG based
preference classification with compared with various machine learning classifiers.
The research has done considerable use of EEG in emotional state prediction to
understand the consumer preferences.

3 Methodology and Proposed Work

The recognition system that uses EEG focuses on the use of feature extraction process
along with the use of classification algorithm for decision making. In the experiment,
we researched the prospects of utilizing two emotional states - unpleasant and pleasant
and calculated of classifier accuracy through the EEG based recognition system (Ohme
2009).
Series of tasks involved in the paper are summarized as following steps (Ohme 2009)
in Fig. 2:

Fig. 2. EEG emotion process

Consumer Emotional State Evaluation Using EEG 117

1. Acquisition of Signal for the selected device: EEG-A DEAP dataset has been taken
and pre-processed to remove the artifacts.
EEG headset used in DEAP data set contain 32 channels. Table 2 provides the
mapping of 14+2 EEG Emotive headset used for the current research work. The
channels in bold are the mapped channels of EEG headset with DEAP dataset.

Table 2. Channel positioning according to 32 channel EEG headset used in DEAP dataset

Channel 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
No
Channel Fp1 AF3 F3 F7 FC5 FC1 C3 T7 CP5 CP1 P3 P7 PO3 O1 Oz Pz
Channel 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
No
Channel Fp2 AF4 Fz F4 F8 FC6 FC2 Cz C4 T8 CP6 CP2 P4 P8 PO4 O2

2. The pre-processing techniques that have been used are Independent Component
Analysis (IA). The data that has been pre-processed is fed into SVM, k-NN, RF and
DNN classifiers
3. Features are extracted and selected for a chosen device using Power spectral density
function. Features are selected where the most required optimum features have been
identified
4. Classification of features based on machine learning and DNN algorithms: KNN,
SVM, RF, DNN
5. Prediction of emotional states using classifiers by comparing the accuracy of each
classifier

The selection and extraction of features are basic techniques which are used to eval-
uate the performance of EEG recognition systems. The current study aims to detect the
emotional states-pleasant and unpleasant using classification algorithm through EEG
Emotive headset (Hwang 2013; Pham and Tran 2012). An off-line analysis was con-
ducted to evaluate the intelligence for the emotional detection and classification. The
DEAP data set was used to explore the performance and computation of deep learning
classification techniques. This might effectively replicate the emotional states of the
consumers for advertisement prediction. To carry out the experiment, the authors intend
to compare individuals’ recordings of the k-nearest neighbor (k-NN) and random forest
(RF), Support Vector Machine (SVM) classifiers and Deep Neural Network classifiers
for evaluating the emotional states. The Scikit-Learn toolbox was used to develop the
system learning suite, also used Python for EEG - artifact cleaning, filtering and pre-
processing, Python library - MNE software suite which is an open-source library used
to explore visualize and analyze cognitive and physiological signals, In addition Keras
library is used on the top of tensorflow for understanding and managing the cognitive
load.. This section discusses the methodology in conjunction with the experimentation
details of the proposed experimental work for detection of emotional states. It starts with
the outline of fact and, also the available prerecorded dataset of the emotional states.
118 R. Gill and J. Singh

Then the characteristic extraction is processed and eventually, the DNN classification
model is illustrated.

DEAP Dataset
DEAP dataset is used for the experimentation (Koelstra 2013). This dataset can be
divided into parts.

i. Calculation of valence, arousal and dominance emotional ratings for 120 music
videos of 1 min each.
ii. Calculation of participant ratings and recording of physiological and face video of 32
volunteers while watching 40 music videos mentioned in first parts. 22 participants
frontal face video was also recorded as shown in Fig. 3.

Fig. 3. DEAP dataset

Data Pre-processing
The experimental trial was done on the already pre-processed EEG based recordings from
the DEAP database. The EEG signal recordings were down examined from 512 Hz to
128 Hz, utilizing a band pass frequency filter between 4.0 Hz to 45.0 Hz, and the EOG
artifacts were eliminated from the epochs by using the dimensionality reduction method-
independent component analysis (ICA) (Hadjidimitriou 2012). The ICA decompose the
extracted features into independent signal by selecting a subset, by eliminating noisy and
very high-dimensional data (Nezamfar 2016). The features which are useful features are
retained and outliers are removed during the experimentation. Additionally, it reduces
the experimentation cost of the consequent measures. Thus, only the mentioned channels
were kept (Fz, F3, F4, AF3, and AF4) (Aldayel et al. 2020). Figure 4 shows the emotional
state engagement in various regions of brain with channel with frequency band involved.
Consumer Emotional State Evaluation Using EEG 119

Fig. 4. EEG emotion detection with band detection for EEG Channel (Teo 2017)

Feature Extraction for EEG Based Emotional State Recognition

Feature extraction plays a vital role in implementation of EEG based emotional recog-
nition system. Various feature extraction techniques are used by various researchers.
The techniques can be classified into frequency domain, time domain and time fre-
quency domain (Hwang 2013; Jenke et al. 2014; Roy 2019). The EEG features can
also be extracted by applying signal processing methods; time domain signal analysis,
frequency domain signal analysis, and/or time-frequency signal domain analysis. For
the current experiment power spectral density (PSD) feature extraction method has been
used to extract of the EEG frequency bands. PSD is a frequency domain technique. The
frequency bands from the recorded through EEG Emotive signals were used for the
calculation of valence.
Power Spectral Density - It is the contemporary feature extraction methods which
uses the frequency domain for experiments. It is an emerging and robust feature extraction
technique used for neuromarketing studies (Qin et al. 2019). The previous research
(Ohme 2010; Ramsøy 2012; Khushaba 2013) has shown that the output of PSD obtained
from the EEG-based data recordings functions well for determining understanding the
consumer behaviors and emotional states. PSD is linear extraction method used to extract
band power spectral features using either the parametric or non-parametric methods
(Ameera et al. 2019). PSD uses the Fast Fourier Transform (FFT), discrete Fourier
Transform (DFT) and FFT inverse for conversion of time domain signal into frequency
domain and vice a versa. PSD technique used in this research divides each EEG based
data recordings to four separate frequency bands: alpha (α), beta (β), gamma (γ), theta
(θ), as shown in Fig. 5. The division of signals into frequency bands is done through
Welch method or Burg method (Ameera et al. 2019). A thorough review by (Hwang
2013) says that PSD is being used in most of the researches for extracting features from
EEG. Also, the Python open source suite (MNE) was used for calculation of PSD. The
specific frequency band was computed to construct a feature using the MNE suite as
shown in Fig. 5. The frequency bands are calculated with amplitude over y-axis and time
over x-axis.
120 R. Gill and J. Singh

Fig. 5. Frequency domain EEG artifact with PSD computation

Valence - In the current study, valence was chosen as the rate of measure emotional
states. The Likert scale values ranging from 0–9 was used to record the same. The value
of activation for EEG frontal asymmetry (E) is directly proportion to valence (V), E ∝ V
(Koelstra 2013). Also, DEAP dataset also reflects this association valence (V) and EEG
frequency bands (αβγθ) (Koelstra 2011), shown in Fig. 6. The increase from valence,
leads to increase in intensity value of frequency bands, which is in accordance with the
results in a comparable study (Al-Nafjan et al. 2017a, b). The liking rating from the
DEAP dataset is not used in the current experiment (Al-Nafjan et al. 2017a, b).

Channel
SelecƟon

PSD

Alpha α Beta β Gamma γ Theta θ

(8-13Hz) (13-30Hz) (30-40Hz) (4-8Hz)

Valence
CalculaƟon

NormalizaƟon

Fig. 6. Process of EEG feature extraction

Calculation of valence (V) is done using below equations (Eqs. 1–4) (Al-Nafjan et al.
2017a, b):

V = (β(AF3, F3))/α(AF3, F3) − β(AF4, F4)/α(AF4, F4) (1)

V = ln [α(Fz, AF3, F3)] − ln [α(Fz, AF4, F4)] (2)

V = α(F4) − β(F3) (3)

Consumer Emotional State Evaluation Using EEG 121

V = α(F4)/β(F4) − (α(F3))/(β(F3)) (4)

4 DNN Classification
Deep Neural Networks are framework that contains layers of “neurons” combined with
each other. Each layer of neuron performs a different linear transformation to the input
information (Roy 2019; Aldayel et al. 2020). Then, every layer’s transformation under-
goes processing to the give an outcome through a nonlinear cost function. These cost
functions are minimized to obtain the optimal outcome. The DNN functions in a single
forward direction, by the input via the hidden ones (if accessible) into the output neurons
in the forward directions. The neuron output from previous layer acts as activation of
each neuron for the next layer.
For the current research, the DNN model using one input layer, three hidden layers,
one batch normalization layer and one output layer. The hyperparameters used for DNN
model training are learning rate calculated through Adam gradient, number of epochs,
and ReLU activation function and output in the form of Softmax activation function. The
trained DNN model was compared with accuracy results for classification algorithms -
SVM, RF, k-NN. The DNN classifier’s block structure is displayed in Fig. 7.

Fig. 7. DNN classification

The first step was to normalize the extracted features. There are two commonly
used normalization techniques – min-max normalization and z-score. For the current
experiment, min-max normalization (Eq. 5) was used and were fed to DNN classifiers.
This is the most common way to normalize the data. The data is normalized in the range
of 0 and 1. The minimum (min) value is converted to 0 and maximum (max) value is
converted to 1, all other values (v) lies between decimals of 0 and 1.
v_normalized = (v − min )/max − min (5)
Adam gradient descent optimization strategy was used to train the DNN classifier.
It is one of the most optimal strategy which uses an iterative algorithm in order to
122 R. Gill and J. Singh

minimize a function to the local or global minima. For the current experiment, three
reduction functions namely cross entropy functions - binary and categorical, and hinge
cross function are used. The system was stopped when the machine started to over-fit
and was stopped at 0.0001. With the acceptable defaults and proper setup: the starting
experimentation learning rate was 0.001. The system consists of layers input layer, and 3
hidden layers 1700, 1200,700 respectively and an output layer. As per the experimental
requirements the sample size for input layer was 2125 samples with decreasing the
size to 75% after every filter operation in the hidden layers. The output measurements
pertain to the amount of goal emotional states. The network was tested over the test
data which comprised roughly 20% of DEAP data samples. Together with three hidden
layers, that comprises components between rectified linear unit (ReLu). The output is
DDN execution is obtained through soft-max activation function with a binary cross-
entropy loss function. Soft-max activation function normalizes the outputs from various
hidden layers.

5 Result and Discussion

The experiment was performed to predict the emotional states using different classifica-
tion algorithms: Deep Neural Network, k-Nearest Neighbor, Random Forest and Support
Vector Machine. The accuracy of emotional states was predicted using the machine learn-
ing classification algorithms. Accuracy is the measure through the performance of any
system could be evaluated and same is used for EEG recognition system. The accuracy
is defining the average of the accurate predictions over actual value availability in the
proposed benchmark dataset.
Number of Correct Predictions
Accuracy(A) = (6)
Total Number of Cases
Figure 8 presents the measured accuracy for experimented classification algorithms.
The performance evaluation of these algorithms was done using cross validation tech-
nique. This technique helps the experimenter to train the model using subset of the

Fig. 8. Accuracy prediction using cross validation method on various classifiers on proposed
dataset
Consumer Emotional State Evaluation Using EEG 123

benchmark data set and then perform evaluation. For the current study three methods
namely - holdout, k-fold cross validation, and leave-one-out cross validation (LOOCV)
were used.
Hold out – Hold out (test/train splitting) method performs the training ay 50% of
the data set and 50% for the test dataset. The results of DNN and k-NN are better than
random forest (RF), support vector machine (SVM) as shown in Table 3:

Table 3. Outcome for the hold-out cross validation

DNN RF SVM k-NN

93% 88% 62% 93%

LOOCV - This method performs training on the whole dataset leaving aside only one
data-point and then iterates over each data point. This is very time-consuming process.
The results of random forest (RF) outperformed other classifiers as shown in Table 4.

Table 4. Outcome for the leave one out cross validation

DNN RF SVM k-NN

70% 92% 45% 87%

K-fold Cross Validation - The data set is spilt into number of subsets known as folds.
This model uses k − 1 folds for training and 1 for testing and then iterated each time over
every fold. The results of random forest and k-nearest neighbor is better than classifiers
as shown in Table 5.

Table 5. Outcome for the k-fold cross validation

DNN RF SVM k-NN

K fold = 5 59% 89% 48% 91%
K fold = 10 67% 90% 49% 84%
K fold = 20 71% 91% 48% 87%

Since the very best results were achieved using the holdout validation from all the
validation techniques, this technique was chosen to apply the loss function -hyper param-
eters for DNN framework. Figure 8 presents the summary of all the cross-validation
techniques.
Figure 9 presents the results for accuracy calculation for SVM, RF, KNN, and DNN
using three different loss functions: the cross-entropy function - binary and categorical,
124 R. Gill and J. Singh

and hinge function. Categorical cross entropy loss is combination of softmax activation
function and Cross - Entropy loss and is used for multi-class classification. Binary
cross entropy is a combination Sigmoid activation and Cross-Entropy loss and is used
for multi-label classification. The hinge loss is used for max-margin classification and
shows best results with SVM classifiers.

Fig. 9. Classifier accuracy for loss functions for emotional state classification using hold-out
validation

The results demonstrate that the k-NN classifier highest accuracy of 88% when the
cross validation of k = 1. Though, accuracy of 92% was achieved for RF, the DNN also
reached the accuracy result of 91% which is the highest with hinge cross-entropy loss
function as compared to the other studied algorithms.
Further the research work compared the work done in the using DNN model on EEG
based emotion recognition. Table 6 provides summary of the results when compared with
existing researches. Two studies were used which used PSD feature extraction on DEAP
dataset and worked on detecting arousal. The comparative results show that proposed
method gave comparative results when applied on DNN model.

Table 6. Comparison of EEG emotion analysis using DNN model

Ref Data set Feature extraction Emotion classes Result

(Al-Nafjan et al. DEAP PSD Valence and arousal 82% Accuracy for
2017a, b) Hold out cross
validation
(Aldayel et al. DEAP PSD Arousal 94% Accuracy for
2020) Hinge Cross
validation
Proposed DNN DEAP PSD Arousal 93% Accuracy for
Hold out cross
validation
Consumer Emotional State Evaluation Using EEG 125

6 Conclusion
In this paper, a DNN based learning model has been proposed to detect consumer emo-
tional states from EEG signals. The complete work is carried proposed dataset and DEAP
dataset. Initially from EEG two types of signals are extracted i.e. PSD and valence. There
are around 2125 different feature in each EEG activity. In this paper various evaluation
parameters of accuracy are used. The parameters were used to test the classifier perfor-
mance and validation using LOOCV, holdout and K-fold techniques. Total four different
classifiers were used (DNN, SVM, KNN, RF), our proposed method achieves the accu-
racy of around 70%, 93%, 91%, 84% and 87% in all the validation parameters. Our
proposed method had shown highest accuracy in contrast with all other methods. The
research work results were compared with existing researches. The major limitations of
the research if limited to only two emotional states and with evaluation using a smaller
number of parameters. In future, DNN method can be further explored on certain param-
eters to improve the achieved accuracy for emotional state evaluation. The exploration
of enhanced DNN model is proposed as future work for the valence arousal model.
The authors recommend applying DNN model on multiple modalities used in order to
understand consumer emotional states.

References
Abdulkader, S.N.: Brain computer interfacing: applications and challenges. Egypt. Inform. J.
16(2), 213–230 (2015)
Agarwal, S.: Neuromarketing and consumer neuroscience: current understanding and the way
forward. Decision 457–462 (2015)
Aldayel, M., Ykhlef, M., Al-Nafjan, A.: Deep learning for EEG-based preference classification
in neuromarketing. Appl. Sci. 10(4), 1525–1548 (2020)
Al-Nafjan, A., Hosny, M., Al-Ohali, Y., Al-Wabil, A.: Review and classification of emotion recog-
nition based on EEG brain-computer interface system research: a systematic review. Appl. Sci.
7(12), 1239 (2017a)
Al-Nafjan, A., Hosny, M., Al-Wabil, A., Al-Ohali, Y.: Classification of human emotions from
electroencephalogram (EEG) signal using deep neural network. Int. J. Adv. Comput. Sci. Appl.
8(9), 419–425 (2017b)
Alvino, L.C.: Towards a better understanding of consumer behavior: marginal utility as a parameter
in neuromarketing research. Int. J. Mark. Stud. 10(1), 90–106 (2018)
Ameera, A., Saidatul, A., Ibrahim, Z.: Analysis of EEG spectrum bands using power spectral
density for pleasure and displeasure state. In: IOP Conference Series: Materials Science and
Engineering, vol. 557, no. 1, pp. 012030–01203. IOP Publishing (2019)
Barros, R.Q., et al.: Analysis of product use by means of eye tracking and EEG: a study of
neuroergonomics. In: Marcus, A. (ed.) DUXU 2016. LNCS, vol. 9747, pp. 539–548. Springer,
Cham (2016). [Link]
Boksem, M.A.: Brain responses to movie trailers predict individual preferences for movies and
their population-wide commercial success. J. Mark. Res. 52(4), 482–492 (2015)
Chew, L., Teo, J., Mountstephens, J.: Aesthetic preference recognition of 3D shapes using EEG.
Cogn. Neurodyn. 10(2), 165–173 (2016)
Cherubino, P., et al.: Consumer behaviour through the eyes of neurophysiological measures:
state-of-the-art and future trends. Comput. Intell. Neurosci. 1–41 (2019)
126 R. Gill and J. Singh

Hadjidimitriou, S.K.: Toward an EEG-based recognition of music liking using time-frequency

analysis. IEEE Trans. Biomed. Eng. 59(12), 3498–3510 (2012)
Hakim, A.: A gateway to consumers’ minds: achievements, caveats, and prospects of
electroencephalography-based prediction in neuromarketing. Wiley Interdisc. Rev. Cogn. Sci.
10(2), e1485 (2019)
Hammou, K.A.: The contributions of neuromarketing in marketing research. J. Manag. Res. 5(4),
20 (2013)
Harris, J.M.: Consumer neuroscience for marketing researchers. J. Consum. Behav. 17(3), 239–252
(2018)
Hwang, H.J.: EEG-based brain-computer ınterfaces: a thorough literature survey. Int. J. Hum.-
Comput. Interact. 29(12), 814–826 (2013)
Jenke, R., Peer, A., Buss, M.: Feature extraction and selection for emotion recognition from EEG.
IEEE Trans. Affect. Comput. 5(3), 327–339 (2014)
Khushaba, R.N.: Consumer neuroscience: assessing the brain response to marketing stimuli using
electroencephalogram (EEG) and eye tracking. Expert Syst. Appl. 40(9) (2013)
Koelstra, S.M.: Deap: a database for emotion analysis; using physiological signals. IEEE Trans.
Affect. Comput. 3(1), 18–31 (2011)
Koelstra, S.P.: Fusion of facial expressions and EEG for implicit affective tagging. Image Vis.
Comput. 31(2), 164–174 (2013)
Krampe, C.G.: The application of mobile fNIRS in marketing research—detecting the “first-
choice-brand” effect. Front. Hum. Neurosci. 12, 433 (2018)
Lin, M.H.: Applying EEG in consumer neuroscience. Eur. J. Mark. 52, 66–91 (2018)
Loke, K.S.: Object contour completion by combining object recognition and local edge cues. J.
Inf. Commun. Technol. 16(2), 224–242 (2017)
Lotte, F., Bougrain, L.: A review of classification algorithms for EEG-based brain–computer
interfaces: a 10 year update. J. Neural Eng. 15, 031005 (2018)
Morin, C.: Neuromarketing: the new science of consumer behavior. Society 48(2), 131–136 (2011)
Murugappan, M.M.: Wireless EEG signals based neuromarketing system using Fast Fourier
Transform (FFT). In: 2014 IEEE 10th International Colloquium on Signal Processing and
its Applications, pp. 25–30. IEEE (2014)
Nezamfar, H.F.: A context-aware code-VEP based brain computer ınterface for daily life using
EEG signals. Ph.D. Thesis, Northeastern University, Boston, MA, USA (2016)
Ohme, R.R.: Analysis of neurophysiological reactions to advertising stimuli by means of EEG
and galvanic skin response measures. J. Neurosci. Psychol. Econ. 2, 21–31 (2009)
Ohme, R.R.: Application of frontal EEG asymmetry to advertising research. J. Econ. Psychol.
31(5), 785–793 (2010)
Pham, T.D., Tran, D.: Emotion recognition using the emotiv EPOC device. In: Huang, T., Zeng, Z.,
Li, C., Leung, C.S. (eds.) ICONIP 2012. LNCS, vol. 7667, pp. 394–399. Springer, Heidelberg
(2012). [Link]
Ramadan, R.A., Refat, S., Elshahed, M.A., Ali, R.A.: Basics of brain computer interface. In: Has-
sanien, A.E., Azar, A.T. (eds.) Brain-Computer Interfaces. ISRL, vol. 74, pp. 31–50. Springer,
Cham (2015). [Link]
Ramadan, R.A.: Brain computer interface: control signals review. Neurocomputing 223, 26–44
(2017)
Ramsøy, T.Z.-O.: Effects of perceptual uncertainty on arousal and preference across different
visual domains. J. Neurosci. Psychol. Econ. 5(4), 212 (2012)
Roy, Y.B.: Deep learning-based electroencephalography analysis: a systematic review. J. Neural
Eng. 16(5), 051001 (2019)
Telpaz, A., Webb, R., Levy, D.: Using EEG to predict consumers’ future choices. J. Mark. Res.
52, 511–529 (2015)
Consumer Emotional State Evaluation Using EEG 127

Teo, J.C.: Classification of affective states via EEG and deep learning. Int. J. Adv. Comput. Sci.
Appl. 9(5), 132–142 (2018a)
Teo, J.H.: Deep learning for EEG-based preference classification. In: AIP Conference Proceedings,
vol. 1891, p. 020141. AIP Publishing LLC (2017)
Teo, J.H.: Preference classification using electroencephalography (EEG) and deep learning. J.
Telecommun. Electron. Comput. Eng. (JTEC), 10(1–11), 87–91 (2018b)
Qin, X., Zheng, Y., Chen, B.: Extract EEG features by combining power spectral density and
correntropy spectral density. In: 2019 Chinese Automation Congress (CAC), pp. 2455–2459.
IEEE (2019)
Yadava, M.K.: Analysis of EEG signals and its application to neuromarketing. Multimed. Tools
Appl. 76(18), 19087–19111 (2017)
Covid Prediction from Chest X-Rays Using
Transfer Learning

D. Haritha and M. Krishna Pranathi(B)

SRK Institute of Technology, Vijayawada, India

harithadasari@[Link], [Link]@[Link]

Abstract. The novel corona virus is a rapidly spreading viral infection that has
became a pandemic causing destructive effects on public health and global econ-
omy. So, early detection and Covid-19 patient early quarantine is having the sig-
nificant impact on curtailing it’s transmission rate. But it has become a major chal-
lenge due to critical shortage of test kits. A new promising method that overcomes
this challenge by predicting Covid-19 from patient X-rays using transfer learning,
a deep learning technique is proposed in this paper. For this we used a dataset
consisting of chest x-rays of Covid-19 infected and normal people. we used VGG,
GoogleNet-Inception v1, ResNet, CheXNet models of transfer learning which is
a deep learning technique for its benefit of decreasing the training time for a neu-
ral network model. Using these we show accuracies of 99.49%, 99%, 98.63%,
99.93% respectively in Covid-19 prediction from x-ray of suspected patient.

Keywords: Convolutional Neural Network · Covid-19 · Transfer learning

1 Introduction
In December 2019, Covid-19 caused by most recently discovered corona virus was first
reported in Wuhan, China as a special case of pneumonia and later named as Covid-19
and the virus as SARS-CoV-2. It infects respiratory system at mild level common cold
to most impacting MERS (Middle East Respiratory Syndrome) as well as SARS (Severe
Acute Respiratory Syndrome). The clinical features of the disease include fewer, sore
throat, headache, cough, mild respiratory symptoms even leading to pneumonia. The
better accurate test techniques that are being currently used for Covid diagnosis are
Polymerase Chain Reaction and Reverse Transcription PCR [1] tests and are laboratory
methods that interact with other RNA and DNA to determine volume of specific RNA
using fluorescence. This is done by collecting samples of nasal secretions. Due to lim-
ited availability of these test kits, early detection can not be done which in turn leads to
increase in the spread of disease. Covid became a pandemic effecting globally and right
now there is no vaccine available to cure this. In this epidemic situation Artificial Intel-
ligence techniques are becoming vital. Some of the applications in this Covid pandemic
scenario that show promising use of AI are AI techniques embedded in cameras to iden-
tify infected patients with their recent travel history using facial recognition techniques,
using robot services to deliver food items and medicines for Covid infected patients, and

D. Garg et al. (Eds.): IACC 2020, CCIS 1367, pp. 128–138, 2021.
[Link]
Covid Prediction from Chest X-Rays Using Transfer Learning 129

using drones to disinfect the surfaces in public places etc. [2]. Lot of research is being
carried out in using AI for drug discovery for Covid cure and vaccine for Covid preven-
tion by learning about the RNA of virus. Machine learning techniques are being used in
medical disease diagnosis for reducing manual intervention and automatic diagnosis and
are becoming supportive tool for clinicians. Deep learning techniques are successfully
applied in several issues like carcinoma detection, carcinoma classification, and respira-
tory disorder detection from chest x-ray pictures. Day by day the Covid19 is growing at
an exponential rate so, the usage of deep learning techniques for Covid prediction may
help to increase testing rate and thereby reducing the transmission rate. Covid effects
line up of respiratory track, shows preliminary symptoms like pneumonia and as doctors
frequently use x-rays to test for pneumonia etc., identification of Covid using X-ray can
play significant role in corona tests. So, to increase the Covid testing rate we can use
X-ray test as preliminary test and if AI prediction test results in positive then patient can
undergo medical test. In this paper, transfer learning, a machine learning technique is
used that takes an approach of reserving knowledge gained in solving one problem and
apply that knowledge for solving the other similar problems. A dataset consisting of x-
rays of normal and Covid-19 patients is used for transfer learning. A deep neural network
is build to be implemented with VGG, inception v1, ResNet and CheXNet models. We
have chosen these models as they are CNNs and are trained with large ImageNet datasets.
These are widely used in Image classification and disease prediction also. We selected
in particular CheXNet as it was trained on Chest X-rays. Section 2 briefs some of the
recent works done in Covid prediction using AI and Deep Learning (DL) techniques.
Section 3 presents our methodology used for Covid prediction using Transfer learning.
Section 4 discusses the results obtained in applying four VGG, GoogleNet-Inception v1,
ResNet, CheXNet models. In Sect. 5 the use of Transfer Leaning in Covid prediction is
concluded.

2 Related Work
Many researches are working rigorously on possibilities of early Covid-19 detection
since Feb 2019. Both laboratory clinical testing methods and computer aided testing
using Artificial Intelligence, machine learning and deep learning (DL) approaches are
being developed. As this disease does not show symptoms immediately, early identifi-
cation of infected person has become difficult. Artificial Intelligence can be aided for
easy and rapid X-ray diagnosis using deep learning. The ideology of using x-ray images
in prediction of covid19 came from the deep neural network approaches which were
used in pneumonia detection using chest X-rays [3]. A deep learning based automated
diagnosis system for X-ray mammograms was proposed by Al-Antari et al. [4]. They
used YOLO, a regional deep learning approach which resulted in detection accuracy of
98.96%.
Bar et al., have detected chest pathology in chest radio-graphs using deep learning
models [5]. The feasibility of detecting pathology based on non-medical learning using
DL approaches is observed. Later many works for detection of lung abnormalities, tuber-
culosis patterns, vessel extraction using x-rays are developed [6, 7]. Covid-19 diagnosis
using deep learning In recent days extensive work is being carried out in using deep
130 D. Haritha and M. K. Pranathi

learning and AI techniques in the Covid 19 prediction. More accurate and faster Covid-
19 detection can be achieved by AI and DL using Chest X-rays with good accuracies.
There were numerous previous works done in the application of transfer learning models
based on Convolutional Neural Networks for different disease predictions. Apostolopou-
los et al., have taken X-ray image dataset from patients with common microorganism
respiratory disorder, Covid-19 positive, and normal diseases from public repositories
for the automated detection of the Coronavirus sickness [8]. They used transfer learning
models that uses CNN for detecting the varied abnormalities in little medical image
datasets yielding outstanding results approximately 96%. Their promising results show
that Deep Learning techniques from X-ray images extract important bio markers associ-
ated with the Covid-19 sickness. Three CNN based models ResNet50, InceptionV3 and
Inception-ResNetV2 were applied for the detection of coronavirus using chest X-ray
radiographs by Narin, Ceren, Pamuk [9]. They obtained 98%, 97% and 87% accuracies
respectively. Salman, Fatima M., et al., used Convolutional Neural Network for Covid19
detection [10, 12]. As an alternate to build a model from scratch, Transfer Learning helps
in reducing the computational overhead and is proved to be the most promising technique
in many deep learning applications. In this paper we proposed covid-19 prediction from
x-rays using transfer learning models with better accuracy.

3 Methodology

3.1 Dataset of Project

Collecting a good dataset is one of the most important requirements for getting better
accuracy of any ML model. Since, in this project we are predicting covid19 from chest
xrays we need a dataset consisting of chest X-ray images only. There is a collection of
datasets available even on internet. The model is trained on chest X-rays of Covid-19
positive patients and normal patients. A balanced dataset comprising 1824 chest X-rays
of both covid-19 and normal patients are considered [11] Fig. 1 shows sample chest
x-rays of Covid-19 patient and normal patient. The hazy lung opacity that obscures the
vessels and bronchial walls is a major feature that distinguishes Covid-19 positive x-rays
from normal once.

3.2 Transfer Learning

Transfer Learning is one of the advanced deep learning approaches in which a model
trained on similar problem is used as a starting point for the other similar problems. It
decreases training time in neural network for optimization of tuning hyper parameters.
One or more layers from the trained model are used in new model and some are freezed
and fine tuning is applied to other output layers which are to be customized. Figure 2
shows the working of Transfer Learning technique. The popular methods of this app-
roach are - VGG (VGG 16 or 19), GoogleNet (Inception v1 or v3), Residual Network
(ResNet50), CheXNet. Keras provides access to a number of such pretrained models. In
transfer learning initially Convolution Neural Networks (CNN) are trained on datasets
and then they are employed to process new set of images and extract the features. In
Covid Prediction from Chest X-Rays Using Transfer Learning 131

medical related tasks we use transfer learning to exploit CNN with these models and eval-
uate algorithms for image classification and object detection. In this section we discuss
the architecture of four models VGG, GoogleNet, ResNet and CheXNet and explore
their applicability using pretrained weights as part of transfer learning for Covid-19
prediction.

Fig. 1. Sample x-rays of Covid patient and normal patient.

Fig. 2. Transfer learning

VGG: VGG is a CNN((Convolutional Neural Network), proposed by Karen and Andrew

in 2014 [13]. The VGG 16 is deep network consisting of sixteen layers out of which
thirteen layers are convolutional layers with 3 × 3 lters, and three fully connected layers.
The fully-connected layers are of 4096 channels each. The stride of convolution is one
pixel and padding is also one pixel value. All convolutional layers are managed as a
group of three whose output is given to a max-pooling layer as shown in Fig. 3.
GoogleNet: GoogleNet was rst proposed by google research group in 2014 and is
the winner of the ILSVRC 2014 Image Classification and detection competition from
Google[14]. It attained a top-5 error rate of 6.67%. This model contains 22 layers deep
132 D. Haritha and M. K. Pranathi

Fig. 3. Architecture of VGG16 model.

CNN and almost 12× less parameters. It uses variant strategies like 1 × 1 convolu-
tion and average pooling that enables it to create a deeper design. Fig. 4 depicts the
architecture of GoogleNet model.
ResNet: ResNet abbreviation for Residual Neural Network proposed in 2015 as part of
ImageNet challenge for computer vision task [15]. It was the winner of that challenge
and is widely used for Computer Vision projects. Using Transfer learning concept we
can train its 150 plus layers successfully. The last two or three layers that contain non
linearity can be skipped. This helps to avoid gradient vanishing problem. It’s architecture
is shown in Fig. 5.

Fig. 4. Architecture of GoogleNet model.

Fig. 5. Architecture of ResNet model.

Covid Prediction from Chest X-Rays Using Transfer Learning 133

CheXNet: CheXNet consists of 121 CNN layers. It produces heatmap comprising local-
ized areas which can indicate the areas effected by the disease in the image along with
the prediction probability [16]. This was developed to predict the pneumonia from chest
x-rays. This model used chest X-ray14 dataset containing 14 different pathological X-
ray images. It’s architecture is shown below in Fig. 6. The test set labels were annotated
by four reputed radiologists and was used for evaluating the performance of the model
with reference to annotations given by radiologists.

Fig. 6. Architecture of CheXNet model.

3.3 Implementation

In our paper, we performed of transfer learning models for Covid-19 prediction from
x-rays. The deep architectures helped in predicting the results with good accuracies for
VGG, GoogleNet, ResNet and CheXNet models. The Fig. 7 describes our proposed
implementation model.

Fig. 7. Schematic representation depicting implementation of our model.

134 D. Haritha and M. K. Pranathi

Algorithm

Step1: Load the dataset that contains 1824 images with 2 classes for binary classification.
Step 2: Resize the images in our dataset to 224 × 224, as the Transfer Learning CNN
models takes input images of size 224 × 224
Step 3: Select pre trained layers from VGG/GoogleNet/ResNet/CheXNet and modify
the output layers. The no of layers selected and modifications carried out are described
below for each model individually.
Step 4: Fine tune the hyper parameters of each model individually and tuned parameters
are indicated in Table 1
Step 5: Evaluate the performance of each model using the metrics explained in the next
subsection.
Step 6: Pass a new X-ray image to detect whether the patient is having Covid-19 or not.

The VGG16 model contains 16 weight layers that include convolutional, pooling,
fully connected and final dense layer. The final layer contains 1000 predictable output
classes out of which we considered 2 classes for our model. This is done by freez-
ing convolutional layers and 2 new fully connected layers are constructed. GoogleNet
contains 22 layers with average pooling, all are trained and in output layer 2 softmax
layers are taken for prediction. ResNet model has 50 layers with output layer capable
of classifying 1000 objects. We freezed the final dense layer and added 2 layers for
predicting our 2 classes covid-19 and non covid. Finally, for CheXNet we considered
DenseNet121 network with pre trained weights and freezed the CONV weights. Then,
new fully connected sigmoid layers are constructed and appended at top of DenseNet.

4 Results and Description

4.1 Dataset
We had given chest x-ray images as input to our model. This data is divided in 8:2 ratio
in our model i.e. 80% for training the model and 20% for model validation. We also
tested with other images. The resizing is done to a default image size (224, 224).

Table 1. Hyper parameters used in different models

Hyper parameters Values

VGG GoogleNet ResNet CheXNet
Epochs 30 50 30 10
Activation function Softmax ReLU ReLU Sigmoid
Batch size 5 20 5 5
Learning rate 1e−3 1e−1 1e−3 1e−3
Test size 0.2 0.2 0.2 0.2
Covid Prediction from Chest X-Rays Using Transfer Learning 135

4.2 Hyperparameter Tuning

The hyperparameters are tuned in order to obtain a highly performing model. We tuned
around 5 different parameters which comprise of adjusting the learning rate, selection of
optimizer, loss functions, changing number of epochs, batch size, test size, rotation range
etc. Learning rate is given as parameter to the optimizer function. Working on different
optimizer and loss function did not affected the working of the model much so we
used Adam as optimizer function and binary cross entropy as loss function throughout
the model. Batch size is the number of samples that will be propagated through the
network and epochs is the number of times the model is implemented on training data.
Dropout is a regularisation technique where some random neurons are ignored during
training. Increasing dropout generally increases accuracy. Table 1 shows the values of
hyperparameters that we used for different transfer learning models.

4.3 Performance Metrics

In a model the values like accuracy, precision, recall, and F1 score are considered as
performance metrics since they are used to evaluate the model performance. Accuracy
is the ratio of correctly classified to the total number of predictions. Precision is the ratio
of true positives to the predicted positives.
Recall is the ratio of true positives predicted out of total positives.
F1-score It is the weighted average of precision and recall.
Precision and recall are useful when the dataset is imbalanced i.e. when there is large
difference between the number of X rays with Covid and without Covid.

Table 2. Performance measures of different models

Model Accuracy F1-measure Precision Recall

VGG 99.49 99 99 100
GoogleNet 99.18 99 98.3 100
ResNet 98.63 97 100 99
CheXNet 99.93 100 100 100

4.4 Result
It ends up with a good accuracy of 99.49% and the values for sensitivity, specificity as
1.0000 and 0.9890 respectively using VGG 16 model, accuracy of 99% with values for
sensitivity, specificity as 1.0000 and 0.9834 respectively using GoogleNet-inception v1
model, accuracy of 98.63% with values for sensitivity, specificity as 1.0000 and 0.9725
respectively using ResNet 50 model and 99.93% accuracy with values for sensitivity,
specificity as 1.000 and 1.000 respectively using CheXNet model for Covid and normal
classes in Covid prediction. The performance measures of all these models is shown
below in Table 2.
136 D. Haritha and M. K. Pranathi

Fig. 8. Graph showing variations in different measures for VGG16 model.

Fig. 9. Graph showing variations in different measures for GoogleNet inceptionV1 model.
Covid Prediction from Chest X-Rays Using Transfer Learning 137

Owing to the well performance of these proposed models, they can be incorporated
in real-time testing which in turn increases the testing rate. The graphs in below figures,
Fig. 8, 9, 10, 11 shows variation in different measures of accuracy and loss for VGG,
GoogleNet, ResNet and CheXNet models.

Fig. 10. Graph showing variations in different measures for ResNet50 model.

Fig. 11. Graph showing variations in different measures for CheXNet model.

5 Conclusion and Future Work

In this paper, we used transfer learning approach to train CNN using x-ray images to
predict the novel Covid-19 disease. This idea can be implemented in real-time scenarios
of Covid-19 detection with further developments. This can also be implemented using
138 D. Haritha and M. K. Pranathi

other transfer learning methods. Our work can be further extended to train with large
datasets so that still better accuracy can be achieved even for the cases of unseen data.
This can also be further enhanced to predict the possibility of survival of the covid
affected patients. However the work carried in this paper can offer potential insight and
will contribute towards further research regarding COVID-19 predictions.

References
1. World Health Organization: Laboratory testing for coronavirus disease 2019 (Covid-19) in
suspected human cases: interim guidance, 2 March 2020. World Health Organization, World
Health Organization (2020)
2. Ruiz Estrada, M.A.: The uses of drones in case of massive Epidemics contagious diseases
relief humanitarian aid: Wuhan-Covid-19 crisis. SSRN Electron. J. (2020). [Link]
10.2139/ssrn.3546547
3. Wu, H., et al.: Predict pneumonia with chest X-ray images based on convolutional deep neural
learning networks. J. Intell. Fuzzy Syst. Preprint (2020)
4. Al-Antari, M.A., et al.: A fully integrated computer-aided diagnosis system for digital X-ray
mammograms via deep learning detection, segmentation, and classification. Int. J. Med. Inf.
117, 44–54 (2018)
5. Bar, Y., et al.: Chest pathology detection using deep learning with non-medical training. In:
2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI). IEEE (2015)
6. Bhandary, A., et al.: Deep-learning framework to detect lung abnormality-a study with chest
X-ray and lung CT scan images. Pattern Recogn. Lett. 129, 271–278 (2020)
7. Nasr-Esfahani, E., et al.: Vessel extraction in X-ray angiograms using deep learning. In: 2016
38th Annual International Conference of the IEEE Engineering in Medicine and Biology
Society (EMBC). IEEE (2016)
8. Apostolopoulos, I.D., Mpesiana, T.A.: Covid-19: automatic detection from x-ray im-ages
utilizing transfer learning with convolutional neural networks. Phys. Eng. Sci. Med. 6, 1
(2020)
9. Narin, A., Ceren, K., Pamuk, Z.: Automatic detection of coronavirus disease (Covid-19)
using x-ray images and deep convolutional neural networks. arXiv preprint arXiv:2003.10849
(2020)
10. Salman, F.M., et al.: Covid-19 detection using artificial intelligence (2020)
11. [Link]
844e-4e8246751706
12. Ozturk, T., et al.: Automated detection of Covid-19 cases using deep neural networks with
X-ray images. Comput. Biol. Med. 121, 103792 (2020)
13. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recog-
nition. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego,
7–9 May 2015 (2015)
14. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of IEEE Conference on
Computer Vision and Pattern Recognition (2015)
15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
16. Rajpurkar, P., et al.: CheXNet: radiologist-level pneumonia detection on chest X-rays
with deep learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern
Recognition (2017)
Machine Learning Based Prediction of H1N1
and Seasonal Flu Vaccination

Srividya Inampudi1(B) , Greshma Johnson2 , Jay Jhaveri3 , S. Niranjan4 ,

Kuldeep Chaurasia5 , and Mayank Dixit5
1 Fr. Conceicao Rodrigues Institute of Technology, Vashi, Navi Mumbai, India
srivi2k@[Link]
2 Saintgits College of Engineering, Kottayam, India
3 Vivekanand Education Society’s Institute of Technology, Mumbai, India
4 Sri Ramakrishna Engineering College, Coimbatore, India
5 School of Engineering and Applied Sciences, Bennett University, Delhi, India

Abstract. The H1N1 Flu that came into existence in 2009 had a great impact on
the lives of people around the world. It was a life-threatening season to hundreds
of people mainly below 65 years old which eventually made the World Health
Organization (WHO) to declare it as the greatest pandemic in more than 40 years.
To find out the vaccination status National 2009 H1N1 Flu Survey (NHFS) was
conducted in U.S. In this paper, the data from the above survey was used to
develop a model that predicts how likely people got H1N1 and seasonal flu vaccine.
For this purpose, various Machine Learning (ML) and Artificial Neural Network
(ANN) models are used to determine the probability of person receiving H1N1
and Seasonal Flu vaccine.

Keywords: Machine learning · H1N1 · Seasonal flu · SVM

1 Introduction
H1N1 or swine flu virus first emerged in 2009, spring season in Mexico and then in
the United States and quickly spread across the globe. A distinctive combination or
integration of influenza genes was discovered in this novel H1N1 virus which was not
identified prior in humans or animals [1]. This contagious novel virus had a very powerful
impact on the whole world and spread across the world like a forest fire and as a result on
June 11 2009 the World Health Organization (WHO) declared that a pandemic of 2009
H1N1 flu or swine flu had begun [2]. The effects of this novel H1N1 virus were more
severe on people below the age of 65. There was significantly high pediatric mortality,
and higher rate of hospitalizations for young adults and children [3].
According to Centres for Disease Control and Prevention (CDC) the first and fore-
most step in protecting oneself of this virus is a yearly flu vaccination [4]. There are
various factors such as age, health perceptions of an individual and the similarities or
“match” in the vaccine’s virus structure and the virus structure which is affecting the
community which affects the ability of the vaccination to provide protection to the per-
son who is vaccinated [5]. Several activities were performed using various social media

D. Garg et al. (Eds.): IACC 2020, CCIS 1367, pp. 139–150, 2021.
[Link]
140 S. Inampudi et al.

platforms and broadcasting networks such as Twitter was used to track the levels of dis-
ease activity and the concern of the public towards this pandemic [6]. The social media
played an important role to assess the sentiments towards vaccination and the implica-
tions for disease dynamics and control [7] etc. The popular among them is the phone
survey conducted by the U.S. where they asked respondents whether they had received
the H1N1 and seasonal flu vaccines, in conjunction with questions about themselves.
In the present study, we used the data obtained from the National 2009 H1N1 Flu
Survey (NHFS) to predict how likely people got H1N1 and seasonal flu vaccines. The
NHFS data is used for estimating the probability of a person receiving H1N1 and Sea-
sonal Flu vaccine using various Machine Learning (ML) and Artificial Neural Network
(ANN) models. The performance of various ML and ANN techniques are also dis-
cussed. In Sect. 2 literature review is presented. Section 3 discusses the data resource
i.e. NHFS survey and Sect. 4 presents the methodology used. Section 5 discusses the
results obtained and Sects. 6 and 7 presents conclusion and future research scope.

2 Literature Review
Mabrouk et al. [8] “A chaotic study on pandemic and classical (H1N1) using EIIP
sequence indicators”, states that the methods such as moment invariants, correlation
dimension, and largest Lyapunov exponent which were used to detect H1N1 indicated
the differences between the pandemic and classical influenza virus. Chinh et al. [9] “A
possible mutation that enables the H1N1 influenza A virus to escape antibody recogni-
tion” explained the methods such as phylogenetic analysis of pandemic strains, molec-
ular docking for the predicted epitopes. Huang et al. [10], “Aptamer-modified CNTFET
(Carbon NanoTube Field Effect Transistors) biosensor for detecting H1N1 virus in a
droplet,” suggested the combination immersed in nanotube which gives CNTFET and
thus it acts as a biosensor which is used in the detection of H1N1 virus by droplet.
M. S. Ünlü [11], “Optical interference for multiplexed, label-free, and dynamic
biosensing: Protein, DNA and single virus detection,” described interferometric
reflectance imaging sensor which can be used for label-free, high throughput, high
sensitivity and dynamic detection and gives detection of H1N1 virus and nanoparticles
and Kamikawa et al. [12] “Pandemic influenza detection by electrically active magnetic
nanoparticles and surface plasmon resonance” indicated that the detection consists of
several processes such as nanoparticle synthesis, glycans, polyaniline, and sensor modifi-
cation by means to find H1N1 by nanoparticle and resonance. Jerald et al. [13], “Influenza
virus vaccine efficacy based on conserved sequence alignment,” spoke about the vital
strain sequence used from National Center for Biotechnology Information (NCBI) and
sequence alignment which helps vaccine efficiency for influenza.
Chrysostomou, et al. [14] “Signal-processing-based bioinformatics approach for the
identification of influenza A virus subtypes in Neuraminidase genes” discussed the
methods used for identification of influenza virus such as neuraminidase genes, sig-
nal processing, F-score, Support Vector Machines (SVM) and Wiriyachaiporn et al.
[15] “Rapid influenza an antigen detection using carbon nano string as the label for
lateral flow immune chromatographic assay,” presented preparation of allantoic fluid
infected with influenza A virus conjugation of Central Nervous System (CNS) to anti-
body and about the evaluation of CBNS-MAb using Lateral Flow Immunoassay (LFIA)
Machine Learning Based Prediction of H1N1 and Seasonal Flu Vaccination 141

and Ma et al. [16], “An integrated passive microfluidic device for rapid detection of
influenza a (H1N1) virus by reverse transcription loop-mediated isothermal amplifica-
tion (RT-LAMP)” demonstrated the loading of virus and magnetic beads and discussed
about virus capture, collection of virus-magnetic beads complexes, removal of excessive
wastes, virus particle lysis, RT-LAMP reaction and the coloration steps to detect H1N1
virus.
Nieto-Chaupis, Huber. [17]. “Face To Face with Next Flu Pandemic with a Wiener-
Series-Based Machine Learning: Fast Decisions to Tackle Rapid Spread” explained
about the Wiener model used in order to increase optimization, efficiency and perfor-
mance to find the spread of seasonal flu and Stalder et al. [18] “Tracking the flu pandemic
by monitoring the social web” related the retrieving data from Twitter and official health
reports provides inexpensive and timely information about the epidemic and Motoyama
et al. [19] “Predicting Flu Trends using Twitter Data” demonstrated the use of SNEFT
model and twitter crawler methods for predicting the flu using twitter data.
Wong et al. [20] “Diagnosis of Response Behavioural Patterns Towards the Risk
of Pandemic Flu Influenza A (H1N1) of Urban Community Based on Rasch Measure-
ment Model” presented the source of data and data analysis methodology used for the
response behavioral patterns towards H1N1 and Bao et al. [21] “Influenza-A Circulation
in Vietnam through data analysis of Hemagglutinin entries” provided NCBI influenza
virus resource datasets (2001–2012) which is used for the analysis of influenza virus and
Hu et al. [22], “Computational Study of Interdependence Between Hemagglutinin and
Neuraminidase of Pandemic 2009 H1N1” explained sequence data and informational
spectrum model.

3 Data Resources
Data is one of the most important and vital aspect of any research study. The National
Flue Survey (NFS) is being conducted since 2010–11 influenza season [23]. The data
for our study is obtained from the National 2009 H1N1 Flu Survey (NHFS) which
was carried out for Centres for Disease Control and Prevention (CDC). The main aim
of the survey was to monitor and evaluate H1N1 flu vaccination efforts among adults
and children. The survey was conducted through telephones, twitter and with the help
of various other electronic media in all the 50 states. The survey consists of national
random digit dialed telephone survey based on rolling weekly sample of landline and
cellular telephone contacted to identify residential households. Various questions about
flu related behaviors, opinions about flu vaccine’s safety and effectiveness, medical
history like recent respiratory illness and pneumococcal vaccination status were asked
apart from the major question about H1N1 and seasonal flu vaccination status. The
NHFS data was collected during Oct., 2009 to May, 2010. This data was obtained to
get a fair idea about the knowledge of people on the effectiveness and safety of flu
vaccines and to learn why some people refrained from getting vaccinated against the
H1N1 flu and seasonal flu. Huge amount of data was gathered through this survey which
is being commonly used for analysis and research purposes and the data also measures
the number of children and adults nationwide who have received vaccinations.
142 S. Inampudi et al.

4 Methodology
A methodology is proposed to determine the probability that a person will receive H1N1
and seasonal Flu vaccination based on many parameters. The data obtained from the
National 2009 H1N1 Flu Survey (NHFS) contains 3 CSV files namely the training set
features, the training set labels, and the test set features. The data has been obtained from
over 53000 people from which around 26000 observations have been considered for the
training set and the rest have been considered for the testing set.
We have considered various methodologies and compared different Machine Learn-
ing and Artificial Neural Network models to predict the probability. The Machine Learn-
ing algorithms such as Multiple Linear regression, Support Vector Regression, Ran-
dom Forest Regression and Logistic Regression were used. The system architecture of
Machine Learning model is presented in Fig. 1.

Fig. 1. System architecture of machine learning model

Artificial Neural Network (ANN) with different optimizers such as Adam, RMSprop,
SGD were used to predict the probability of the test set features. The system architecture
of ANN is presented in Fig. 2.

4.1 Taking Care of Missing Data

The missing data values in the dataset were imputed by univariate feature imputation
using the most frequent strategy (statistics) with the help of the Simple Imputer class from
the [Link] module. This Simple Imputer class provides fundamental strategies
for assigning some value for the missing values in the columns of the dataset [24, 25].

4.2 Encoding Categorical Data

For encoding categorical data the columns are transformed separately and then the fea-
tures generated during this transformation of columns are concatenated to a single fea-
ture space and this process is carried out with the help of Column Transformer class of
[Link] module. Heterogeneous or columnar data is most benefited from this
Machine Learning Based Prediction of H1N1 and Seasonal Flu Vaccination 143

Fig. 2. System architecture of artificial neural network

method as several feature extraction mechanisms can be combined together or it gets

transformed into a one transformer [24]. Then the categorical data is encoded using the
One Hot Encoder class of the [Link] module. The features are encoded
using an encoding scheme. A binary column is created in this method for every category.
A sparse matrix or a dense array is returned depending on the sparse parameter [25].

4.3 Splitting the Dataset

The training set features and training set labels have been split into training set (80%)
and testing set (20%) using train_test_split from sklearn.model_selection. This library
splits the dataset into training and testing sets.

4.4 Hyperparameter Tuning

Hyperparameter tuning is done to find the most optimal parameter for the model on which
the model gives the best results. We have used various Hyperparameter tuning methods
such as GridseacrchCV, RandomSearchCV for our machine learning models to obtain
better results. K fold cross Validation method has been used to tune hyperparameters for
the Artificial Neural Network.

4.5 Training the Model on the Dataset

An open-source Python library scikit-learn is used which uses a unified interface to

implement many machine learning, preprocessing, cross-validation, and visualization
algorithms [26]. Scikit-learn can be used for both supervised and unsupervised learning,
using a consistent, task-oriented interface [24].
In scikit-learn all supervised estimators implement a fit(X, y) method to fit the model
i.e. to train a model and a predict(X) is a method that gives predicted labels y for given
unlabelled observations X.
144 S. Inampudi et al.

4.6 Predicting the Results

Given the training model, we have predicted the label output for the test set features
using the predict function of the model. Probabilities of the labels of test set features
have also been predicted using the predict_proba method. In this predict_proba method,
the highest probability is returned [24, 27].

4.7 Evaluation of the Results

There are various evaluation methods available to measure the performance and the qual-
ity of the prediction made by the model such as roc_auc_score(), r2_score(), Confusion
Matrix, etc.
In our implementation, we have used the roc_auc_score() method from the
[Link] library. roc_auc_score metric is essentially defined for binary classi-
fication tasks. In this by default the positive class labeled is 1 and only the positive
labelled class is evaluated. Then the roc_auc_socre function computes the area under
the curve by this the curve information can be summarized in one number which is
denoted as AUC or AUROC [24, 27].

5 Results and Discussion

5.1 H1N1 and Seasonal Flu Vaccination
In the default models the best performing method on the dataset has been the Artificial
Neural Network method with 2 hidden layers and activation function being selu and the
optimizer being Stochastic Gradient Descent (SGD) optimizer. The sigmoid function
is used in the output layer for activation function. The accuracy obtained with ANN is
shown in Table 1. Other Machine Learning algorithms have also yielded comparatively
good results except logistic regression which has been the worst performing model
with accuracy less than 70% in both H1N1 flu and seasonal flu vaccination prediction.
Comparison of all the methods used during implementation are presented in Table 1.
The Results are also plotted using the ROC AUC curve. In Figs. 3, 4, 5 and 6 we
can observe the performance of various models on the dataset and it can be concluded
that Artificial Neural Network method has performed the best with accuracy over 82%
in H1N1 flu vaccination prediction and 86% in Seasonal flu vaccination prediction.

5.2 H1N1 and Seasonal Flu Vaccination in ML Models with Hyperparameter

Tuning
To obtain better results tuning of parameters has been carried out. Various methods such
as GridSearchCV, RandomSearchCV, kfold method were used for hyperparameter tuning
of the Machine Learning models. It is learnt that Support Vector Machine with Radial
Basis Function (RBF) kernel and C:20 using the GridSearchCV method yields the best
result for H1N1 vaccination prediction as shown in Table 2. The optimal parameters for
Machine Learning Based Prediction of H1N1 and Seasonal Flu Vaccination 145

Table 1. Results for H1N1 flu and Seasonal flu vaccination prediction

Model H1N1 score Seasonal flu score Parameters

svm 0.8085 0.8596 {‘kernel’:‘rbf’}
random_forest 0.8154 0.8494 {‘n_estimators’: 100}
logistic_regression 0.6792 0.5949 {max_iter=1000}
ANN 0.8257 0.8601 {‘1st hidden layer: ‘units=60,
activation=‘selu” ‘2nd hidden layer:
‘units=30, activation=‘selu” output
layer: ‘units=1, activation=‘sigmoid”
‘optimizer=‘SGD’,
loss=‘binary_crossentropy”}

Fig. 3. ROC AUC Curve using Support Vector Machine: RBF Kernel for (a) h1n1 vaccine and
(b) seasonal flu vaccine

Fig. 4. ROC AUC Curve using Random Forest Regressor for (a) h1n1 vaccine and (b) seasonal
flu vaccine
146 S. Inampudi et al.

Fig. 5. ROC AUC Curve using Logistic Regression for (a) h1n1 vaccine and (b) seasonal flu
vaccine

Fig. 6. ROC AUC Curve using Artificial Neural Network for (a) h1n1 vaccine and (b) seasonal
flu vaccine

random forest regression are training the model with ‘10’ n_estimators, and the optimal
parameters for logistic regression is C:5. All these results are presented in tabulated
form in Table 2 and Table 3. It is observed that the results of Seasonal flu vaccination
prediction have not been upto the mark using hyperparameter tuning, they were better
predicted using the default models.
Machine Learning Based Prediction of H1N1 and Seasonal Flu Vaccination 147

Table 2. Results with Hyperparameter tuning (GridSearchCV) for H1N1 flu vaccination
prediction

Model H1N1 score Seasonal flu score Parameters

svm 0.8397 0.7836 {‘C’:20, ‘Kernel’: ‘rbf’}
random_forest 0.8213 0.7504 {‘n_estimators’: 10}
logistic_regression 0.8363 0.7799 {‘C’: 5}

Table 3. Results with Hyperparameter tuning (RandomSearchCV) for H1N1 flu vaccination
prediction

Model H1N1 score Seasonal flu score Parameters

svm 0.8367 0.7836 {‘Kernel’: ‘rbf’, ‘C’:20}
random_forest 0.8205 0.7467 {‘n_estimators’: 10}
logistic_regression 0.8363 0.7799 {‘C’:5}

5.3 H1N1 and Seasonal Flu Vaccination in Artificial Neural Network

with Hyperparameter Tuning

Kfold method is used to fine tune hyperparameters in the Artificial Neural Network
method. The obtained results are more or less equal to the default method but a marginal
increase in performance is noted which can be clearly seen in Table 4. The most optimal
parameters obtained for ANN with kfold method are 1st hidden layer with selu as
activation function and having 60 units, the 2nd hidden layer with selu as activation
function and having 3 units, and the output layer with sigmoid as activation function and
having 2 units. All the results are presented in Table 4.

Table 4. Results with Hyperparameter tuning (kfold method) for H1N1 flu and Seasonal
vaccination prediction

Model Best score Parameters

ANN 0.8323 {‘1st hidden layer: ‘units=60, activation=‘selu” ‘2nd hidden layer:
‘units=3, activation=‘selu” output layer: ‘units=2, activation=‘sigmoid”
‘n_splits’=10, ‘shuffle’=‘True’, ‘optimizer=‘SGD’,
loss=‘binary_crossentropy”}
148 S. Inampudi et al.

6 Conclusion
In this paper, prediction of H1N1 and seasonal flu vaccination are carried out using the
data source given by the National 2009 H1N1 flu survey (NHFS) for Center of Disease
Control and Prevention (CDC). Various ML and ANN models are used for predition of
H1N1 and Seasonal Flu vaccination. The model studies are improved using several tech-
niques such as taking care of missing data, encoding categorical data, hyperparameter
tuning and splitting of data set for training and testing purposes. The results obtained
from various models are compared and evaluated. The results indicated that prediction of
H1N1 vaccination is done best by the help of SVM model with RBF kernel with the help
of hyperparameter tuning using GridSearchCV which yielded an accuracy of 83.97%
and seasonal flu vaccination prediction is done best with Artificial Neural Network which
yielded an accuracy of 86.10%.

7 Future Research Scope

Although we have achieved promising results, our study has many limitations. Mainly,
the use of Twitter for data collection is not uniform across time and geography. Due to
this inconsistency in the data our model’s performance may vary and suffer. The accuracy
difference can be noticed among the regional level and national level pertaining to the fact
that people of the same region usually tend to have the same behavioural aspects hence
lack of proper data could tamper the implementation. In future with advancements in
technology the quality and the quantity of data could increase which could result in better
performance and analysis of the issue. More information about the seasons, especially
non-pandemic seasons could be very helpful for analysis of this project. In future we
also look forward to exploring more Machine Learning algorithms, methods and deep
learning techniques to obtain more optimal results.

Acknowledgement. The work presented in this paper is carried out as part of Internship project at
Bennett University, Noida, India. Success of our Internship Project involving such high technical
proficiency requires patience and massive support of guides. We take this opportunity to express
our gratitude to those who have been instrumental in the successful completion of this work. Big
thanks to Dr. Madhushi Verma for all the encouragement, timely details and guidelines given to
our team. We would also like to thank Dr. Deepak Garg, HOD of Computer Science Engineering
Department and Dr. Sudhir Chandra, Dean, School of Engineering & Applied Sciences, Bennett
University for giving us the opportunity and the environment to learn and grow.

References
1. CDC. [Link] Accessed 21
June 2020
2. CDC. [Link] Accessed 22 May 2020
3. CDC. [Link] Accessed 22 May
2020
4. CDC. [Link] Accessed 22 May 2020
Machine Learning Based Prediction of H1N1 and Seasonal Flu Vaccination 149

5. CDC. [Link] Accessed 22 May 2020

6. Signorini, A., Segre, A.M., Polgreen, P.M.: The use of Twitter to track levels of disease
activity and public concern in the U.S. during the influenza A H1N1 pandemic. PLoS ONE
6(5), e19467 (2011)
7. Marcel, S., Khandelwal, S.: Assessing vaccination sentiments with online social media: impli-
cations for infectious disease dynamics and control. PLoS Comput. Biol. 7(10), e1002199
(2011)
8. Mabrouk, M.S., Marzouk, S.Y.: A chaotic study on pandemic and classical (H1N1) using
EIIP sequence indicators. In: Proceedings of 2nd International Conference on Computer
Technology and Development, Cairo, pp. 218–221 (2010)
9. Chinh, T.T.S., Stephanus, D.S., Kwoh, C., Schönbach, C., Li, X.: A possible mutation that
enables H1N1 influenza a virus to escape antibody recognition. In: Proceedings of IEEE
International Conference on Bioinformatics and Biomedicine (BIBM), Hong Kong, pp. 81–84
(2010)
10. Huang, J., Lin, T., Chang, W., Hsieh, W: Aptamer-modified CNTFET biosensor for detecting
H1N1 virus in droplet. In: Proceedings of 4th IEEE International NanoElectronics Conference,
Tao-Yuan, pp. 1–2 (2011)
11. Ünlü, M.S.: Optical interference for multiplexed, label-free, and dynamic biosensing: Protein,
DNA and single virus detection. In: Proceedings of XXXth URSI General Assembly and
Scientific Symposium, Istanbul, pp. 1–2 (2011)
12. Kamikawa, T., et al.: Pandemic influenza detection by electrically active magnetic nanopar-
ticles and surface plasmon resonance, nanotechnology. IEEE Trans. Nanotechnol. 11, 88–96
(2012)
13. Baby Jerald, A., Gopalakrishnan Nair, T.R.: Influenza virus vaccine efficacy based on con-
served sequence alignment. In: Proceedings of International Conference on Biomedical
Engineering (ICoBE), Penang, pp. 327–329 (2012)
14. Chrysostomou, C., Seker, H.: Signal-processing-based bioinformatics approach for the iden-
tification of influenza A virus subtypes in Neuraminidase genes. In: Proceedings of 35th
Annual International Conference of the IEEE Engineering in Medicine and Biology Society
(EMBC), pp. 3066–3069 (2013)
15. Wiriyachaiporn, N., Sirikett, H., Dharakul, T.: Rapid influenza a antigen detection using
carbon nanostrings as label for lateral flow immunochromatographic assay. In: Proceedings
of 13th IEEE International Conference on Nanotechnology (IEEE-NANO 2013), Beijing,
pp. 166–169 (2013)
16. Ma, Y., Chang, W., Wang, C., Ma, H., Huang P., Lee, G.: An integrated passive microfluidic
device for rapid detection of influenza a (H1N1) virus by reverse transcription loop-mediated
isothermal amplification (RT-LAMP). In: Proceedings of 19th International Conference on
Solid-State Sensors, Actuators and Microsystems (TRANSDUCERS), Kaohsiung, pp. 722–
725 (2017)
17. Nieto-Chaupis, H.: Face to face with next flu pandemic with a wiener-series-based machine
learning: fast decisions to tackle rapid spread, pp. 0654–0658 (2019). [Link]
CCWC.2019.8666474
18. Stalder, F., Hirsh, J.: Open source intelligence. First Monday 7(6), 416 (2002)
19. Motoyama, M., Meeder, B., Levchenko, K., Voelker, G.M., Savage, S.: Measuring online
service availability using Twitter. In: Proceedings of Workshop on online social networks,
Boston, Massachusetts, USA (2010)
20. Wong, L.P., Sam, I.C.: Behavioral responses to the influenza A (H1N1) outbreak in Malaysia.
J. Commun. Health 34, 23–31 (2011)
21. Bao, Y., et al.: The influenza virus resource at the national center for biotechnology
information. J. Virol. 82(2), 596–601 (2008)
150 S. Inampudi et al.

22. Hu, W.: Molecular features of highly pathogenic Avian and Human H5N1 Influenza a viruses
in Asia. Comput. Mol. Biosci. 2(2), 45–59 (2012)
23. Smith, P.J., Wood, D., Darden, P.M.: Highlights of historical events leading to national surveil-
lance of vaccination coverage in the United States. Public Health Rep. 126(Suppl 2), 3–12
(2011)
24. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12,
2825–2830 (2012)
25. Buitinck, L., et al.: API design for machine learning software: experiences from the scikit-learn
project (2013)
26. Dubosson, F., Bromuri, S., Schumacher, M.: A python framework for exhaustive machine
learning algorithms and features evaluations. In: Proceedings of IEEE 30th International
Conference on Advanced Information Networking and Applications (AINA), Crans-Montana,
pp. 987–993 (2016)
27. Virtanen, P., Gommers, R., Oliphant, T.E., et al.: SciPy 1.0: fundamental algorithms for
scientific computing in Python. Nat Methods 17, 261–272 (2020)
A Model for Heart Disease Prediction
Using Feature Selection with Deep
Learning

Vaishali Baviskar1 , Madhushi Verma2(B) , and Pradeep Chatterjee3

1
GHRIET, Pune, India
[Link]@[Link]
2
Bennett University, Greater Noida, India
[Link]@[Link]
3
Tata Motors, Pune, India
pchats2000@[Link]

Abstract. The heart disease is considered as the most widespread dis-

ease. It is challenging for most of the physicians to diagnose at an
early stage to avoid the risk of death rate. The main objective of this
study involves the prediction of heart disease by using efficient tech-
niques based on feature selection and classification. For feature selection,
the enhanced genetic algorithm (GA) and particle swarm optimization
(PSO) have been implemented. For classification, the recurrent neural
network (RNN) and long short term memory (LSTM) has been imple-
mented in this study. The data set used is the Cleveland heart disease
data set available on UCI machine learning repository, and the perfor-
mance of the proposed techniques has been evaluated by using various
metrics like accuracy, precision, recall and f-measure. Finally, the results
thus obtained have been compared with the existing models in terms
of accuracy. It has been observed that LSTM when combined with PSO
showed an accuracy of 93.5% whereas the best known existing model had
an accuracy of 93.33%. Therefore, the proposed approach can be applied
in the medical field for accurate heart disease prediction.

Keywords: Heart disease prediction · Deep neural network · Long

short term memory · Recurrent neural network · Genetic algorithm ·
Particle swarm optimization

1 Introduction

In the research ﬁeld, heart disease has created a lot of serious concerns, and
the signiﬁcant challenge is accurate detection or prediction at an early stage to
minimize the risk of death. According to World Health Organization (WHO) [1],
the medical professionals have predicted only 67% of heart diseases correctly and
hence there exists a vast research scope in the area of heart disease prediction.
A lot of technicalities and parameters are involved in predicting the diseases
c Springer Nature Singapore Pte Ltd. 2021
D. Garg et al. (Eds.): IACC 2020, CCIS 1367, pp. 151–168, 2021.
[Link]
152 V. Baviskar et al.

accurately. Various machine learning, deep learning algorithms and several opti-
mization techniques have been used to predict the heart-disease risk. All these
techniques mainly focus on the higher accuracy which shows the importance of
correct prediction of heart disease. It would be helpful for the doctors to predict
the heart disease at an early stage and save millions of life from death [2]. For
temporal sequenced data, the recurrent neural network (RNN) models are best
suited and for sequenced features, several variants have been chosen. In vari-
ous sequence based tasks like language modelling, handwriting recognition, and
for other such as tasks, long short term memory (LSTM) has been used, which
shows an impressive performance [3,4]. For better performance, evolutionary
algorithms (EAs) are used for model optimization. The evolutionary algorithm
related to self-adaptability based on population is very useful in case of feature
selection and extraction. The EAs used in the recent year include ant colony
optimization (ACO), particle swarm optimization (PSO) and genetic algorithm
(GA). The GA is considered as a stochastic method for optimization and global
search, which is very helpful in handling the medical data. The possible solu-
tions are obtained from set of individuals using GA. GA which are generally
used to create solutions with a better quality for global search and optimiza-
tion are based on the mutation, crossover and selection operators. The PSO-a
meta heuristic algorithm is considered in this study due to its simplicity and
ease implementation. It uses only few parameters and required few numbers of
parameters tuning. The PSO exhibits the information sharing mechanism and
population-based methods, and hence it extended from single to multi-objective
optimization. It has been successfully applied in the medical field for heart dis-
ease prediction and recorded good performances [5,6]. The main contribution of
this study involves,
– Improve the accuracy of the prediction of heart disease in human using effi-
cient feature selection and classification methods.
– Implementing the GA and PSO for efficient feature selection.
– Implementing the RNN and LSTM to improve an accuracy for heart disease
prediction.
– Compared performance of the proposed method with the existing techniques
in terms of an accuracy, precision, recall and f-measure.
The remaining organization of the paper is as follows: Sect. 2 includes the
literature survey of the existing research work related to feature selection tech-
niques and deep learning classification methods for heart disease prediction.
Section 3 discusses the implementation process of the GA and PSO optimiza-
tion algorithm and LSTM and RNN classification. Section 4 discusses the per-
formance analysis of the proposed work. The conclusion has been presented in
Sect. 5.

2 Related Work
In [7], researchers proved that optimization algorithms are necessary for an eﬃ-
cient heart disease diagnosis and also for their level estimation. They used sup-
A Model for Heart Disease Prediction Using Feature Selection with DL 153

port vector machine (SVM) and generated an optimization function using the
GA for the selection of more substantial features to identify the heart disease.
The data set used in this research is a Cleveland heart disease database. G. T.
Reddy et al. developed an adaptive GA with fuzzy logic design (AGAFL) in
[8], which in turn helps the medical practitioners for heart disease diagnose at
an early stage. Using the hybrid AGAFL classifier, the heart disease has been
predicted, and this research has been performed on UCI heart disease data sets.
For diagnosing the coronary artery disease, usually angiography method is used,
but it shows significant side effects and highly expensive. The alternative modal-
ities have been found by using the data mining and machine learning techniques
stated in [9], where the coronary artery disease diagnosis is done with the more
accurate hybrid techniques with increased performance of neural network and
used GA to enhance its accuracy. For this research work, Z-Alizadeh Sani data
set is used and yields above 90% values in specificity, accuracy and sensitivity.
In [10], researchers proposed trained recurrent fuzzy neural network (RFNN)
based on GA for heart disease prediction. The data set named UCI Cleveland
heart disease is used. From the testing set, 97.78% accuracy has been resulted.
For large data related to health diagnosis, the machine learning has been con-
sidered as an effective support system. Generally to analyze this kind of massive
data more execution time and resources have required. Effective feature selec-
tion algorithm has been proposed by J. Vijayashree et al. in [11] to identify the
significant features which contribute more in disease diagnosis. Hence to identify
the best solution in reduced time the PSO has been implemented. The PSO
also removes the redundant and irrelevant features in addition to selecting the
important features in the given data set. Novel fitness function for PSO has
been designed in this work using the support vector machine (SVM) to solve
the optimal weight selection issue for velocity and position of particle’s upda-
tion. Finally, the optimization algorithms show the merit of handling the difficult
non-linear problems with adaptability and flexibility. To improve the heart dis-
ease classification quality, the Fast correlation based feature selection namely
(FCBF) method used in [12] by Y. Khourdifi et al. to enhance the classification
of heart disease and also filter the redundant feature. The classification based
on SVM, random forest, MLP, K-Nearest neighbor, the artificial neural network
optimized using the PSO mixed with an ant colony optimization (ACO) tech-
niques, have been applied on heart disease data set. It resulted in robustness
and efficacy by processing the heart disease classification. By using data min-
ing and artificial intelligence, the heart disease has been predicted but for lesser
time and cost in [13], which focused on PSO and neural network feed forward
back propagation method by using the feature ranking on the disease’s effective
factors presented in Cleveland clinical database. After evaluating the selected
features, the result shows that the proposed classified methods resulted in best
accuracy. In [14], for the risk prediction of diseases, machine learning algorithm
plays a major role. The prediction accuracy influenced by attribute selection in
the data set. The performance metric of Mathew’s correlation co-efficient has
been considered. For attribute selection performance, the altered PSO has been
154 V. Baviskar et al.

applied. N. S. R. Pillai et al. in [15] using the deep RNNs the language model like
technique demonstrated to predict high-risk diagnosis patients (prognosis pre-
diction) named as PP-RNNs. Several RNNs used by this proposed PP-RNN for
learning from the patient’s diagnosis code to predict the high risk disease exis-
tences and achieved a higher accuracy. In [16], M. S. Islam et al. suggested grey
wolf optimization algorithm (GWO) combined with RNN, which has been used
for predicting medical disease. The irrelevant and redundant attributes removed
by feature selection using GWO. The feature dimensionality problem avoided
by RNN classifier in which different diseases have been predicted. In this study,
UCI data sets used and enhanced an accuracy in disease prediction obtained
from Cleveland data set. From the structured and unstructured medical data,
deep learning techniques exhibited the hidden data. In [17], researchers used
the LSTM for predicting the cardio vascular disease (CVD) risk factors, and
it generally yields better Mathew’s correlation co-efficient (MCC) as 0.90 and
accuracy as 95% compared with the existing methods. Compared with other sta-
tistical machine learning algorithms, the LSTM based proposed module shows
best performance in the CVD risk factors’ prediction. Based on novel LSTM
deep learning method in [18], helped in predicting the heart failure at an early
stage. Compared with general methods like SVM, logistic regression, MLP and
KNN, the proposed LSTM method shows superior performance. Due to mental
anxiety also CVD occurs, which may increase in COVID-19 lock down period.
In [19], researchers proposed an automated tool which has used RNN for health
care assistance system. From previous health records of patients for detecting the
cardiac problems, the stacked bi-directional LSTM layer has been used. Cardiac
troubles predicted with 93.22% accuracy from the obtained experimental results.
In [21], Senthilkumar Mohan et al. proposed a hybrid machine learning technique
for an effective prediction of heart disease. A new method which finds major
features to improve the accuracy in the cardiovascular prediction with differ-
ent feature’s combinations and several known classification techniques. Machine
learning techniques were used in this work to process raw data and provided
a new and novel discernment towards heart disease. The challenges are seen in
existing studies exhibited as,
– In the medical field, the challenging requirement is, training data in a large
amount is necessary to avoid the over-fitting issue. Towards the majority
samples, predictions are biased if the data set is imbalanced and hence over-
fitting occurs.
– Through the tuning of hyper parameters such as activation functions, learning
rates and network architecture, the deep learning algorithms are optimized.
However, the hyper-parameters selection is a long process as several values
are interdependent, and multiple trials are required.
– Significant memory and computational resources are required for timely com-
pletion assurance. Also, need to improve an accuracy of Cleveland heart dis-
ease data set using deep learning with feature selection techniques.
A Model for Heart Disease Prediction Using Feature Selection with DL 155

3 Methodology
The main purpose of this study is to predict the heart disease in human. The
proposed workflow is shown in Fig. 1, which starts with the collection of dataset,
data pre-processing, implementing the PSO and GA significantly for feature
selection and for classification, RNN and LSTM classifiers used. At last, the
proposed model is evaluated with respect to accuracy, precision, recall and f-
measure. This section describes the workflow of the proposed study.

Fig. 1. Heart rate prediction proposed ﬂow with RNN and LSTM classiﬁcation
156 V. Baviskar et al.

3.1 Preprocessing the Data

The data set contains 14 attributes and 303 records, where 8 are categorical
and 6 are numeric attributes. Attribute of a class is zero for normal and one for
having heart disease. Also, 6 records had been missing values. For pre-processing
the two strategies is followed. By using features mean value; the missing values
are replaced. Further the string values are changed into numerical categorical
values. After ﬁltering out such records, 297 were complete records available for
the heart disease prediction.

3.2 Enhanced GA and PSO for Feature Selection

Genetic Algorithm (GA)

The GA is considered as the stochastic search algorithm that imitates natural
evolutionary process using the operators which imposed upon the population.
The GA algorithm used two major operators such as crossover operator and
mutation operator. For individual’s mating in parent population, the crossover
operator is used whereas the characteristics of individual’s randomly changed
and diverse offspring resulted for the mutation operator. In the following algo-
rithm 1, the offspring made systematic replacement of the generated parents.
In nature the crossover of single point symmetric and through bit flipping the
mutation is achieved. The expression of a minimization problem is,

|sf |
f it = αE(C) + β (1)
|Af |
where, E(C) is the classiﬁer’s error rate, sf is the selected feature subset
length and available features total count is the Af , the parameters used to con-
trol feature reduction and classiﬁcation accuracy weights β is 1 − α and α ∈
[0,1].

Selection
It selected a portion of population for next-generation breed. Based on the mea-
sured ﬁtness values using Eq. (1) the selection is generated.

Crossover
For further breeding, randomly selected two parents from the previously selected
pool. Until the suitable population size reached, the process is continued. At only
one point, the crossover taken place and this is the parent solution’s mid-point.
The crossover probability parameter is probc which controls the crossover fre-
quency.

Mutation
Selected the random solutions from the chosen candidates for breeding and on
these, the bit ﬂipping has been carried out. A diverse group of solutions arise,
which keeps various characteristics of their parents. The mutation probability
parameter is P robm which controls the mutation’s frequency.
A Model for Heart Disease Prediction Using Feature Selection with DL 157

Table 1. Algorithm 1

1. Set the initial population-size value N, nsite, P robc , P robm , maxit

2. Initializing the population at random by xi = (xi1 , xi2 ...xiD )
3. Assess fitness of every solution using Eq. (1)
4. prevfit ← fit
5. set counter=0
6. calculate fitness fitsort=sort(fit(x))
NT op
7. xval = 2
(f itsort)
8. for j in range(0,N/2)do
9. ij = f loor( N2 ∗ random) + 1
10. ik = f loor( N2 ∗ random) + 1
11. if (P robc > random) then
12. [xnewval (ij), xnewval (ik)] = crossover(xval (ij), xval (ik))
13. Assess fitness functions for each of the new solutions using f (xi ) Eq. (1)
14. if (fit < prevfit) then
15. x = xnewval
16. prevfit=fit
17. end if
18. end if
19. if (P robm > random) then
20. select nsite no of random sites to mutate using
21. il = f loor( N2 ∗ random) + 1
22. xnewval (il) = mutatexval (il)
23. Assess fitness of every solution
24. if (fit < prevfit) then
25. x = xnewval
26. prevfit=fit
27. end if
28. end if
29. x= Combine xval and xnewval
30. end for
31. [f itbest , i] = min(f it)
32. Solbest = x(i)
33. counter= counter+1
34. until(counter< maxit )
35. Return Solbest

GA algorithm 1 from Table 1 is described as, at ﬁrst initialize the population

size N, nsite, P robc , maxit values. Then for each solution, initialize the popu-
lation randomly as xi = (xi1 , xi2 ...xiD ). The following calculations are repeated
158 V. Baviskar et al.

until the ending criteria is seen, i) evaluate the ﬁtness value using f (xi ) ii) breed-
ing population selected as xval = N T op (f itsort) iii) Taken random value and its
2
higher than P robc , random sample mutation from xval is taken iv) update the
enhanced new solution with existing solution v) Taken random value and its
higher than P robm , random sample mutation from xval is taken vi) update the
enhanced new solution with existing solution vii) combination of xval and xnewval
generated and it is considered a new solution and ﬁnally global best solution is
produced considered as best found solution.

Particle Swarm Optimization (PSO)

The PSO is a population-based search technique derived from the exchange of
information through birds. At ﬁrst in PSO, initialize the particles’ random pop-
ulation and based on their other particles interchange, these particles are moved
with certain velocity in the population. At each iteration out of all particles, the
personal, best and global best achieved and all the particles of velocity updated
based on this information. To the personal, best and global, the weights are given
by certain variables. The following algorithm 2 used speciﬁed transfer function
type k is used for alteration of endless value to binary value, which is a substi-
tute to basic hyperbolic tangent function. By the dimensional vector D, every
particle is represented, and with every individual value, which is being binary
are initialized randomly,

xi = (xi1 , xi2 ...xiD ) ∈ As (2)

Where As is the search space which is available by dimensional vector D, the

velocity vi is represented and initialized to 0.

vi = (vi1 , vi2 ...viD ) ∈ As (3)

By each particle retained, the best personal position recorded as,

pi = (pi1 , pi2 ...piD ) (4)

From Table 2, the PSO algorithm described as, at first the swarm size values
N, acceleration constant Ac1 , Ac2 , wmax , wmin , vmax , maxit are initialized. As in
Eq. (2) and Eq. (3), the population is randomly initialized and velocity vectors
are initialized respectively. The following calculations are repeated until the end-
ing criterion is seen, i) inertia weight value w is updated, ii) using f (xi ) the each
solution’s fitness value is updated, iii) assigned the personal-best solution pbest
and gbest as global test solution, iv) the velocity of each particle is formulated
with respect to each iteration c, v) using the transfer function k, the continuous
values are mapped into binary values and generate the new solutions. Finally,
the global best is produced as best found solution.
LSTM and RNN for Classification
A classification technique to predict the heart disease using the RNN and LSTM
model is developed. The LSTM model is proposed at first by Hochreiter et al. in
1997 considered as special RNN model [20]. The RNN is a catch up to the current
A Model for Heart Disease Prediction Using Feature Selection with DL 159

Table 2. Algorithm 2

1. Set the initial swarm-size value N, Acceleration constant Ac1 , Ac2 ,

2. wmax , wmin , vmax , maxit
3. Initialize population at random value x using Eq. (2) for every solution
and v is the velocity vectors as dimensional D as 0 vectors as per in Eq. (3)
4. counter← c
5. set c=0
6. w = wmax − c( wmax −wmin
maxit
)
7. The ﬁtness function for every solution is evaluated using f (xi ) Eq. (1)
8. and assign the value for pbest and gbest
9. for i in range(1,N) do
10. vic+1 = wvic+1 + Ac1 rand1 (pbest (c) − xci ) + Ac2 rand2 (gbest − xci )
11. end for
12. update velocity v of particles
13. for i in range(1,N) do
14. for j in range(1,D) do
15. If v(Ij) > vmax then
16. v(i, j) = vmax
17. end if
18. If (v(I, j) < −vmax )
19. v(i, j) = −vmax
20. end If
1
21. k = vmax 1+e( −v(i,j)
22. If (rand < k) then
23. x (I,j)=1
24. Else
25.x(I, j) = 0
26. end If
27. end for
28. end for
29. c=c+1
30. Until c < maxit
31. return gbest

hidden layer state to previous n-level hidden layer state to obtain the long-term
memory. Basis of RNN network, the LSTM layers are added to valve node, which
overcomes the RNN long term memory evaluation problems. Generally, LSTM
includes three gates to original RNN network such as an input gate, forget gate
and an output gate. The LSTM design key vision is to integrate data-dependent
controls and non-linear to RNN cell is trained and assures that the objective
160 V. Baviskar et al.

Table 3. RNN and LSTM Speciﬁcation

Simple RNN layer (100 units)

LSTM layer (100 units)
Activation Layer - Softmax
Optimizer - Adam
Epoch - 100
Batch Size - 4

function gradient does not vanish based on the state signal. The specification of
RNN and LSTM shown in Table 3.
GA and PSO algorithms with LSTM deep learning model are shown in Fig. 2
and Fig. 3. Here, GA and PSO are used as feature selection algorithms and
LSTM is used as classifier to classify the patients into normal and abnormal
class. Selected features are given as an input to classifier. The details of features
selected are given in Table 6.

4 Results and Discussion

4.1 Dataset Description
In this proposed study, the Cleveland heart disease data set described is available
on the UCI machine learning repository. This data aset contains 303 instances,
in which six instances exhibits missing attributes and 297 instances exhibit no
missing data. In its original form, the data set contains 76 raw features. From
Table 4, the experiments followed only 13 features. These instances have no
missing values used in the proposed experiments.

Table 4. Dataset features description [22]

Feature no. Feature description Feature abbreviations Features code

1 Age (Years) Numeric AGE F1
2 Sex SEX F2
3 Chest Pain Type CPT F3
4 Resting Blood Pressure RBP F4
5 Serum Cholesterol SCH F5
6 Fasting Blood Sugar FBS F6
7 Resting Electrcardiograpic Results RES F7
8 Maximum Heart Rate Achieved MHR F8
9 Exercise Indiced Angina EIA F9
10 Old Peak OPK F10
11 Peak Exercise Slope PES F11
12 Number of major vessels colored VCA F12
13 Thallium Scan THA F13
A Model for Heart Disease Prediction Using Feature Selection with DL 161

Fig. 2. GA with LSTM work ﬂow

4.2 Performance Metrics

The proposed predictive model results are evaluated by performance metrics

such as accuracy, precision, recall and f-measure. The formulations of all the
metrics are,
162 V. Baviskar et al.

Fig. 3. PSO with LSTM work ﬂow

Accuracy: The correctly classiﬁed in test data set shows in percentage values
are termed as accuracy. The accuracy can be calculated based on the formula
given in Eq. (5),
A Model for Heart Disease Prediction Using Feature Selection with DL 163

TP + TN
Accuracy = (5)
TP + TN + FP + FN
Precision: While the correctly classiﬁed subjects showed by precision value.
Precision is calculated by using the formula given in Eq. (6),
TP
P recision = (6)
TP + FP
Recall: A recall is the proportion of related instances that have been recovered.
Therefore, both accuracy and recall are based on an understanding of signiﬁcance
and measurement. It is estimated by the formula given in Eq. (7),
TP
Recall = (7)
TP + FN
F-measure: The method of F1 score is referred to as the harmonious mean of
accuracy and recall. This can be computed with the aid of the formula given in
Eq. (8),
2 ∗ P recision ∗ Reall
F 1Score = (8)
P recision + Recall

4.3 Comparison of Results

The following ﬁgures representing the performance metrics of the proposed
method with respect to feature selection by GA and PSO and classiﬁcation
using RNN and LSTM shown below. Here, data set is split into 30% testing and
70% training. Out of 303, randomly 212 records have taken for training, and
61 records have taken for testing and predicted for normal (class 0 - negative
class having no heart disease) and abnormal (class 1 - positive class having heart
disease) class of heart disease.

Fig. 4. Performance evaluation with an accuracy of model

From Fig. 4, it shows the results of the performance metric of accuracy of deep
learning models, RNN and LSTM with and without feature selection algorithms
164 V. Baviskar et al.

of GA and PSO. Here, all six models are compared and LSTM + PSO shows
better accuracy of 93.5%. Out of 61 records tested, 57 predicted accurately where
25 records are from normal class, and 32 records are from abnormal class. Also,
LSTM gives an accuracy in less time compared to RNN as shown in Table 5.

Table 5. Performance in time

Methods Accuracy Time

RNN 88.52 3.4672741889953613 s
LSTM 86.88 1.3537867069244385 s

Fig. 5. Performance evaluation with precision of model

From Fig. 5, it shows the results of the performance metric of precision of

deep learning models, RNN and LSTM with and without feature selection algo-
rithms of GA and PSO. Correctly classiﬁed of positive class patients’ percentage
accuracy is shown. Here, all six models are compared and LSTM + PSO shows
better performance of 94%. Out of 61 records tested, 34 records are predicted
for abnormal class where 32 records are accurately predicted, and two records
are from normal class but predicted wrongly as abnormal class.
From Fig. 6, it shows the results of the performance metric of recall of deep
learning models, RNN and LSTM with and without feature selection algorithms
of GA and PSO. Here, all six models are compared, and PSO shows better
performance of 94% for RNN and LSTM classiﬁer. Out of 61 records tested, 34
records are from abnormal class where 32 records are accurately predicted, and
2 are wrongly predicted as normal class.
From Fig. 7, it shows the results of the performance metric of f-measure
of deep learning models, RNN and LSTM with and without feature selection
algorithms of GA and PSO. Here, all six models are compared and LSTM +
PSO shows better performance of 94%. It shows an average of precision and
recall.
A Model for Heart Disease Prediction Using Feature Selection with DL 165

Fig. 6. Performance evaluation with recall of model

Fig. 7. Performance evaluation with F-measure of model

Table 6. Features accuracy

Methods Selected feature Feature count Accuracy Time

PSO 1110101011001 8 0.918033 92.51282
GA 1111111111010 11 0.901639 47.31271

From Table 6, the proposed method evaluation shows the PSO, and GA
selected features. For PSO, the selected features’ count is 8 and shows an accu-
racy level as 91% and takes more time. While the GA selected features’ count
is 11 and shows an accuracy level as 90% and takes lesser time compared with
PSO. However, in terms of accuracy, the PSO shows better performance com-
pared with GA.
From proposed Fig. 8, the evaluation performance for RNN is shown for GA
and PSO features selected algorithms. It shows that, RNN with PSO shows
the better performance compared to RNN with GA and without any feature
selection. Also, accuracy is increased by 3% using PSO algorithm.
From proposed Fig. 9, the evaluation performance for LSTM is shown for GA
and PSO features selected algorithms. It shows that, LSTM with PSO shows
the better performance compared to LSTM with GA and without any feature
selection. Also, accuracy is increased by 7% using PSO algorithm.
166 V. Baviskar et al.

Fig. 8. Proposed method performance of RNN evaluation (With GA and PSO)

Fig. 9. Proposed method performance of LSTM evaluation (With GA and PSO)

Table 7. Comparison of Existing methods with Proposed method

Methods Accuracy
DNN + χ2 Statistical model [22] (K-fold)91.57
2
DNN + χ Statistical model [22] (holdout) 93.33
RNN+GA ( Proposed method) 90
RNN+ PSO ( Proposed method) 92
LSTM+GA ( Proposed method) 90
LSTM+ PSO ( Proposed method) 93.5

From Table 7, it shows that by compared with the existing method the pro-
posed method with LSTM + PSO shows higher accuracy for predicting the heart
disease.

5 Conclusion

In this study, the eﬃcient diagnosis approach has been developed for accurate
prediction of heart disease. The proposed approach used enhanced GA and PSO
for optimized feature selection from the heart disease data set. Further, the
A Model for Heart Disease Prediction Using Feature Selection with DL 167

classiﬁcation has been achieved by using deep learning models such as RNN
and LSTM. The proposed model has been evaluated using the accuracy, preci-
sion, recall and f-measure performance metrics. The obtained results show that
the proposed method which implements LSTM with PSO yields an accuracy of
93.5% and slightly higher computational time due to the feature selection phase
but leads to an accurate prediction of heart disease as compared to the existing
methods. For other performance metrics like precision, recall and f-measure also
LSTM + PSO shows better performance. In the future, it may be considered for
enhancing the performance of the proposed model.

References
1. Kirubha, V., Priya, S.M.: Survey on data mining algorithms in disease prediction.
Int. J. Comput. Trends Tech. 38, 124–128 (2016)
2. Sharma, H., Rizvi, M.: Prediction of heart disease using machine learning algo-
rithms: a survey. Int. J. Recent Innov. Trends Comput. Commun. 5, 99–104 (2017)
3. Choi, E., Schuetz, A., Stewart, W.F., Sun, J.: Using recurrent neural network
models for early detection of heart failure onset. J. Am. Med. Inform. Assoc. 24,
361–370 (2017)
4. Jin, B., Che, C., Liu, Z., Zhang, S., Yin, X., Wei, X.: Predicting the risk of heart
failure with EHR sequential data modelling. IEEE Access 6, 9256–9261 (2018)
5. Salem, T.: Study and analysis of prediction model for heart disease: an optimization
approach using genetic algorithm. Int. J. Pure Appl. Math. 119, 5323–5336 (2018)<