Sign Language Detection Using Deep Learning
Sign Language Detection Using Deep Learning
net/publication/380388215
CITATIONS READS
3 499
5 authors, including:
Pranali Kosamkar
Dr. Vishwanath Karad MIT World Peace University Pune
35 PUBLICATIONS 328 CITATIONS
SEE PROFILE
All content following this page was uploaded by Pranali Kosamkar on 20 September 2024.
Abstract—In our digital age, communication should be Our research is driven by a commitment to advancing
accessible for everyone. Communication is a fundamental assistive technologies and breaking down communication
human right, yet millions of individuals who use sign language barriers. By presenting our findings in this paper, we hope to
face daily challenges in expressing themselves and add valuable insights to the scientific community and inspire
understanding others. Sign Language Detection and Text further developments in the field of accessible
Conversion will help to break down the walls that make communication technologies.
communication difficult for sign language users. This paper
presents an innovative solution for this problem where we used The organization of the paper is: 1st we give an idea about
MobileNetV2, Resnet50 and EfficientNet-B0 algorithm on a the introduction to sign language detection using deep
Kaggle dataset containing American Sign Language (ASL). We learning. 2nd portion presents the literature survey, 3rd
were able to achieve 98.96% accuracy using EfficientNet-B0, division includes the system architecture, 4th section presents
98.33% accuracy with RestNet50 and 98.12% for the result analysis and finally conclusion in 5th section.
MobileNetV2. Models were trained using 55,680 images, 13,920
images were used for validation and 17,400 images for testing II. LITERATURE SURVEY
purpose. TensorFlow framework is implemented for
From the last three decades, several works have been
MobileNetV2, ResNet50, and EfficientNet-B0. The network
consists of several layers like global average pooling, dense
done to handle hand gesture recognition problem. P. S.
layer with 128 units, ReLU activation function with 29 units Neethu et al., has developed hand gesture detection and
and a softmax with 5 epochs. Importantly, we convert these recognition methodology using CNN classification approach
gestures into alphabets, making communication more achieves high performance with accuracy of 94.8%, and
reachable. recognition rate of 96.2%. Dataset images are openly
available in Kawulok et al. (2012). 1600 gesture images from
Keywords— Sign Language, Communication, MobileNetV2, this dataset are utilized as training set in this paper. The
RestNet50, EfficientNet-B0 testing set contains 800 images which represent eight
different gestures [1]. Muneer Hammadi et al., has
I. INTRODUCTION implemented system for dynamic hand gesture recognition
In our rapidly evolving world, inclusive communication using multiple deep learning architectures demonstrated its
is very significant. While technology is advancing, some effectiveness and outperformed state-of-the-art approaches.
barriers affect certain individuals especially those who rely They have used The King Saud University Saudi Sign
on sign language. The Deaf and Hard of Hearing (DHH) Language (KSU-SSL) dataset and achieved accuracy of
population and mute peoples often face challenges in 98.62% [2]. Throughout this article Mehmet Akif Ozdemir et
interacting and communicating with the spoken and written al., has demonstrated the effectiveness of deep learning in
language-centric majority. Communication difficulties can improving the accuracy of EMG-based hand gesture
contribute to social isolation, stigma, and employment recognition systems. The spectrogram images of the
challenges, and these individuals may even face educational segmented sEMG signals, created using Short-Time Fourier
hurdles thereby hindering their academic success. Transform (STFT) was used. The proposed algorithm
achieved a test accuracy of 99.59% [3]. Xiaoguang Yu et al.,
The core objective of our research is to create a has proposed a hand gesture recognition system based on the
technology that detects sign language and converts it into Faster-RCNN deep learning algorithm, which achieves high
text for seamless communication between individuals who recognition accuracy and processing speed. They achieved
depend on it and those who don't. Such sign language accuracy of 99.2% and response time less than 10ms [4].
systems empower deaf and mute individuals to communicate Samer Alashhab et al., presents a new innovative solution for
independently without relying on interpreters or third parties. the detection and localization of hand gestures using Deep
It can also be used in emergency situations to convey critical Learning techniques, specifically Convolutional Neural
information to emergency responders. These systems also Networks (CNNs). They manually collect a dataset of hand
help in the learning process by providing access to gestures using a simple mobile phone and then achieved
educational materials and making it easier to communicate result of Xception, Inception v3, and MobileNetV2 (above
with teachers and peers. 99), while SqueezeNet achieved competitive results
(98.36%) [5]. In [6] Qinglian Yang, Weikang Ding et al., has
2
Authorized licensed use limited to: MIT-World Peace University. Downloaded on September 20,2024 at 03:49:42 UTC from IEEE Xplore. Restrictions apply.
comprising a Deep Convolutional BiLSTM model, achieving C. Feature Extraction
an accuracy of 83.36%. Depthwise separable convolutions and inverted residuals
III. SYSTEM ARCHITECTURE is used for Feature Extraction. Invert residuals efficiently
capture features by lightweight depthwise separable
The system architecture for hand gesture detection using convolutions and linear bottleneck layers. The linear
deep learning involves collecting a diverse dataset, bottleneck consists of 1x1 pointwise convolution for channel
preprocessing images, extracting features with models like expansion and reduction. This help to extract highly efficient
ResNet, MobileNetV2, and EfficientNet, followed by features from the input data.
classification. The models are trained, evaluated, and
optionally combined for improved accuracy. In deployment, D. Resnet50
real-time inference is performed, and a user interface Resnet50 stands for Residual network, it is especially
provides feedback based on detected gestures. Continuous used for the purpose like image processing and classification
improvement includes periodic dataset updates and model in computer vision. Its works by introducing residual blocks
retraining for adaptability. with skip connection. Skip connection helps the network
learn residuals, making it easier to train deeper architectures.
Defined features of Restnet50 are Residual Blocks, Skip
Connections, Deep Architectures and Bottleneck Layers
E. MobileNetV2
MobileNetV2 is a neural network architecture designed
to be lightweight, and made for the development of mobile
and other devices. Essential traits of using MobileNetV2 are
Depthwise Separable Convolutions, Inverted Residuals
followed by Linear Bottlenecks, Global Average Pooling
Fig. 1. Architecture Diagram
(GAP) and shortcut connections.
F. EfficientNet-B0
A. Image Collection
EfficientNet-B0 model, is a cutting-edge lightweight
For the study of sign language detection, we have used CNN architecture known for its stability between
approximately 87,000 images specifically focused on computational efficiency and model performance. Custom
American Sign Language (ASL) signs were collected from layers were used for training Efficient-B0 to handle specific
the Kaggle website [32]. The broad variety of the dataset, task of recognizing gestures. The layers include a global
provided by Kaggle, lays a solid groundwork for developing average pooling layer to reduce spatial parameters, a desnse
models capable of not only detecting complex hand layer with ReLU and a final dense layer with softmax
movements but also converting them into understandable activation which predict hand gesture classes. Adam
text. The collected data acts as a valuable repository, optimizer is used for model compiling and metrics like
enabling the exploration and implementation of creative accuracy, precision, recall and AUC were monitored during
solution for hand gesture detection and text conversion training. The training step was run for 5 epochs and the best
through the powerful lens of deep learning techniques. model was saved on minimizing the validation loss.
Fig. 2. Dataset
B. Image pre-processing
Pre-processing methods like shear range, zoom range,
horizontal flip, and validation split which are a part of data
augmentation were used to enhance the model's adapatability
to variations in the input image. The ImageDataGenerator in
TensorFlow/Keras introduces random applications of these
transformations to the input images, enhancing the dataset's
diversity. GaussianBlur() function was used to remove noise
from the images.
3
Authorized licensed use limited to: MIT-World Peace University. Downloaded on September 20,2024 at 03:49:42 UTC from IEEE Xplore. Restrictions apply.
IV. PERFORMANCE ANALYSIS In fig. 6 graph, we get the training accuracy for ResNet50
Result analysis of models MobileNet, ResNet and as 98.33%.
EfficientNet for classification of sign language is mentioned.
The dataset contains 87,000 images and is divided into three
parts. Training includes 55,680 images, 13,920 images for
validation set and 17,400 images for testing purposes.
4
Authorized licensed use limited to: MIT-World Peace University. Downloaded on September 20,2024 at 03:49:42 UTC from IEEE Xplore. Restrictions apply.
MobileNetV2 is 98%. Models were trained using 55,680
images, 13,920 images were used for validation and 17,400
images for testing purpose. We have used Tensoflow
framework for implementing MobileNetV2, ResNet50, and
EfficientNet-B0. The developed model achieved a high
accuracy of 98.96% on the images obtained from the Kaggle
dataset.
REFERENCES
[1] P. S. R. &. S. Neethu, "An efficient method for human hand gesture
detection and recognition using deep learning convolutional neural
networks," Soft Comput, vol. 24, p. 15239–15248 , 2020.
[2] M. A.-H. e. al., "Deep Learning-Based Approach for Sign Language
Gesture Recognition With Efficient Hand Gesture Representation,"
IEEE, vol. 8, pp. 192527-192542, 2020.
[3] M. K. D. G. O. O. A. a. A. A. Ozdemir, "EMG based hand gesture
recognition using deep learning," Medical Technologies Congress
(TIPTEKNO) IEEE, p. 14, 2020.
[4] X. a. Y. Y. Yu, "Hand Gesture Recognition Based on Faster-RCNN
Deep Learning," J. Comput, vol. 14(2), pp. 101-110, 2019.
[5] A.-J. G. &. M. Á. L. Samer Alashhab, "Hand Gesture Detection with
Convolutional Neural Networks," 15th International Conference, vol.
800, 2019.
[6] W. D. X. Z. D. Z. S. Y. Qinglian Yang, "Leap Motion Hand Gesture
Recognition Based on Deep Neural Network," Chinese Control And
Decision Conference, pp. 2089-2093, 2020.
[7] B. H. &. J. Wang, "Deep Learning Based Hand Gesture Recognition
and UAV Flight Controls," Int. J. Autom. Comput, vol. 17, p. 17–29,
2020.
[8] J. J. G. L. G. e. a. Qi, "Surface EMG hand gesture recognition system
based on PCA and GRNN," Neural Comput & Applic, vol. 32, p.
6343–6351, 2020.
[9] S. S. Sakshi Sharma, "Vision-based hand gesture recognition using
deep learning for the interpretation of sign language," Expert Systems
with Applications, vol. 182, p. 115657, 2021.
[10] T. F. D. a. M. E. Ahmed, "Using YOLOv5 Algorithm to Detect and
Recognize American Sign Language," International Conference on
Information Technology (ICIT), pp. 603-607, 2021.
[11] O. M. ,. A. M. H. a. A. Y. S. Jungpil Shin, "American Sign Language
Alphabet Recognition by Extracting Feature from Hand Pose
Estimation," Sensors, vol. 21, no. 17, p. 5856, 2021.
[12] R. R. e. al., "Wearable Smart Band for American Sign Language
Recognition With Polymer Carbon Nanocomposite-Based Pressure
Sensors," IEEE Sensors Letters, vol. 5, pp. 1-4, 2021.
[13] R. R. Adithya Venugopalan, "Applying deep neural networks for the
automatic recognition of sign language words: A communication aid
to deaf agriculturists," Expert Systems with Applications, vol. 185, p.
115601, 2021.
[14] X. Y. a. J. S. S. Chavan, "Convolutional Neural Network Hand
Gesture Recognition for American Sign Language," IEEE
International Conference on Electro Information Technology (EIT,
pp. 188-192, 2021.
[15] K. J. J. a. S. S. T. Josepha, "Recognition of Hand Signs Based on
Geometrical Features using Machine Learning and Deep Learning
appraoch," Revista Argentina de Clínica Psicológica , vol. 30(3), pp.
175-183, 2021.
[16] F. J. M. C. S. B. M. M. K. M. S. N. S. &. S. S. Shamrat, "Bangla
numerical sign language recognition using convolutional neural
Fig. 8. Comparison graph of training metrics precision, AUC, recall and networks.," Indonesian Journal of Electrical Engineering and
loss for MobileNetV2, ResNet50 and EfficientNetB0 Computer Science, vol. 23(1), pp. 405-413, 2021.
[17] V. S. U. S. T. Shagun Katoch, "Indian Sign Language recognition
V. CONCLUSION system using SURF with SVM and CNN," Array, vol. 14, p. 100141,
2022.
Image classification was carried for improving the [18] Y. A. A. D. K. Y. G. M. M. E. Mohammed Zakariah, "Sign Language
model’s generalization and robustness for variations in the Recognition for Arabic Alphabets Using Transfer Learning
input images. We have used deep learning techniques such as Technique," Computational Intelligence and Neuroscience, 2022.
Resnet50, MobileNetV2 and EfficientNet-B0 to recognize [19] A. E. A. E. ,. O. A.-H. A. Y. Ahmed KASAPBAŞI, "DeepASLR: A
the hand gesture efficiently. We got result as a comparison CNN based human computer interface for American Sign Language
between Resnet50, MobileNetV2 and EfficientNet-B0. The recognition for hearing-impaired individuals," Computer methods and
programs in biomedicine update, vol. 2, p. 100048, 2022.
EfficientNet-B0 model achieved the maximum accuracy of
98.96% while the accuracy achieved by Resnet50 and
5
Authorized licensed use limited to: MIT-World Peace University. Downloaded on September 20,2024 at 03:49:42 UTC from IEEE Xplore. Restrictions apply.
[20] M. T. &. A. O. S. Nigus Kefyalew Tamiru, "Recognition of Amharic [27] R. R. G. F. G. G. Giulia Zanon de Castro, "Automatic translation of
sign language with Amharic alphabet signs using ANN and SVM," sign language with multi-stream 3D CNN and generation of artificial
The Visual Computer, vol. 38, p. 1703–1718, 2022. depth maps," Expert Systems with Applications, vol. 215, p. 119394,
[21] O. a. K. C. Sunusi Bala Abdullahi, "American Sign Language Words 2023.
Recognition of Skeletal Videos Using Processed Video Driven Multi- [28] R. R. ,. J. ,. S. K. ,. H. ,. H. a. N. K. Nasima Begum, "Borno-Net: A
Stacked Deep LSTM," Sensors, vol. 22(4), p. 1406, 2022. Real-Time Bengali Sign-Character Detection and Sentence
[22] Puchuan Tan, Xi Han et al., "Self-Powered Gesture Recognition Generation System Using Quantized Yolov4-Tiny and LSTMs,"
Wristband Enabled by Machine Learning for Full Keyboard and Applied Sciences, vol. 13(9), p. 5219, 2023.
Multicommand Input," Advanced Materials, vol. 34, p. 2200793, [29] M. T. F. S. A. ,. M. A. A. Nehal F. Attia, "Efficient deep learning
2022. models based on tension techniques for sign language recognition,"
[23] C. S. A. L. P.K. Athira, "A Signer Independent Sign Language Intelligent Systems with Applications, vol. 20, p. 200284, 2023.
Recognition with Co-articulation Elimination from Live Videos: An [30] S. E.-K. H. M. B. M. S. E. E. M. H. &. M. M. S. Mostafa Magdy
Indian Scenario," Journal of King Saud University - Computer and Balaha, "A vision-based deep learning approach for independent-
Information Sciences, vol. 34, no. 3, pp. 771-781, 2022. users Arabic sign language interpretation," Multimedia Tools and
[24] M. S. I. N. H. N. N. S. H. W. Sunanda Das, "A hybrid approach for Applications, vol. 82, p. 6807–6826, 2023.
Bangla sign language recognition using deep transfer learning model [31] A. R. Venugopalan, "Applying Hybrid Deep Neural Network for the
with random forest classifier," Expert Systems with Applications, vol. Recognition of Sign Language Words Used by the Deaf COVID-19
213, p. 118914, 2023. Patients," Arab J Sci Eng, vol. 48, p. 1349–1362, 2023.
[25] Z. H. D. Z. Z. a. M. L. Yuejiao Wang, "UltrasonicGS: A Highly [32] A. Nagaraj, "ASL Alphabet," 2018.
Robust Gesture and Sign Language Recognition Method Based on
Ultrasonic Signals," Sensors, vol. 23(4), p. 1790, 2023.
[26] H.-J. K. a. S.-W. Baek, "Application of Wearable Gloves for Assisted
Learning of Sign Language Using Artificial Neural Networks,"
Processes, vol. 11(4), p. 1065, 2023.
6
Authorized licensed use limited to: MIT-World Peace University. Downloaded on September 20,2024 at 03:49:42 UTC from IEEE Xplore. Restrictions apply.