Sign Language Detection
Sign Language Detection
BACHELOR OF TECHNOLOGY IN
ELECTRONICS AND COMMUNICATION ENGINEERING
This is to certify that the major project report entitled, “Sign Language” submitted by “Dipanshu
Shukla(9916102138), Lakshya Sharma(9916102030), Sagar Kapoor(9916102028)” in partial fulfillment of
the requirements for the award of Bachelor of Technology Degree in Electronics and Communication
Engineering of the Jaypee Institute of Information Technology, Noida is an authentic work carried out by
them under my supervision and guidance. The matter embodied in this report is original and has not been
submitted for the award of any other degree.
Signature of Supervisor
Dr. Satyendra Kumar
ECE Department
Jaypee Institute of Information Technology, NOIDA
May 2020
ii
DECLARATION
We hereby declare that this written submission represents our own ideas in our own words and where other’s
ideas or words have been included, have adequately cited and referenced the original sources. We also
declare that we have adhered to all principles of academic honesty and integrity and have not misrepresented
or fabricated or falsified any idea/data/fact/source in our submission.
Place:
Date:
iii
ABSTRACT
Sign language is commonly used in hand movements by people with hearing loss to communicate easily.
However, it is incredibly difficult for non-sign-language speakers to communicate with others with speech or
hearing disability, as interpreters are not always readily accessible. Most nations have their own sign languages,
such as American Sign Language (ASL) primarily used in the United States and Canada. The program suggested
allows non-sign language speakers to understand expressions in the American Sign Language.
In this research project SURF (Speed Up Robust Feature) is used to detect and compare the efficiency of various
algorithms, such as SVM and Naive Bayes for ASL gestures. First of all, the signals are filmed with a webcam.
The first step is to process the image of the input and mask the skin. Blade detection is then used to identify the
hand bottom. If this is achieved, the SURF function detection is used, and the image is then labeled using the
SVM and Naive Bayes algorithms.
We then used a profound learning model CNN to predict the input movement and accuracy.
iv
ACKNOWLEDGEMENT
We would like to place on record our deep sense of gratitude to Dr. Satyendra Kumar for their
generous guidance, help and useful suggestions we would like to thank Jaypee Institute of Information
Technology, Noida, for their invaluable guidance and assistance, without which the accomplishment of the task
would have never been possible. We also thank them for giving this opportunity to explore the real world and
realize the interrelation of theoretical concepts and its practical application. We also wish to extend my thanks
to other classmates for their insightful comments and constructive suggestions to improve the quality of this
project work.
v
INDEX
Contents
CERTIFICATE ............................................................................................................................................... ii
ABSTRACT .................................................................................................................................................. iv
ACKNOWLEDGEMENT .............................................................................................................................. v
INDEX .......................................................................................................................................................... vi
TABLE OF FIGURE ................................................................................................................................... viii
CHAPTER 1 ................................................................................................................................................... 1
INTRODUCTION .......................................................................................................................................... 1
1.1 Social Media Influence ..................................................................................................................... 1
1.2 Overview ............................................................................................................................................... 2
CHAPTER 2 ................................................................................................................................................... 3
LITERATURE SURVEY ............................................................................................................................... 3
2.1 Discovering and Analyzing Important Real-Time Trends in Noisy Twitter Stream ................................ 3
2.2 Techniques for sentiment analysis of Twitter data.................................................................................. 3
2.3 A Mobile Application of ASL Translation via Image Processing Algorithms ......................................... 4
2.4 Sign Language Recognition using Convolutional Neural Networks ....................................................... 5
CHAPTER 3 ................................................................................................................................................... 6
REQUIREMENT ANALYSIS ........................................................................................................................ 6
3.1 Minimum System Requirements – ......................................................................................................... 6
3.2 Software Requirements - ....................................................................................................................... 6
3.3 Functional Requirements – .................................................................................................................... 7
CHAPTER 4 ................................................................................................................................................... 8
ALGORITHMS USED ................................................................................................................................... 8
4.1 Bag of Words Model ............................................................................................................................. 8
4.2 Support Vector Machine ........................................................................................................................ 8
4.3 Naïve Bayes Algorithm ......................................................................................................................... 8
4.4 CNN Algorithm ..................................................................................................................................... 8
CHAPTER 5 ................................................................................................................................................... 9
DETAILED DESIGN ..................................................................................................................................... 9
5.1 Convolution........................................................................................................................................... 9
5.2 Maxpooling ........................................................................................................................................... 9
5.3 Flattening .............................................................................................................................................. 9
5.4 Full Connection ................................................................................................................................... 10
CHAPTER 6 ................................................................................................................................................. 12
IMPLEMENTATION ................................................................................................................................... 12
6.1 Twitter Podcast .................................................................................................................................... 12
vi
6.1.1 Data Retrieval ............................................................................................................................... 12
6.1.2 Data Processing............................................................................................................................. 12
6.1.3 Data Filtering ................................................................................................................................ 13
6.2 Sign Language ..................................................................................................................................... 14
6.2.1 Fetching the data ........................................................................................................................... 14
CHAPTER 7 ................................................................................................................................................. 16
RESULTS .................................................................................................................................................... 16
7.1 Sentimental Analysis: .......................................................................................................................... 16
7.2 Sign Language Detection: .................................................................................................................... 17
CONCLUSION ............................................................................................................................................ 19
FUTURE WORK ......................................................................................................................................... 20
REFERENCES ............................................................................................................................................. 21
vii
TABLE OF FIGURE
viii
CHAPTER 1
INTRODUCTION
Our introduction includes overviews of two of our projects namely Twitter Podcast and Sign Language. The
first section contains the brief of sentimental analysis and the second section contains the brief of sign
languages and gestures.
Social networking and microblogging sites, such as Facebook and Twitter, are fast paced in spreading
encapsulated news and trend topics around the world. A subject becomes a phenomenon as more and more users
add their thoughts and viewpoints to make them a reliable source of online perception. In general, these topics
tend to raise awareness or encourage public figures, political campaigns, product supports and entertainment
such as movies and award shows. The scope for the identification and study of fascinating trends from the
endless social media data is immense. The analysis of sentiment is the prediction in a phrase, paragraph, or
corpus of documents of emotions. This will serve as an application to recognize the beliefs, opinions and
emotions reflected in an online post. It is precisely a concept of classifying talks into positive, negative or neutral
marks. Most people use social media networks to network with other people and keep up with news and current
events. These sites (Twitter, Facebook, Instagram, Google+) offer people a forum to share their views. This
type of knowledge is the basis on which people assess, assess the performance of any film, product and know
whether it is good or not. This sort of comprehensive knowledge can be used in marketing and social studies on
these pages. Sentiment analysis therefore has wide applications, including emotional undermining, polarity,
grouping and analysis of impact. Twitter is an online networking site with 140 restricted message tweets. The
character limit therefore enhances the use of hashtags for the classification of text. Roughly 6,500 tweets are
currently released per second, resulting in about 561,6 million tweets a day. Such tweet streams are usually
noisy representing multi-theme knowledge that shifts attitudes in an unfiltered and unstructured format [1].
1
1.2 Overview
Signed language is a method used by the deaf in a general description for communication purposes. This is a
three-dimensional language based on visual gestures and moveable signs which categorize letters and words.
Management recognition was often fearful and adhered to both academic and demonstrative standards of the
person.
The use of signing is not restricted to people with impaired hearing or speech communicating with each other
or non-signed-language speakers and is also viewed as a popular form of communication. Within this project,
we plan to analyze and classify different alphabets from the sign image database. The database consists of a set
of pictures, with a different hand orientation each image clicked under various light conditions. We are ready
to train our program to good standards with such a divergent collection of data and thus obtain good
performance.
The mathematical representation of a human action by a computer system is gesture recognition. Gesture is a
sign of physical or emotional actions. It includes the movement of the body and the hand. It is divided in two
categories: static behavior and dynamic motion. For the former, the stance of the body or the hand movement
is a sign. Of the latter, the body or hand movement conveys certain signals. Gesture is also used as a contact
device between machine and human beings. It is quite different from standard hardware-based approaches and
can communicate with people by understanding gestures. The identification of gestures defines the user's
purpose through the importance of the gesture or motion of the body or parts.
We used the CNN machine-learning model to predict sign language movements in this project [2].
2
CHAPTER 2
LITERATURE SURVEY
Apart from the technical studies for building the main projects, the planning session required a lot of knowledge
about the sentimental analysis and image recognition. We had to read an insight about the different algorithms
to reach to the best applicable algorithm and technique.
3
The method taken by algorithms to extract and track features made their work distinct from the rest of the
literature. After pre-processing, the extraction of the feature included the development of n grams along with
POS-taggers for negation and the improvement of classification accuracy. For further study and calculation,
two algorithms were chosen – Peoples Rank Algorithm inspired by Google's Page Rank Algorithm. The
main concept behind this algorithm is the meaning of People Rank, the greater its significance on twitter
with regard to followers, retweets and comments. The more important the node in the list. The other
algorithm is Twitter Rank, an extension to page rank to assess user power by considering the similarities
between users and node structures, i.e. other users with whom they are related. We discussed and established
the Page Rank weaknesses. The impact steps are taken in the light of the fact that popular / influential people
are following you and functioning as media for broadcasting those things. After some mathematical
calculation of the following ratios, retweets, parameters such as, the weights are calculated, and a
mathematical form is finally derived for monitoring the impact of a certain individual. The methodology
they suggested will affect Twitter personalities / entities and can be used for advertising and branding
purposes [4].
4
2.4 Sign Language Recognition using Convolutional Neural Networks
Authors: [Link] & [Link]
This paper deals with the understanding of Italian sign language. A recognition framework is used with the
Microsoft Kinect, CNN and GPU acceleration. Instead of building complex hand-made applications, CNNs
are ready to automate the feature building process. The software has been able to reliably interpret 20 Italian
gestures. The predictive model will generalize 91.7 per cent accuracy to users and situations that do not
occur during training. This research demonstrates that convolutionary neural networks mostly help to
correctly classify various signs of a logo, with users and environments not included. This capacity
generalization of CNNs in spatiotime data can contribute to the wider field of research on the automatic
recognition of sign language [6].
5
CHAPTER 3
REQUIREMENT ANALYSIS
Many of the software used for the analysis of the data like Anaconda, Jupyter, Spyder require a certain set of
configurations in a laptop to achieve optimum functionality. Below is a table which shows the exact requirement
of the configuration for these softwares to function properly.
Recommended Configuration
Particulars
RAM 4 GB or more*
Table 1: Requirements
6
3.3 Functional Requirements -
The machine should be able to capture the image on the camera mounted.
The project should be able to differentiate between different classification mode used in compliance
with their accuracy.
Device should be able to eliminate noise from the images and detect image edges.
The software should be able to recognize the movement of a hand.
7
CHAPTER 4
ALGORITHMS USED
Algorithms play a vital role in determing the accuracy and efficiency of the data. The algorithms that were
used in Twitter Podcast and Sign Language are Bag of Words Model, Support Vector Machine, Naïve Bayes
algorithm, CNN.
A Naive Bayes classification is a simple probabilistic classification based on the theorem of Bayes that
contains strong (naive) assumptions about independence. For the underlying probability model, a more concise
word would be "independent function model." Maximum Entropy Classifications are also used as alternatives
to the Naive Bayes Classification. This needs no statistical independence of the predictor features.
8
CHAPTER 5
DETAILED DESIGN
Segmentation of the first image is finished. It is achieved by skin masking, which determines the threshold in the
RGB scheme and converts RGB color space to gray image. Finally, the technology of Canny Edge is used to
identify and distinguish sharp discontinuities in an image and thus to identify the edges of the object in focus.
The technique Speed Up Robust Function (SURF) is used to extract descriptors from segmented images in hand
gestures. The SURF descriptors extracted from each image are the same number (64). Function bag (BoF) is used
to describe the visual vocabulary's histogram features rather than the proposed features. Once we get the model
function pocket, we will predict results for new raw images in order to test our model.
We then implemented the CNN profound learning model. The CNN phases are as follows:
5.1 Convolution
Convolution is the first layer of CNN to derive features from an image input. Because we focus on images here
that basically consist of two-dimensional arrays, we use Convolution 2-D. In using the "Convolution2D"
function, we have introduced a convolution layer. This function contains 4 arguments, first the number of filters
(32 here), the second argument is the shape of every filter i.e. 3x3 here, and the third is the input form and type
of image(RGB or Black & White)in each pixel i.e. the input image that our CNN takes is 64x64 resolution.
5.2 Maxpooling
We use a Maxpooling feature to construct this neural network, there are different types of pooling operations like
Min Pooling, Mean Pooling, etc. The maximum value pixel of the respective region of interest is needed here in
MaxPooling. The primary goal of a pooling technique is to reduce the image size to the limit.
5.3 Flattening
The pooled images are converted by flattening into a continuous vector. Applause is a particularly important
move. In general, we use the 2-D array here, i.e. pooled image pixels and transform them to a single one-
dimensional vector.
9
5.4 Full Connection
In this step we have to construct a completely connected layer and to this layer, we will connect the collection of
nodes we got in the flattening step to these fully connected layers as an input layer.
The layer between the input layer and the output layer is present.[7]
Start Webcam
Capture Image
Create Dataset
User
system
Extract Feature
Train dataset
Classification
Display Result
10
Figure 3: Flow Chart
11
CHAPTER 6
IMPLEMENTATION
This chapter will discuss the functioning of both the projects i.e. Twitter Podcast, and Sign Language. Firstly,
we will discuss all the steps involved in sentimental analysis of tweets and later we will discuss the steps
involved in recognition of Sign Language.
Data is accessed through the [Link]/API website, which offers a package for the twitter streaming
of the API in real time. The API requires us to register a Twitter developer account and complete
parameters such as consumerKey, consumerSecret, Tokenaccess, and TokenSecret. With this API, all
random tweets or filter data can be accessed using keywords. Filters help the collection of tweets that
meet the developer 's unique criteria. We have used this to retrieve tweets linked to those keywords taken
as user input. Initially, we set an application name and mode at least. We run the program in local rather
than cluster mode. Then the keyword input array is given to stream the context "ssc" using "sc," where
"sc" is spark context.
Tokenization requires data processing, which splits tweets into individual words called tokens. Tokens
can be separated by means of whitespace or dot characters. Based on the classification model used it
may be unigram or bigram. The word bag model is one of the most commonly used classification models.
It is based on the premise that text is known as a bag or a series of words without ties or
interdependencies. In our project, the easiest way to implement this model is by using unigrams as
features. The text to be categorized is just a set of individual words, so we break every tweet with
whitespace.
12
6.1.3 Data Filtering
A tweet received after data processing has always a portion of raw information in it that can be useful
for our application or may not be useful. By eliminating stopping words, numbers and punctuations
these tweets are further filtered.
Stop Word: For instance, tweets contain quite common words like 'is,' 'am,' 'are' and contain no other
detail. Such terms are meaningless, and this function is implemented using a list that is stored in
[Link]. We compare every word with that list in a tweet and delete the words corresponding to
the stop list.
Delete non-alphabetical characters: Symbols like "# @" and numbers are meaningless for sentiment
analysis and are replaced with matching sequence. Standard words will only fit alphabetical characters
and rest will be ignored. This helps to reduce the twitter stream's clutter.
Stemming: this is the way derived words are reduced to their roots. For examples, words like 'fish'
have the same roots as 'fishing' and 'fishes.' The library used for stemming is Stanford NLP, which
also provides specific algorithms like porter stemming. We did not use a stemming algorithm in our
case because of time constraints.
Sentiment analysis is conducted using a customized algorithm, which defines polarity as below.
Finding polarity: We used a simple algorithm of counting positive and negative words in a tweet for
the discovery of polarity. Specific lists were made for both positive and negative terms. The next move
is to compare each word on both lists in a tweet. When a word in the Positive List matches the current
term, a score of 1 is increased and a negative term is found, then reduced. More positive words lead to
higher feelings. Stanford NLP can therefore be used to predict accurate emotional analysis to predict
it with complex algorithms.
Sentiment Analysis Output: The data contains a real-time list of tweets and the left hand feeling
ranking. Two negative keywords give the first tweet a score of -2. The next two tweets are optimistic
because they have "good" and "wonderful" keywords. These two words are in the list of positive terms.
It should be remembered that if a tweet has a score of 0, then the final performance is ignored. The
problem with neutral tweets is that they do not serve any reason because they do not give the product
any feeling.
13
6.2 Sign Language
14
First, to include the data in our model:
Fig. Fig. 6.2 demonstrates how the model of the classifier matches the results. Steps per epoch contains
the number of pictures that are practiced, i.e. the number of pictures that the set folder contains.
If the image is checked, we must prepare the image to be submitted to the model by converting it to
64x64, as the model can only take that resolution into account. Then we use the predict) (method for the
prediction on our classifier object (see Fig. 6.3)
15
CHAPTER 7
RESULTS
In this chapter we will discuss the about the outcome of our projects. Firstly, we will discuss about the
sentimental analysis of tweets and later we will see the results of sign gestures.
For sentimental analysis, we have picked three events that were the hot topic of the calendar year 2019.
16
7.2 Sign Language Detection:
Gesture Prediction:
17
Figure 12 : Gesture R- Wrong prediction
18
CONCLUSION
In this article, we will talk about the classification of SVM and Naive Bayes for ASL Gesture Recognition. The
accuracy of these models is determined, and their shortcomings addressed. SURF has been used for function
detection and the picture has been labeled with both the SVM and Naive Bayes algorithms. The findings
revealed the exact use of SURF for both versions. We have gained greater accuracy from SVM than from the
Naive Bayes Classifier. In addition, we used a deep learning model CNN for the prediction of the input
movement and our model could reliably predict most behavior.
19
FUTURE WORK
In addition, we would like to extend this project through the introduction of other machine learning algorithms
for applications such as electoral outcomes, product ratings, films and the project running in clusters to improve
its functionality. We also want to make a web application for users to enter keywords and obtain analyzed
results. We have worked with this project only on unigram models but want to expand it to bigram and beyond
to enhance the connection between data and to provide accurate analysis of sentiment. The overall tweet score
can be computed for a single keyword which provides an overall public feeling on a topic. In terms of sign
language, we want to increase the scope of our dataset and pick more movements, signals, indications to enhance
model effectiveness and accuracy.
20
REFERENCES
[1] Alexander Pak, Patrick Paroubek,” Twitter as a Corpus for sentimental analysis and opinion mining,
2010, vol.2”
[2] Akoum, A., & Al Mawla, N. Hand Gesture Recognition Approach for ASL Language Using Hand
Extraction Algorithm. Journal of Software Engineering and Applications, 8(08), 419, 2015, vol2.
[3] Dr. Khalid N. Alhayyan & Dr. Imran Ahmad “Discovering and Analysing Important Real-Time Trends
in Noisy Twitter Stream (IEEE)”, 2014, vol.1
[4] M. Desai and M. Mehta, “Techniques for sentimental analysis of Twitter data”, International Conference
on Computing, Communication and Automation (ICCCA), 2016.
[5] C. M. Omar, & Jaward, M. H. A mobile application of American signing translation via image processing
algorithms. In Region 10 symposium (tensymp), ieee (pp. 104-109). IEEE, 2016.
[6] [Link], [Link]. Sign language recognition using convolutional neural networks. In
Workshop at the ecu Conference on Computer Vision (pp. 572-578). Springer, Cham, 2014.
[7] Akoum, A., & Al Mawla. Hand Gesture Recognition Approach for ASL Language Using Hand
Extraction Algorithm. Journal of Software Engineering and Applications, 8(08), 419, 2015.
21
22