0% found this document useful (0 votes)
90 views29 pages

Real-Time Sign Language Detection System

Sign language detection project report

Uploaded by

hetdarji914
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views29 pages

Real-Time Sign Language Detection System

Sign language detection project report

Uploaded by

hetdarji914
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Sign Language Detection

A PROJECT REPORT

Submitted by

Dave Yaashu Digant


Darji Het Shaileshbhai
Tanishka Gaur
Mehta Aesha Vipulkumar

In fulfillment for the award of the degree


of
BACHELOR OF ENGINEERING
in
Computer Engineering

LDRP Institute of Technology and Research, Gandhinagar

Kadi Sarva Vishwavidyalaya


July, 2024

CE-Department LDRP-ITR
LDRPINSTITUTEOFTECHNOLOGYANDRESEARCH
GANDHINAGAR

CE-IT Department

CERTIFICATE
This is to certify that the Project Work entitled “Sign Langauge Detection” has been carried out
by Dave Yaashu Digant (21BECE30044) under my guidance in fulfillment of the degree of
Bachelor of Engineering in Computer Engineering, Semester-7 of Kadi Sarva Vishwavidyalaya
University during the academic year 2024-2025.

Hitesh Barot Ashish Patel

Internal Guide Head of the Department

LDRP-ITR LDRP-ITR

CE-Department LDRP-ITR
LDRPINSTITUTEOFTECHNOLOGYANDRESEARCH
GANDHINAGAR

CE-IT Department

CERTIFICATE
This is to certify that the Project Work entitled “Sign Langauge Detection” has been carried out
by Darji Het Shaileshbhai(21BECE30040) under my guidance in fulfillment of the degree of
Bachelor of Engineering in Computer Engineering, Semester-7 of Kadi Sarva Vishwavidyalaya
University during the academic year 2024-2025.

Hitesh Barot Ashish Patel

Internal Guide Head of the Department

LDRP-ITR LDRP-ITR

CE-Department LDRP-ITR
LDRPINSTITUTEOFTECHNOLOGYANDRESEARCH
GANDHINAGAR

CE-IT Department

CERTIFICATE
This is to certify that the Project Work entitled “Sign Langauge Detection” has been carried out
by Tanishka Gaur(21BECE30064) under my guidance in fulfillment of the degree of Bachelor of
Engineering in Computer Engineering, Semester-7 of Kadi Sarva Vishwavidyalaya University
during the academic year 2024-2025.

Hitesh Barot Ashish Patel

Internal Guide Head of the Department

LDRP-ITR LDRP-ITR

CE-Department LDRP-ITR
LDRPINSTITUTEOFTECHNOLOGYANDRESEARCH
GANDHINAGAR

CE-IT Department

CERTIFICATE
This is to certify that the Project Work entitled “Sign Langauge Detection” has been carried out
by Mehta Aesha Vipulkumar(21BECE300134) under my guidance in fulfillment of the degree
of Bachelor of Engineering in Computer Engineering, Semester-7 of Kadi Sarva Vishwavidyalaya
University during the academic year 2024-2025.

Hitesh Barot Ashish Patel

Internal Guide Head of the Department

LDRP-ITR LDRP-ITR

CE-Department LDRP-ITR
Project Presenta on - II
1. Name & Signature of Internal Guide

2. Comments from Panel Members

3. Name & Signature of Panel Members


ACKNOWLEDGEMENT
We take this opportunity to express our gratitude and thankfulness towards all those concerned
with our project. Firstly, we are sincerely thankful to Prof. Hitesh Barot for giving us the
opportunity to work on this project under his guidance that proved to be the key to our collective
success in overcoming the challenges that we have faced during the project work.

Our heartfelt thanks to our HOD Ashishkumar Patel for providing us this great opportunity. We
would also like to express our gratitude to our friends and of course, the CE Department of
LDRP-ITR.

I would also like to thank my teammates for their collaboration and support. We worked together
to overcome challenges and to achieve our goals. We shared resources and ideas, and we helped
each other to stay on track. I am grateful for their friendship and support.

Finally, I would like to thank my family and friends for their love and support. They encouraged
me to pursue my goals and to never give up. I could not have completed this project without
them.

Tanishka Gaur(21BECE30064)

Aesha Mehta(21BECE30134)

Yaashu Dave(21BECE30044)

Het Darji(21BECE30040)

LDRP - ITR i
ABSTRACT
This project aims at developing a system for sign language recognition that is capable of
being used in Real-time processing using Computer vision and Deep learning approach. In a
society, communication is very important and to the deaf or the hard of hearing people, sign
language is very vital though there is no adequate ICT that translates the sign language to the
spoken or the written language in real-time. It uses a webcam to capture live video feed and
then scans each frame for movements in hands while those movements are translated to sign
language motions.
The hand landmark detection was done with the help of MediaPipe while the gesture
recognition was performed using a Convolutional Neural Network (CNN) model. For this
purpose the model was developed by feeding it with the labeled dataset of gestures performed
in various sign languages to enable it make correct classifications. The real-time detection
system was then created by implementing the trained model to a video capture interface using
OpenCV that is capable of detecting the gesture and displaying the detected gesture in real-
time.
Using signs other than those used during the training process, the efficacy and capability of
the system in translating the generated sign was evaluated. The human interface part was
especially having a easy and friendly use interface so that the users can interact with the given
system. It has an advantage of immediately pointing to the use of sign language thereby
benefiting the field as a tool that can be of great value in education/communication/training
and especially in the physically disabled.
For the future work it should collect more data for various gestures and further tuning of the
model, and decrease the influence of lighting and backgrounds. This project serves as the
backbone for other implementations of more complex sign language recognition purposes; it
will greatly help the disabled people including the hearing impaired by availing to them the
means of communication and self expression in their daily activities.

LDRP - ITR ii
NO CHAPTER NAME PAGE NO
Acknowledgement i
Abstract ii
List of Figures iii
Table of Contents vi
1 Introduction 1
1.1 Background 1
1.2 Aim and Objective of the work 2
1.3 Literature Review 2
1.3.1 LSTM Architecture 4
1.4 Problem defination 6
1.5 Implementation details 7
1.5.1 Proposed Methodology 7
2 Technology review 11
2.1 Technology Review 11

3 System Diagrams 12
3.1 Design Model 12
3.2 Use Case Diagram 13
3.3 Sequence Diagram 14
4 Evaluation of Model 15

5 Outputs and Run-time Testing 16

6 Conclution 18
7 Bibliography 19

LDRP - ITR iii


LIST OF FIGURES

NO NAME PAGE NO

1 Diagram of LSTM Architecture Activity Diagram 5

2 Diagram of Model Architecture 10

3 Activity Diagram 12

4 Use Case Diagram 13

5 Sequence Diagram 14

6 Confusion Matrix 15

7 Output image 1 16

8 Output image 2 16

9 Output image 3 17

LDRP - ITR iv
1. Introduction
1.1 Background
Sign language recognition has evolved significantly over the years, driven by the need to bridge
communication gaps between sign language users and non-users. Early research focused on
traditional computer vision techniques, such as handcrafted feature extraction and shape
analysis. For instance, Starner and Pentland utilized Hidden Markov Models (HMMs) for
recognizing American Sign Language (ASL) by analyzing hand shapes and movements.

With the advancement of machine learning, techniques like Support Vector Machines (SVMs)
and neural networks became prominent. Ong and Bowden applied SVMs to British Sign
Language gestures, while Cooper et combined neural networks with HMMs for continuous sign
language recognition, highlighting the potential of machine learning in improving gesture
recognition accuracy.

The advent of deep learning marked a significant breakthrough in the field. Convolutional
Neural Networks (CNNs) were employed for static gesture recognition, further advanced the
field by integrating CNNs with Long Short-Term Memory (LSTM) networks to handle
dynamic gestures, achieving enhanced recognition performance through spatiotemporal
modeling.

Building on these advancements, our project aims to develop a real-time sign language
detection system utilizing CNNs and LSTMs, integrating state-of-the-art techniques to achieve
accurate and efficient sign language recognition

1.2 AIM and Objective of work

The primary aim of this project is to develop a real-time sign language detection system that
accurately recognizes and translates hand gestures into text or spoken language. This system is
intended to facilitate communication for the deaf and hard of hearing community, promoting
inclusivity and accessibility by bridging the communication gap between sign language users
and non-users.
 First, it seeks to design and implement a convolutional neural network (CNN)
model capable of accurately recognizing a wide variety of sign language gestures
from video inputs.
 This involves collecting and preprocessing a comprehensive dataset of sign
language gestures, training the CNN model, and fine-tuning its parameters to
achieve high accuracy.
 Additionally, the project aims to integrate this trained model into a real-time sign
detection system, ensuring it can process live video feeds and provide immediate
feedback.
 By achieving these objectives, the project aspires to make a meaningful impact on
the lives of individuals who rely on sign language as their primary means of
communication.

1
1.3 Literature Review

1. Traditional Computer Vision Techniques:


Initially, researchers relied on traditional computer vision techniques to recognize hand
gestures. These methods involved manually extracting features from images or videos of hand
gestures.

 Handcrafted Features: Early systems used features such as hand shapes, positions, and
movements. For example, Starner and Pentland (1995) used Hidden Markov Models
(HMMs) to recognize American Sign Language (ASL) by analyzing these features.

 Contour and Shape Analysis: Another approach was to analyze the contours and shapes
of hands. Triesch and Malsburg (1996) used a method called elastic graph matching to
recognize hand shapes in static images.

2. Machine Learning Approaches:


With the introduction of machine learning, more advanced techniques were developed to
improve the accuracy and robustness of sign language recognition systems.

 Support Vector Machines (SVMs): Researchers like Ong and Bowden (2004) used SVMs
to recognize British Sign Language (BSL) gestures. They extracted features from video
sequences and trained SVM classifiers to recognize different signs.

 Neural Networks: Cooper et al. (2011) combined neural networks with HMMs to
recognize continuous sign language. Neural networks were used to capture spatial features,
while HMMs handled the temporal aspect of gestures.

These machine learning approaches provided better performance compared to traditional


methods, but still faced challenges in handling complex gestures and variations in signing
styles.

3. Advances with Deep Learning:


 Convolutional Neural Networks (CNNs): CNNs have been particularly effective for image
and gesture recognition. Molchanov et al. (2015) used CNNs to recognize static hand gestures
from depth images, achieving high accuracy.

 CNN-LSTM Combination: For dynamic gestures, combining CNNs with Long Short-Term
Memory (LSTM) networks proved to be effective. Pigou et al. (2016) used CNNs to extract
spatial features from video frames and LSTMs to capture the temporal dependencies between
frames, allowing for accurate recognition of continuous gestures.

 3D CNNs: Recent advancements include the use of 3D CNNs, which can capture both spatial
and temporal information from video sequences. This approach, as shown by Jhuang et al.
(2013), improves the ability to recognize complex gestures over time.

Deep learning techniques have significantly enhanced the performance and robustness of sign
language recognition systems, making them more practical for real-world applications.

2
4. Real-Time Sign Language Recognition:
Developing systems that can recognize sign language gestures in real-time has been a key focus
in recent research. Real-time systems offer immediate feedback, making them useful for
practical applications and improving accessibility for sign language users.

 Kinect-Based Systems: Microsoft Kinect has been widely used for real-time gesture
recognition. Liang and Ouhyoung (2012) developed a system using Kinect that captured
depth and skeletal data to recognize ASL gestures in real-time.

 Mobile Applications: Mobile applications have also emerged, leveraging the processing
power of smartphones and cloud computing. Koller et al. (2016) developed a mobile app
that used deep learning models for real-time ASL recognition, providing a convenient tool
for users.

3
1.3.1 Long Short-Term Memory (LSTM) Architecture:

LSTM networks are used in combination with CNNs to handle the temporal aspect of sign
language gestures. While CNNs extract spatial features from individual frames, LSTMs process
these features over time to recognize sequences of gestures, enabling accurate recognition of
dynamic sign language.

1. Cell State and Hidden State:

 Cell State (ct): This is the LSTM's internal memory, which retains information over long
periods. It flows through the network relatively unchanged, except for minor adjustments by
the gates. This helps in preserving information over many time steps.

 Hidden State (ht): This represents the output of the LSTM unit at each time step and is used for
the current time step's prediction or passed to the next unit in the sequence. It encapsulates
short-term information.

2. Input (xt):

 This is the current input data point that is fed into the LSTM unit. It could be a word in a
sentence, a data point in a time series, or any other element in a sequence.

3. Previous States:

 Previous Cell State (ct-1): The cell state from the previous time step carries information
accumulated over past time steps.

 Previous Hidden State (ht-1): The hidden state from the previous time step captures
information relevant to the most recent input and serves as an intermediary for combining new
and past information.

4. Gates:

 Forget Gate: This gate decides which information from the previous cell state should be
discarded. It helps the model forget irrelevant information that is no longer needed for the
current context.

 Input Gate: This gate determines which new information from the current input and the
previous hidden state should be added to the cell state. It controls the extent to which new
information should influence the cell state.

 Output Gate: This gate controls the output of the hidden state. It decides which parts of the
cell state should be output as the new hidden state, effectively determining what information
should be carried forward to the next time step and what should be used for the current
prediction.

4
5. Cell State Update

 This part of the LSTM unit combines the old cell state with the new candidate values . The
combination ensures that the cell state is updated with relevant new information while
discarding outdated information.

6. Hidden State Update

 The new hidden state is calculated based on the updated cell state and the output gate's
modulation. This updated hidden state is what will be used for the current output and passed to
the next LSTM unit in the sequence.

Fig 1.1: Diagram Of LSTM Architecture

5
1.4 Problem Definition
Effective communication is a fundamental human need, and for individuals who are deaf or hard
of hearing, sign language serves as a crucial means of interaction. However, the majority of people
do not understand sign language, leading to significant communication barriers. These barriers can
impede access to essential services, limit social interactions, and create challenges in educational
and professional settings for the deaf and hard of hearing community.
The primary challenge addressed in this project is the development of a system that can accurately
and efficiently recognize and translate sign language gestures into text in real time. Current methods
for sign language recognition often fall short due to various reasons:

1. Complexity of Gestures: Sign language comprises a wide range of gestures, including both
static hand shapes and dynamic movements. Recognizing these gestures requires capturing
intricate spatial and temporal patterns.

2. Variability in Signing Styles: Different individuals may sign differently due to variations in
hand shape, size, signing speed, and style. A robust recognition system must accommodate
these variations to be effective across diverse users.

3. Real-Time Processing: For practical use, especially in real-time communication scenarios, the
system must process gestures quickly and provide immediate feedback. Achieving this requires
efficient algorithms and optimized hardware implementation.

4. Environmental Conditions: The system must perform reliably under varying environmental
conditions, such as changes in lighting, background clutter, and occlusions. Robustness to these
factors is crucial for the system’s practical deployment.

5. User-Friendly Interface: An intuitive and accessible user interface is essential to ensure that
the system can be easily used by individuals with varying levels of technical expertise.

By addressing these challenges, the project seeks to create a tool that enhances communication for
the deaf and hard of hearing community, thereby promoting inclusivity and accessibility. The
ultimate goal is to develop a system that not only accurately recognizes sign language gestures but
also operates efficiently in real-time, providing a seamless and user-friendly experience for all
users.

6
1.5 Implementation details

1.5.1 Proposed Methodology

1. Data Collection:

 Gesture Definition: The first step in data collection is to define the scope of the gestures
that the system will recognize. This project focuses on basic sign language gestures such as
“hello”, “thank you”, “yes”, and “no”. Clearly defining the gestures helps in organizing the
data collection process and ensures that all necessary gestures are included.
 Data Capturing: Data capturing involves recording videos or capturing images of
individuals performing the defined gestures. Multiple instances of each gesture are recorded
to ensure diversity and robustness. videos allows for capturing the dynamic nature of the
gestures and images provides a variety of data formats for the model to learn from.

2. Data Pre Processing:

After capturing the data, preprocessing steps are applied to prepare the data for training.
Preprocessing ensures that the data is in a consistent format and is suitable for input into
the machine learning model. The preprocessing steps include:

 Frame Extraction: If the data is in video format, the videos are converted into individual frames.
This involves extracting a sequence of images from each video file.

 Resizing: The extracted frames or captured images are resized to a consistent size, such as
64x64 pixels. This ensures uniformity and reduces computational complexity.

 Normalization: The pixel values of the images are normalized to a range of [0, 1] by dividing
by 255. Normalization helps in accelerating the training process and achieving better convergence.

 Labeling: Each image or sequence of images is labeled according to the corresponding sign
language gesture. This involves organizing the data into directories named after each gesture or
using annotation tools to assign labels to specific regions in the images.

3. Model Evolution:

The model architecture is designed to effectively capture both spatial and temporal features from
the input video frames. This involves using Long Short-Term Memory (LSTM) networks.

Long Short-Term Memory (LSTM) Networks: LSTMs are a type of recurrent neural network
(RNN) that are capable of learning long-term dependencies and temporal patterns. Since gestures
are dynamic and involve a sequence of movements, LSTMs are used to capture the temporal
relationships between consecutive frames. This helps in accurately recognizing gestures that
involve motion over time.

The combined architecture of LSTM enables the model to extract spatial features from individual
frames and understand the temporal context across a sequence of frames.

The LSTM neural network model you have built and trained is designed to recognize actions from
sequences of keypoints extracted from videos. The architecture of the model includes three LSTM
7
layers followed by three Dense layers, as visualized in the provided layered view image. Below is
a detailed description of the model architecture and the training process.

Model Architecture:

 Input Layer:

The model accepts input sequences with a shape of (30, 1662), where 30 represents the number of
time steps in the sequence and 1662 represents the number of features (keypoints) at each time step.

 First LSTM Layer:

Units: 64
Activation: ReLU
return_sequences=True ensures that the output is a sequence of the same length as the input.

 Second LSTM Layer:

Units: 128
Activation: ReLU
return_sequences=True for passing the sequence to the next LSTM layer.

 Third LSTM Layer:

Units: 64
Activation: ReLU
return_sequences=False as it outputs the final hidden state to the dense layers.

Dense Layers:

 First Dense Layer:

Units: 64
Activation: ReLU

 Second Dense Layer:

Units: 32
Activation: ReLU

 Output Dense Layer:

Units: Number of actions ([Link][0])


Activation: Softmax for multi-class classification.

Model Compilation:

Optimizer: Adam
8
Loss Function: Categorical Crossentropy
Metrics: Categorical Accuracy
Training Process
Epochs: 2000
Callbacks: TensorBoard callback for monitoring training progress.

 Compilation: The model is compiled with an appropriate loss function, optimizer, and
evaluation metric. The categorical cross-entropy loss function is used for multi-class
classification, and the Adam optimizer is chosen for its efficiency and adaptability.

 Training: The dataset is split into training and validation sets. The model is trained on the
training data, with the validation set used to monitor its performance and prevent overfitting.
Data augmentation techniques are applied during training to increase the model's robustness.

 Saving the Model: After training, the model is saved to disk for later use in the real-time
detection system.

4. Real-time Detection System:

 he real-time detection system integrates a webcam to capture video feeds in real-time. Each
captured frame undergoes preprocessing, including resizing and normalization, before being
fed into the trained model.

 The system loads the trained model to predict the gesture for each frame. Post-processing steps
involve applying thresholds to filter out weak predictions and using sliding windows or majority
voting techniques to stabilize the output.

 The detected gesture labels are then overlaid on the video feed, providing real-time visualization
of the recognized gestures. The results are displayed in a window, allowing users to see the
detected gestures in real-time.

4. Performance Evaluation:

 The performance of the sign language detection system is essential to ensure its accuracy and
reliability. The model is assessed using metrics such as accuracy, precision, recall, and F1-
score.

 Testing is conducted in various lighting conditions and backgrounds to ensure the system's
robustness. Additionally, user testing involves members of the deaf and hard-of-hearing
community to provide feedback on the system's effectiveness and usability.

9
Fig 1.2: Diagram Of Model Architecture

10
2. Technology Review

1. Interactive Environments: Google Collab (TPU), Jupiter Notebook

2. Core Language: Python: A general-purpose, high-level language known for its


readability, versatility, and extensive libraries.

3. Key Libraries:

NumPy: Foundation for numerical computing in Python, providing efficient array operations
and mathematical functions.

Pandas: High-performance data analysis and manipulation tool, offering data structures like
Data Frames and Series for handling large datasets.

cv2 (OpenCV): Open-source library for real-time computer vision, image processing, and video
analysis.

Matplotlib: Comprehensive library for creating static, animated, and interactive visualizations
in Python.

Scikit-learn (sklearn): Versatile library for various machine learning algorithms,including


classification, regression, clustering, dimensionality reduction, and model selection.

scikit-image (skimage): Collection of algorithms for image processing, including filtering,


feature extraction, segmentation, and geometric transformations.

Tkinter and PIL Integration: Tkinter, a standard GUI toolkit in Python, coupled with the Python
Imaging Library (PIL), facilitated the development of a user-friendly interface for manual mask
creation.

4. Frameworks: tensorflow, keras

11
3. System Diagram
3.1 Design Model

Fig 3.1: Design model

12
3.2 Usecase Diagram:

A use case diagram is a visual representation of how a user interacts with a system.

Key Actors:

1. User:
➔Initiates the sign language detection by providing image/video forms.
➔ Interprets and utilizes the Sign Language Detection.

2. System:
➔Conducts sign language detection using LSTM model.
➔Capture gesture and extract it from image or video.
➔After extraction it will match with the database that gesture.
➔Presents sign language to the user.

Fig 3.2: Usecase diagram

13
3.3 Sequence Diagram:

A sequence diagram is a UML diagram that illustrates the interactions and messages exchanged
between different components or objects in a system over time. It represents the dynamic
behavior of a system, showing the sequence of interactions between objects and the order in
which these interactions occur, helping to visualize the flow of control in a specific scenario or
use case.

Fig 3.3: Sequence diagram

14
4. Evaluation:
The evaluation of the model for real-time detection of Indian Sign Language using MediaPipe Holistic
Keypoints and an LSTM-based model involves the following metrics:

Accuracy: Measures the proportion of correctly predicted gestures out of the total predictions.
In our case, the model achieved an accuracy of 1.0 (100%), indicating perfect prediction
performance.

Confusion Matrix: Provides a detailed breakdown of true positives, true negatives, false
positives, and false negatives for each class. This matrix is essential for understanding the
performance across different sign language gestures. Here, we did some manual testing to test
the model. Its confusion matrix is as follows:

Fig 4.1: Confusion Matrix

Validation Loss: Measures the error on the validation set, helping to understand the model's
ability to generalize to unseen data. The validation loss observed is very minimal indicating a
good fit of the model on the validation data.

15
[Link] and Run-time Testing:
Here from the figure below we can clearly see that when a person does a hand sign of a help, the
model correctly predicts it.

Fig 5.1: Output 1


Here from the figure below we can clearly see that when a person does a hand sign of a cat, the model
correctly predicts it.

Fig 3.1: Output 2

16
Here from the figure below we can clearly see that when a person does a hand sign of a Food, the
model correctly predicts it.

Fig 3.1: Output 3

17
6. CONCLUSION:
This project presents a highly efficient real-time sign language detection system using a
combination of Long Short-Term Memory (LSTM) networks and MediaPipe Holistic key point
detection. The model, specifically designed for recognizing Indian Sign Language, achieved a
perfect accuracy of 1.0, signifying its effectiveness in classifying gestures with high precision.

Through the use of LSTM networks, the system successfully captures both spatial and temporal
aspects of hand gestures, allowing for dynamic gesture recognition. The integration of real-time
processing and visual outputs, including key points, class labels, and confidence scores, adds to
its practicality and usability.

This system represents a significant leap forward in enhancing communication accessibility for
the hearing-impaired community. The robust evaluation metrics confirm the system's reliability,
making it a valuable tool for real-world applications. Further improvements can make the model
more versatile, such as adding more gestures and improving environmental adaptability,
potentially revolutionizing the field of sign language recognition and communication.

18
6. Bibliography:

1. Ong, E. J., & Bowden, R. (2004). A boosted classifier tree for hand shape detection. Proceedings
of the 6th International Conference on Automatic Face and Gesture Recognition.
2. Cooper, H., Bowden, R., Saeed, A., & Othman, A. (2011). Sign language recognition using
Sub-Units. Journal of Machine Learning Research, 13(1), 2205–2231.
3. Pigou, L., Dieleman, S., Kindermans, P. J., & Schrauwen, B. (2016). Sign language recognition
using convolutional neural networks. European Conference on Computer Vision.
4. Jhuang, H., Gall, J., Zuffi, S., Schmid, C., & Black, M. J. (2013). Towards understanding action
recognition. IEEE International Conference on Computer Vision (ICCV).
5. Liang, R.-H., & Ouhyoung, M. (2012). A real-time continuous gesture recognition system for
sign language. Proceedings of the IEEE International Conference on Automatic Face and
Gesture Recognition.
6. Koller, O., Ney, H., & Bowden, R. (2016). Deep learning of mouth shapes for sign language
recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
7. TensorFlow Team. (2023). TensorFlow: Large-scale machine learning on heterogeneous
systems. Retrieved from [Link]
8.  MediaPipe Documentation. (2023). Real-time machine learning framework by Google.
Retrieved from [Link]
9.  OpenCV Team. (2023). OpenCV library for computer vision. Retrieved from
[Link]

19

You might also like